Subscribe Bookmark RSS Feed

Insert list of column names into a formula

robot

Community Trekker

Joined:

Feb 27, 2012

Hi,

I am using JMP10.  I am trying to create a script to delete duplicate rows of data in a data table.  An example of my attempt is as follows:

// Start.

// Create data table.

dt = open("$SAMPLE_DATA/Big Class.jmp");

dt << Select Randomly(5);

subdt = dt << Subset(Output Table Name("subset"), Invisible);

dt << Concatenate(subdt, Append to First Table);

Close(subdt, No Save);

dt << Clear Select;

// Find and delete duplicate rows for bigClass.

icol = dt << New Column( "Index", Numeric, Continuous, Formula( Sequence( 1, N Row( dt ), 1, 1 ) ) );

repcol = dt << New Column( "Repeat Data",

  Numeric,

  Continuous,

  Formula( If( :Index == Col Minimum( :Index, :name, :age, :sex, :height, :weight ), 1, 0 ) )

);

dt << Select Where( As Column(repcol) == 0 );

dt << Delete Rows();

dt << Delete Columns( {icol, repcol} );

// Find and delete duplicate rows for generic data table.

colnames = dt << Get Column Names();

icol = dt << New Column( "Index", Numeric, Continuous, Formula( Sequence( 1, N Row( dt ), 1, 1 ) ) );

repcol = dt << New Column( "Repeat Data",

  Numeric,

  Continuous,

  Formula( If( :Index == Col Minimum( :Index, /* Insert colnames here. */ ), 1, 0 ) )

);

dt << Select Where( As Column(repcol) == 0 );

dt << Delete Rows();

dt << Delete Columns( {icol, repcol} );

// End.

My problem is I would like to make this script run on a generic data table.  How can I insert my colnames list into my formula?  Is there a more clever way to do this?

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Solution

Brilliant idea to use Col Min() to find duplicates!

Below is an alternative approach based on your idea that appears to work. It is simpler because there is no need for any temporary formula columns.

// Create example data table.

Open( "$SAMPLE_DATA/Big Class.jmp" ) << Concatenate( Open( "$SAMPLE_DATA/Big Class.jmp" ), Append to First Table );

// Delete duplicates from generic table

dt = Current Data Table();

col_min = Parse( "Col Min( Row(), :Name(\!"" || Concat Items( dt << get column names( string ), "\!"), :Name(\!"" ) || "\!"))" );

duplicates = [];

For Each Row( If( col_min != Row(), duplicates |/= Row() ) );

dt << delete rows( duplicates );

5 REPLIES
Solution

Brilliant idea to use Col Min() to find duplicates!

Below is an alternative approach based on your idea that appears to work. It is simpler because there is no need for any temporary formula columns.

// Create example data table.

Open( "$SAMPLE_DATA/Big Class.jmp" ) << Concatenate( Open( "$SAMPLE_DATA/Big Class.jmp" ), Append to First Table );

// Delete duplicates from generic table

dt = Current Data Table();

col_min = Parse( "Col Min( Row(), :Name(\!"" || Concat Items( dt << get column names( string ), "\!"), :Name(\!"" ) || "\!"))" );

duplicates = [];

For Each Row( If( col_min != Row(), duplicates |/= Row() ) );

dt << delete rows( duplicates );

robot

Community Trekker

Joined:

Feb 27, 2012

Thanks MS!  Very clever.  I will still have to study exactly how it works.

ms

Super User

Joined:

Jun 23, 2011

I was too quick. The first posted version would fail for complex column names that require the :Name("e.g. ∆X") construct. I edited the code above and now it should work more generally.

Jeff_Perkinson

Community Manager

Joined:

Jun 23, 2011

It is clever to use Col Min() with an index column to locate duplicates.

I usually use Tables->Summary to get a data table free of duplicates. Just Group by all of the columns that would define a duplicate. This has the added advantage of providing a column (N Rows) that tells you how many duplicate rows there were in the original table.

4480_JMPScreenSnapz001.png

-Jeff

-Jeff
robot

Community Trekker

Joined:

Feb 27, 2012

Thanks Jeff.