cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
JMP is taking Discovery online, April 16 and 18. Register today and join us for interactive sessions featuring popular presentation topics, networking, and discussions with the experts.
Choose Language Hide Translation Bar
robot
Level VI

Insert list of column names into a formula

Hi,

I am using JMP10.  I am trying to create a script to delete duplicate rows of data in a data table.  An example of my attempt is as follows:

// Start.

// Create data table.

dt = open("$SAMPLE_DATA/Big Class.jmp");

dt << Select Randomly(5);

subdt = dt << Subset(Output Table Name("subset"), Invisible);

dt << Concatenate(subdt, Append to First Table);

Close(subdt, No Save);

dt << Clear Select;

// Find and delete duplicate rows for bigClass.

icol = dt << New Column( "Index", Numeric, Continuous, Formula( Sequence( 1, N Row( dt ), 1, 1 ) ) );

repcol = dt << New Column( "Repeat Data",

  Numeric,

  Continuous,

  Formula( If( :Index == Col Minimum( :Index, :name, :age, :sex, :height, :weight ), 1, 0 ) )

);

dt << Select Where( As Column(repcol) == 0 );

dt << Delete Rows();

dt << Delete Columns( {icol, repcol} );

// Find and delete duplicate rows for generic data table.

colnames = dt << Get Column Names();

icol = dt << New Column( "Index", Numeric, Continuous, Formula( Sequence( 1, N Row( dt ), 1, 1 ) ) );

repcol = dt << New Column( "Repeat Data",

  Numeric,

  Continuous,

  Formula( If( :Index == Col Minimum( :Index, /* Insert colnames here. */ ), 1, 0 ) )

);

dt << Select Where( As Column(repcol) == 0 );

dt << Delete Rows();

dt << Delete Columns( {icol, repcol} );

// End.

My problem is I would like to make this script run on a generic data table.  How can I insert my colnames list into my formula?  Is there a more clever way to do this?

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
ms
Super User (Alumni) ms
Super User (Alumni)

Re: Insert list of column names into a formula

Brilliant idea to use Col Min() to find duplicates!

Below is an alternative approach based on your idea that appears to work. It is simpler because there is no need for any temporary formula columns.

// Create example data table.

Open( "$SAMPLE_DATA/Big Class.jmp" ) << Concatenate( Open( "$SAMPLE_DATA/Big Class.jmp" ), Append to First Table );

// Delete duplicates from generic table

dt = Current Data Table();

col_min = Parse( "Col Min( Row(), :Name(\!"" || Concat Items( dt << get column names( string ), "\!"), :Name(\!"" ) || "\!"))" );

duplicates = [];

For Each Row( If( col_min != Row(), duplicates |/= Row() ) );

dt << delete rows( duplicates );

View solution in original post

5 REPLIES 5
ms
Super User (Alumni) ms
Super User (Alumni)

Re: Insert list of column names into a formula

Brilliant idea to use Col Min() to find duplicates!

Below is an alternative approach based on your idea that appears to work. It is simpler because there is no need for any temporary formula columns.

// Create example data table.

Open( "$SAMPLE_DATA/Big Class.jmp" ) << Concatenate( Open( "$SAMPLE_DATA/Big Class.jmp" ), Append to First Table );

// Delete duplicates from generic table

dt = Current Data Table();

col_min = Parse( "Col Min( Row(), :Name(\!"" || Concat Items( dt << get column names( string ), "\!"), :Name(\!"" ) || "\!"))" );

duplicates = [];

For Each Row( If( col_min != Row(), duplicates |/= Row() ) );

dt << delete rows( duplicates );

robot
Level VI

Re: Insert list of column names into a formula

Thanks MS!  Very clever.  I will still have to study exactly how it works.

ms
Super User (Alumni) ms
Super User (Alumni)

Re: Insert list of column names into a formula

I was too quick. The first posted version would fail for complex column names that require the :Name("e.g. ∆X") construct. I edited the code above and now it should work more generally.

Jeff_Perkinson
Community Manager Community Manager

Re: Insert list of column names into a formula

It is clever to use Col Min() with an index column to locate duplicates.

I usually use Tables->Summary to get a data table free of duplicates. Just Group by all of the columns that would define a duplicate. This has the added advantage of providing a column (N Rows) that tells you how many duplicate rows there were in the original table.

4480_JMPScreenSnapz001.png

-Jeff

-Jeff
robot
Level VI

Re: Insert list of column names into a formula

Thanks Jeff.