We’re asking you to select a content label when starting a new topic in the Discussions area. Read more to find out why.
Choose Language Hide Translation Bar
Highlighted
robot
Level VI

Insert list of column names into a formula

Hi,

I am using JMP10.  I am trying to create a script to delete duplicate rows of data in a data table.  An example of my attempt is as follows:

// Start.

// Create data table.

dt = open("$SAMPLE_DATA/Big Class.jmp");

dt << Select Randomly(5);

subdt = dt << Subset(Output Table Name("subset"), Invisible);

dt << Concatenate(subdt, Append to First Table);

Close(subdt, No Save);

dt << Clear Select;

// Find and delete duplicate rows for bigClass.

icol = dt << New Column( "Index", Numeric, Continuous, Formula( Sequence( 1, N Row( dt ), 1, 1 ) ) );

repcol = dt << New Column( "Repeat Data",

  Numeric,

  Continuous,

  Formula( If( :Index == Col Minimum( :Index, :name, :age, :sex, :height, :weight ), 1, 0 ) )

);

dt << Select Where( As Column(repcol) == 0 );

dt << Delete Rows();

dt << Delete Columns( {icol, repcol} );

// Find and delete duplicate rows for generic data table.

colnames = dt << Get Column Names();

icol = dt << New Column( "Index", Numeric, Continuous, Formula( Sequence( 1, N Row( dt ), 1, 1 ) ) );

repcol = dt << New Column( "Repeat Data",

  Numeric,

  Continuous,

  Formula( If( :Index == Col Minimum( :Index, /* Insert colnames here. */ ), 1, 0 ) )

);

dt << Select Where( As Column(repcol) == 0 );

dt << Delete Rows();

dt << Delete Columns( {icol, repcol} );

// End.

My problem is I would like to make this script run on a generic data table.  How can I insert my colnames list into my formula?  Is there a more clever way to do this?

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted
ms
Super User ms
Super User

Re: Insert list of column names into a formula

Brilliant idea to use Col Min() to find duplicates!

Below is an alternative approach based on your idea that appears to work. It is simpler because there is no need for any temporary formula columns.

// Create example data table.

Open( "$SAMPLE_DATA/Big Class.jmp" ) << Concatenate( Open( "$SAMPLE_DATA/Big Class.jmp" ), Append to First Table );

// Delete duplicates from generic table

dt = Current Data Table();

col_min = Parse( "Col Min( Row(), :Name(\!"" || Concat Items( dt << get column names( string ), "\!"), :Name(\!"" ) || "\!"))" );

duplicates = [];

For Each Row( If( col_min != Row(), duplicates |/= Row() ) );

dt << delete rows( duplicates );

View solution in original post

5 REPLIES 5
Highlighted
ms
Super User ms
Super User

Re: Insert list of column names into a formula

Brilliant idea to use Col Min() to find duplicates!

Below is an alternative approach based on your idea that appears to work. It is simpler because there is no need for any temporary formula columns.

// Create example data table.

Open( "$SAMPLE_DATA/Big Class.jmp" ) << Concatenate( Open( "$SAMPLE_DATA/Big Class.jmp" ), Append to First Table );

// Delete duplicates from generic table

dt = Current Data Table();

col_min = Parse( "Col Min( Row(), :Name(\!"" || Concat Items( dt << get column names( string ), "\!"), :Name(\!"" ) || "\!"))" );

duplicates = [];

For Each Row( If( col_min != Row(), duplicates |/= Row() ) );

dt << delete rows( duplicates );

View solution in original post

Highlighted
robot
Level VI

Re: Insert list of column names into a formula

Thanks MS!  Very clever.  I will still have to study exactly how it works.

Highlighted
ms
Super User ms
Super User

Re: Insert list of column names into a formula

I was too quick. The first posted version would fail for complex column names that require the :Name("e.g. ∆X") construct. I edited the code above and now it should work more generally.

Highlighted
Jeff_Perkinson
Community Manager Community Manager

Re: Insert list of column names into a formula

It is clever to use Col Min() with an index column to locate duplicates.

I usually use Tables->Summary to get a data table free of duplicates. Just Group by all of the columns that would define a duplicate. This has the added advantage of providing a column (N Rows) that tells you how many duplicate rows there were in the original table.

4480_JMPScreenSnapz001.png

-Jeff

-Jeff
Highlighted
robot
Level VI

Re: Insert list of column names into a formula

Thanks Jeff.