cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
bk
bk
Level I

removing duplicate items in a table

Hi All,

i have a table with ~20 columns and about 500 rows, two of these columns are Serial No. and Time stamp, and the rest is miscellaneous measurment data.

in some instances i have rows of data that have the same Serial No. but all other values are different.

my goal is to write a script to clean up the data table so that only rows with unique Serial No. are left.

I want to first select items with the same serial number and then identify from that subset the one with the latest Timestamp (which is just a numeric value), and delete the rest,

so i have a cleaned up table with unique Serial No.'s left.

Would appreciate if someone here could guide me how to write this bit of code. i've reviewd other examples on the forum, but am still struggling with this.

Thanks

Bble, ript

1 ACCEPTED SOLUTION

Accepted Solutions
mpb
mpb
Level VII

Re: removing duplicate items in a table

This code shows one way to go about it. I ran it against this data:

3898_pastedImage_2.png

mydt = data table("MyDemo");

mydtsummary = Data Table( "MyDemo" ) << Summary(

    Group( :SN ),

    Max( :DT ),

    statistics column name format( "stat of column" )

);

mydt << Update(

    With( Data Table( mydtsummary) ),

    Match Columns( :SN = :SN ),

    Add Columns from Update table( :Max of DT )

);

mydt << select where(:DT == :Max of DT) << invert row selection;

mydt << delete rows;

mydt << delete columns("Max of DT");

close(mydtsummary, nosave);

View solution in original post

3 REPLIES 3
mpb
mpb
Level VII

Re: removing duplicate items in a table

This code shows one way to go about it. I ran it against this data:

3898_pastedImage_2.png

mydt = data table("MyDemo");

mydtsummary = Data Table( "MyDemo" ) << Summary(

    Group( :SN ),

    Max( :DT ),

    statistics column name format( "stat of column" )

);

mydt << Update(

    With( Data Table( mydtsummary) ),

    Match Columns( :SN = :SN ),

    Add Columns from Update table( :Max of DT )

);

mydt << select where(:DT == :Max of DT) << invert row selection;

mydt << delete rows;

mydt << delete columns("Max of DT");

close(mydtsummary, nosave);

bk
bk
Level I

Re: removing duplicate items in a table

hey mpb,

Awesome!, modified it to suit my data, and it sure works just fine!

thanks a lot

regards

Jeff_Perkinson
Community Manager Community Manager

Re: removing duplicate items in a table

If you're not looking for a scripting solution. You can do this interactively by creating a row state column with a formula.


3903_JMPScreenSnapz014.png


This uses the Col Maximum function which takes optional arguments for By Variables.


You can also use this function with the Select Where message in a script.


dt = Open( "$SAMPLE_DATA\Big Class.jmp");

dt << select where(

  Height== Col Max(

     Height, Age, Sex

     )

  );


-Jeff

-Jeff