Subscribe Bookmark RSS Feed

removing duplicate items in a table

bk

Community Trekker

Joined:

Jul 13, 2013

Hi All,

i have a table with ~20 columns and about 500 rows, two of these columns are Serial No. and Time stamp, and the rest is miscellaneous measurment data.

in some instances i have rows of data that have the same Serial No. but all other values are different.

my goal is to write a script to clean up the data table so that only rows with unique Serial No. are left.

I want to first select items with the same serial number and then identify from that subset the one with the latest Timestamp (which is just a numeric value), and delete the rest,

so i have a cleaned up table with unique Serial No.'s left.

Would appreciate if someone here could guide me how to write this bit of code. i've reviewd other examples on the forum, but am still struggling with this.

Thanks

Bble, ript

1 ACCEPTED SOLUTION

Accepted Solutions
Solution

This code shows one way to go about it. I ran it against this data:

3898_pastedImage_2.png

mydt = data table("MyDemo");

mydtsummary = Data Table( "MyDemo" ) << Summary(

    Group( :SN ),

    Max( :DT ),

    statistics column name format( "stat of column" )

);

mydt << Update(

    With( Data Table( mydtsummary) ),

    Match Columns( :SN = :SN ),

    Add Columns from Update table( :Max of DT )

);

mydt << select where(:DT == :Max of DT) << invert row selection;

mydt << delete rows;

mydt << delete columns("Max of DT");

close(mydtsummary, nosave);

3 REPLIES
Solution

This code shows one way to go about it. I ran it against this data:

3898_pastedImage_2.png

mydt = data table("MyDemo");

mydtsummary = Data Table( "MyDemo" ) << Summary(

    Group( :SN ),

    Max( :DT ),

    statistics column name format( "stat of column" )

);

mydt << Update(

    With( Data Table( mydtsummary) ),

    Match Columns( :SN = :SN ),

    Add Columns from Update table( :Max of DT )

);

mydt << select where(:DT == :Max of DT) << invert row selection;

mydt << delete rows;

mydt << delete columns("Max of DT");

close(mydtsummary, nosave);

bk

Community Trekker

Joined:

Jul 13, 2013

hey mpb,

Awesome!, modified it to suit my data, and it sure works just fine!

thanks a lot

regards

Jeff_Perkinson

Community Manager

Joined:

Jun 23, 2011

If you're not looking for a scripting solution. You can do this interactively by creating a row state column with a formula.


3903_JMPScreenSnapz014.png


This uses the Col Maximum function which takes optional arguments for By Variables.


You can also use this function with the Select Where message in a script.


dt = Open( "$SAMPLE_DATA\Big Class.jmp");

dt << select where(

  Height== Col Max(

     Height, Age, Sex

     )

  );


-Jeff

-Jeff