Solved: removing duplicate items in a table

bk · Jul 13, 2013 08:59 PM

Hi All,

i have a table with ~20 columns and about 500 rows, two of these columns are Serial No. and Time stamp, and the rest is miscellaneous measurment data.

in some instances i have rows of data that have the same Serial No. but all other values are different.

my goal is to write a script to clean up the data table so that only rows with unique Serial No. are left.

I want to first select items with the same serial number and then identify from that subset the one with the latest Timestamp (which is just a numeric value), and delete the rest,

so i have a cleaned up table with unique Serial No.'s left.

Would appreciate if someone here could guide me how to write this bit of code. i've reviewd other examples on the forum, but am still struggling with this.

Thanks

Bble, ript

mpb · Oct 18, 2016 1:17 PM

This code shows one way to go about it. I ran it against this data:

mydt = data table("MyDemo");

mydtsummary = Data Table( "MyDemo" ) << Summary(

Group( :SN ),

Max( :DT ),

statistics column name format( "stat of column" )

);

mydt << Update(

With( Data Table( mydtsummary) ),

Match Columns( :SN = :SN ),

Add Columns from Update table( :Max of DT )

);

mydt << select where(:DT == :Max of DT) << invert row selection;

mydt << delete rows;

mydt << delete columns("Max of DT");

close(mydtsummary, nosave);

View solution in original post

mpb · Oct 18, 2016 1:17 PM

This code shows one way to go about it. I ran it against this data:

mydt = data table("MyDemo");

mydtsummary = Data Table( "MyDemo" ) << Summary(

Group( :SN ),

Max( :DT ),

statistics column name format( "stat of column" )

);

mydt << Update(

With( Data Table( mydtsummary) ),

Match Columns( :SN = :SN ),

Add Columns from Update table( :Max of DT )

);

mydt << select where(:DT == :Max of DT) << invert row selection;

mydt << delete rows;

mydt << delete columns("Max of DT");

close(mydtsummary, nosave);

bk · Jul 15, 2013 02:09 AM

hey mpb,

Awesome!, modified it to suit my data, and it sure works just fine!

thanks a lot

regards

Jeff_Perkinson · Oct 18, 2016 1:17 PM

If you're not looking for a scripting solution. You can do this interactively by creating a row state column with a formula.

This uses the Col Maximum function which takes optional arguments for By Variables.

You can also use this function with the Select Where message in a script.

dt = Open( "$SAMPLE_DATA\Big Class.jmp");

dt << select where(

Height== Col Max(

Height, Age, Sex

)

);

-Jeff

removing duplicate items in a table

Re: removing duplicate items in a table

Re: removing duplicate items in a table

Re: removing duplicate items in a table

Re: removing duplicate items in a table