BookmarkSubscribe
Choose Language Hide Translation Bar
OtisZeca
Occasional Contributor

Selecting and removing duplicates row with condition

Hi guys,

 

I've searched around the forum however didnt come close to what I wanted to achieve, or maybe I missed out.

 Sorting.jpg

Anyway, I need help on the table above. As can be seen, I have duplicates as per highlighted by color. The serial number is same for most part because the part is inspected on different section(as show 1A,1B,2A,2B,3A,3B,4A,4B). Usually if the part fails inspection it will be re-inspected, which show in the re-test column(0 for no retest, 1 for 1st, etc..) Now I can only differentiate the part by the number of retest or date within the same serial number, so I would like to retain the latest re-test result and delete the duplicates of previous. Meaning to say as per highlighted box above, for the blue or red, I want to retain the re-test 1 and remove the re-test 0. This is kinda headache for me, so I would need community's advise on the script to select the condition within the serial number and section to figure out the  latest re-test, then retain the latest re-test and remove previous duplicates. I hope you all have any input for this. Thank you

0 Kudos
2 ACCEPTED SOLUTIONS

Accepted Solutions
txnelson
Super User

Re: Selecting and removing duplicates row with condition

There is a function called

     Select Duplicate Rows()

that can be used for this, however it is based upon finding rows following a unique row.  Therefore, the table needs to be sorted in reverse before using the function.  Below is a sample script to do this.  Since I did not have a sample data table to run this on, the script has not been tested, however it is very close to being what you need.

Documentation on the function can be found in the Scripting Index

  •      Help==>Scripting Indes
Names Default To Here( 1 );
dt = Current Data Table();

// Add a new column to preserve the current order
dt << New Column( "rowNum", formula( Row() ) );
// Remove the formula to convert values to static values
:rowNum << delete property( "formula" );

// Sort the data table in referse order, since one wants to 
// keep the most recent value
dt << sort( by( rowNum ), order( descending ), Replace Table( 1 ) );

// Find the duplicate rows
dt << Select Duplicate Rows( Match( :SerialNo, :Date, :Retest ) );

// Delete Duplicates
dt << delete rows;

// Reorder the table
dt << sort( by( rowNum ), order( ascending ), Replace Table( 1 ) );

// Delete rowNum column
dt << delete columns( "rowNum" );
Jim
ms
Super User ms
Super User

Re: Selecting and removing duplicates row with condition

If you have an earlier version of JMP, <<Select Duplicate Rows may not be available. Then try the below approach. The code should select and delete the duplicate rows (based on matching SerialNo and Section) with the earliest date. Sorting does not matter.

 

dt = Current Data Table();
dt << select where(Col Max(:Date, :SerialNo, :Section) > :Date());
dt << delete rows();
4 REPLIES 4
txnelson
Super User

Re: Selecting and removing duplicates row with condition

There is a function called

     Select Duplicate Rows()

that can be used for this, however it is based upon finding rows following a unique row.  Therefore, the table needs to be sorted in reverse before using the function.  Below is a sample script to do this.  Since I did not have a sample data table to run this on, the script has not been tested, however it is very close to being what you need.

Documentation on the function can be found in the Scripting Index

  •      Help==>Scripting Indes
Names Default To Here( 1 );
dt = Current Data Table();

// Add a new column to preserve the current order
dt << New Column( "rowNum", formula( Row() ) );
// Remove the formula to convert values to static values
:rowNum << delete property( "formula" );

// Sort the data table in referse order, since one wants to 
// keep the most recent value
dt << sort( by( rowNum ), order( descending ), Replace Table( 1 ) );

// Find the duplicate rows
dt << Select Duplicate Rows( Match( :SerialNo, :Date, :Retest ) );

// Delete Duplicates
dt << delete rows;

// Reorder the table
dt << sort( by( rowNum ), order( ascending ), Replace Table( 1 ) );

// Delete rowNum column
dt << delete columns( "rowNum" );
Jim
OtisZeca
Occasional Contributor

Re: Selecting and removing duplicates row with condition

I made abit of amendments to suit my table. Thanks!

0 Kudos
ms
Super User ms
Super User

Re: Selecting and removing duplicates row with condition

If you have an earlier version of JMP, <<Select Duplicate Rows may not be available. Then try the below approach. The code should select and delete the duplicate rows (based on matching SerialNo and Section) with the earliest date. Sorting does not matter.

 

dt = Current Data Table();
dt << select where(Col Max(:Date, :SerialNo, :Section) > :Date());
dt << delete rows();
OtisZeca
Occasional Contributor

Re: Selecting and removing duplicates row with condition

Great help! Thanks!

I was building a something like this, somehow my combination got wrong on the column selection.

Thank you!

0 Kudos