cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Check out the JMP® Marketplace featured Capability Explorer add-in
Choose Language Hide Translation Bar
OtisZeca
Level I

Selecting and removing duplicates row with condition

Hi guys,

 

I've searched around the forum however didnt come close to what I wanted to achieve, or maybe I missed out.

 Sorting.jpg

Anyway, I need help on the table above. As can be seen, I have duplicates as per highlighted by color. The serial number is same for most part because the part is inspected on different section(as show 1A,1B,2A,2B,3A,3B,4A,4B). Usually if the part fails inspection it will be re-inspected, which show in the re-test column(0 for no retest, 1 for 1st, etc..) Now I can only differentiate the part by the number of retest or date within the same serial number, so I would like to retain the latest re-test result and delete the duplicates of previous. Meaning to say as per highlighted box above, for the blue or red, I want to retain the re-test 1 and remove the re-test 0. This is kinda headache for me, so I would need community's advise on the script to select the condition within the serial number and section to figure out the  latest re-test, then retain the latest re-test and remove previous duplicates. I hope you all have any input for this. Thank you

2 ACCEPTED SOLUTIONS

Accepted Solutions
txnelson
Super User

Re: Selecting and removing duplicates row with condition

There is a function called

     Select Duplicate Rows()

that can be used for this, however it is based upon finding rows following a unique row.  Therefore, the table needs to be sorted in reverse before using the function.  Below is a sample script to do this.  Since I did not have a sample data table to run this on, the script has not been tested, however it is very close to being what you need.

Documentation on the function can be found in the Scripting Index

  •      Help==>Scripting Indes
Names Default To Here( 1 );
dt = Current Data Table();

// Add a new column to preserve the current order
dt << New Column( "rowNum", formula( Row() ) );
// Remove the formula to convert values to static values
:rowNum << delete property( "formula" );

// Sort the data table in referse order, since one wants to 
// keep the most recent value
dt << sort( by( rowNum ), order( descending ), Replace Table( 1 ) );

// Find the duplicate rows
dt << Select Duplicate Rows( Match( :SerialNo, :Date, :Retest ) );

// Delete Duplicates
dt << delete rows;

// Reorder the table
dt << sort( by( rowNum ), order( ascending ), Replace Table( 1 ) );

// Delete rowNum column
dt << delete columns( "rowNum" );
Jim

View solution in original post

ms
Super User (Alumni) ms
Super User (Alumni)

Re: Selecting and removing duplicates row with condition

If you have an earlier version of JMP, <<Select Duplicate Rows may not be available. Then try the below approach. The code should select and delete the duplicate rows (based on matching SerialNo and Section) with the earliest date. Sorting does not matter.

 

dt = Current Data Table();
dt << select where(Col Max(:Date, :SerialNo, :Section) > :Date());
dt << delete rows();

View solution in original post

5 REPLIES 5
txnelson
Super User

Re: Selecting and removing duplicates row with condition

There is a function called

     Select Duplicate Rows()

that can be used for this, however it is based upon finding rows following a unique row.  Therefore, the table needs to be sorted in reverse before using the function.  Below is a sample script to do this.  Since I did not have a sample data table to run this on, the script has not been tested, however it is very close to being what you need.

Documentation on the function can be found in the Scripting Index

  •      Help==>Scripting Indes
Names Default To Here( 1 );
dt = Current Data Table();

// Add a new column to preserve the current order
dt << New Column( "rowNum", formula( Row() ) );
// Remove the formula to convert values to static values
:rowNum << delete property( "formula" );

// Sort the data table in referse order, since one wants to 
// keep the most recent value
dt << sort( by( rowNum ), order( descending ), Replace Table( 1 ) );

// Find the duplicate rows
dt << Select Duplicate Rows( Match( :SerialNo, :Date, :Retest ) );

// Delete Duplicates
dt << delete rows;

// Reorder the table
dt << sort( by( rowNum ), order( ascending ), Replace Table( 1 ) );

// Delete rowNum column
dt << delete columns( "rowNum" );
Jim
OtisZeca
Level I

Re: Selecting and removing duplicates row with condition

I made abit of amendments to suit my table. Thanks!

u757707
Level I

Re: Selecting and removing duplicates row with condition

If you want to remove duplicate rows matching all columns except one(or a few), this is my working solution.
Advantage is no need to specify every column name.

// delete duplicate rows, where only column ID is different 
	column_list = dtProcess << Get Column Names( string );		
	Remove From( column_list, As List( Loc( column_list, "ID" ) ) ); // remove "ID"
	dtProcess << Select Duplicate Rows( Match(As List(column_list)));
	doublerows = dtProcess << get selected rows;
	num_selected = n rows(doublerows);	
	if(num_selected >0, 
		dtProcess << delete rows;
		print("Removed " || char(num_selected) || " rows from process table.")
		,
		print("No duplicate rows in process table, excluding column ID.")
	);

:

ms
Super User (Alumni) ms
Super User (Alumni)

Re: Selecting and removing duplicates row with condition

If you have an earlier version of JMP, <<Select Duplicate Rows may not be available. Then try the below approach. The code should select and delete the duplicate rows (based on matching SerialNo and Section) with the earliest date. Sorting does not matter.

 

dt = Current Data Table();
dt << select where(Col Max(:Date, :SerialNo, :Section) > :Date());
dt << delete rows();
OtisZeca
Level I

Re: Selecting and removing duplicates row with condition

Great help! Thanks!

I was building a something like this, somehow my combination got wrong on the column selection.

Thank you!