Share your ideas for the JMP Scripting Unsession at Discovery Summit by September 17th. We hope to see you there!
Choose Language Hide Translation Bar
Highlighted
Srishti
Level III

Removing duplicates rows

This is not a simple removal of duplicates- I want to remove duplicate rows by their value matched in 2 columns- in 1 column, there are the unique clinic ids, in the other are the year of diagnosis of a disease. i want to apply the condotion that duplicates for a unique clinic id (ie patient) should be removed if the corresponding years of diagnosis are within a year of each other. any provision for such a thing in jmp? eg, a diagnosis in 2015 and 2016 should be considered as one, and the year 2015 should be retained. i have tried the join and summary options without success. any ideas?

3 REPLIES 3
Highlighted
txnelson
Super User

Re: Removing duplicates rows

Here is a small script that I believe will find the duplicates.  It assumes the date column has a JMP date value in it.  The names of the columns used in the script, will have to be changed in the script, to match the actual names in your data table.

Names Default To Here( 1 );
dt = Current Data Table();
delList = {};

For( i = 1, i <= N Rows( dt ), i++,
	curID = dt:id[i];
	curRows = dt << get rows where( :id == curID );
	minIDdate = Min( :date[curRows] );
	curDateList = As List( :date[curRows] );
	minIDRow = (curRows)[Loc( curDateList, minIDdate )];
	If( i != minIDRow,
		If( :date[i] <= minIDdate + In Years( 1 ),
			Insert Into( delList, i )
		)
	);
);

// dt << delete rows( delList );
// or
dt << New Column( "Duplicate", character );
dt:Duplicate[delList] = "Duplicate";
Jim
Highlighted
Srishti
Level III

Re: Removing duplicates rows

thank you so much for your reply. i am a beginner at JMP, and am not sure how to run the script. could you tell me where to substitute the table and column names in this script? eg, table name- Preg, Coulmn names- Clinic number, Year.

 

I appreciate the help!

Highlighted
ms
Super User ms
Super User

Re: Removing duplicates rows

Here's another variant of the same idea. It will identify duplicates according to your rules. For 2015 and 2016, 2015 will be retained; for 2014, 2015, 2017, 2018, 2015 and 2017 will be retained, but beware that if there are more than two years consecutive years e.g. 2015, 2016, 2017, only 2015 will be retained. 

 

// Example table
dt = New Table("test",
    Add Rows(13),
    New Column("Clinic", "Character", Set Values({"A", "A", "B", "B", "C", "C", "D", "D", "E", "E", "E", "E", "E"})),
    New Column("Year", Set Values([2016, 2017, 2015, 2016, 2015, 2016, 2015, 2018, 2010, 2011, 2016, 2017, 2018]))
);

// Select multiples based on criteria: same clinic and same or adjacent year
dt << sort(Replace Table, By(:Clinic, :Year));
dt << select where(Col Min(Row(), :Clinic, If(Row() == 1, 0, Dif(:Year)) <= 1) < Row());
// dt << delete rows;
Article Labels

    There are no labels assigned to this post.