Solved: Re: How to eliminate the first duplicate row?

LarsBirger · Jan 29, 2019 03:17 PM

Dear community I have a question and that is how to eliminate rows with the first duplicate.

Example

Nr	Date	Count
4567	2017-01-21	1
5555	2017-02-03	1
5555	2017-02-10	2
8745	2015-03-10	1
8345	2016-05-01	1
9563	2016-01-02	1
9563	2016-01-10	2

Is there a script how to, in this example, find the duplicates that I have marked here with bold text and either eliminate these rows directly or create an additional column where it could be stated for example Keep and Eliminate?

I am using JMP Pro 14.0.0 (64-bit)

Sincerely yours

Lars Enochsson, M.D., Ph.D.

Professor of Surgery

Department of Surgical and Perioperative Sciences

Umeå University

Head of the Swedish Registry of Gallstone Surgery and ERCP, GallRiks

Scientific Secretary of the Swedish Surgical Society

E-mail: lars.enochsson@umu.se

ms · Jan 29, 2019 2:16 PM

This script should work too:

// Delete all multiplicate rows except the one with the latest date (sorting does not matter). 
dt = Current Data Table();
dt << Select Where(Col Max(:Date, :Nr) > :Date);
dt << Delete rows;

Edit: If duplicates can have the same date, this works better, but sorting by date required.

dt << Select Where(Col Max(Row(), :Nr) > Row());
dt << Delete Rows;

View solution in original post

cwillden · Jan 29, 2019 04:03 PM

Hi Lars,

Will duplicates always be sequentially ordered or could there be records between? If they are always sequential like in your example, you could do something like this:

dt = Current Data Table();
del_rows = {}; //initiate list to contain list of rows that are duplicates

for(i = 1, i<=N Row(dt), i++,
	if(:Nr[i] == :Nr[i+1], insert into(del_rows,i)) //if Nr for current row is same as next, then put current row in del_rows
);

dt << Delete Rows(del_rows); //delete all rows in del_rows

-- Cameron Willden

txnelson · Jan 29, 2019 04:04 PM

Here is a script that will get the job done:

Names Default To Here( 1 );
dt = New Table( "Example",
	Add Rows( 7 ),
	New Script(
		"Source",
		Data Table( "Transpose of Untitled 17" ) <<
		Subset( All rows, Selected columns only( 0 ) )
	),
	New Column( "Nr",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Set Values( [4567, 5555, 5555, 8745, 8345, 9563, 9563] )
	),
	New Column( "Date",
		Numeric,
		"Continuous",
		Format( "yyyy-mm-dd", 12 ),
		Input Format( "yyyy-mm-dd" ),
		Set Values(
			[3567801600, 3568924800, 3569529600, 3508790400, 3544905600, 3534537600,
			3535228800]
		)
	),
	New Column( "Count",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Set Values( [1, 1, 2, 1, 1, 1, 2] )
	)
);

dt << select duplicate rows( Match( :Nr ) );
wait(5);
dt2 = dt << subset( invisible, selected rows( 1 ), selected columns( 0 ) );
dt << delete rows;
Wait(5);
dt = dt << Update( With( dt2 ), Match Columns( :Nr = :Nr ) );

Close( dt2, nosave );

Jim

ms · Jan 29, 2019 2:16 PM

This script should work too:

// Delete all multiplicate rows except the one with the latest date (sorting does not matter). 
dt = Current Data Table();
dt << Select Where(Col Max(:Date, :Nr) > :Date);
dt << Delete rows;

Edit: If duplicates can have the same date, this works better, but sorting by date required.

dt << Select Where(Col Max(Row(), :Nr) > Row());
dt << Delete Rows;

LarsBirger · Jan 30, 2019 03:52 AM

Thanks for the quick reply. This script really did the trick. There are not som many people in Sweden using JMP at least not in the academic world. Usually they use SPSS or Stata. However, with this active community there is no reason to change. Even our statistician up here at Umeå University is impressed./Lars

txnelson · Jan 30, 2019 04:00 AM

Well, I appreciate the "Thanks". I feel even better about responding to you, now that I know you are a fellow Scandinavian. My Great Grandfather, Nels Knutson, immigrated to America from Bergan, Norway.

Jim

ms · Jan 30, 2019 06:51 AM

I actually first encountered JMP in 1994 at Uppsala university, Sweden and have used it ever since. Even if JMP is still not very commonly used in Swedish academia, I have "converted" quite a few SPSS users along the way.

Agree, this community is great.

txnelson · Jan 30, 2019 11:01 AM

@ms ,

Your years of experiece and knowlege with JMP are obvious in your Community Discussion responses. I really appreciate your involvement in the Community.

Jim