cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Browse apps to extend the software in the new JMP Marketplace
Choose Language Hide Translation Bar
Yngeinstn
Level IV

Delete Duplicate Data from Table using Time Stamp

I apologize, i know this question has been asked quite a few times on this forum but I haven't been able to find a solution to my problem.

I have a large data set  ~24,000 rows of test data. Our process is to run a 5 step validation on 5 die and once we get the results we begin testing which ultimately means i am going to have repeat data on those 5 sites.  This 5 step validation verifies the contact points which sometimes results in tearing down our test equipment and setting it back up. I am looking to exclude the duplicate data from the 1st test (touchdown). Our tests are timestamped so it is easy to distinguish between old and new. If anyone has a great solution to my problem i am all ears. Attached is the data table where i am trying to remove the duplicates.

 

Thank You,

Yng

2 ACCEPTED SOLUTIONS

Accepted Solutions
txnelson
Super User

Re: Delete Duplicate Data from Table using Time Stamp

I think this will do the trick....it simply finds all of the rows for a given RowCol that have a timestamp(ts) less than the maximum timestamp for that RowCol, and then deletes them.  It takes a couple of minutes to process, but it seems to work

Names Default To Here( 1 );
dt = Current Data Table();

// remove formula leaving static data
try(dt:RowCol << delete formula);

Summarize( dt, byGroup = by( "RowCol" ) );

For( i = 1, i <= N Items( byGroup ), i++,
	theRows = dt << get rows where( dt:RowCol == byGroup[i] & :ts < Col Max( dt:ts, :RowCol ) );
	If( N Rows( theRows ) > 1,
		dt << delete rows( theRows )
	);
);
Jim

View solution in original post

txnelson
Super User

Re: Delete Duplicate Data from Table using Time Stamp

I rethought through the issue, and came up with a totally methodology.  This code will work for wafers

Names Default To Here( 1 );
dt = Current Data Table();

// Summarize the data to find the maximum :ts for each wafer/rowcol
dtSum = dt << Summary(
	Group( :wafer_number, :RowCol ),
	Max( :ts ),
	Freq( "None" ),
	Weight( "None" ),
	statistics column name format( "column" ),
	link to original table( 0 )
);

// Join the data together matching only where the max :ts for each
// wafer/rowcol
dtFinal = dt << Join(
	With( dtSum ),
	Merge Same Name Columns,
	By Matching Columns( :wafer_number = :wafer_number, :RowCol = :RowCol, :ts = :ts ),
	Drop multiples( 0, 0 ),
	Include Nonmatches( 1, 0 ),
	Preserve main table order( 1 )
);
dtSum << delete columns("N Rows");
Close( dtSum, nosave );

// Delete all rows that do not match the maximum value of :ts within
// each Wafer, RowCol
dtFinal << select where( Not( :Match Flag == 3 ) );
dtFinal << delete rows;

// Get rid of the Match Flag column which is an artifact of the Join
dtFinal << delete columns("Match Flag");
Jim

View solution in original post

10 REPLIES 10
txnelson
Super User

Re: Delete Duplicate Data from Table using Time Stamp

I am having trouble identifying from your attached data table, the rows that contain the data for Test 1 that you want to have deleted.  Could you please provide the row numbers for 1 or two cases that represent the target data?

Jim
Yngeinstn
Level IV

Re: Delete Duplicate Data from Table using Time Stamp

Sorry about that.. I deleted the unnecessary columns and reattached.

test_output is the data that i need. RowCol is the identifier of the devices i am testing.

 

I attached the summary table of the RowCol {r2c3sr1sc1, r2c8sr1sc1, r4c5sr1sc1, r7c3sr1sc1, r7c8sr1sc1} have the duplicate data. I need to delete the initial tests of those devices from the data set. They can be will have the earliest Time Stamp.. These 5 devices are used in every set-up which will require me to remove from each test_output.

txnelson
Super User

Re: Delete Duplicate Data from Table using Time Stamp

I think this will do the trick....it simply finds all of the rows for a given RowCol that have a timestamp(ts) less than the maximum timestamp for that RowCol, and then deletes them.  It takes a couple of minutes to process, but it seems to work

Names Default To Here( 1 );
dt = Current Data Table();

// remove formula leaving static data
try(dt:RowCol << delete formula);

Summarize( dt, byGroup = by( "RowCol" ) );

For( i = 1, i <= N Items( byGroup ), i++,
	theRows = dt << get rows where( dt:RowCol == byGroup[i] & :ts < Col Max( dt:ts, :RowCol ) );
	If( N Rows( theRows ) > 1,
		dt << delete rows( theRows )
	);
);
Jim
Yngeinstn
Level IV

Re: Delete Duplicate Data from Table using Time Stamp

Thank you so much! I am absolutely amazed at the skill you gentlefolk have with regards to scripting.. Here i was thinking that it was going to be some elaborate script that would accomplish this and you manage to get it in 10 lines..

 

You and Mr. gzmorgan0 have been extremely helpful.. 

 

Side note: my company has a P.O in the cycle that gets us JMP 14 which makes me super excited..

 

Thank you again Mr. txnelson..

 

Yng (not so Yng anymore)

Yngeinstn
Level IV

Re: Delete Duplicate Data from Table using Time Stamp

I am curious! I am an Industrial Engineer and coding is very difficult for me.. If() and For() have thrown me for a loop (forgive the pun) since the beginning. How were you able to determine 9 as the number?

Thanks Again
txnelson
Super User

Re: Delete Duplicate Data from Table using Time Stamp

Oops....that should be a 1 not a nine
Jim
Yngeinstn
Level IV

Re: Delete Duplicate Data from Table using Time Stamp

txnelson,

 

One more question regarding removing duplicate data.. I forgot to mention that i do a blanket data pull that includes all wafers in order for me to do an overall distribution and one way t-test analysis.

 

The data table that i attached was just the test data for one wafer..  Where in the script would i put the wafer_number? Would i include it in the summarize() part?

 

I have attached the new data table for reference.

 

I had to delete the other one because it was pointed out to me that i had my sql script that pull from our servers...

 

txnelson
Super User

Re: Delete Duplicate Data from Table using Time Stamp

I rethought through the issue, and came up with a totally methodology.  This code will work for wafers

Names Default To Here( 1 );
dt = Current Data Table();

// Summarize the data to find the maximum :ts for each wafer/rowcol
dtSum = dt << Summary(
	Group( :wafer_number, :RowCol ),
	Max( :ts ),
	Freq( "None" ),
	Weight( "None" ),
	statistics column name format( "column" ),
	link to original table( 0 )
);

// Join the data together matching only where the max :ts for each
// wafer/rowcol
dtFinal = dt << Join(
	With( dtSum ),
	Merge Same Name Columns,
	By Matching Columns( :wafer_number = :wafer_number, :RowCol = :RowCol, :ts = :ts ),
	Drop multiples( 0, 0 ),
	Include Nonmatches( 1, 0 ),
	Preserve main table order( 1 )
);
dtSum << delete columns("N Rows");
Close( dtSum, nosave );

// Delete all rows that do not match the maximum value of :ts within
// each Wafer, RowCol
dtFinal << select where( Not( :Match Flag == 3 ) );
dtFinal << delete rows;

// Get rid of the Match Flag column which is an artifact of the Join
dtFinal << delete columns("Match Flag");
Jim
Yngeinstn
Level IV

Re: Delete Duplicate Data from Table using Time Stamp

Thank you txnelson,

 

I am going to try this out this morning and will incorporate it in my server pull script / table manipulation.

 

Much appreciated again..