Problem with script to delete redundant measurements specific to certain conditi...

FreshUser · Jun 9, 2023 9:00 AM

I have a data table that is made up of multiple rows of individual measurements in-series that I am trimming before doing a table split function on, however I have a problem with some redundant measurements that I would like to remove but the current code I'm using is extremely slow and likely inefficient for what I need.

Background Information:

Each row is the result of an individual automated measurement on a piece of material performed in series and produces a report with the following columns:
- A unique part ID assigned to each individual piece of material [ PartID ]
- The Lot ID associated with the raw material used to create the individual piece of material [ LotID ]
- The actual measurement value for the measurement performed [ Value ]
- Measurement result assigned based off of the measurement value(Pass/Fail/Rework) [ Result ]
- Measurement timestamp [ Date_Created ]

My equipment produces dozens of rows of measurements per piece of material, and processes thousands of pieces of material per report - those reports are then fed into JMP as my original data table. Each report also contains multiple lots, so the data table has multiple references to "PartID = 1" because each lot is given a sequential number starting at 1, so I need to use both PartID and LotID to refer to a given piece of material. In the case where the result is "Rework", the measurement equipment stops any further measurements on that individual piece of material, re-orients the piece of material, and returns it to the start of the line re-measuring all previous tests on the material but still keeping the same unique part ID that was associated with the piece of material.

Desired Outcome:

I am wanting to use script to go through the data table and eliminate all measurements up to (and including) the rework result, since after rework all the measurements are repeated and rework can affect the measurement values. To do this I am currently using JSL script to loop through each row and do the following:

Identify whether the result is "Rework"; if so then do the following:
1. Identify the PartID for the given piece of material as "xPartID"
2. Identify the LotID for the given piece of material as "xLot"
3. Identify the Date_Created for the given measurement as "xDate"
4. Select and delete all rows where (PartID = xPartID) and (LotID = xLot) and (Date_Created <= xDate)
If the result is not "Rework" then bail out of the loop, no need to continue.

I keep having extreme slowdowns and basically the code leads to crashing the program. If I run it step-by-step it seems to work, but if left to run it eventually dies. Here's the actual code:

//Recoding the two potential "Rework" results into a shared value "aRework"; putting the letter "a" in front of it so if I sort by the column ascending then all of these values are at the start of the list, to hopefully make it easier and faster to process them.
dt << Recode Column(
	dt:result,
	{Map Value(
		_rcOrig,
		{"FailOnHighRework", "aRework", 
		"FailOnLowRework", "aRework"},
		Unmatched( _rcNow )
	)},
	Update Properties( 1 ),
	Target Column( :result )
);

//Sorting the table using the result column after the recode, so all of the rework rows are at the start of the table
SortColumn = "result";
dt << Sort( By( :result ), Replace Table, Order( Ascending ), Suppress formula evaluation( 1 ) );

//now run the row-by-row script
dt << begin data update;
for each row(
	If( :result == "aRework",
		xRow = Row();
		xPartID=:PartID[Row()];
		xLot=:lot[Row()];
		xDate=:date_created[Row()];
		dt <<select where(:partid == xPartID & :lot==xLot & :date_created<=xDate);
		dt << delete rows();
	);
	//if we run out of reworks, bail out - it's over, we're done here
	if(:result!="aRework",
		Break();
	);
);
dt << End Data Update;
//------------------

I am on JMP 16.2.0 (570548).

txnelson · Oct 11, 2022 12:18 PM

I believe you are running into a problem where you are looping through a data table using a For Each Row, and you keep changing the table you are looping through by deleting rows, and the loop is getting confused.

Here is an untested piece of code that I think will eliminate the issue and should run fairly effecicently.

Names Default To Here( 1 );
//Recoding the two potential "Rework" results into a shared value "aRework"; putting the letter "a" in front of it so if I sort by the column ascending then all of these values are at the start of the list, to hopefully make it easier and faster to process them.
dt << Recode Column(
	dt:result,
	{Map Value( _rcOrig, {"FailOnHighRework", "aRework", "FailOnLowRework", "aRework"}, Unmatched( _rcNow ) )},
	Update Properties( 1 ),
	Target Column( :result )
);

dt << select where( :result == "aRework" );
dt2 = dt << subset( selected rows( 1 ), selected columns( 0 ) );

// Remove aRework rows from original table
dt << delete rows;

// Loop across the aRework table and delete rows in orininal table
For( i = 1, i <= N Rows( dt2 ), i++,
	dt << select where(
		(dt:partid == dt2:PartID[i] & dt:lot == dt2:Lot[i] & dt:date_created <= dt2:Date_created[i]);
		If( N Rows( dt << get selected ) > 0,
			dt << delete rows
		);
	)
);

close( dt2, nosave );

I did not have a sample data table to work with, so the code is untested, but the approach should allow you to fix any issues and get it to work.

Jim

FreshUser · Oct 11, 2022 01:32 PM

Hello, and thank you!

I am running the script against my data table now; I think you were definitely right on my iterating through rows on a table that is actively having the row being considered deleted, which was probably messing with the loop logic.

I'm running the code against my data table now and will report back. I'm also going to try to see if I can sanitize some snippet of my data table to allow for me to include it in this post. Thank you so much for your assistance!

FreshUser · Oct 11, 2022 02:39 PM

Hello,

The script appears to be working as intended, and when run against a very small dataset (10k rows total) it appears to be good and fast, however I'm having significant slowdowns with the actual dataset I'm working with (1.6 million rows) such that I can't tell if it's killing the program (does not respond, left it running for around 1.5 hours and it was still seemingly running but process didn't respond and ultimately had to "End Task" it through task manager) - any advice on how to improve? Running against 10k rows took maybe < 10 seconds, so I'd hope that for 1.6 million it would take maybe 30 minutes if it scaled linearly, but that doesn't seem to be the case.

I'm uploading a sanitized subset of the data table for reference with the suggested script improvement saved to data table. The sanitized table only had a single LotID present, but this is not typically the case and is just for minimizing the total size of the file for upload.

Here's the current version of the script I'm trying to run against the table:

dt = current data table();

dt << select where( :result == "aRework" );
dt2 = dt << subset( selected rows( 1 ), selected columns( 0 ) );

// Remove aRework rows from original table
dt << delete rows;

// Loop across the aRework table and delete rows in orininal table
dt << begin data update;
TotalRows=N Rows(dt2);
For( i = 1, i <= N Rows( dt2 ), i+=1,
	dt << select where(dt:partid == dt2:PartID[i] & dt:lot == dt2:Lot[i] & dt:date_created <= dt2:Date_created[i]);
		If( N Rows( dt << get selected rows ) > 0,
			dt << delete rows
		);
);
dt << end data update;

close( dt2, nosave );

Once again, thank you so much for the help!

txnelson · Oct 11, 2022 03:54 PM

Here is a version of the script that prints out after every 100 rows in dt have been proces

dt = Current Data Table();

dt << select where( :result == "aRework" );
dt2 = dt << subset( selected rows( 1 ), selected columns( 0 ) );

// Remove aRework rows from original table
dt << delete rows;

// Loop across the aRework table and delete rows in orininal table
//dt << begin data update;
start = 1;
end = 0;
alldt2 = N Rows( dt2 );
While( N Rows( dt2 ) > 100,
	TotalRows = N Rows( dt2 );
	If( TotalRows > 100,
		TotalRows = 100
	);
	end = end + TotalRows;
	For( i = 1, i <= TotalRows, i += 1,
		dt << select where(
			dt:partid == dt2:PartID[i] & dt:lot == dt2:Lot[i] & dt:date_created <= dt2
			:Date_created[i]
		);
		If( N Rows( dt << get selected rows ) > 0,
			dt << delete rows
		);
	);
//dt << end data update;
	dt2 << delete rows( 1 :: 100 );
	Show(
		"Rows " || Char( start ) || " to " || Char( end ) || " of " || Char( alldt2 ) ||
		" processed"
	);
);
Close( dt2, nosave );

You could also set it up to just process x number of rows and then rerun it to process more rows.

Jim

Problem with script to delete redundant measurements specific to certain conditions

Re: Problem with script to delete redundant measurements specific to certain conditions

Re: Problem with script to delete redundant measurements specific to certain conditions

Re: Problem with script to delete redundant measurements specific to certain conditions

Re: Problem with script to delete redundant measurements specific to certain conditions