cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
Jackie_
Level VI

Faster iteration

Hi,


Trying to optimize the iterations.

 

I have a reference table where each column is a name from a main data table column. I want to iterate through each row of the reference data table and then add the values from Column(dt_main, "Values") to New A and New B Columns in the reference table if Column(dt_ref, "Nrows")>2 & Comment[i] == "A > B".

 

I am using an associative array because I am interested only in the last recorded values of each type A and B, mapped to the corresponding X/Y/Wafer ID.

 

The below script takes a very long time. Is there a way to optimize?

 

Names Default To Here( 1 );
start = Tick Seconds();
dt_main = Data Table( "maintable" );
dt_ref = Data Table( "referencetable" );


Try( dt_ref << delete columns( :New N, :New R ) );
dt_ref << New Column( "New A", "Numeric" );
dt_ref << New Column( "New B", "Numeric" );
nrows = [];

For( nn = 1, nn <= N Rows( dt_ref ), nn++, 
	
	Insert Into( nrows, nn )
);

dt_ref << begin data update;
For Each( {r, i}, nrows, 
	
	
	If( dt_ref:N Rows[i] > 2 & dt_ref:Comment[i] == "A > B", 
/// Create AA to get the last recorded values for each A and B to the corresponding X/Y/Wafer ID from dt_main
		countsessions = Associative Array(
			dt_main:Type[dt_main << get rows where( dt_main:X Coord == dt_ref:X Coord[i] & dt_main:Y Coord == dt_ref:Y Coord[i] & dt_main:Wafer ID == dt_ref:Wafer ID[i] )], 
			
			dt_main:Value[dt_main << get rows where( dt_main:X Coord == dt_ref:X Coord[i] & dt_main:Y Coord == dt_ref:Y Coord[i] & dt_main:Wafer ID == dt_ref:Wafer ID[i] )];
			
		);
		
		Try( dt_ref:New A[i] = countsessions["A"] );
		Try( dt_ref:New B[i] = countsessions["B"] );
		
	)
);

dt_ref << end data update;
end = Tick Seconds();

Show( Char( end - start ) );

 

1 ACCEPTED SOLUTION

Accepted Solutions
txnelson
Super User

Re: Faster iteration

I took a different route, using the built in JMP Select, Join, Split and Update platforms.  It works very fast, and I appear to get the same results, except for the A>B rows that in your reference table have missing values....but the values I come up with match the N(A) and N(b) columns.  So I may be completely off base.

Names Default To Here( 1 );
start = Tick Seconds();
dt_main = Data Table( "maintable" );
dt_ref = Data Table( "referencetable" );


Try( dt_ref << delete columns( :New A, :New B ) );

// Create a subset of the maintable that contains only data for the rows in
// the reference table that have comment value of A>B
// Select only A>B rows in reference table
dt_ref << select where(:Comment == "A > B");
dtAB = dt_ref<<subset( selected rows(1), columns(:X Coord,:Y Coord,:Wafer ID));

// Use Join to create a new main table with only matching wafer, x, y
dtMainJoin = dt_main << Join(
	With( dtAB ),
	Merge Same Name Columns,
	Match Flag( 0 ),
	By Matching Columns(
		:X Coord = :X Coord, :Y Coord = :Y Coord, :Wafer ID = :Wafer ID
	),
	Drop multiples( 0, 0 ),
	Include Nonmatches( 0, 0 ),
	Preserve main table order( 1 ),
	Output Table( "Join of maintable with Subset of referencetable 2" )
);

// Clean up by deleting dtAB
close( dtAB, nosave );

// Get rid of duplicate data, saving only the last values.
// Use Select Duplicate Rows() to find the duplicates, but since it retains the 1st
// row found, reverse the data table and then apply Select Duplicate Rows()
dtMainJoin << New Column("RowNum", set each value(Row()));

dtMainJoin = dtMainJoin << sort(by(:RowNum), Order(descending), Replace table(1));
found = dtMainJoin << Select Duplicate Rows(match(:Type, :X Coord, :Y Coord, :Wafer ID));

If(found>0, dtMainJoin << delete rows);
dtMainJoin = dtMainJoin << sort(by(:RowNum), Order(ascending), Replace table(1));
dtMainJoin << delete columns(:RowNum);

// Split the table to create the New A and New B Columns
dtSplit = dtMainJoin <<
Split(
	Split By( :Type ),
	Split( :Value ),
	Group( :X Coord, :Y Coord, :Wafer ID ),
	Output Table(
		"Split of Join of maintable with Subset of referencetable 2 by Type"
	),
	Sort by Column Property
);

// Clean up dtMainJoin table by closing it
close(dtMainJoin, nosave);

// Rename the A and B columns
dtSplit:A << set name("New A");
dtSplit:B << set name("New B");

// Use Update to merge the dtSplit data into the reference table
dt_ref << Update(
	With(
		Data Table(
			dtSplit
		)
	),
	Match Columns( :X Coord = :X Coord, :Y Coord = :Y Coord, :Wafer ID = :Wafer ID )
);

// Clean up
close( dtSplit, nosave );

dt_ref << end data update;
end = Tick Seconds();

Show( Char( end - start ) );
Jim

View solution in original post

1 REPLY 1
txnelson
Super User

Re: Faster iteration

I took a different route, using the built in JMP Select, Join, Split and Update platforms.  It works very fast, and I appear to get the same results, except for the A>B rows that in your reference table have missing values....but the values I come up with match the N(A) and N(b) columns.  So I may be completely off base.

Names Default To Here( 1 );
start = Tick Seconds();
dt_main = Data Table( "maintable" );
dt_ref = Data Table( "referencetable" );


Try( dt_ref << delete columns( :New A, :New B ) );

// Create a subset of the maintable that contains only data for the rows in
// the reference table that have comment value of A>B
// Select only A>B rows in reference table
dt_ref << select where(:Comment == "A > B");
dtAB = dt_ref<<subset( selected rows(1), columns(:X Coord,:Y Coord,:Wafer ID));

// Use Join to create a new main table with only matching wafer, x, y
dtMainJoin = dt_main << Join(
	With( dtAB ),
	Merge Same Name Columns,
	Match Flag( 0 ),
	By Matching Columns(
		:X Coord = :X Coord, :Y Coord = :Y Coord, :Wafer ID = :Wafer ID
	),
	Drop multiples( 0, 0 ),
	Include Nonmatches( 0, 0 ),
	Preserve main table order( 1 ),
	Output Table( "Join of maintable with Subset of referencetable 2" )
);

// Clean up by deleting dtAB
close( dtAB, nosave );

// Get rid of duplicate data, saving only the last values.
// Use Select Duplicate Rows() to find the duplicates, but since it retains the 1st
// row found, reverse the data table and then apply Select Duplicate Rows()
dtMainJoin << New Column("RowNum", set each value(Row()));

dtMainJoin = dtMainJoin << sort(by(:RowNum), Order(descending), Replace table(1));
found = dtMainJoin << Select Duplicate Rows(match(:Type, :X Coord, :Y Coord, :Wafer ID));

If(found>0, dtMainJoin << delete rows);
dtMainJoin = dtMainJoin << sort(by(:RowNum), Order(ascending), Replace table(1));
dtMainJoin << delete columns(:RowNum);

// Split the table to create the New A and New B Columns
dtSplit = dtMainJoin <<
Split(
	Split By( :Type ),
	Split( :Value ),
	Group( :X Coord, :Y Coord, :Wafer ID ),
	Output Table(
		"Split of Join of maintable with Subset of referencetable 2 by Type"
	),
	Sort by Column Property
);

// Clean up dtMainJoin table by closing it
close(dtMainJoin, nosave);

// Rename the A and B columns
dtSplit:A << set name("New A");
dtSplit:B << set name("New B");

// Use Update to merge the dtSplit data into the reference table
dt_ref << Update(
	With(
		Data Table(
			dtSplit
		)
	),
	Match Columns( :X Coord = :X Coord, :Y Coord = :Y Coord, :Wafer ID = :Wafer ID )
);

// Clean up
close( dtSplit, nosave );

dt_ref << end data update;
end = Tick Seconds();

Show( Char( end - start ) );
Jim