cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
LaserGuy
Level II

Nearest Neighbor Algorithm: Need Optimization Help

Hi Everyone,

 

I have created the following script. It is meant to analyze a data table where each row corresponds to a chip in a wafer, and each row has a coordinate (x,y) and some parameters. The script will go through each row, look for other rows that correspond to dies that are "neighbors" to this die, and calculate the median and standard deviation of some parameter for that neighborhood.

 

The issue here is that it while it works, it takes a long time to run when analyzing data sets with a tens of thousands of rows.

 

I would really appreciate it if anyone can give pointers on how to optimize the script to reduce the computation time.

 

Code Below:

dt0 = current data table();
	dt0 << new column("NN_median");
	dt0 << new column("NN_stdev");

for ( i = 1, i <= N Rows(dt0), i++,

	//Find rows corresponding to neighboring dies
	m = dt0[0,{"x", "y"}];
	v = ( abs( m[0,1] - :x[i] ) <= 2 )
		&
		( abs( m[0,2] - :y[i] ) <= 2 );
	rows = loc(v);
	
	// Create new DT of neighborhood

	dt1 = dt0 << subset( Rows(rows), Columns(:param), invisible );
	
	//Calculate Median and Std Dev
	tb = Tabulate(
		Add Table(
			Column Table( Statistics( Median, Std Dev ) ),
			Row Table( Analysis Columns( column("param") ) )
		)
	);

	dt2 = tb << make into data table( invisible );
	tb << close window;
	
	current data table(dt2);
	placeholder_median = :Median[1];
	placeholder_stdev = :Std Dev[1];
	
	close(dt2, no save);
	close(dt1, no save);
	
	current data table(dt0);
	:NN_median[i] = placeholder_median;
	:NN_stdev[i] = placeholder_stdev;

);
1 ACCEPTED SOLUTION

Accepted Solutions
txnelson
Super User

Re: Nearest Neighbor Algorithm: Need Optimization Help

I obtained the identical results in 11 seconds using

dt0 = Current Data Table();
dt0 << New Column( "NN_median" );
dt0 << New Column( "NN_stdev" );
	
m = dt0[0, {"x", "y"}];
For( i = 1, i <= N Rows( dt0 ), i++, 

	//Find rows corresponding to neighboring dies
	
	v = (Abs( m[0, 1] - :x[i] ) <= 2) & (Abs( m[0, 2] - :y[i] ) <= 2);
	rows = Loc( v );
	dt0:NN_stdev[i] = Std Dev( dt0:param[rows] );
	dt0:NN_median[i] = Quantile( .5, dt0:param[rows] );
);

while your code too over 13 minutes

Jim

View solution in original post

3 REPLIES 3
txnelson
Super User

Re: Nearest Neighbor Algorithm: Need Optimization Help

I obtained the identical results in 11 seconds using

dt0 = Current Data Table();
dt0 << New Column( "NN_median" );
dt0 << New Column( "NN_stdev" );
	
m = dt0[0, {"x", "y"}];
For( i = 1, i <= N Rows( dt0 ), i++, 

	//Find rows corresponding to neighboring dies
	
	v = (Abs( m[0, 1] - :x[i] ) <= 2) & (Abs( m[0, 2] - :y[i] ) <= 2);
	rows = Loc( v );
	dt0:NN_stdev[i] = Std Dev( dt0:param[rows] );
	dt0:NN_median[i] = Quantile( .5, dt0:param[rows] );
);

while your code too over 13 minutes

Jim
ian_jmp
Staff

Re: Nearest Neighbor Algorithm: Need Optimization Help

I didn't do any benchmarking, but might expect that using 'KDTable()' would be good for larger, more general problems:

Names Default To Here( 1 );

n = 100;		// Size of square

// Make a table of x and y locations
dtx = AsTable((1::n)`, << Invisible);
dty = AsTable((1::n)`, << Invisible);
dt = dty << Join( With( dtx ), Cartesian Join );
Close(dtx, NoSave);
Close(dty, NoSave);
Column(dt, 1) << setName("x");
Column(dt, 2) << setName("y");
dt << deleteProperty("Source");
dt << setName("xy Locations");

// Put these locations into a matrix
mat = dt << getAsMatrix;
tbl = KDTable( mat );

// Pick a point in the interior, so we don't need to decide on any boundary conditions
myX = RandomInteger(2, n-1);
myY = RandomInteger(2, n-1);
myRow = Loc(mat[0, 1] == myX & mat[0, 2] == myY);
myRow = myRow[1];

// Find the 8 nearest neighbours to the chosen point
{rows, dist} = tbl << K nearest rows( 8, myRow ); 
Show( myRow, rows );

// Check it worked
myXvals = Column(dt, "x")[myRow];
myYvals = Column(dt, "y")[myRow];
dt << NewColumn("Distance", Numeric, Continuous, Formula(sqrt((:x - myXvals)^2 +(:y - myYvals)^2)));
dt << selectRows(VConcat(myRow, rows`));
gb = dt << Graph Builder(
						Size( 529, 453 ),
						Show Control Panel( 0 ),
						Variables( X( :x ), Y( :y ), Color( :Distance ) ),
						Elements( Points( X, Y, Legend( 5 ) ) )
					);

 

LaserGuy
Level II

Re: Nearest Neighbor Algorithm: Need Optimization Help

Thanks! I am learning JMP script syntaxes as I go along. Your code certainly bypasses quite a few unnecessary operations.