cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
Franck_R
Level III

Near neighbors mean calculation

Hi everyone,

 

I have to calculate the mean of neighbors dies/components on a microelectronic wafer for example on a 3x3 box like that:

box.jpg

The reference component is in the centre of a box and therefore surrounded by 8 components whose mean I want to calculate, for each row of the data table (then calculate the same thing for a 5x5 box and 16 neighbors...)

It will be used as a new feature for machine learning.

 

My code is:

dt = current data table();

// building the KD table matrice = (dt:row << getvalues) || (dt:col << getvalues); table = KDTable(matrice); for( i=1, i<=nrows(dt),i++, // get the 3x3 neighbors neighbors = table << Knearestrows(8,i); neighbors_number = neighbors[1,1]; // Select the neighbors and get their metrics values and calculate the mean
dt << select rows (neighbors_number ); mean_selection = Col Mean( If( Selected(), :metric, . ) ); :mean_metric[i] = mean_selection; dt << clear select );

Has someone a smart way to do that calculation more efficiently?

thanks!

7 REPLIES 7
XanGregg
Staff

Re: Near neighbors mean calculation

I'm not sure how the efficiency compares, but here is an alternative for you to try out.

  • Using begin/end data update() will help for any method
  • Using Nearest Neighbor is clever (and maybe even more efficient) but be aware that it will always return the 8 nearest, even at the edges of the wafer where the neighbors will all be on one side.
  • Matrix subscripting may be more efficient than Col Mean().

 

 

dt = current data table();
metrics = dt[0, "metric"];
rows = dt[0, "row"];
cols = dt[0, "col"];
d = 1;	// 1 => 3x3, 2 => 5x5

dt << Begin Data Update();
For Each Row(
	neighbors = Loc( :col - d <= cols <= :col + d & :row - d <= rows <= :row + d );
	mean all = Mean( metrics[neighbors] );
	nn = N Rows( neighbors );
	mean without center = (mean all * nn - :metric) / (nn - 1);
	:mm = mean without center;
);
dt << End Data Update();
Craige_Hales
Super User

Re: Near neighbors mean calculation

I'm pretty sure the size of the problem will determine whether the kdtable() or loc() approach is better. I'd guess loc() will win for a 10x10 or smaller matrix, and kdtable for 100x100 or bigger.

Xan's point about the edges is important. Loc() might be the easiest way to get the right answer.

Craige
Franck_R
Level III

Re: Near neighbors mean calculation

Thank you very much it's very interesting!
I sometimes use "begin data update" but I didn't think of it here: good point. With that, the Loc() solution is actually a bit slower than knearestrows() if you add the begin update I find for 20,000 lines:

old script without begin update: 116s !

knearestrows() with begin update: 5s

Loc() with begin update: 14s

but for the knearestrows I had to change the way I calculate the mean since we can no longer use select rows with the begin/end update.

 

For the edges you are perfectly right, I had thought of putting in:

neighbors = table << Knearestrows( {8, 1.5}, i );


To avoid this problem using a limited radius

Thanks again!

 
 
Franck_R
Level III

Re: Near neighbors mean calculation

on the other hand I realize that the solution of the radius limit is not perfect because we still take an extra element in the array
for example a radius of 1.5 contains a value of 2 because this is the value that stops the neighbors search... So it doesn't give me exactly what I want, I'll have to search a little more!

Craige_Hales
Super User

Re: Near neighbors mean calculation

You could pad enough rows and cols of missing values, left, right, top, bottom.  Depending how far you go with extending the neighborhood, you might have 8 complete missing value wafers surrounding the real wafer. 

There is another way to do this, also requiring the dummy rows and cols, which will be really fast:

JMP 2D matrices can be indexed as 1D linear matrices.

[ 1 2,

  3 4,

  5 6 ] (3 rows, 2 cols) 

also looks like 

[ 1,

  2,

  3,

  4,

  5,

  6] (6 rows, 1 col when using one subscript)

If the wafer is NRows x NCols then put it in the middle of a 3NRows x 3NCols matrix (I'll call it M3).

You can make a matrix of 1D subscripts to index M3 by adding 1 to move horizontally and by adding 3NCols to move vertically. To extract a 3x3 submatrix from M3, use the index matrix and the shape() function.

something like this:

M3 = [. . . . . . . . .,
. . . . . . . . .,
. . . . . . . . .,
. . . 1 2 3 . . .,
. . . 4 5 6 . . .,
. . . 7 8 9 . . .,
. . . . . . . . .,
. . . . . . . . .,
. . . . . . . . .];

indexer2x2 = [1 2 10 11]; // top left 2x2 sub-matrix

For( x = 2, x <= 5, x += 1,
	For( y = 2, y <= 5, y += 1,
		Write( "\!n ", x, " ", y, " ", Shape( M3[indexer2x2 + x + y * 9], 2 ) );
	
	)
);

2 2 [. ., . 1]
2 3 [. 1, . 4]
2 4 [. 4, . 7]
2 5 [. 7, . .]
3 2 [. ., 1 2]
3 3 [1 2, 4 5]
3 4 [4 5, 7 8]
3 5 [7 8, . .]
4 2 [. ., 2 3]
4 3 [2 3, 5 6]
4 4 [5 6, 8 9]
4 5 [8 9, . .]
5 2 [. ., 3 .]
5 3 [3 ., 6 .]
5 4 [6 ., 9 .]
5 5 [9 ., . .]

 

You might need this too: Using Loc with a 2D Matrix 

You can make indexer3x3, etc and just reuse the M3 for each level. the for loop x=3, x<=5 needs to go a bit further each time. 

edit: the shape function may be unneeded if you are just getting the mean of the indexed elements, but it helps show what happened above. more: JSL matrices have special behavior when the index is less than 1 and you will not get the error message you might hope for! if the indexer contains a zero (or -1 etc) it will seem very strange.

Craige
ron_horne
Super User (Alumni)

Re: Near neighbors mean calculation

Hi @Franck_R ,

Perhaps this can also help you: Add-In: Spatial Data Analysis 

 

If what you are looking for is the spatial correlation, Moran's I is perhaps the most basic concept.

 

Ron

 

 

Franck_R
Level III

Re: Near neighbors mean calculation

Thanks for all this, I'm going to dig more deeply into it