topic Near neighbors mean calculation in Discussions

Near neighbors mean calculation

Franck_R — Fri, 09 Jun 2023 22:02:16 GMT

Hi everyone,

I have to calculate the mean of neighbors dies/components on a microelectronic wafer for example on a 3x3 box like that:

The reference component is in the centre of a box and therefore surrounded by 8 components whose mean I want to calculate, for each row of the data table (then calculate the same thing for a 5x5 box and 16 neighbors...)

It will be used as a new feature for machine learning.

My code is:

dt = current data table();

// building the KD table
matrice = (dt:row << getvalues) || (dt:col << getvalues);

table = KDTable(matrice);

for( i=1, i<=nrows(dt),i++,
    // get the 3x3 neighbors
	neighbors = table << Knearestrows(8,i);
	neighbors_number = neighbors[1,1];
	// Select the neighbors and get their metrics values and calculate the mean
    dt << select rows (neighbors_number );	
	mean_selection = Col Mean( If( Selected(), :metric, . ) );	
	:mean_metric[i] = mean_selection;
	dt << clear select
	
);

Has someone a smart way to do that calculation more efficiently?

thanks!

Re: Near neighbors mean calculation

XanGregg — Mon, 04 Jan 2021 20:32:23 GMT

I'm not sure how the efficiency compares, but here is an alternative for you to try out.

Using begin/end data update() will help for any method
Using Nearest Neighbor is clever (and maybe even more efficient) but be aware that it will always return the 8 nearest, even at the edges of the wafer where the neighbors will all be on one side.
Matrix subscripting may be more efficient than Col Mean().

dt = current data table();
metrics = dt[0, "metric"];
rows = dt[0, "row"];
cols = dt[0, "col"];
d = 1;	// 1 => 3x3, 2 => 5x5

dt << Begin Data Update();
For Each Row(
	neighbors = Loc( :col - d <= cols <= :col + d & :row - d <= rows <= :row + d );
	mean all = Mean( metrics[neighbors] );
	nn = N Rows( neighbors );
	mean without center = (mean all * nn - :metric) / (nn - 1);
	:mm = mean without center;
);
dt << End Data Update();

Re: Near neighbors mean calculation

Craige_Hales — Mon, 04 Jan 2021 23:09:27 GMT

I'm pretty sure the size of the problem will determine whether the kdtable() or loc() approach is better. I'd guess loc() will win for a 10x10 or smaller matrix, and kdtable for 100x100 or bigger.

Xan's point about the edges is important. Loc() might be the easiest way to get the right answer.

Re: Near neighbors mean calculation

Franck_R — Tue, 05 Jan 2021 20:01:32 GMT

Thank you very much it's very interesting!
I sometimes use "begin data update" but I didn't think of it here: good point. With that, the Loc() solution is actually a bit slower than knearestrows() if you add the begin update I find for 20,000 lines:

old script without begin update: 116s !

knearestrows() with begin update: 5s

Loc() with begin update: 14s

but for the knearestrows I had to change the way I calculate the mean since we can no longer use select rows with the begin/end update.

For the edges you are perfectly right, I had thought of putting in:

neighbors = table << Knearestrows( {8, 1.5}, i );

To avoid this problem using a limited radius

Thanks again!

Re: Near neighbors mean calculation

Franck_R — Tue, 05 Jan 2021 09:00:03 GMT

on the other hand I realize that the solution of the radius limit is not perfect because we still take an extra element in the array
for example a radius of 1.5 contains a value of 2 because this is the value that stops the neighbors search... So it doesn't give me exactly what I want, I'll have to search a little more!

Re: Near neighbors mean calculation

Craige_Hales — Tue, 05 Jan 2021 13:01:44 GMT

You could pad enough rows and cols of missing values, left, right, top, bottom. Depending how far you go with extending the neighborhood, you might have 8 complete missing value wafers surrounding the real wafer.

There is another way to do this, also requiring the dummy rows and cols, which will be really fast:

JMP 2D matrices can be indexed as 1D linear matrices.

[ 1 2,

3 4,

5 6 ] (3 rows, 2 cols)

also looks like

[ 1,

6] (6 rows, 1 col when using one subscript)

If the wafer is NRows x NCols then put it in the middle of a 3NRows x 3NCols matrix (I'll call it M3).

You can make a matrix of 1D subscripts to index M3 by adding 1 to move horizontally and by adding 3NCols to move vertically. To extract a 3x3 submatrix from M3, use the index matrix and the shape() function.

something like this:

M3 = [. . . . . . . . .,
. . . . . . . . .,
. . . . . . . . .,
. . . 1 2 3 . . .,
. . . 4 5 6 . . .,
. . . 7 8 9 . . .,
. . . . . . . . .,
. . . . . . . . .,
. . . . . . . . .];

indexer2x2 = [1 2 10 11]; // top left 2x2 sub-matrix

For( x = 2, x <= 5, x += 1,
	For( y = 2, y <= 5, y += 1,
		Write( "\!n ", x, " ", y, " ", Shape( M3[indexer2x2 + x + y * 9], 2 ) );
	
	)
);

2 2 [. ., . 1]
2 3 [. 1, . 4]
2 4 [. 4, . 7]
2 5 [. 7, . .]
3 2 [. ., 1 2]
3 3 [1 2, 4 5]
3 4 [4 5, 7 8]
3 5 [7 8, . .]
4 2 [. ., 2 3]
4 3 [2 3, 5 6]
4 4 [5 6, 8 9]
4 5 [8 9, . .]
5 2 [. ., 3 .]
5 3 [3 ., 6 .]
5 4 [6 ., 9 .]
5 5 [9 ., . .]

You might need this too:

You can make indexer3x3, etc and just reuse the M3 for each level. the for loop x=3, x<=5 needs to go a bit further each time.

edit: the shape function may be unneeded if you are just getting the mean of the indexed elements, but it helps show what happened above. more: JSL matrices have special behavior when the index is less than 1 and you will not get the error message you might hope for! if the indexer contains a zero (or -1 etc) it will seem very strange.

Re: Near neighbors mean calculation

ron_horne — Tue, 05 Jan 2021 20:06:41 GMT

Hi @Franck_R ,

Perhaps this can also help you:

If what you are looking for is the spatial correlation, Moran's I is perhaps the most basic concept.

Ron

Re: Near neighbors mean calculation

Franck_R — Wed, 06 Jan 2021 07:44:55 GMT

Thanks for all this, I'm going to dig more deeply into it