Showing results for 
Show  only  | Search instead for 
Did you mean: 
The Discovery Summit 2025 Call for Content is open! Submit an abstract today to present at our premier analytics conference.
Choose Language Hide Translation Bar
Level III

Near neighbors mean calculation

Hi everyone,


I have to calculate the mean of neighbors dies/components on a microelectronic wafer for example on a 3x3 box like that:


The reference component is in the centre of a box and therefore surrounded by 8 components whose mean I want to calculate, for each row of the data table (then calculate the same thing for a 5x5 box and 16 neighbors...)

It will be used as a new feature for machine learning.


My code is:

dt = current data table();

// building the KD table matrice = (dt:row << getvalues) || (dt:col << getvalues); table = KDTable(matrice); for( i=1, i<=nrows(dt),i++, // get the 3x3 neighbors neighbors = table << Knearestrows(8,i); neighbors_number = neighbors[1,1]; // Select the neighbors and get their metrics values and calculate the mean
dt << select rows (neighbors_number ); mean_selection = Col Mean( If( Selected(), :metric, . ) ); :mean_metric[i] = mean_selection; dt << clear select );

Has someone a smart way to do that calculation more efficiently?



Re: Near neighbors mean calculation

I'm not sure how the efficiency compares, but here is an alternative for you to try out.

  • Using begin/end data update() will help for any method
  • Using Nearest Neighbor is clever (and maybe even more efficient) but be aware that it will always return the 8 nearest, even at the edges of the wafer where the neighbors will all be on one side.
  • Matrix subscripting may be more efficient than Col Mean().



dt = current data table();
metrics = dt[0, "metric"];
rows = dt[0, "row"];
cols = dt[0, "col"];
d = 1;	// 1 => 3x3, 2 => 5x5

dt << Begin Data Update();
For Each Row(
	neighbors = Loc( :col - d <= cols <= :col + d & :row - d <= rows <= :row + d );
	mean all = Mean( metrics[neighbors] );
	nn = N Rows( neighbors );
	mean without center = (mean all * nn - :metric) / (nn - 1);
	:mm = mean without center;
dt << End Data Update();
Super User

Re: Near neighbors mean calculation

I'm pretty sure the size of the problem will determine whether the kdtable() or loc() approach is better. I'd guess loc() will win for a 10x10 or smaller matrix, and kdtable for 100x100 or bigger.

Xan's point about the edges is important. Loc() might be the easiest way to get the right answer.

Level III

Re: Near neighbors mean calculation

Thank you very much it's very interesting!
I sometimes use "begin data update" but I didn't think of it here: good point. With that, the Loc() solution is actually a bit slower than knearestrows() if you add the begin update I find for 20,000 lines:

old script without begin update: 116s !

knearestrows() with begin update: 5s

Loc() with begin update: 14s

but for the knearestrows I had to change the way I calculate the mean since we can no longer use select rows with the begin/end update.


For the edges you are perfectly right, I had thought of putting in:

neighbors = table << Knearestrows( {8, 1.5}, i );

To avoid this problem using a limited radius

Thanks again!

Level III

Re: Near neighbors mean calculation

on the other hand I realize that the solution of the radius limit is not perfect because we still take an extra element in the array
for example a radius of 1.5 contains a value of 2 because this is the value that stops the neighbors search... So it doesn't give me exactly what I want, I'll have to search a little more!

Super User

Re: Near neighbors mean calculation

You could pad enough rows and cols of missing values, left, right, top, bottom.  Depending how far you go with extending the neighborhood, you might have 8 complete missing value wafers surrounding the real wafer. 

There is another way to do this, also requiring the dummy rows and cols, which will be really fast:

JMP 2D matrices can be indexed as 1D linear matrices.

[ 1 2,

  3 4,

  5 6 ] (3 rows, 2 cols) 

also looks like 

[ 1,





  6] (6 rows, 1 col when using one subscript)

If the wafer is NRows x NCols then put it in the middle of a 3NRows x 3NCols matrix (I'll call it M3).

You can make a matrix of 1D subscripts to index M3 by adding 1 to move horizontally and by adding 3NCols to move vertically. To extract a 3x3 submatrix from M3, use the index matrix and the shape() function.

something like this:

M3 = [. . . . . . . . .,
. . . . . . . . .,
. . . . . . . . .,
. . . 1 2 3 . . .,
. . . 4 5 6 . . .,
. . . 7 8 9 . . .,
. . . . . . . . .,
. . . . . . . . .,
. . . . . . . . .];

indexer2x2 = [1 2 10 11]; // top left 2x2 sub-matrix

For( x = 2, x <= 5, x += 1,
	For( y = 2, y <= 5, y += 1,
		Write( "\!n ", x, " ", y, " ", Shape( M3[indexer2x2 + x + y * 9], 2 ) );

2 2 [. ., . 1]
2 3 [. 1, . 4]
2 4 [. 4, . 7]
2 5 [. 7, . .]
3 2 [. ., 1 2]
3 3 [1 2, 4 5]
3 4 [4 5, 7 8]
3 5 [7 8, . .]
4 2 [. ., 2 3]
4 3 [2 3, 5 6]
4 4 [5 6, 8 9]
4 5 [8 9, . .]
5 2 [. ., 3 .]
5 3 [3 ., 6 .]
5 4 [6 ., 9 .]
5 5 [9 ., . .]


You might need this too: Using Loc with a 2D Matrix 

You can make indexer3x3, etc and just reuse the M3 for each level. the for loop x=3, x<=5 needs to go a bit further each time. 

edit: the shape function may be unneeded if you are just getting the mean of the indexed elements, but it helps show what happened above. more: JSL matrices have special behavior when the index is less than 1 and you will not get the error message you might hope for! if the indexer contains a zero (or -1 etc) it will seem very strange.

Super User (Alumni)

Re: Near neighbors mean calculation

Hi @Franck_R ,

Perhaps this can also help you: Add-In: Spatial Data Analysis 


If what you are looking for is the spatial correlation, Moran's I is perhaps the most basic concept.





Level III

Re: Near neighbors mean calculation

Thanks for all this, I'm going to dig more deeply into it