- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Near neighbors mean calculation
Hi everyone,
I have to calculate the mean of neighbors dies/components on a microelectronic wafer for example on a 3x3 box like that:
The reference component is in the centre of a box and therefore surrounded by 8 components whose mean I want to calculate, for each row of the data table (then calculate the same thing for a 5x5 box and 16 neighbors...)
It will be used as a new feature for machine learning.
My code is:
dt = current data table();
// building the KD table
matrice = (dt:row << getvalues) || (dt:col << getvalues);
table = KDTable(matrice);
for( i=1, i<=nrows(dt),i++,
// get the 3x3 neighbors
neighbors = table << Knearestrows(8,i);
neighbors_number = neighbors[1,1];
// Select the neighbors and get their metrics values and calculate the mean dt << select rows (neighbors_number );
mean_selection = Col Mean( If( Selected(), :metric, . ) );
:mean_metric[i] = mean_selection;
dt << clear select
);
Has someone a smart way to do that calculation more efficiently?
thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Near neighbors mean calculation
I'm not sure how the efficiency compares, but here is an alternative for you to try out.
- Using begin/end data update() will help for any method
- Using Nearest Neighbor is clever (and maybe even more efficient) but be aware that it will always return the 8 nearest, even at the edges of the wafer where the neighbors will all be on one side.
- Matrix subscripting may be more efficient than Col Mean().
dt = current data table();
metrics = dt[0, "metric"];
rows = dt[0, "row"];
cols = dt[0, "col"];
d = 1; // 1 => 3x3, 2 => 5x5
dt << Begin Data Update();
For Each Row(
neighbors = Loc( :col - d <= cols <= :col + d & :row - d <= rows <= :row + d );
mean all = Mean( metrics[neighbors] );
nn = N Rows( neighbors );
mean without center = (mean all * nn - :metric) / (nn - 1);
:mm = mean without center;
);
dt << End Data Update();
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Near neighbors mean calculation
I'm pretty sure the size of the problem will determine whether the kdtable() or loc() approach is better. I'd guess loc() will win for a 10x10 or smaller matrix, and kdtable for 100x100 or bigger.
Xan's point about the edges is important. Loc() might be the easiest way to get the right answer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Near neighbors mean calculation
Thank you very much it's very interesting!
I sometimes use "begin data update" but I didn't think of it here: good point. With that, the Loc() solution is actually a bit slower than knearestrows() if you add the begin update I find for 20,000 lines:
old script without begin update: 116s !
knearestrows() with begin update: 5s
Loc() with begin update: 14s
but for the knearestrows I had to change the way I calculate the mean since we can no longer use select rows with the begin/end update.
For the edges you are perfectly right, I had thought of putting in:
neighbors = table << Knearestrows( {8, 1.5}, i );
To avoid this problem using a limited radius
Thanks again!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Near neighbors mean calculation
on the other hand I realize that the solution of the radius limit is not perfect because we still take an extra element in the array
for example a radius of 1.5 contains a value of 2 because this is the value that stops the neighbors search... So it doesn't give me exactly what I want, I'll have to search a little more!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Near neighbors mean calculation
You could pad enough rows and cols of missing values, left, right, top, bottom. Depending how far you go with extending the neighborhood, you might have 8 complete missing value wafers surrounding the real wafer.
There is another way to do this, also requiring the dummy rows and cols, which will be really fast:
JMP 2D matrices can be indexed as 1D linear matrices.
[ 1 2,
3 4,
5 6 ] (3 rows, 2 cols)
also looks like
[ 1,
2,
3,
4,
5,
6] (6 rows, 1 col when using one subscript)
If the wafer is NRows x NCols then put it in the middle of a 3NRows x 3NCols matrix (I'll call it M3).
You can make a matrix of 1D subscripts to index M3 by adding 1 to move horizontally and by adding 3NCols to move vertically. To extract a 3x3 submatrix from M3, use the index matrix and the shape() function.
something like this:
M3 = [. . . . . . . . .,
. . . . . . . . .,
. . . . . . . . .,
. . . 1 2 3 . . .,
. . . 4 5 6 . . .,
. . . 7 8 9 . . .,
. . . . . . . . .,
. . . . . . . . .,
. . . . . . . . .];
indexer2x2 = [1 2 10 11]; // top left 2x2 sub-matrix
For( x = 2, x <= 5, x += 1,
For( y = 2, y <= 5, y += 1,
Write( "\!n ", x, " ", y, " ", Shape( M3[indexer2x2 + x + y * 9], 2 ) );
)
);
2 2 [. ., . 1]
2 3 [. 1, . 4]
2 4 [. 4, . 7]
2 5 [. 7, . .]
3 2 [. ., 1 2]
3 3 [1 2, 4 5]
3 4 [4 5, 7 8]
3 5 [7 8, . .]
4 2 [. ., 2 3]
4 3 [2 3, 5 6]
4 4 [5 6, 8 9]
4 5 [8 9, . .]
5 2 [. ., 3 .]
5 3 [3 ., 6 .]
5 4 [6 ., 9 .]
5 5 [9 ., . .]
You might need this too: Using Loc with a 2D Matrix
You can make indexer3x3, etc and just reuse the M3 for each level. the for loop x=3, x<=5 needs to go a bit further each time.
edit: the shape function may be unneeded if you are just getting the mean of the indexed elements, but it helps show what happened above. more: JSL matrices have special behavior when the index is less than 1 and you will not get the error message you might hope for! if the indexer contains a zero (or -1 etc) it will seem very strange.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Near neighbors mean calculation
Hi @Franck_R ,
Perhaps this can also help you: Add-In: Spatial Data Analysis
If what you are looking for is the spatial correlation, Moran's I is perhaps the most basic concept.
Ron
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Near neighbors mean calculation
Thanks for all this, I'm going to dig more deeply into it