Hi everyone,
I have to calculate the mean of neighbors dies/components on a microelectronic wafer for example on a 3x3 box like that:
The reference component is in the centre of a box and therefore surrounded by 8 components whose mean I want to calculate, for each row of the data table (then calculate the same thing for a 5x5 box and 16 neighbors...)
It will be used as a new feature for machine learning.
My code is:
dt = current data table();
// building the KD table
matrice = (dt:row << getvalues) || (dt:col << getvalues);
table = KDTable(matrice);
for( i=1, i<=nrows(dt),i++,
// get the 3x3 neighbors
neighbors = table << Knearestrows(8,i);
neighbors_number = neighbors[1,1];
// Select the neighbors and get their metrics values and calculate the mean
dt << select rows (neighbors_number );
mean_selection = Col Mean( If( Selected(), :metric, . ) );
:mean_metric[i] = mean_selection;
dt << clear select
);Has someone a smart way to do that calculation more efficiently?
thanks!
I'm not sure how the efficiency compares, but here is an alternative for you to try out.
dt = current data table();
metrics = dt[0, "metric"];
rows = dt[0, "row"];
cols = dt[0, "col"];
d = 1; // 1 => 3x3, 2 => 5x5
dt << Begin Data Update();
For Each Row(
neighbors = Loc( :col - d <= cols <= :col + d & :row - d <= rows <= :row + d );
mean all = Mean( metrics[neighbors] );
nn = N Rows( neighbors );
mean without center = (mean all * nn - :metric) / (nn - 1);
:mm = mean without center;
);
dt << End Data Update();
I'm pretty sure the size of the problem will determine whether the kdtable() or loc() approach is better. I'd guess loc() will win for a 10x10 or smaller matrix, and kdtable for 100x100 or bigger.
Xan's point about the edges is important. Loc() might be the easiest way to get the right answer.
Thank you very much it's very interesting!
I sometimes use "begin data update" but I didn't think of it here: good point. With that, the Loc() solution is actually a bit slower than knearestrows() if you add the begin update I find for 20,000 lines:
old script without begin update: 116s !
knearestrows() with begin update: 5s
Loc() with begin update: 14s
but for the knearestrows I had to change the way I calculate the mean since we can no longer use select rows with the begin/end update.
For the edges you are perfectly right, I had thought of putting in:
neighbors = table << Knearestrows( {8, 1.5}, i );
To avoid this problem using a limited radius
Thanks again!
on the other hand I realize that the solution of the radius limit is not perfect because we still take an extra element in the array
for example a radius of 1.5 contains a value of 2 because this is the value that stops the neighbors search... So it doesn't give me exactly what I want, I'll have to search a little more!
You could pad enough rows and cols of missing values, left, right, top, bottom. Depending how far you go with extending the neighborhood, you might have 8 complete missing value wafers surrounding the real wafer.
There is another way to do this, also requiring the dummy rows and cols, which will be really fast:
JMP 2D matrices can be indexed as 1D linear matrices.
[ 1 2,
3 4,
5 6 ] (3 rows, 2 cols)
also looks like
[ 1,
2,
3,
4,
5,
6] (6 rows, 1 col when using one subscript)
If the wafer is NRows x NCols then put it in the middle of a 3NRows x 3NCols matrix (I'll call it M3).
You can make a matrix of 1D subscripts to index M3 by adding 1 to move horizontally and by adding 3NCols to move vertically. To extract a 3x3 submatrix from M3, use the index matrix and the shape() function.
something like this:
M3 = [. . . . . . . . .,
. . . . . . . . .,
. . . . . . . . .,
. . . 1 2 3 . . .,
. . . 4 5 6 . . .,
. . . 7 8 9 . . .,
. . . . . . . . .,
. . . . . . . . .,
. . . . . . . . .];
indexer2x2 = [1 2 10 11]; // top left 2x2 sub-matrix
For( x = 2, x <= 5, x += 1,
For( y = 2, y <= 5, y += 1,
Write( "\!n ", x, " ", y, " ", Shape( M3[indexer2x2 + x + y * 9], 2 ) );
)
);2 2 [. ., . 1]
2 3 [. 1, . 4]
2 4 [. 4, . 7]
2 5 [. 7, . .]
3 2 [. ., 1 2]
3 3 [1 2, 4 5]
3 4 [4 5, 7 8]
3 5 [7 8, . .]
4 2 [. ., 2 3]
4 3 [2 3, 5 6]
4 4 [5 6, 8 9]
4 5 [8 9, . .]
5 2 [. ., 3 .]
5 3 [3 ., 6 .]
5 4 [6 ., 9 .]
5 5 [9 ., . .]
You might need this too: Using Loc with a 2D Matrix
You can make indexer3x3, etc and just reuse the M3 for each level. the for loop x=3, x<=5 needs to go a bit further each time.
edit: the shape function may be unneeded if you are just getting the mean of the indexed elements, but it helps show what happened above. more: JSL matrices have special behavior when the index is less than 1 and you will not get the error message you might hope for! if the indexer contains a zero (or -1 etc) it will seem very strange.
Hi @Franck_R ,
Perhaps this can also help you: Add-In: Spatial Data Analysis
If what you are looking for is the spatial correlation, Moran's I is perhaps the most basic concept.
Ron
Thanks for all this, I'm going to dig more deeply into it