Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

- JMP User Community
- :
- Discussions
- :
- Nearest Neighbor to detect outliers

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Created:
Mar 7, 2020 9:31 PM
| Last Modified: Mar 7, 2020 10:12 PM
(1377 views)

I have a data set with three columns X,Y and Z. There are several rows in the table. X and Y coordinates in each row is one location in XY space and Z is the height at that XY location. I want to create a 3x3 cluster at each XY location and calculate the average of 8 nearest neighbor points (excluding the center point). Then subtract Z height of the XY location in the center from the calculated average and if the difference is greater than the threshold (decided by user) then that XY location will be called an outlier.This will continue until all XY locations have been checked for outlier. Are there any built in functions in JMP that I can use? How can I script it in JSL? Also how are the edge coordinates handled? An example image is shown below. Any guidance will be highly appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

There are no functions that can be called for this very specialized data process. Here is a short script that performs the essential task.

```
Names Default to Here( 1 );
dt1 = Current Data Table();
// dimensions of results
nr = Col Maximum( dt1:Y ) - 1; // assumes first level is 0
nc = Col Maximum( dt1:X ) - 1; // assumes first level is 0
net height = [];
// iterate over 3x3 cells
For( c = 1, c < nc, c++,
For( r = 1, r < nr, r++,
cell = dt1 << Get Rows Where(
Or(
:X == r-1 & :Y == c-1,
:X == r-1 & :Y == c,
:X == r-1 & :Y == c+1,
:X == r & :Y == c-1,
:X == r & :Y == c,
:X == r & :Y == c+1,
:X == r+1 & :Y == c-1,
:X == r+1 & :Y == c,
:X == r+1 & :Y == c+1,
)
);
target = dt1 << Get Rows Where( :X == r & :Y == c );
net height |/= Matrix( { c, r, (((Sum( dt1:Z[cell] ) - dt1:Z[target]) / 8) - dt1:z[target])[1] } )`;
);
);
dt2 = As Table( net height, << Column Names( { "X", "Y", "Net Z" } ) );
// assume user threshold is 1.5
threshold = 1.5;
dt2 << New Table Variable( "Threshold", "1.5" );
dt2 << New Column( "Outlier", "Numeric", "Nominal",
Values( net height[0,3] > threshold ),
Value Labels( {0 = "No", 1 = "Yes"} ),
Use Value Labels( 1 )
);
dt2 << New Script( "Plot Outliers",
Current Data Table() << Graph Builder(
Size( 522, 454 ),
Show Control Panel( 0 ),
Variables( X( :X ), Y( :Y ), Color( :Outlier ) ),
Elements( Points( X, Y, Legend( 3 ) ) )
);
);
```

Learn it once, use it forever!

3 REPLIES 3

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

There are no functions that can be called for this very specialized data process. Here is a short script that performs the essential task.

```
Names Default to Here( 1 );
dt1 = Current Data Table();
// dimensions of results
nr = Col Maximum( dt1:Y ) - 1; // assumes first level is 0
nc = Col Maximum( dt1:X ) - 1; // assumes first level is 0
net height = [];
// iterate over 3x3 cells
For( c = 1, c < nc, c++,
For( r = 1, r < nr, r++,
cell = dt1 << Get Rows Where(
Or(
:X == r-1 & :Y == c-1,
:X == r-1 & :Y == c,
:X == r-1 & :Y == c+1,
:X == r & :Y == c-1,
:X == r & :Y == c,
:X == r & :Y == c+1,
:X == r+1 & :Y == c-1,
:X == r+1 & :Y == c,
:X == r+1 & :Y == c+1,
)
);
target = dt1 << Get Rows Where( :X == r & :Y == c );
net height |/= Matrix( { c, r, (((Sum( dt1:Z[cell] ) - dt1:Z[target]) / 8) - dt1:z[target])[1] } )`;
);
);
dt2 = As Table( net height, << Column Names( { "X", "Y", "Net Z" } ) );
// assume user threshold is 1.5
threshold = 1.5;
dt2 << New Table Variable( "Threshold", "1.5" );
dt2 << New Column( "Outlier", "Numeric", "Nominal",
Values( net height[0,3] > threshold ),
Value Labels( {0 = "No", 1 = "Yes"} ),
Use Value Labels( 1 )
);
dt2 << New Script( "Plot Outliers",
Current Data Table() << Graph Builder(
Size( 522, 454 ),
Show Control Panel( 0 ),
Variables( X( :X ), Y( :Y ), Color( :Outlier ) ),
Elements( Points( X, Y, Legend( 3 ) ) )
);
);
```

Learn it once, use it forever!

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Nearest Neighbor to detect outliers

a similar idea, different implementation

```
// make sample data
dt = New Table( "sample",
New Column( "x", Numeric, "Continuous", Format( "Best", 12 ) ),
New Column( "y", Numeric, "Continuous", Format( "Best", 12 ) ),
New Column( "z", Numeric, "Continuous", Format( "Best", 12 ) )
);
For( ix = 0, ix < 200, ix += 1,
For( iy = 0, iy < 250, iy += 1,
dt << addrows( 1 );
dt:x = ix;
dt:y = iy;
dt:z = Random Normal( 30, .25 );
)
);
// load into a matrix that is 1 row/col bigger all around
// this assumes the x/y data is gridded, integers, no holes
// but no particular order
xmin = Col Min( dt:x );
xmax = Col Max( dt:x );
ymin = Col Min( dt:y );
ymax = Col Max( dt:y );
m = J( ymax - ymin + 3, xmax - xmin + 3, . );// +3 makes a border of missing values on all sides
For( i = 1, i <= N Rows( dt ), i += 1,
// subtract minimum makes it zero based. add 1 to get one-based, but add 2 to leave the border
m[dt:y[i] - ymin + 2, dt:x[i] - xmin + 2] = dt:z[i]; // copy each z to its x,y (col,row) element
);
meanmat = J( ymax - ymin + 1, xmax - xmin + 1, . ); // +1 is original size
// the x and y loops do not include the border
For( ix = 2, ix <= N Cols( m ) - 1, ix += 1,
For( iy = 2, iy <= N Rows( m ) - 1, iy += 1,
// smallmat is 3x3 and may include missing values from border
smallmat = m[(iy - 1) :: (iy + 1), (ix - 1) :: (ix + 1)];
smallmat[2, 2] = .;// remove center point from consideration
meanmat[iy - 1, ix - 1] = Mean( smallmat ); // mean ignores missing values: mean([8 2 .])==5
)
);
threshold = .5; // detection threshold
// the error matrix has 0 for ok, 1 for beyond threshold. the subscripts on m[]
// remove the border to make it line up with the meanmat.
error = Abs( m[2 :: (ymax - ymin + 2), 2 :: (xmax - xmin + 2)] - meanmat ) > threshold;
// add the outlier indicator back to the table
dt << New Column( "outliers" );
For( i = 1, i <= N Rows( dt ), i += 1,
dt:outliers[i] = error[dt:y[i] - ymin + 1, dt:x[i] - xmin + 1]
);
// fiddle with the tabl's row states to make the graph
dt << colorOrMarkByColumn( outliers );
dt << selectwhere( outliers == 1 );
// a graph
dt << Surface Plot(
Columns( :x, :y, :z ),
Datapoints Choice( "Points" ),
Response( :z ),
Surface Color Method( "Solid", "Solid", "Solid", "Solid" ),
SetVariableAxis( :x, Axis Data( {} ) ),
SetVariableAxis( :y, Axis Data( {} ) ),
SetZAxis( :z, Current Value( 30.5 ) ),
SetXVariable( :x ),
SetYVariable( :y ),
Frame3D(
Set Graph Size( 900, 900 ),
Set Rotation( -89, 1, -35 )
)
);
```

This does assume the data's x and y coordinates are consecutive integers. Make sure the edge behavior is what you expect and make sure the center is left out the way you expect. Test carefully! For example, a 2x2 case like this:

Craige

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Nearest Neighbor to detect outliers

Thank you very much. Your solution does exactly what I asked for. I just changed < to <= in the nested for loop to get one additional row and column. I am going to add a variable n for size of nearest neighbor array in case I need to use a different size like 5x5 etc.

Article Labels

There are no labels assigned to this post.