Choose Language Hide Translation Bar
Highlighted
rverma
Level II

Nearest Neighbor to detect outliers

I have a data set with three columns X,Y and Z. There are several rows in the table. X and Y coordinates in each row is one location in XY space and Z is the height at that XY location. I want to create a 3x3 cluster at each XY location and calculate the average of 8 nearest neighbor points (excluding the center point). Then subtract Z height of the XY location in the center from the calculated average and if the difference is greater than the threshold (decided by user) then that XY location will be called an outlier.This will continue until all XY locations have been checked for outlier. Are there any built in functions in JMP that I can use? How can I script it in JSL? Also how are the edge coordinates handled? An example image is shown below. Any guidance will be highly appreciated.

Array.PNG

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Nearest Neighbor to detect outliers

There are no functions that can be called for this very specialized data process. Here is a short script that performs the essential task.

 

Names Default to Here( 1 );

dt1 = Current Data Table();

// dimensions of results
nr = Col Maximum( dt1:Y ) - 1; // assumes first level is 0
nc = Col Maximum( dt1:X ) - 1; // assumes first level is 0

net height = [];

// iterate over 3x3 cells
For( c = 1, c < nc, c++,
	For( r = 1, r < nr, r++,
		cell = dt1 << Get Rows Where(
			Or(
				:X == r-1 & :Y == c-1,
				:X == r-1 & :Y == c,
				:X == r-1 & :Y == c+1,
				:X == r   & :Y == c-1,
				:X == r   & :Y == c,
				:X == r   & :Y == c+1,
				:X == r+1 & :Y == c-1,
				:X == r+1 & :Y == c,
				:X == r+1 & :Y == c+1,
			)
		);
		target = dt1 << Get Rows Where( :X == r & :Y == c );
		net height |/= Matrix( { c, r, (((Sum( dt1:Z[cell] ) - dt1:Z[target]) / 8) - dt1:z[target])[1] } )`;
	);
);

dt2 = As Table( net height, << Column Names( { "X", "Y", "Net Z" } ) );

// assume user threshold is 1.5
threshold = 1.5;
dt2 << New Table Variable( "Threshold", "1.5" );
dt2 << New Column( "Outlier", "Numeric", "Nominal",
	Values( net height[0,3] > threshold ),
	Value Labels( {0 = "No", 1 = "Yes"} ),
	Use Value Labels( 1 )
);
dt2 << New Script( "Plot Outliers",
	Current Data Table() << Graph Builder(
		Size( 522, 454 ),
		Show Control Panel( 0 ),
		Variables( X( :X ), Y( :Y ), Color( :Outlier ) ),
		Elements( Points( X, Y, Legend( 3 ) ) )
	);
);
Learn it once, use it forever!

View solution in original post

3 REPLIES 3
Highlighted

Re: Nearest Neighbor to detect outliers

There are no functions that can be called for this very specialized data process. Here is a short script that performs the essential task.

 

Names Default to Here( 1 );

dt1 = Current Data Table();

// dimensions of results
nr = Col Maximum( dt1:Y ) - 1; // assumes first level is 0
nc = Col Maximum( dt1:X ) - 1; // assumes first level is 0

net height = [];

// iterate over 3x3 cells
For( c = 1, c < nc, c++,
	For( r = 1, r < nr, r++,
		cell = dt1 << Get Rows Where(
			Or(
				:X == r-1 & :Y == c-1,
				:X == r-1 & :Y == c,
				:X == r-1 & :Y == c+1,
				:X == r   & :Y == c-1,
				:X == r   & :Y == c,
				:X == r   & :Y == c+1,
				:X == r+1 & :Y == c-1,
				:X == r+1 & :Y == c,
				:X == r+1 & :Y == c+1,
			)
		);
		target = dt1 << Get Rows Where( :X == r & :Y == c );
		net height |/= Matrix( { c, r, (((Sum( dt1:Z[cell] ) - dt1:Z[target]) / 8) - dt1:z[target])[1] } )`;
	);
);

dt2 = As Table( net height, << Column Names( { "X", "Y", "Net Z" } ) );

// assume user threshold is 1.5
threshold = 1.5;
dt2 << New Table Variable( "Threshold", "1.5" );
dt2 << New Column( "Outlier", "Numeric", "Nominal",
	Values( net height[0,3] > threshold ),
	Value Labels( {0 = "No", 1 = "Yes"} ),
	Use Value Labels( 1 )
);
dt2 << New Script( "Plot Outliers",
	Current Data Table() << Graph Builder(
		Size( 522, 454 ),
		Show Control Panel( 0 ),
		Variables( X( :X ), Y( :Y ), Color( :Outlier ) ),
		Elements( Points( X, Y, Legend( 3 ) ) )
	);
);
Learn it once, use it forever!

View solution in original post

Highlighted
Craige_Hales
Staff (Retired)

Re: Nearest Neighbor to detect outliers

a similar idea, different implementation

// make sample data
dt = New Table( "sample",
	New Column( "x", Numeric, "Continuous", Format( "Best", 12 ) ),
	New Column( "y", Numeric, "Continuous", Format( "Best", 12 ) ),
	New Column( "z", Numeric, "Continuous", Format( "Best", 12 ) )
);
For( ix = 0, ix < 200, ix += 1,
	For( iy = 0, iy < 250, iy += 1,
		dt << addrows( 1 );
		dt:x = ix;
		dt:y = iy;
		dt:z = Random Normal( 30, .25 );
	)
);

// load into a matrix that is 1 row/col bigger all around
// this assumes the x/y data is gridded, integers, no holes
// but no particular order

xmin = Col Min( dt:x );
xmax = Col Max( dt:x );
ymin = Col Min( dt:y );
ymax = Col Max( dt:y );

m = J( ymax - ymin + 3, xmax - xmin + 3, . );// +3 makes a border of missing values on all sides

For( i = 1, i <= N Rows( dt ), i += 1,
	// subtract minimum makes it zero based. add 1 to get one-based, but add 2 to leave the border
	m[dt:y[i] - ymin + 2, dt:x[i] - xmin + 2] = dt:z[i]; // copy each z to its x,y (col,row) element
);

meanmat = J( ymax - ymin + 1, xmax - xmin + 1, . ); // +1 is original size

// the x and y loops do not include the border 
For( ix = 2, ix <= N Cols( m ) - 1, ix += 1,
	For( iy = 2, iy <= N Rows( m ) - 1, iy += 1,
		// smallmat is 3x3 and may include missing values from border
		smallmat = m[(iy - 1) :: (iy + 1), (ix - 1) :: (ix + 1)];
		smallmat[2, 2] = .;// remove center point from consideration
		meanmat[iy - 1, ix - 1] = Mean( smallmat ); // mean ignores missing values: mean([8 2 .])==5
	)
);

threshold = .5; // detection threshold

// the error matrix has 0 for ok, 1 for beyond threshold. the subscripts on m[]
// remove the border to make it line up with the meanmat.
error = Abs( m[2 :: (ymax - ymin + 2), 2 :: (xmax - xmin + 2)] - meanmat ) > threshold;

// add the outlier indicator back to the table
dt << New Column( "outliers" );
For( i = 1, i <= N Rows( dt ), i += 1,
	dt:outliers[i] = error[dt:y[i] - ymin + 1, dt:x[i] - xmin + 1]
);

// fiddle with the tabl's row states to make the graph
dt << colorOrMarkByColumn( outliers );
dt << selectwhere( outliers == 1 );

// a graph
dt << Surface Plot(
	Columns( :x, :y, :z ),
	Datapoints Choice( "Points" ),
	Response( :z ),
	Surface Color Method( "Solid", "Solid", "Solid", "Solid" ),
	SetVariableAxis( :x, Axis Data( {} ) ),
	SetVariableAxis( :y, Axis Data( {} ) ),
	SetZAxis( :z, Current Value( 30.5 ) ),
	SetXVariable( :x ),
	SetYVariable( :y ),
	Frame3D(
		Set Graph Size( 900, 900 ),
		Set Rotation( -89, 1, -35 )

	)
);

Red outliers above and below blue cloudRed outliers above and below blue cloud

This does assume the data's x and y coordinates are consecutive integers. Make sure the edge behavior is what you expect and make sure the center is left out the way you expect. Test carefully! For example, a 2x2 case like this:

Tiny test case making use of the missing value border, a lot!Tiny test case making use of the missing value border, a lot!

Craige
Highlighted
rverma
Level II

Re: Nearest Neighbor to detect outliers

Thank you very much. Your solution does exactly what I asked for. I just changed < to <= in the nested for loop to get one additional row and column. I am going to add a variable n for size of nearest neighbor array in case I need to use a different size like 5x5 etc.
Article Labels

    There are no labels assigned to this post.