cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
The Discovery Summit 2025 Call for Content is open! Submit an abstract today to present at our premier analytics conference.
Get the free JMP Student Edition for qualified students and instructors at degree granting institutions.
Choose Language Hide Translation Bar
View Original Published Thread

Challenge 11 - Outlier Screening

jmp-challenge-v2.pngOutlier screening is an anomaly detection methodology where the data is typically numeric. JMP has four outlier screening methods under the Analyze > Screening > Explore Outliers menu option. Two of these methods are univariate, applicable to outliers that can be identified from a single column of data.

These methods use a standard approach of defining outliers as observations that lie far from the center of the data. Algorithmically defining far and center are at the heart of these types of methods. They work well when the data is unimodal and unbounded but not so well when the outliers occur close to a boundary or as clusters between major modes or at the ends of the data values.

In these situations, standard outlier screening methods either find no outliers or require multiple iterations. Three examples that elicit these behaviors are given in the Three Outlier Examples.JMP below.

For this month’s challenge, develop an algorithm that will find these types of outliers in addition to those found by standard techniques. Assume the outliers are univariate and the data numeric. The Probe.JMP sample data table is a good set of data to test your algorithm.

Comments
mzwald
Staff

This was a fun exercise!  Here is my submission for the Bimodal column for the attached datatable.  I will admit it took some fine tuning of the K parameters to get the desired results.  For the other 2 columns, it was less clear to me what should be considered an outlier.

 

dt = Open( "Three Outlier Examples.jmp" );

obj1 = dt << Explore Outliers(
	Y( :Bimodal ),
	K Nearest Neighbor Outliers( K( 20 ), Impute Missing( 1 ) )
); 

obj1 << Save NN Distances;

obj2 = dt << Explore Outliers(
	Y( :Nearest 20 Distance ),
	Robust Fit Outliers( K Sigma( 200 ), Huber( 1 ) )
);

obj2 << Select Rows (:"Nearest 20 Distance"n) << Color Rows (:"Nearest 20 Distance"n);

dt << Graph Builder(
	Size( 489, 454 ),
	Show Control Panel( 0 ),
	Variables( X( :Bimodal ) ),
	Elements( Points( X, Legend( 3 ) ) )
);


 

DonMcCormack
Staff

Thanks @mzwald, I like the approach.