Challenge 11

DonMcCormack · Sep 28, 2021 01:18 PM

Outlier screening is an anomaly detection methodology where the data is typically numeric. JMP has four outlier screening methods under the Analyze > Screening > Explore Outliers menu option. Two of these methods are univariate, applicable to outliers that can be identified from a single column of data.

These methods use a standard approach of defining outliers as observations that lie far from the center of the data. Algorithmically defining far and center are at the heart of these types of methods. They work well when the data is unimodal and unbounded but not so well when the outliers occur close to a boundary or as clusters between major modes or at the ends of the data values.

In these situations, standard outlier screening methods either find no outliers or require multiple iterations. Three examples that elicit these behaviors are given in the Three Outlier Examples.JMP below.

For this month’s challenge, develop an algorithm that will find these types of outliers in addition to those found by standard techniques. Assume the outliers are univariate and the data numeric. The Probe.JMP sample data table is a good set of data to test your algorithm.

mzwald · ‎09-28-2021

This was a fun exercise! Here is my submission for the Bimodal column for the attached datatable. I will admit it took some fine tuning of the K parameters to get the desired results. For the other 2 columns, it was less clear to me what should be considered an outlier.

dt = Open( "Three Outlier Examples.jmp" );

obj1 = dt << Explore Outliers(
	Y( :Bimodal ),
	K Nearest Neighbor Outliers( K( 20 ), Impute Missing( 1 ) )
); 

obj1 << Save NN Distances;

obj2 = dt << Explore Outliers(
	Y( :Nearest 20 Distance ),
	Robust Fit Outliers( K Sigma( 200 ), Huber( 1 ) )
);

obj2 << Select Rows (:"Nearest 20 Distance"n) << Color Rows (:"Nearest 20 Distance"n);

dt << Graph Builder(
	Size( 489, 454 ),
	Show Control Panel( 0 ),
	Variables( X( :Bimodal ) ),
	Elements( Points( X, Legend( 3 ) ) )
);

DonMcCormack · ‎09-29-2021

Thanks @mzwald, I like the approach.