Re: Outliers more than 3 standard deviations in JMP

NeF · Aug 14, 2018 03:23 AM

Hi,

Is there any way to detect outliers more than 3 stansard deviations in JMP?

I should say that I am familiar with the "explore outliers option as well as "Levey Jennings" Control chart. but, is there other way to detect outliers more than 3SDs?

thanks,

Ne

Thomas1 · Aug 14, 2018 10:57 AM

@NeF wrote:
Hi,
Is there any way to detect outliers more than 3 stansard deviations in JMP?
I should say that I am familiar with the "explore outliers option as well as "Levey Jennings" Control chart. but, is there other way to detect outliers more than 3SDs?
thanks,
Ne

This can be solved by creating a new formula column with the formula:

If( :Data > Col Mean( :Data ) + Col Std Dev( :Data ) * 3,
   1,
   0
)

The data column is your source column.

In case you want the vaules > 3SDs, the formula is:

If( :Data > Col Mean( :Data ) + Col Std Dev( :Data ) * 3,
:Data,
  "."
)

NeF · Aug 14, 2018 03:38 PM

Dear Thomas1,

Thank you for your kind reply!

Ne

Thomas1 · Aug 15, 2018 01:05 AM

My second formula, which should show the values > 3 SDs contains an error. Therefore it doesn’t work. You have to replace "*" against *

So the correct formula, in order to get the values, is:

If( :Data > Col Mean( :Data ) + Col Std Dev( :Data ) * 3,

:Data,

.

)

NeF · Aug 15, 2018 05:49 AM

hi,

thanks a lot. I've noticed that while using it yesterday.

best,

Ne

gzmorgan0 · Aug 14, 2018 06:32 PM

Just a simple addendum to @Thomas1 response. If you have severe outlier, the Col Std Dev can be pretty large and can bias the mean.

An alternative is to use the median + k* pseudo sigma for the upper screening limit and median - k* pseudo sigma fro the lower screening limit.

Here is a the column fomula for a column named weight. For raw data, sometimes quantiles 0.85 and 0.15 are used to compute the pseudo sigma with 6 as the multiplier for ps.

Local( {ps},
	ps = (Col Quantile( :weight, 0.75 ) - Col Quantile( :weight, 0.25 )) / 1.349;
	If(
		:weight > Col Quantile( :weight, 0.5 ) + 5 * ps, 1,
		:weight < Col Quantile( :weight, 0.5 ) - 5 * ps, -1,
		0
	);
)

If you are scripting, the Distribution platform computes a Robust Mean and Robust Std Dev. Here is a simple script to get these values.

Names Default To Here(1);

dt = Open( "$sample_data/Big class.jmp");

dist = dt << Distribution(
	Continuous Distribution(
		Column( :height ),
		Quantiles( 0 ),
		Horizontal Layout( 1 ),
		Histogram( 0 ),
		Vertical( 0 ),
		Outlier Box Plot( 0 ),
		Customize Summary Statistics(
			Trimmed Mean( 1 ),
			Robust Mean( 1 ),
			Robust Std Dev( 1 ),
			Set Alpha Level( 0.05 )
		)
	)
);

snames = report(dist)["Summary Statistics"][TableBox(1)][StringColBox(1)] << get;
svalues= report(dist)["Summary Statistics"][TableBox(1)][NumberColBox(1)] << get;
stats = Associative Array(snames, svalues);  //cretea a keyed list

r_xb = stats["Robust Mean"];
r_sd = stats["Robust Standard Deviation"];

show(r_xb, r_sd);

//now use r_xb + <4|5|6> * r_sd  and  r_xb - <4|5|6> * r_sd for screening limits

NeF · Aug 15, 2018 05:50 AM

Hi gzmorgan0,
Its really helpful!
thanks,
Nehai