JMP Grubb's outlier test Scripts - different versions (started by Mark Bailey)

Hani · Aug 24, 2018 08:26 AM

Hello,

In work I was handed 3 different scripts for Grubb's outlier test (below). They were all written (or started) by Mark Bailey but somehow they are different versions. Does someone know the difference between these 3 scripts? Thank you very much. Hani.

Script 1: GrubbsOutlierTest Sequential.jsl

/*

GrubbsSequentialOutlierTest.jsl
05Jun2003

Copyright (c) 2003 by SAS Institute Inc., Cary, NC 27513, USA. All rights reserved.

Note: please read the disclaimer at the end of this script.

Purpose
This script demonstrates a principle.

Author
Mark Bailey (SAS Institute)

Contact
mark.bailey@sas.com

Usage
Simply open a data table and then run this script by any one of these methods:

	Edit > Run Script
	Control-R
	Click "Run Script" button in tool bar

Future Improvement Ideas
None at this time.

*/

Names Default to Here( 1 );

dlg = Column Dialog(
	yCol = Col List( "Y, Data",
		Data Type( Numeric ),
		Min Col(1),
		Max Col(1)
	),
	Line Up( 2,
		"Significance", a = Edit Number( 0.05 )
	),
	"Select data for outlier test"
);

If( dlg["Button"] == -1, Throw( "User cancelled" ) );
Remove From( dlg ); Eval List( dlg );

dt = Current Data Table();

	// process as single sample
	dist = dt << Distribution(
		Y( yCol[1] ),
		Normal Quantile Plot( 1 ),
		Fit Distribution( Normal( Goodness of Fit( 1 ) ) )
	);
	distr = dist << Report;

	yVal = yCol[1] << Get As Matrix;
	exRows = dt << Get Excluded Rows();
	yval[exRows] = [];
	n = N Row( yVal );

	t0Sqr =  t Quantile( 1 - a/(2*n), n-2 )^2;

	g = Maximum( Abs( yVal - Mean( yVal ) ) ) / Std Dev( yVal );
	g0 = ((n-1)/Sqrt(n)) * Sqrt( t0Sqr / (n - 2 + t0Sqr) );


if(g>g0,
distr<<close window();
for(,g>g0,,
	dist = dt << Distribution(
		Y( yCol[1] ),
		Normal Quantile Plot( 1 ),
		Fit Distribution( Normal( Goodness of Fit( 1 ) ) )
	);
	distr = dist << Report;

	yVal = yCol[1] << Get As Matrix;
	exRows = dt << Get Excluded Rows();
	yval[exRows] = [];
	n = N Row( yVal );

	t0Sqr =  t Quantile( 1 - a/(2*n), n-2 )^2;

	g = Maximum( Abs( yVal - Mean( yVal ) ) ) / Std Dev( yVal );
	g0 = ((n-1)/Sqrt(n)) * Sqrt( t0Sqr / (n - 2 + t0Sqr) );


ycolumn=dlg["ycol"];
	if(
		g>g0,
		Summarize(maxvalue=Max(Column(ycolumn)));
		Summarize(minvalue=Min(Column(ycolumn)));
		Summarize(meanvalue=Mean(Column(ycolumn)));
		if((maxvalue-meanvalue)>(meanvalue-minvalue),
		dt<<select where(as column(dt,ycolumn)==maxvalue)<<hide and exclude;,
		dt<<select where(as column(dt,ycolumn)==minvalue)<<hide and exclude;,
		);
	distr<<close window();
		
	);
	
));
	distr[Outline Box(2)] << Append(
		Outline Box( "Grubbs' Outlier Test",
			Table Box(
				String Col Box( "Statistic", {"G", "G("||Char(a)||")"} ),
				Number Col Box( "Estimate", Matrix( {g, g0} ) )
			),
			Text Box(
				If( g>g0,
					"Outlier detected",
					"No outlier detected"
				)
			)
		)
	);


/*
Revision History (date, change, person)
05Jun2003, created, Mark Bailey
06Sep2013, correctly disregards excluded rows, Mark Bailey
03Oct2014, add By groups, Mark Bailey
15Nov2017,removed By groups and added sequential testing, Hadley Myers
*/

/*
Disclaimer by 
SAS Institute Inc. 

License Agreement for Corrective Code or 
Additional Functionality 

SAS INSTITUTE INC. IS PROVIDING YOU WITH THE COMPUTER SOFTWARE CODE INCLUDED WITH THIS AGREEMENT ("CODE") ON AN "AS IS" BASIS, AND AUTHORIZES YOU TO USE THE CODE SUBJECT TO THE TERMS HEREOF.  BY USING THE CODE, YOU AGREE TO THESE TERMS.  YOUR USE OF THE CODE IS AT YOUR OWN RISK.  SAS INSTITUTE INC. MAKES NO REPRESENTATION OR WARRANTY, EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT AND TITLE, WITH RESPECT TO THE CODE. 

The Code is intended to be used solely as part of a product ("Software") you currently have licensed from SAS or one of its subsidiaries or authorized agents ("SAS"). The Code is designed to either correct an error in the Software or to add functionality to the Software, but has not necessarily been tested.  Accordingly, SAS makes no representation or warranty that the Code will operate error-free.  SAS is under no obligation to maintain or support the Code.

Neither SAS nor its licensors shall be liable to you or any third party for any general, special, direct, indirect, consequential, incidental or other damages whatsoever arising out of or related to your use or inability to use the Code, even if SAS has been advised of the possibility of such damages.

Except as otherwise provided above, the Code is governed by the same agreement that governs the Software.  If you do not have an existing agreement with SAS governing the Software, you may not use the Code. 

(SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies.)
*/

Script 2: Grubbs Outlier Test 2.jsl

/*

GrubbsOutlierTest.jsl
05Jun2003

Copyright (c) 2003 by SAS Institute Inc., Cary, NC 27513, USA. All rights reserved.

Note: please read the disclaimer at the end of this script.

Purpose
This script demonstrates a principle.

Author
Mark Bailey (SAS Institute)

Contact
mark.bailey@sas.com

Usage
Simply open a data table and then run this script by any one of these methods:

	Edit > Run Script
	Control-R
	Click "Run Script" button in tool bar

Future Improvement Ideas
None at this time.

*/

Names Default to Here( 1 );

dlg = Column Dialog(
	yCol = Col List( "Y, Data",
		Data Type( Numeric ),
		Min Col(1),
		Max Col(1)
	),
	bCol = Col List( "By",
		Max Col(1)
	),
	Line Up( 2,
		"Significance", a = Edit Number( 0.05 )
	),
	"Select data for outlier test"
);

If( dlg["Button"] == -1, Throw( "User cancelled" ) );
Remove From( dlg ); Eval List( dlg );

dt = Current Data Table();

If( N Items( bCol ),

	// process by group
	dist = dt << Distribution(
		Y( yCol[1] ),
		By( bCol[1] ),
		Normal Quantile Plot( 1 ),
		Fit Distribution( Normal( Goodness of Fit( 1 ) ) )
	);
	distr = dist << Report;

	bCol = Column( bCol[1] );
	Summarize( group = By( bCol ) );

	yy = yCol[1] << Get As Matrix;
	exRows = dt << Get Excluded Rows();
	yy[exRows] = .;

	For( i = 1, i <= N Items( group ), i++,
		groupName = Trim( Word( 2, distr[i][OutlineBox(1)] << Get Title, "=" ) );
		getRows = dt << Get Rows Where( bCol[] == groupName );
		yVal = yy[getRows];
		yVal[Loc( Is Missing( yVal ) )] = [];
		n = N Row( yVal );

		t0Sqr = t Quantile( 1 - a/(2*n), n-2 )^2;

		g = Maximum( Abs( yVal - Mean( yVal ) ) ) / Std Dev( yVal );
		g0 = ((n-1)/Sqrt(n)) * Sqrt( t0Sqr / (n - 2 + t0Sqr) );

		distr[i][Outline Box(2)] << Append(
			Outline Box( "Grubbs' Outlier Test",
				Table Box(
					String Col Box( "Statistic", {"G", "G("||Char(a)||")"} ),
					Number Col Box( "Estimate", Matrix( {g, g0} ) )
				),
				Text Box(
					If( g>g0,
						"Outlier detected",
						"No outlier detected"
					)
				)
			)
		);
	),
	
	// process as single sample
	dist = dt << Distribution(
		Y( yCol[1] ),
		Normal Quantile Plot( 1 ),
		Fit Distribution( Normal( Goodness of Fit( 1 ) ) )
	);
	distr = dist << Report;

	yVal = yCol[1] << Get As Matrix;
	exRows = dt << Get Excluded Rows();
	yval[exRows] = [];
	n = N Row( yVal );

	t0Sqr =  t Quantile( 1 - a/(2*n), n-2 )^2;

	g = Maximum( Abs( yVal - Mean( yVal ) ) ) / Std Dev( yVal );
	g0 = ((n-1)/Sqrt(n)) * Sqrt( t0Sqr / (n - 2 + t0Sqr) );

	distr[Outline Box(2)] << Append(
		Outline Box( "Grubbs' Outlier Test",
			Table Box(
				String Col Box( "Statistic", {"G", "G("||Char(a)||")"} ),
				Number Col Box( "Estimate", Matrix( {g, g0} ) )
			),
			Text Box(
				If( g>g0,
					"Outlier detected",
					"No outlier detected"
				)
			)
		)
	);
);

/*
Revision History (date, change, person)
05Jun2003, created, Mark Bailey
06Sep2013, correctly disregards excluded rows, Mark Bailey
03Oct2014, add By groups, Mark Bailey
*/

/*
Disclaimer by 
SAS Institute Inc. 

License Agreement for Corrective Code or 
Additional Functionality 

SAS INSTITUTE INC. IS PROVIDING YOU WITH THE COMPUTER SOFTWARE CODE INCLUDED WITH THIS AGREEMENT ("CODE") ON AN "AS IS" BASIS, AND AUTHORIZES YOU TO USE THE CODE SUBJECT TO THE TERMS HEREOF.  BY USING THE CODE, YOU AGREE TO THESE TERMS.  YOUR USE OF THE CODE IS AT YOUR OWN RISK.  SAS INSTITUTE INC. MAKES NO REPRESENTATION OR WARRANTY, EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT AND TITLE, WITH RESPECT TO THE CODE. 

The Code is intended to be used solely as part of a product ("Software") you currently have licensed from SAS or one of its subsidiaries or authorized agents ("SAS"). The Code is designed to either correct an error in the Software or to add functionality to the Software, but has not necessarily been tested.  Accordingly, SAS makes no representation or warranty that the Code will operate error-free.  SAS is under no obligation to maintain or support the Code.

Neither SAS nor its licensors shall be liable to you or any third party for any general, special, direct, indirect, consequential, incidental or other damages whatsoever arising out of or related to your use or inability to use the Code, even if SAS has been advised of the possibility of such damages.

Except as otherwise provided above, the Code is governed by the same agreement that governs the Software.  If you do not have an existing agreement with SAS governing the Software, you may not use the Code. 

(SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies.)
*/

TEST 3: GrubbsOutlierTest.jsl

/*

GrubbsOutlierTest.jsl
05Jun2003

Copyright (c) 2003 by SAS Institute Inc., Cary, NC 27513, USA. All rights reserved.

Note: please read the disclaimer at the end of this script.

Purpose
This script demonstrates a principle.

Author
Mark Bailey (SAS Institute)

Contact
mark.bailey@sas.com

Usage
Simply run this script by any one of these methods:

	Edit > Run Script
	Control-R
	Click "Run Script" button in tool bar

Future Improvement Ideas
None at this time.

*/

Clear Globals();

dlg = Column Dialog(
	yCol = Col List( "Y, Data",
		Data Type( Numeric ),
		Min Col(1),
		Max Col(1)
	),
	Line Up( 2,
		"Significance", a = Edit Number( 0.05 )
	),
	"Select data for outlier test"
);

If( dlg["Button"] == -1, Throw( "User cancelled" ) );
Remove From( dlg ); Eval List( dlg );

yCol = Column( yCol[1] );

dist = Distribution(
	Continuous Distribution(
		Column( yCol ),
		Quantiles(1),
		Moments(1),
		Normal Quantile Plot(1)
	)
);

yVal = yCol << Get As Matrix;
yRes = yVal - Mean( yVal );
n = N Row( yVal );

g = Maximum( Abs( yRes ) ) / Std Dev( yVal );
t0 = Abs( t Quantile( a/(2*n), n-2 ) );
g0 = ((n-1)/Sqrt(n)) * Sqrt( t0^2 / (n - 2 + t0^2) );

p = 2 * n * (1 - t Distribution( Sqrt( g^2*(2-n)/(2+g^2-1/n-n) ), n-2 ));

distr = dist << Report;

distr[Outline Box(2)] << Append(
	Outline Box( "Grubbs' Outlier Test",
		Table Box(
			String Col Box( "Statistic", {"G", "G("||Char(a)||")", "p>|G|"} ),
			Number Col Box( "Estimate", Matrix( {g, g0, p} ) )
		),
		Text Box(
			If( g>g0,
				"Outlier detected",
				"No outlier detected"
			)
		)
	)
);

distr["Quantiles"] << Close;
distr["Moments"] << Close;

/*
Revision History (date, change, person)
05Jun2003, created, Mark Bailey
*/

/*
Disclaimer by 
SAS Institute Inc. 

License Agreement for Corrective Code or 
Additional Functionality 

SAS INSTITUTE INC. IS PROVIDING YOU WITH THE COMPUTER SOFTWARE CODE INCLUDED WITH THIS AGREEMENT ("CODE") ON AN "AS IS" BASIS, AND AUTHORIZES YOU TO USE THE CODE SUBJECT TO THE TERMS HEREOF.  BY USING THE CODE, YOU AGREE TO THESE TERMS.  YOUR USE OF THE CODE IS AT YOUR OWN RISK.  SAS INSTITUTE INC. MAKES NO REPRESENTATION OR WARRANTY, EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT AND TITLE, WITH RESPECT TO THE CODE. 

The Code is intended to be used solely as part of a product ("Software") you currently have licensed from SAS or one of its subsidiaries or authorized agents ("SAS"). The Code is designed to either correct an error in the Software or to add functionality to the Software, but has not necessarily been tested.  Accordingly, SAS makes no representation or warranty that the Code will operate error-free.  SAS is under no obligation to maintain or support the Code.

Neither SAS nor its licensors shall be liable to you or any third party for any general, special, direct, indirect, consequential, incidental or other damages whatsoever arising out of or related to your use or inability to use the Code, even if SAS has been advised of the possibility of such damages.

Except as otherwise provided above, the Code is governed by the same agreement that governs the Software.  If you do not have an existing agreement with SAS governing the Software, you may not use the Code. 

(SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies.)
*/

Mark_Bailey · Aug 24, 2018 09:10 AM

All three scripts perform Grubb's test for outliers. (Be sure that this test is appropriate for the contamination process that produces the aberrant values.) They are listed in reverse chronological order. The bottom version is the original. The middle version was extended to accommodate groups of data through a variable in the By role. The top version is the latest and replaced By groups with sequential testing.

These changes were documented in the Revision History comments of all three versions.

Hani · Aug 24, 2018 10:13 AM

Thank you very much for your quick reply. I am not familiar with JMP scripting and I do not really understand the grouping notion here as I usually look for outlier in the data of one variable, i.e. I select a variable in a table then I run the script and I receive the outlier results for that variable. Could you please explain more how does the "by group" work? how does it influence the results?

Also when I tested the 3 scripts with one set of data, I received the same results for the recent two (which are different from the first older script), which seem to me more relevant, does this mean that the first script should not be anymore used?

Thanks a lot again,

Hani

Mark_Bailey · Aug 24, 2018 10:24 AM

The matter of the By role has nothing to do with scripting. You should read the introductory books about JMP and speed up your on-boarding. See Help > Books > Using JMP and Basic Analysis.

We say that you cast a variable (data column) in an analysis role. The standard roles are Y, X, Freq, Weight, and By. For example, I might have a sample of data from 5 batches. I have the measurement in one column, Data, and the label for the batch in another column, Lot. I could summarize the data by selecting Analyze > Distribution, select Data and click Y, click Lot and click By. I will get a separate analysis for each batch. That is what the By role is for. The original script for the Grubb's test was modified to perform the test for each level in the By variable.

I do not see any reason to use the older, original script.

Hani · Aug 24, 2018 11:12 AM

Thanks for the reply. Yes I know how the separate analysis with "By" works, sorry I just didn't notice the "By" window in the modified version especially I didn't need to do a "by" analysis.

Also now I finally understood that the difference in the results is due to the fact that I used "hide and exclude" while I excluded the outlying observations before executing a second run for the test. I was trapped as the description results excluded the outlying observations, but not the outlier test. I should have "deleted" the rows in the dataset instead. And indeed we are going from now on to use the last version (sequential). Thanks!

Mark_Bailey · Aug 24, 2018 12:01 PM

The scripts identify excluded rows (hidden doesn't matter) and removes them from the test. The report for the outlier test doesn't mention the excluded rows because the Distribution report covers that fact already.

Glad you find the script useful.

JMP Grubb's outlier test Scripts - different versions (started by Mark Bailey)

Re: JMP Grubb's outlier test Scripts - different versions (started by Mark Bailey)

Re: JMP Grubb's outlier test Scripts - different versions (started by Mark Bailey)

Re: JMP Grubb's outlier test Scripts - different versions (started by Mark Bailey)

Re: JMP Grubb's outlier test Scripts - different versions (started by Mark Bailey)

Re: JMP Grubb's outlier test Scripts - different versions (started by Mark Bailey)