cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Grubbs' Outlier Test (Version 2)

(This script is a new version that provides By group processing. Finally! Note that the p-value reported in the first version is no longer available.)

This script adds the two-tailed outlier test by Grubbs to the Distribution platform. The normal quantile plot and Goodness of Fit test are opened to help assess the assumption that the sample was drawn from a normal population.

Simply open the data table with the numeric variable to be evaluated, then open and run the script. Select the data column and click Y, Data. Specify the desired level of significance (alpha is 0.05 by default). This example is based on the variable height in the Big Class data table in the Sample Data folder.

9210_Capture 1.jpg

Click OK.

9211_Capture 2.jpg

The pattern of the markers in the normal quantile plot appears to be linear and none of the markers is outside of the region designated by the dotted read curves. The sample and, therefore, the population are judged to be normal in this case. The test at the bottom of the platform is not significant at the specified level.

Now the same analysis is performed using a By variable. This example uses sex as the grouping variable.

9212_Capture 3.jpg

Click OK.

9213_Capture 4.jpg

Comments
MTOF

Hi Mark

Thanks for posting this.

I would be great if it could be done BY a grouping variable.

BR, Marianne

msharp

If the data set happens to have any missing values, this code incorrectly calculates N for the Grubbs test.  This can be problematic if the number of missing columns is rather large.

See the following correction:

lines 76 and 110

n = N rows( yVal ) - N missing ( yVal);

ghartel

the script breaks if the by variable is numeric.  A simple way to fix this is to change bcol to character data type after the line

dt = Current Data Table();
If( N Items( bCol ),
               bCol[1] << set data type(character);

 

/* you could also fix the script to work with numeric By varaibles withotu cahnging them, but thsi seemed simpler and you can always change it back to numeric at the end */

 

Cheers

Gunter

 

Jan

Hi, thanks for the script.

It would be helpfull to also include in the output which value was actually considered an outlier
(or at least provide an indication whether it concerns the lowest or the highest value).

Brice_L

Thank you for script.  I second Jans reccomendation to have some method to indicate the values that are detected to be outliers.

Recommended Articles