Subscribe Bookmark RSS Feed



Jun 23, 2011

Grubbs' Outlier Test (Version 2)

(This script is a new version that provides By group processing. Finally! Note that the p-value reported in the first version is no longer available.)

This script adds the two-tailed outlier test by Grubbs to the Distribution platform. The normal quantile plot and Goodness of Fit test are opened to help assess the assumption that the sample was drawn from a normal population.

Simply open the data table with the numeric variable to be evaluated, then open and run the script. Select the data column and click Y, Data. Specify the desired level of significance (alpha is 0.05 by default). This example is based on the variable height in the Big Class data table in the Sample Data folder.

9210_Capture 1.jpg

Click OK.

9211_Capture 2.jpg

The pattern of the markers in the normal quantile plot appears to be linear and none of the markers is outside of the region designated by the dotted read curves. The sample and, therefore, the population are judged to be normal in this case. The test at the bottom of the platform is not significant at the specified level.

Now the same analysis is performed using a By variable. This example uses sex as the grouping variable.

9212_Capture 3.jpg

Click OK.

9213_Capture 4.jpg


Hi Mark

Thanks for posting this.

I would be great if it could be done BY a grouping variable.

BR, Marianne

If the data set happens to have any missing values, this code incorrectly calculates N for the Grubbs test.  This can be problematic if the number of missing columns is rather large.

See the following correction:

lines 76 and 110

n = N rows( yVal ) - N missing ( yVal);

Article Tags