cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
garethw
Level I

Help Interpreting KSL test results in Goodness of Fit

nHi All,

I'm looking at some goodness of fit results. While the Shapiro-Wilkes results have some (sparse) documentation, the Komologorov-Smirnov-Lilliefors test seems to have a different form and no documentation. A quick search shows nothing on the forum either; and google is quite quiet on the subject.

My results are:

For a Normal Distribution:

D                     Prob > D

0.3752777  <  0.0100*

I do not believe that this is a normal distribution. Can I reject the Null Hypothesis as D > alpha?

Why is Prob > D significant - does this just mean the test result is not likely as a random occurence?

For the LogNormal Distribution.

D                     Prob > D

0.030143  <  0.0100*

This is close to a Log Normal Distribution - it is mostly within the limits on the Normal quantile plot but wanders a little at the ends. As D is now less than alpha; does this mean that it fits a log Normal distribution?

FYI, There are ~159k data points.

The best link I can find on the subject is http://homepages.cae.wisc.edu/’1e642/content/Techniq ues/KS-Test.htm, which is what I am basing my assumptions on.

Thanks for any further information you can provide,

Gareth


					
				
			
			
				
			
			
				
			
			
			
			
			
			
		
1 ACCEPTED SOLUTION

Accepted Solutions

Re: Help Interpreting KSL test results in Goodness of Fit

The details and formulas of this test can be found on the following page:  http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_univaria...

The null hypothesis for the KSL test is that the data are distributed as whatever distribution you have fit.  Suppose you fit a normal distribution.  Then the null hypothesis is that the data are normally distributed.

In order to interpret the test, the user must decide on an alpha level.  Suppose you pick an alpha value of .05.

Receiving a p-value of 0.1500 indicates that one cannot reject the null hypothesis (because we chose an alpha of .05 which is less than the p-value).

View solution in original post

7 REPLIES 7
ylee
Level III

Re: Help Interpreting KSL test results in Goodness of Fit

I have similar questions and it seems there is very little resource on this topic.

1.  How does JMP calculate and report the KSL Test results?

2.  How to interpret the D and Prob>D values reported by JMP?

3.  Does user have control over the Alpha / Significant Level?  If yes, how?

 

Re: Help Interpreting KSL test results in Goodness of Fit

The details and formulas of this test can be found on the following page:  http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_univaria...

The null hypothesis for the KSL test is that the data are distributed as whatever distribution you have fit.  Suppose you fit a normal distribution.  Then the null hypothesis is that the data are normally distributed.

In order to interpret the test, the user must decide on an alpha level.  Suppose you pick an alpha value of .05.

Receiving a p-value of 0.1500 indicates that one cannot reject the null hypothesis (because we chose an alpha of .05 which is less than the p-value).

ron_horne
Super User (Alumni)

Re: Help Interpreting KSL test results in Goodness of Fit

with this size of sample size everything comes out as significant (for good or bad). this is since every little difference is "detectable". with this sample i would first mach things visually.

I also find the p values of this test rather suspiciously rounded. if you just omit some observations and repeat the test it will still give the exact same p value.

Re: Help Interpreting KSL test results in Goodness of Fit

JMP is not actually calculating the p-values in this case. JMP is looking them up from tabled values. That is why you can omit some observations and get the same p-value. This is also why JMP gives the < or > symbol. This tells the user that the p-value is slightly less than or slightly more than the reported value.



You are correct about the sample size/significance issue.


Re: Help Interpreting KSL test results in Goodness of Fit

JMP is not actually calculating the p-values in this case. JMP is looking them up from tabled values. That is why you can omit some observations and get the same p-value. This is also why JMP gives the < or > symbol. This tells the user that the p-value is slightly less than or slightly more than the reported value.



You are correct about the sample size/significance issue.


ylee
Level III

Re: Help Interpreting KSL test results in Goodness of Fit

Thank you for the pointers.

 

While a sample size of ~159k is considered as large and highly detectable in the context of rejecting the null hypothesis, is there a number where we consider a sample size as "large and highly detectable"?  And if this number is based on a theory, or a general rule of thumb?  In my case, I could be working on a sample size of ~40k.

 

To add, my intention is to attempt to avoid visual inspection of the histograms.  I could be dealing with ~40k of units in a particular product, where each unit is taken ~3000 different kinds of measurements.  Assuming we expect every single kind of measurement to produce a Normal distribution behaviour.  I imagine, if possible, I could run the Normality test using JMP to report the p-value for each of the 3000 measurements.  By looking at 3000 p-values as the first gross screen, it allows me to quickly identify problematic measurements (that are not Normal) instead of viewing through 3000 histograms manually.  For this, we need (1) accuracy and robustness from the Normality test, and (2) automation in JSL supported to report these p-values.  If there's any advice on such usage model, that'd be very much appreciated.

ron_horne
Super User (Alumni)

Re: Help Interpreting KSL test results in Goodness of Fit

in this case you may want to try the following steps:

1) open your data table.

2) run a distribution with all the variables you want.

3) hold the Ctrl key and fit the normal distribution to all variables

image.png

 

4) hold the Ctrl key and get the goodness of fit

image.png

 

5) now comes the trick: right click on one of the goodness of fit tables and choose Make combined data table.

image.png

 

now you should have a fully functional table with all your results at once. this will allow you to sort by the statistic value or Pvalue or even make a graph of what you just got.

 

hope this helps.

ron