Discussions

jmpquestions123 · Jun 8, 2023 5:38 PM

I wanted to obtain Shapiro Wilks Test for a distribution but I cannot obtain it. I tried the method posted on the discussion about legacy fitters. That only gives me KSL test. Otherwise if I do not use legacy fitters it gives me Anderson-Darlling test. My sample size is extremely large, >30k, not sure if that has anything to do with it. Thank you for help

tonya_mauldin · Sep 1, 2021 07:29 AM

For the legacy distributions, the Shapiro-Wilk test for normality is reported when the sample size is less than or equal to 2000. The KSL test is computed for samples that are greater than 2000. When the sample size is large, the Shapiro-Wilk test has a large power. Hence any small difference between your distribution and the null hypothesis is meaningful and leads to a rejection of the null hypothesis.

View solution in original post

tonya_mauldin · Sep 1, 2021 07:29 AM

For the legacy distributions, the Shapiro-Wilk test for normality is reported when the sample size is less than or equal to 2000. The KSL test is computed for samples that are greater than 2000. When the sample size is large, the Shapiro-Wilk test has a large power. Hence any small difference between your distribution and the null hypothesis is meaningful and leads to a rejection of the null hypothesis.

jmpquestions123 · Sep 1, 2021 09:47 AM

Thank you for your message. So between KSL and Anderson-Darling, which test is a good substitute for Shapiro Wilks?

tonya_mauldin · Sep 1, 2021 10:07 AM

Anderson-Darling is a more powerful test and is the one we suggest be used.

jmpquestions123 · Sep 1, 2021 10:43 AM

I am confused on how to interpret it. All my Anderson-Darling p-values for ~50 variables are significant but the distributions of many of them look normal....

Mark_Bailey · Sep 1, 2021 12:00 PM

Your large sample N > 30,000 leads to detection of non-normal features (departures from ideal normal CDF) that might be statistically significant but practically unimportant.

ron_horne · Sep 2, 2021 07:01 AM

Thank you @Mark_Bailey for this comment. This is a very important point with all statistical significance testing. unfortunately, too many people are fixed on p values without looking at visuals or the size of the coefficients. In some research fields this is partially avoided by using minimal sample sizes to detect a practically meaningful difference.

Yet, it is very difficult to explain to some people, including reviewers and researchers, that it is very easy to get statistical significance that is meaningless.

Mark_Bailey · Sep 2, 2021 7:51 AM

Most of the statistics we are taught were developed 80-100 years ago, when small samples were the rule. So the issue was, "What can I convincingly learn from or decide with a small sample?" Inferential techniques started then. Such situations still exist (e.g., no data available, need to experiment...) but a new situation arose 25 years ago. We now have massive databases from which to learn. Inference is pointless. Everything is statistically significant. That is, even the smallest differences or smallest parameter estimates are statistically significant. So, predictive modeling does not try to infer significance. Instead, the focus is on predictive performance through feature / model selection. It is a different challenge and requires a different mindset.

Also, in this specific case, why is a normal distribution assumed? What is supposed to be normally distributed? Data? An estimate? And what are the consequences of non-normality? How much of a departure from normal is necessary to compromise the result or decision? How robust is the method?

Mark_Bailey · Sep 1, 2021 11:57 AM

@tonya_mauldin explained this very well when the changes were first made in a JMP Blog post.

Discussions

Testing for Normality

Re: Testing for Normality

Re: Testing for Normality

Re: Testing for Normality

Re: Testing for Normality

Re: Testing for Normality

Re: Testing for Normality

Re: Testing for Normality

Re: Testing for Normality

Re: Testing for Normality

Recommended Articles