Share your ideas for the JMP Scripting Unsession at Discovery Summit by September 17th. We hope to see you there!
Choose Language Hide Translation Bar
Highlighted
Hans_Hsu
Level I

How to do hypothesis test with highly right skewed data that contains many zeros?

The data we got is "defect count on substract", and let's say we implemant a new clean method, and want to know if the new method is better than the original method, that is, we want to perform a hypothesis test to judge it.

But the problem is: for both sample set, the major number is zero, and right skewed to several defect count, in this case is there any good method to perform hypothesis test? 

 

My original idea is transfrom data to normal distribution then perform two sample t test, and since the majority number is zero, I tried to use log(x+1) to transform my data, but it still failed to fit normal distribtution from JMP continuous fit 

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: How to do hypothesis test with highly right skewed data that contains many zeros?

The logistic distribution is symmetric, so that choice is not the best. for your data. I would not use the Wilcoxon test.

 

The double exponential distribution (also known as the Gumbel distribution) is skewed, like your data, so it would be a better choice. I would use the Median test.

Learn it once, use it forever!

View solution in original post

3 REPLIES 3
Highlighted

Re: How to do hypothesis test with highly right skewed data that contains many zeros?

I can think of two approaches. The first is a non-parametric test. They are also available in the Oneway platform along with the t tests. The second way is to define a meaningful sample statistic (e.g., 0.9 quantile) and use a bootstrap to obtain a p-value for the difference..

Learn it once, use it forever!
Highlighted
Hans_Hsu
Level I

Re: How to do hypothesis test with highly right skewed data that contains many zeros?

Thank you for your reply, for nonparametric test, I look up the JMP help, and it says that: 

Wilcoxon Test --> powerful for logistic distributions

Median Test --> powerful for double-exponential distributions

van der Waerden Test --> powerful for normal distributions

Kolmogorov Smirnov Test --> not so sure

 

So my question is for the extreme right skew distribution, which nonparametric method will be more suitbale?

Highlighted

Re: How to do hypothesis test with highly right skewed data that contains many zeros?

The logistic distribution is symmetric, so that choice is not the best. for your data. I would not use the Wilcoxon test.

 

The double exponential distribution (also known as the Gumbel distribution) is skewed, like your data, so it would be a better choice. I would use the Median test.

Learn it once, use it forever!

View solution in original post

Article Labels

    There are no labels assigned to this post.