cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
Hans_Hsu
Level I

How to do hypothesis test with highly right skewed data that contains many zeros?

The data we got is "defect count on substract", and let's say we implemant a new clean method, and want to know if the new method is better than the original method, that is, we want to perform a hypothesis test to judge it.

But the problem is: for both sample set, the major number is zero, and right skewed to several defect count, in this case is there any good method to perform hypothesis test? 

 

My original idea is transfrom data to normal distribution then perform two sample t test, and since the majority number is zero, I tried to use log(x+1) to transform my data, but it still failed to fit normal distribtution from JMP continuous fit 

1 ACCEPTED SOLUTION

Accepted Solutions

Re: How to do hypothesis test with highly right skewed data that contains many zeros?

The logistic distribution is symmetric, so that choice is not the best. for your data. I would not use the Wilcoxon test.

 

The double exponential distribution (also known as the Gumbel distribution) is skewed, like your data, so it would be a better choice. I would use the Median test.

View solution in original post

3 REPLIES 3

Re: How to do hypothesis test with highly right skewed data that contains many zeros?

I can think of two approaches. The first is a non-parametric test. They are also available in the Oneway platform along with the t tests. The second way is to define a meaningful sample statistic (e.g., 0.9 quantile) and use a bootstrap to obtain a p-value for the difference..

Hans_Hsu
Level I

Re: How to do hypothesis test with highly right skewed data that contains many zeros?

Thank you for your reply, for nonparametric test, I look up the JMP help, and it says that: 

Wilcoxon Test --> powerful for logistic distributions

Median Test --> powerful for double-exponential distributions

van der Waerden Test --> powerful for normal distributions

Kolmogorov Smirnov Test --> not so sure

 

So my question is for the extreme right skew distribution, which nonparametric method will be more suitbale?

Re: How to do hypothesis test with highly right skewed data that contains many zeros?

The logistic distribution is symmetric, so that choice is not the best. for your data. I would not use the Wilcoxon test.

 

The double exponential distribution (also known as the Gumbel distribution) is skewed, like your data, so it would be a better choice. I would use the Median test.