Solved: How to do hypothesis test with highly right skewed data that contains many zeros...

Hans_Hsu · Aug 6, 2019 03:00 AM

The data we got is "defect count on substract", and let's say we implemant a new clean method, and want to know if the new method is better than the original method, that is, we want to perform a hypothesis test to judge it.

But the problem is: for both sample set, the major number is zero, and right skewed to several defect count, in this case is there any good method to perform hypothesis test?

My original idea is transfrom data to normal distribution then perform two sample t test, and since the majority number is zero, I tried to use log(x+1) to transform my data, but it still failed to fit normal distribtution from JMP continuous fit

Mark_Bailey · Aug 9, 2019 07:25 AM

The logistic distribution is symmetric, so that choice is not the best. for your data. I would not use the Wilcoxon test.

The double exponential distribution (also known as the Gumbel distribution) is skewed, like your data, so it would be a better choice. I would use the Median test.

View solution in original post

Mark_Bailey · Aug 6, 2019 07:30 AM

I can think of two approaches. The first is a non-parametric test. They are also available in the Oneway platform along with the t tests. The second way is to define a meaningful sample statistic (e.g., 0.9 quantile) and use a bootstrap to obtain a p-value for the difference..

Hans_Hsu · Aug 8, 2019 09:39 PM

Thank you for your reply, for nonparametric test, I look up the JMP help, and it says that:

Wilcoxon Test --> powerful for logistic distributions

Median Test --> powerful for double-exponential distributions

van der Waerden Test --> powerful for normal distributions

Kolmogorov Smirnov Test --> not so sure

So my question is for the extreme right skew distribution, which nonparametric method will be more suitbale?

Mark_Bailey · Aug 9, 2019 07:25 AM

The logistic distribution is symmetric, so that choice is not the best. for your data. I would not use the Wilcoxon test.

The double exponential distribution (also known as the Gumbel distribution) is skewed, like your data, so it would be a better choice. I would use the Median test.

How to do hypothesis test with highly right skewed data that contains many zeros?

Re: How to do hypothesis test with highly right skewed data that contains many zeros?

Re: How to do hypothesis test with highly right skewed data that contains many zeros?

Re: How to do hypothesis test with highly right skewed data that contains many zeros?

Re: How to do hypothesis test with highly right skewed data that contains many zeros?