Choose Language Hide Translation Bar
Highlighted
Level III

## Simple t-test with large N

I know this should be obvious....I have a stacked table. Fit y by x gives a beautiful plot showing me that the mean from category A is higher than category B (it's a bit on the subtle side, but its there). And the p-value backs it up, but of course it will because N is in the thousands. ANOM also shows that both the means are different than the overall mean, but I think I'll have the same problem here. Variances are different, normal distributions. What would be the best approach here? Divide 0.05 by N and use that value to determine significance? Not use significance at all and look at confidence intervals? Any and all suggestions gratefully appreciated!

7 REPLIES 7
Highlighted
Super User

## Re: Simple t-test with large N

One simple way to handle this is to create a stratified random subset of your data with 30 observations in each group, and then run the t-Test.

Jim
Highlighted
Level III

## Re: Simple t-test with large N

Thanks Jim,

Great idea and simple which is always good. I'm thinking if I wanted to be a bit more thorough, I could repeat the process 10-15 times and look at the distribution of the p-values.

Best,

Greg

Highlighted
Super User

## Re: Simple t-test with large N

Then you need to start to think about adjusting for the number of tests.

Jim
Highlighted
Staff

## Re: Simple t-test with large N

You could be a lot more thorough. You can use the bootstrap feature if you have JMP Pro. It will produce the empirical sampling distribution of your statistic (e.g., difference in the means) based on thousands of samples and produce a data table with the results for analysis in the Distribution platform.

Learn it once, use it forever!
Highlighted
Super User

## Re: Simple t-test with large N

Don't put methods in front of the question....what is the question you are trying answer with your data? What does your data represent? Where does it come from (as in why are you so lucky to have so many observations)? What decisions are to be made based on this data? (\$5 decisions have less ramifications with respect to "significance" than \$5M decisions). What are other variables that you might need to account for? Is there a time element to the data? And on and on.... I like the confidence intervals, bootstrapping, and sampling ideas but only if they make sense as a way to answer the question behind the data.
Highlighted
Level III

## Re: Simple t-test with large N

Thanks everyone. It is image data. Every pixel can be thought of as being an independent measurement, detecting counts and I run my imaging experiments to ensure that at each pixel I have good precision based on Poisson counting statistics (there is no background that needs to be worried about) . Groups of pixels can be clustered to represent a "feature" based on other correlative image data. I'm trying to sort out the best way to show that the mean of the eg. 10,000 pixels that make up feature A is different that the mean from the 15,000 pixels that make up feature B. There are times I believe I am overthinking this and other times I don't think it is that straightforward.

Highlighted
Level III

## Re: Simple t-test with large N

If I understand your situation correctly, you have a "statistical" difference but you are not sure you have a "practical" difference.  Is that correct?

Just because a difference is detected using a Hypothesis Test, it doesn’t mean that that difference is practically significant.

The observed difference needs to be weighed versus normal process variation, practical requirements, and the size of the sample.

Sometimes the sample size is so large (by convenience of available data), that very small differences will show up as significant
A statistically significant difference is not always a practical difference.  It is a good practice to establish what you think is a "practical" difference

We have used JMP's "Equivalence Test"

JMP Help => https://www.jmp.com/support/help/en/15.0/#page/jmp/test-equivalence.shtml?os=win&source=application#...
The equivalence test assesses whether a population mean is equivalent to a hypothesized value. You must define a threshold difference that is considered equivalent to no difference. The Test Equivalence option uses the Two One-Sided Tests (TOST) approach. Two one-sided t tests are constructed for the null hypotheses that the difference between the true mean and the hypothesized value exceeds the threshold. If both null hypotheses are rejected, this implies that the true difference does not exceed the threshold. You conclude that the mean can be considered practically equivalent to the hypothesized value.
When you select the Test Equivalence option, you specify the Hypothesized Mean, the threshold difference (Difference Considered Practically Zero), and the Confidence Level. The Confidence Level is 1 - alpha, where alpha is the significance level for each one-sided test

Article Labels

There are no labels assigned to this post.