Solved: Re: Best way to analyse differences in response between groups with n = 3?

alexw · Sep 17, 2014 04:56 AM

Hi,

I have a continuous response for which I have 3 data points for each of 5 levels of a nominal factor. I want to assess to what extent there are differences in the response between the 5 groups, and if so which groups are significantly different from one another.

My instinct would be to go straight to Oneway ANOVA, have a look at the mean CIs, and compare all means via Tukey-Kramer or T-Test. However I am concerned that given the very small sample sizes, such analysis would be invalid. Certainly the variance of the 5 groups does not look equal (although O'Brien, Brown-Forsythe, Bartlett and Levene's tests are all non-significant). Not to mention that making any assumptions about the normality of the data is meaningless when n = 3.

Should I:

Use Tukey-Kramer all pairs comparison
Use Student's T-Test all pairs comparison
Use Wilcoxon all pairs comparison
Subset data into pairs of groups and compare each pairing of interest individually with T-tests
Do something different

Any thoughts on this welcomed.

Thanks,

Alex

julian · Sep 18, 2014 12:16 AM

Hi Alex,

Like Smoore2, I think your approaches are all reasonable and with small samples it's nice to try several techniques and hopefully obtain a consensus. In this case it doesn't appear you'll have that so I wanted to add a few thoughts about the choice between parametric and nonparametic tests, as well as your concern about the homogeneity of variances among your groups.

First, regarding the assumption of normality, I want to address your comment "Not to mention that making any assumptions about the normality of the data is meaningless when n = 3." Keep in mind that the assumption of normality is not about your sample being normally distributed, but rather that your sample is drawn from a population that is normally distributed. This is a subtle point but an important one. My sample could have the values: {5,5,5,5,5,5,6,6,6,6,6,6}, and the assumption of normality could be perfectly upheld IFF the population from which those values were drawn is normally distributed (this sample would be highly improbable, but that wouldn't make the assumption false, and wouldn't affect the meaningfulness of the parametric test results). If the population is normal, the sampling distribution of the sample mean is necessarily normal, which means our parametric tests will return p-values that are meaningful (other assumptions also being true). In cases where the population is not normal we can trust that the sampling distribution of the sample means will be normally distributed when sample sizes are "large" (thanks to the ever beautiful central limit theorem, though what counts as "large" for a given situation is somewhat debatable). You don't have sample sizes anywhere near large enough to fall back on the CLT, but I bring this up because I would like to ask: given your knowledge of whatever you are measuring, do you have any reason to believe the population is normal? For instance, if you were collecting data on IQ and you had only three IQ scores for each group, you could, without concern, trust your parametric tests since IQ is known to be normally distributed (indeed, the tests are structured and scored to ensure that property). If you can drawn on extra-statistical information to justify the assumption that the population from which your sample is drawn is normally distributed you don't need to, in a data-driven way, justify the assumption.

Second, regarding your observed unequal variances, how unequal are we talking about here? T-tests are not entirely robust to violations of homogeneity of variance, but false alarm rates do not increase considerably until you have quite unequal variances in your groups (and you have equal sample sizes, which helps with the robustness of the t-test to unequal variances). If you would like to go the route of generating all pairwise t-tests NOT assuming equal variances there is certainly a way we can make this happen through scripting OR you can use the local data filter (available under the red triangle in Fit Y by X >> Script >> Local Data filter) and then select two levels, produce the t-test (not assuming equal variances) and click through the different comparisons. Not entirely elegant, but it would eliminate the need for relaunching the platform after global filtering, or iteratively subsetting your data. In case you're interested, here's a quick video of how that looks:

Finally, it's worth clarifying (for yourself, not us) the root of your concerns here. Is your chief concern with these tests (in the context of small samples) that you might false alarm, or that you might miss a real effect? If it's the former, non-parametric tests are probably best, and alpha-controlled non-parmetric pairwise tests at that (like the Steel-Dwass); if it's the latter (and you have some reasons to think your assumptions are generally reasonable) then parametric tests will probably be better (and you might want to relax your criterion for statistical significance if this is exploratory work).

I hope some of this helps!

Julian

View solution in original post

Steven_Moore · Sep 17, 2014 08:14 AM

I think all of the approaches you propose are ok. I often like to approach the data with as many tools as possible and then look for a "consensus" among the methods. One approach you might also try is Mood's Median test. This is a non-parametric test to use in place of ANOVA. I wouldn't say that with n=3 the "assumptions about the normality of the data is meaningless." Remember that you can never prove that a data set DOES fit the normal distribution, but you can definitively show that the data set DOES NOT fit the normal distribution. Reminds me of something I heard Deming say in 1991 at his 4-day seminar: "Normal distribution? I've never seen one!"

Steve

alexw · Sep 17, 2014 09:41 AM

Thanks for the thoughts @smoore2. Looks like I get Prob>ChiSq = 0.063 for the Median test, versus Prob>F = 0.008 for ANOVA - so not quite as conclusive, but bordering on demonstrating statistically significant difference between groups.

I'm particularly interested in comparing the different groups to one another. However I can't quite get my head around the issue of equal variances in this context - it seems like the Tukey-Kramer and Student's T-Test all pairs comparisons both use pooled variance, which I'm uncomfortable with. However apart from the 4th option I mentioned, which is rather cumbersome, I can't see an option for doing the pairwise comparisons without assuming equal variance.

julian · Sep 18, 2014 12:16 AM

Hi Alex,

Like Smoore2, I think your approaches are all reasonable and with small samples it's nice to try several techniques and hopefully obtain a consensus. In this case it doesn't appear you'll have that so I wanted to add a few thoughts about the choice between parametric and nonparametic tests, as well as your concern about the homogeneity of variances among your groups.

First, regarding the assumption of normality, I want to address your comment "Not to mention that making any assumptions about the normality of the data is meaningless when n = 3." Keep in mind that the assumption of normality is not about your sample being normally distributed, but rather that your sample is drawn from a population that is normally distributed. This is a subtle point but an important one. My sample could have the values: {5,5,5,5,5,5,6,6,6,6,6,6}, and the assumption of normality could be perfectly upheld IFF the population from which those values were drawn is normally distributed (this sample would be highly improbable, but that wouldn't make the assumption false, and wouldn't affect the meaningfulness of the parametric test results). If the population is normal, the sampling distribution of the sample mean is necessarily normal, which means our parametric tests will return p-values that are meaningful (other assumptions also being true). In cases where the population is not normal we can trust that the sampling distribution of the sample means will be normally distributed when sample sizes are "large" (thanks to the ever beautiful central limit theorem, though what counts as "large" for a given situation is somewhat debatable). You don't have sample sizes anywhere near large enough to fall back on the CLT, but I bring this up because I would like to ask: given your knowledge of whatever you are measuring, do you have any reason to believe the population is normal? For instance, if you were collecting data on IQ and you had only three IQ scores for each group, you could, without concern, trust your parametric tests since IQ is known to be normally distributed (indeed, the tests are structured and scored to ensure that property). If you can drawn on extra-statistical information to justify the assumption that the population from which your sample is drawn is normally distributed you don't need to, in a data-driven way, justify the assumption.

Second, regarding your observed unequal variances, how unequal are we talking about here? T-tests are not entirely robust to violations of homogeneity of variance, but false alarm rates do not increase considerably until you have quite unequal variances in your groups (and you have equal sample sizes, which helps with the robustness of the t-test to unequal variances). If you would like to go the route of generating all pairwise t-tests NOT assuming equal variances there is certainly a way we can make this happen through scripting OR you can use the local data filter (available under the red triangle in Fit Y by X >> Script >> Local Data filter) and then select two levels, produce the t-test (not assuming equal variances) and click through the different comparisons. Not entirely elegant, but it would eliminate the need for relaunching the platform after global filtering, or iteratively subsetting your data. In case you're interested, here's a quick video of how that looks:

Finally, it's worth clarifying (for yourself, not us) the root of your concerns here. Is your chief concern with these tests (in the context of small samples) that you might false alarm, or that you might miss a real effect? If it's the former, non-parametric tests are probably best, and alpha-controlled non-parmetric pairwise tests at that (like the Steel-Dwass); if it's the latter (and you have some reasons to think your assumptions are generally reasonable) then parametric tests will probably be better (and you might want to relax your criterion for statistical significance if this is exploratory work).

I hope some of this helps!

Julian

Steven_Moore · Sep 18, 2014 07:35 AM

Great comments, Julian!

Alex, you might alos look at ANOM (Analysis of Means) with your data. Another useful approach is to construct a U-Chart with you data. I know this is a bit unorthodox, but the U-Chart can tell you when treatments are different from the grand mean. Davis Balestracci has published some good examples on the internet.

Steve