Re: Categorical data analysis: Chi-square for large sample sizes adjustment

RvEDelft · Jun 8, 2023 6:01 PM

Dear JMP community,

We use the JMP Pro 16.2.0. We are running an experiment with categorical independent variables and categorical dependent variables via internet. The point is if you have a large number of participants that determine the frequency count, the chi-square will always reach signifcance. You can see that in the fictive data in the jmp file I included and the analysis in the included journal. I multiplied the low frequency with 100 and you see that the mosaic graphs look exactly the same but in the low frequency condition it not significant whereas in the high frequency condition it is very significant. That is the “problem” of the chi-square test that reaches significance easily with large sample sizes(F=test is less sensitive).

There are solutions for this, sometimes you can convert counts in proportions and conduct an F-test, but in our case this is not possible. Another solution is in this paper :

http://expsylab.psych.uoa.gr/fileadmin/expsylab.psych.uoa.gr/uploads/papers/Bergh_2015.pdf

My question is: does JMP Pro have any implementation for this problem or is there an addin that you know of? I think with the availabitliy of data gathering programs via internet (e.g., Amazon Turk, Qualtrics) dealing with these large sample sizes become more and more relevant. Especially, in categorical data analysis. Perhaps there is an option under consumer research but I have not discovered it.

Hope the problem is clear,

Thanks for your insights,

Kind regards,

Rene

Victor_G · Apr 21, 2023 02:50 AM

Hi @RvEDelft,

Welcome in the Community !

Just briefly looking at your problem, did you try to check the other analysis available under the "Measures of Association Report" (accessible when clicking on the red triangle) ? It depends about what is your question, but there are numerous other options, and some already take into account dataset size : Measures of Association Report (jmp.com)

One example is provided with nominal variables like yours here : Example of the Measures of Association Option (jmp.com) where the Lambda and Uncertainty measures are used to help making a decision (in general, to know how much variation from Y can be explained by variation from X).

When using your dataset and the frequency columns in low and high levels, you can see that Lambda and Uncertainty measures are not changed based on dataset size/frequency of values.

FREQLOW vs. FREQHIGH :

I don't know your topic and objective and I'm not an expert concerning these statistical tests, but litterature on this topic can be found on internet.

I hope this first answer may help you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

dale_lehman · Apr 21, 2023 11:27 AM

I am not going to directly address your question, as there are others who can better respond. But, in general, I think almost all large data sets will end up showing statistical significance for almost any analysis you do - the null being tested is of no association and things are, in reality, almost always associated. So, I would not be troubled by the fact that the Chi-squared tests always show significance in large data sets. To the extent that your question is focused on the generation of large data sets from things like Amazon Turk, etc., I think a bigger question is that these large sample sizes cannot generally be considered as random samples of the populations you wish to study. That is, the statistical significance is relying on random sampling, and the samples you are getting, while large, are not likely to be random. That would concern me more than the fact that everything ends up showing statistical significance.

RvEDelft · Apr 21, 2023 12:36 PM

Ok,

My question is just about the chi square. I think your remarks relate more to the validity of a test not to the reliably. The first is, I think very important. But most often the significance-adapts win. You will never find papers with null-results. Which is a pity. I once had a fellow phd who had a perfect experimental design but never found signifcant effects.

To elaborate on this further. Students of mine that have to do user research with a small sample have problems with that., There is one simple question that you can ask yourself; A person interacts with a product in a certain context encounters a problem. Would this be the only person in the world that has this problem? Probably not, I do not think anyone would argue otherwise, No statistics available for this...

please stay with my initial question...

i feel fine to make another link on the validity aspect.

Rene

dale_lehman · Apr 21, 2023 01:40 PM

Well, I said I was not going to directly address your question. In any case, I've looked at it again and it is not my area of expertise, but I do have a question I'd appreciate getting informed about. You refer to this as a "problem" with the chi-square test. I don't understand why it is referred to as a problem. It seems to me that the test is doing exactly what it should - the very same differences in the response of Variable A to Variable B are deemed to be more likely to have been obtained via random sampling (from a no relationship assumption) when the sample size is 240 rather than 24000. I don't understand the context under which this is a "problem." And, when you further explain about your students often having small samples, I don't see this as a problem that can be cured via statistical manipulation. Small samples are highly variable so finding statistical significance should be more difficult, should it not?

Since this is not my area of expertise, I genuinely would like to understand the context better as to why this is considered a problem and what sort of "solution" is being sought. From my very limited understanding, I would be highly suspicious of any solution that made the small sample case more definitive, as well as being more suspicious of any solution that demanded a higher standard for statistical significance with larger sample sizes. On this last point, it is one of my major criticisms of NHST that the null hypothesis being tested is virtually never true, so rejecting it should typically be expected. It is the sizes of the differences or strength of the relationship that are of interest, not whether or not you can reject the null. Inability to reject the null just tells me that there is too much variability relative to your sample size to say anything (and from my experience this is true for most student survey type research using primary data).