cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Browse apps to extend the software in the new JMP Marketplace
Choose Language Hide Translation Bar
slamer2000
Level II

How can I do a chi-square test when 20% of the cells that have an expected count less than 5?

I've got these warning messages (see below) after doing some analysis with some variables. So to fix this problem, I read in a stats book that I need to follow some steps that it stated in order to correct the problem such as "If the smallest expected cell frequency is less than five, the analyst should recode a row or column variable, combining rows or columns until an adequate expected cell frequency is achieved."

However, I don't really understand this or how to do it in JMP. Can someone explain how I might change modify things so that I won't get his warning message either through the instructions from the book I've just mentioned or from a better way? I've also heard that I one can alternatively use the fisher's test? 

Thanks!

expected count jmp.jpg

 

11 REPLIES 11

Re: How do I correct the cells that have an expected count less than 5?

Let's see what we can do.

First of all, I believe that you have the Y and X roles reversed. The responses (e.g., Neutral) should be in the Y role (levels running across the top of the Contigency table). Second of all, your example of a predictor variable X is ordinal. It is a ranking of decreasing cycling activity. As such, you should (1) apply the ordinal modeling type and (2) add a Value Ordering column property. For example, in the example that you posted, the first level shown should likely be the last level after Less Than Once a Week. You might be able to take advantage of the ordinal nature of X and Y and use a more powerful association test than the contingency table Chi square.

The fact that you are getting no or just a few responses in adjoining levels means that you can collapse them into a single level. The example you show has essentially all positive mass: it exhibits 8 neutral or negative responses out of 190. You have a lot of observations but spreading them out over 25 cells means that you are likely to fail the validity check.

slamer2000
Level II

Re: How do I correct the cells that have an expected count less than 5?

Thanks Mark for sticking with me so far! I've tried collapsing the levels together and doing all the things you mentioned. And silly silly me, I realized that I've been looking at the actual counts, forgetting that they're not the same as expected counts, which is what you were saying earlier, that validity is based not on zero counts but expected counts less than 5. I already knew this but it just went over my head. Anyways, sorry about that. Looking now at my results after making all the changes and adjustments you suggested, it seems with my data, theres always this one (or two) lingering cell(s) that have expected counts less than 1 like in the below example (Figure 1).

 

I found this guideline at a website: "If both variables have 4 to 6 levels, then you can trust the results if either of the following is true:

1. All cells have expected counts of at least 2.

2. All cells have expected counts of at least 1, and 50% or fewer of the cells have expected counts of less than 5."

 

And so I'm hoping to at least satisfy the second point but as you can see, one of the cells under Neutral/D/SD, doesn't quite cut it. In the end though, after getting your suggestion (as well as others from forums) to use other association tests for ordinal data, I'll try to experiment using other tests suited for ordinal data if theres not much more I can do. But I'd still like to get your thoughts about whether theres anyway I might be able eventually utilize the chisquare method for this data. Unless, you strongly suggest that I just move on from this and use association tests for ordinal data since it would give better results.

 

Also, I'm not exactly sure how to do ordinal measures of association on JMP but I'm guessing that when you use the Fit Y by X function, it automatically does all these tests for you depending on what your data is like. As I found when I clicked the orange tab button next to "Contingency.." and selected Measures of Association, it showed me all these outputs for a lot of different tests (figure 2), some of which I recognized as tests for ordinal data like Somers'd. Is this how I find the analysis results for different tests without having to manually select the right test like how SPSS would require you to do?

 

Figure 1Figure 1Figure 2Figure 2