turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- JMP User Community
- :
- Discussions
- :
- How can I do a chi-square test when 20% of the cel...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

May 6, 2017 5:58 AM
(2191 views)

I've got these warning messages (see below) after doing some analysis with some variables. So to fix this problem, I read in a stats book that I need to follow some steps that it stated in order to correct the problem such as "If the smallest expected cell frequency is less than five, the analyst should recode a row or column variable, combining rows or columns until an adequate expected cell frequency is achieved."

However, I don't really understand this or how to do it in JMP. Can someone explain how I might change modify things so that I won't get his warning message either through the instructions from the book I've just mentioned or from a better way? I've also heard that I one can alternatively use the fisher's test?

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

May 7, 2017 2:13 PM
(2544 views)

Solution

The Chi square test used in the Contingency platform requires at least 80% of the cells to have an *expected count* greater than 5 or else the sum of the cell Chi squares will not have a Chi square distribution and so your test (*p*-value) will not be valid.

The usual, simple solution in such cases is to combine levels. For example, in your case you might replace the five levels with just three (Disagree, Neutral, Agree) and see if your assumption is valid. This way does not exclude any data from the analysis.

Learn it once, use it forever!

11 REPLIES

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

May 6, 2017 7:31 AM
(2188 views)

I made a quick scan of your results( which means my statistical viewpoint may be less than accurate), it appears that few respondants have answered "Disagree" or "Strongly Disagree". So you may want to do one of 2 things. Either eliminate those respondants, or recode their answers.

To eliminate those responses, you can open up a local data filter by going to the red triangle at the top of the display, and selecting Local Data Filter. There you will be able to select only the data you want. You can also go to the data table, Select all rows that match either of the choices, and then right click on the row state column for one of the selected rows, and select "Hide and Exclude". When you rerun the analysis those rows will be left out of the analysis.

If you want to recode, then go to the column in question, "Recycling can.........", select that column, and then in the Columns Panel at the left of the data table click on the red triangle and select Recode. The window that pops up will let you combine the Disagree and Strongly Disagree or maybe to take both of these choices and move them into Neutral

Jim

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

May 7, 2017 3:52 AM
(2167 views)

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

May 7, 2017 6:07 AM
(2159 views)

Your concerns are valid. And what I illustrated is a method to allow you to collapse or eliminate data so that you can meet the requirements of the statistical test. But as you indicate, that method comes with a cost.

Jim

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

May 7, 2017 6:56 AM
(2155 views)

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

May 7, 2017 7:34 AM
(2153 views)

The data are, what the data are. The statistics are based upon the relationships with between the different cells. You may be able to overcome the issue by increasing the number of data points, but that would require getting more subjects. Any concatenations, or eliminations that you choose to do, need to be evaluated from the real world. Many times, the very small cell counts are meaningless because of the low number of responses. In those cases, you really can not draw any opinion on those results anyway. So even if you left them in, I would not want to make any statement on those cells anyway. Taking them out, therefore, may allow you to show stronger relationships with the remaining data.

Jim

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

May 7, 2017 2:13 PM
(2545 views)

The Chi square test used in the Contingency platform requires at least 80% of the cells to have an *expected count* greater than 5 or else the sum of the cell Chi squares will not have a Chi square distribution and so your test (*p*-value) will not be valid.

The usual, simple solution in such cases is to combine levels. For example, in your case you might replace the five levels with just three (Disagree, Neutral, Agree) and see if your assumption is valid. This way does not exclude any data from the analysis.

Learn it once, use it forever!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

May 8, 2017 1:27 AM
(2129 views)

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

May 8, 2017 4:51 AM
(2116 views)

First of all, a clarification: the validity is not based on zero counts, but cells with expected counts less than five.

JMP provides the exact test only for 2x2 contingency tables. JMP Pro or SAS/STAT provides the exact test for larger tables.

Learn it once, use it forever!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

May 8, 2017 9:06 AM
(2106 views)

Yes, I am aware that the validity is based on cells with expected count less than 5, I just highlighted those cells to show why I wouldn't be able to combine the levels since there were cells with expected count less than 5 on both the neutral and the disagree levels.

Well it is unfortunate that I would need to get those software in order to do this test. So is there any other solution for my problem?