Subscribe Bookmark


May 27, 2014

Customizing your crosstabs: Categorical platform preferences

The Categorical platform has a wide array of statistics, including means, standard deviations, overall chi-squared tests, and pairwise cell and column chi-squared statistics. There are also Share and Frequency charts, summary reports to help navigate large numbers of tables, and specialized statistics for multiple responses.


Now, there may be some statistics you want to see all the time. For example, some JMP users always want to see the pairwise cell statistics (Compare Each Cell option). Others always want the means and standard deviations. To save a trip to the menu and a click, use the Categorical platform preferences. 


To get to the preferences specifically for the Categorical platform, go to File->Preferences->Platform->Categorical. Selecting that panel shows that nearly every option available on the Categorical menu can also be set as a preference. If you always want to see the Compare Each Cell tests, make sure the box is checked in the Preferences; you can also set which statistic should be used to determine statistical significance by default. Closing the Preferences window saves the preference, and future reports will have the Compare Each Cell statistics by default.




Now, if I open the Consumer table from the Sample Data library and run the “Career by Age Group” script, the letters for cell compares will appear by default. The letters correspond to columns (or rows) in the data table. When one column’s letter appears in a cell, that means they are statistically significantly different. When a lot of letters pile up in one cell, that means that cell is very different from the others in its row.




Compare Each Cell literally means that: JMP performs a chi-squared test for each pair of cells across each row. The null hypothesis for each test is that the Share percentages (labeled “Share” in the top left corner) are the same for that pair of cells (For more information on the Compare Each Cell statistic, see the JMP documentation).


Now, if you took a Statistics 101 class in college, this might bring back a memory of a harried professor jumping up and down and yelling something about “Experimentwise error” and “torturing your data.” If you were taking a nap about then, just think about this: Each test uses a p-value of 0.05 as its cut-off (0.10 for the lowercase letters). Testing each pair of cells across each row results in 42 tests. Even if the counts were totally random, the probability of seeing at least one letter in one cell is 0.88.


Because I used to be one of those harried professors, my preferred crosstab uses an overall chi-squared test for statistical significance. The overall tests are requested with the “Test Response Homogeneity”  option on the red triangle menu for the Categorical platform.  




The overall chi-squared test does one test per table. It’s less susceptible to the ballooning experimentwise error rate that you get with Compare Each Cell, but you do miss a key piece of information that the Compare Each Cell test gives you: Are there any cells that are really different? That is, which groups are driving that statistically significant statistic?


One easy way to see that is by adding the Cell Chisq statistic to the crosstab (right below Test Response Homogeneity on the red triangle menu). Again, going back to statistics 101, the chi-squared statistic comes from comparing the count we see in each cell to the count we would expect to see if there was no relationship between the Age Group column and the I am working on my career column. I can request the Cell Chi-Square statistic from the Categorical Platform red triangle menu.




This adds the color-coding to the report. For each cell, the color saturation indicates how different that cell’s count is from the expected count. Deep colors indicate very different from expected, and light colors indicate about the same as expected. The color indicates the direction of the difference. Red means high counts, and blue means low counts.



From the table, we can see that the younger Age Group (25-29) tended to answer “Agree” more often to the I am working on my career question, while people in the greater than 54 category tended to answer “Disagree” more often. Those two categories are the main drivers of the statistical significant result we see in the table. Now, with one test (i.e., without torturing our data), we were able to determine not just if there was a difference, but who in the survey showed the most differences from the “average” or “norm” for the rest of the sample.


The report above is just about right for me. It’s what I’d like to see every time I make a Categorical report. I could go to the File->Preferences menu and set these options to be this way all the time. But in the Categorical platform, there’s an easier way to take your current report and make it your default.


On the red triangle menu, almost at the bottom, there is a Set Preferences option.



Selecting that option from the red triangle menu brings up a dialog that shows you all of your current settings — and only the current settings that are relevant to what you’re seeing in your report right now. If that setting is on, the box immediately to its left is checked.



I can’t take credit for this idea — that belongs to the previous developer for the Categorical platform — but I absolutely love this feature. There are so many settings for the Categorical platform that you could quickly get lost in trying to set them all correctly. This simplifies that process for you. For every option that you want to make the default in the future, just check the “Set” checkbox on the left next to the name of the option.


If you look at the top right of the preferences box, you can see that Test Response Homogeneity and Cell Chisq are checked to show that they’re currently on. I know I always want those two tests, so I check those boxes. I also always want to see my Share Chart — because a visual tends to give more information than raw numbers, so I’ve checked that too.


The top left of the window has two options: 1) Submit Platform Preferences, which sets the Categorical platform preferences for your copy of JMP, and 2) Create Platform Preference Script. If you’re working with other JMP users, and you want to make sure you’re all creating the same reports, you can create a JSL script with your platform preferences and send it to your colleagues. If they run the script with their copy of JMP, they’ll get the same settings you have on your copy without having to go through all the steps.



Note: As the platform developer, I know which statistics are always on by default, so I know which boxes I have to check to get the report above. If you’re an experienced Categorical user, you probably know which statistics appear most of the time and can pick and choose. If I didn’t know, I could have just checked “Set” next to every option that was already checked. When I click OK, the window disappears, and my platform settings are set. Now if I run the “Home Needs Improvement by School Age Children” script in the Consumer table, the Cell Chisq and the Test Response Homogeneity options are on by default, and I get my preferred report without having to make any extra clicks.



If you find yourself making the same clicks over and over in any platform, chances are that you’d benefit from setting some of those platform preferences permanently by going to File->Preferences->Platforms. The Categorical platform has an added feature that lets you set the preferences based on the report in front of you, which saves you time, and keeps you from needing to remember every click you made to get there.