cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
joshua
Level III

How to check categorical to categorical correlations in data

Hi,

I have data like this and I would like to correlation between fail column to other categorical columns.

What is stat test in jmp and how can we apply that? 
Any suggestions?

 

joshua_0-1608843343180.png

 

4 REPLIES 4
txnelson
Super User

Re: How to check categorical to categorical correlations in data

Use the Fit Y by X platform. Place the Fail in as the X Factor and the other columns of interest in as the Response variables. It will provide you with an R Square and a Chi Square test of significance
Jim

Re: How to check categorical to categorical correlations in data

I want to add a few comments to @txnelson's guidance. First, I don't know your statistics background, so I assume that it is minimal. You will get a lot out of the explanation and examples for the Contingency platform.

 

I also recommend that you use the Multiple Correspondence Analysis platform. It is specifically designed for categorical data across many levels and variables. Its primary strength is a bi-plot to aid interpretation. The chi square statistics can be used to decide the statistical significance of the association. The odds ratio can be used to assess the strength of the association. But neither of these statistics tell you about the nature of the association. MCA will do that for you.

joshua
Level III

Re: How to check categorical to categorical correlations in data

Hi Mark,@markbailey

 

Thanks for links. Yes my stat is limited but I have ambition to learn it;) I tried one of the example and it gave very complex result that I can understand for me to digest.

 

When I looked to https://www.jmp.com/support/help/en/15.2/#page/jmp/multiple-correspondence-analysis.shtml#

 

I don't understand the how chi squared number is good or bad based on the number.

What is the threshold to determine to here?

How to select y axis and x axis dimensions. Which one is the best there multiple dimensions in the dropdown ?

What is lambda number in y and x axis ?

 

joshua_0-1608923156226.png

 

 

Re: How to check categorical to categorical correlations in data

JMP output can be rather verbose at times. But you do not necessarily need all of it. I will use the same example for my explanation.

 

The MCA bi-plot shows you all the levels of each variable to understand the nature of the association between variables. The chi square value is a distance measure. So there is no good or bad chi square. There is no threshold for comparison. It is just information. The idea is that you might first decide if an association is statistically significant and then investigate the nature of it with MCA. MCA is interpretative.

 

Let's say that two variables are strongly associated. In what way? The MCA plot tells you. Your original categorical variables are re-cast into a continuous measure of distance from the centroid of all the data. Think of centroid as the average row in your data table. The each new distance measure defines a new dimension or axis. These dimensions are orthogonal as you would expect for Cartesian coordinates. Each dimension represents unique information from all the original categorical variables. These dimensions are ordered in descending inertia. Think of inertia as the total distance from the centroid in one dimension. (The symbol for inertia is the lower case Greek lambda.) The first dimension therefore maximally separates the levels. The second dimension separates the levels less than the first but more than the remaining dimensions, and so on. So the first two dimensions show you the most information about the association.

 

Start with one variable, such as sex. Notice that the male and female levels in your plot are well away from the origin in opposite directions along the second dimension, but they are close along the first dimension (i.e., close to 0). Now look at another variable: country. The American and Japanese levels are well away from the origin along the first dimension but close to 0 on the second dimension. On the other hand, the European level is close to 0 on the first dimension but quite negative in the second dimension. You can assess both the direction and distance away from the origin for levels of the same variable.

 

Now consider two variables at a time, such as country and size. The general association between these two variables can be seen as Japanese with Small, American with Large, and finally European with Medium. You can only assess direction away from the origin for levels of different variables.