cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar

Multiple Logistic Regression Interpretation for Dependent Variables with Multiple Levels

I am trying to run a multivariate logistic regression. The dependent variable has two categories, and the first independent variable has five groups, and the second independent variable has two groups. 

 

Once I run the model (use Fit Model), the parameter estimates section reports the coefficient/beta value for four of the five groups for the first independent variable and one of the two groups for the second independent variable. I assume the parameter estimates reports one less than the number of categories for each variable because it chooses one group as the comparison group?

 

My second question has to do with the odds ratio generated. It seems like there is an odds ratio for combination of each of the five (and two) groups with each other, for a total of 25 (and 4) odds ratio. JMP shows it as Level1/Level 2. These odds ratio don't correspond to the coefficients under the parameter estimates section (since they do not equal to e^coefficient). What do these odds ratio represent? 

13 REPLIES 13
dale_lehman
Level VII

Re: Multiple Logistic Regression Interpretation for Dependent Variables with Multiple Levels

 On the first question, one group is the comparison group so only 4 coefficients are shown (put in plain terms, if an observation is not in one of those 4 groups, then we know it must be in the 5th so that is redundant information - which mathematically results in problems).  If you don't like the default group which is used as the comparison, then you can set Value Ordering for that variable to change it (it won't change the actual results, only the way they appear).

 

If I understand your second question, you are asking how the odds ratios are derived from the coefficients.  The coefficients are part of determining the probability of each classification.  The coefficients are part of a linear equation that determines the log of the odds, and the odds ratio does that calculation for each group and looks at the ratio of the 2.  I don't think there is an easy way to intuitively describe the relationship between the coefficients and the odds ratio - a number of steps are required to go from one set of numbers to the other.  I've tried to find good ways to explain this when I teach it, but I've always ended up having to show a series of equations that take you from one to the other (mathematically clear, but hardly intuitive).

Re: Multiple Logistic Regression Interpretation for Dependent Variables with Multiple Levels

Would you be able to show me the equations? To simplify and better understand the test, I took out the variable with 5 categories and just ran the Fit Model with the dichotomous variable. So, I have a dichotomous dependent variable (with categories A and B) and a dichotomous independent variable. 

 

The coefficient estimate (under Parameter Estimates) that I got for the dichotomous dependent variable (category A) was -0.86355. Under the odds ratio section, the Level1/Level 2 odds ratio for A/B is 0.177797 and the for B/A is 5.6243923. I understand B/A = 1/ A/B, but I thought the e^(coefficient estimate = -0.86355) should equal the Level1/Level 2 odds ratio for A/B (0.177797), but it does not. 

 

I also ran the same logistic regression test in SPSS, and the beta/coefficient estimate I got was -1.727.

 

EDIT:

I retried the Fit Model test with the dependent variable that has 5 categories: (Category 1, 2, 3, 4, and 5, with Category 5 being the default group). Under the odds ratio, I understand how the Level1/Level 2 Odds Ratio is calculated when Level1 and Level 2 are the non-default groups, i.e. Categories 1 - 4. If Level 1 = category 1, and Level 2 = category 4, then the Level1/Level2 odds ratio is just e^(coefficient estimate for category 1) / e^(coefficient estimate for category 4). 

 

However, I am confused when either Level 1 or Level 2 is the default group (Category 5). JMP calculates a Level1/Level2 odds ratio where Level 1 = non-default group and Level 2 = default group that does not equal e^(coefficient estimate for category non-default group). 

 

For example, here are my numbers

Parameter Estimates:

Category 1: 0.00630345

Category 2: -0.2234814

Category 3: - 0.0617221

Category 4:  0.11781092

 

Odds Ratio (Level 2 = Category 5, default group)

Category 1/Category 5: 0.9641408

Category 2/Category 5: 0.7662071

Category 3/Category 5: 0.9007356

Category 4/Category 5:  1.0778729

 

I may be wrong but I thought the parameter estimate for category 5, the default group is 0, so the odds ratio for category N/Category 5 should just equal e^(coefficient estimate for category N). But as you can see, e^(0.00630345) =/= 0.9641408

 

As before, the parameter estimate values in SPSS are also very different. 

 

Thank you for your help!

 

 

Re: Multiple Logistic Regression Interpretation for Dependent Variables with Multiple Levels

I also tried the test again with the dependent variable with 5 categories(let's say categories 1, 2, 3, 4, 5, with category 5 being the default group, so JMP calculates a coefficient estimate for categories 1, 2, 3, and 4). Under the odds ratio section, I understand how the Level1/Level 2 odds ratio is calculated when the two compared categories are the non-default ones, i.e. the Level1/Level 2 odds ratio for category 1 and category 4 is just e^(coefficient for category 1) / e^(coefficient for category 4).

 

However, I was wondering how the odds ratio are calculated when either Level1 or Level2 is the default group, i.e. category 5? I may be wrong, but I thought the coefficient for the default group is set to 0, so shouldn't the Level1/Level2 odds ratio be e^(coefficient for category n) where n = 1, 2, 3, or 4?

 

Here are my numbers:

Parameter Estimates:

Category 1: 0.00630345

Category 2: -0.2234841

Category 3: -0.0617221

Category 4: 0.11781092

 

Level1/Level2 odds ratio, where Level2 = Category 5 (default group)

Category 1/Category 5: 0.9641408

Category 2/Category 5: 0.7662071

Category 3/Category 5: 0.9007356

Category 4/Category 5: 1.0778729

 

As you can see, e^(0.00630345) =/= 0.9641408

 

Thanks for your help!

Re: Multiple Logistic Regression Interpretation for Dependent Variables with Multiple Levels

Forgive me if this is a duplicate, it seemed like my second reply did not post:

 

I retried the Fit Model test with the dependent variable that has 5 categories: (Category 1, 2, 3, 4, and 5, with Category 5 being the default group). Under the odds ratio, I understand how the Level1/Level 2 Odds Ratio is calculated when Level1 and Level 2 are the non-default groups, i.e. Categories 1 - 4. If Level 1 = category 1, and Level 2 = category 4, then the Level1/Level2 odds ratio is just e^(coefficient estimate for category 1) / e^(coefficient estimate for category 4). 

 

However, I am confused when either Level 1 or Level 2 is the default group (Category 5). JMP calculates a Level1/Level2 odds ratio where Level 1 = non-default group and Level 2 = default group that does not equal e^(coefficient estimate for category non-default group). 

 

For example, here are my numbers

Parameter Estimates:

Category 1: 0.00630345

Category 2: -0.2234814

Category 3: - 0.0617221

Category 4:  0.11781092

 

Odds Ratio (Level 2 = Category 5, default group)

Category 1/Category 5: 0.9641408

Category 2/Category 5: 0.7662071

Category 3/Category 5: 0.9007356

Category 4/Category 5:  1.0778729

 

I may be wrong but I thought the parameter estimate for category 5, the default group is 0, so the odds ratio for category N/Category 5 should just equal e^(coefficient estimate for category N). But as you can see, e^(0.00630345) =/= 0.9641408

 

Thank you for your help!

Re: Multiple Logistic Regression Interpretation for Dependent Variables with Multiple Levels

The parameter for the last level is constrained by the first 4 parameters. JMP uses 'effect coding' for the parameterization of the logistic model. The parameter estimates are constrained to sum to zero, so sum the first 4 parameters, negate the sum, and that is the 'estimate' of the last parameter. You will get 0.16108913 for the last parameter in your case. Does that value fit with you expected odds?

Re: Multiple Logistic Regression Interpretation for Dependent Variables with Multiple Levels

Hi Mark, Thanks for your reply. That value doesn't fit with my expected odds. For example, JMP calculated the odds ratio for Category 1/Category 5: 0.9641408. If my Category 1 estimate is Category 1: 0.00630345 and Category 5 estimate is 0.16108913, e^(0.00630345)/ e^(0.16108913) still does not equation the odds ratio of 0.9641408. 

Re: Multiple Logistic Regression Interpretation for Dependent Variables with Multiple Levels

I should have thought of this answer at the beginning. Here is an example using Big Class data table.

 

dialog.PNG

 

I suggest that you click the red triangle at the top of the Nominal Logistic Regression platform and select Save Probability Formula.

 

menu.PNG

 

You will see more than one new data column.

 

table.PNG

 

This untidy result is actually great because you can look at the prediction in stages. The "lin[X]" column captures the linear predictor and the estimates. It predicts the logit or log( odds ).

 

lin formula.PNG

 

The "Prob[X]" column shows the back-transformation of the linear predictor for the logit response to a probability response.

 

prob formula.PNG

 

These probabilities can be used to compute odds and odds ratios. I hope that these formulas and results help answer your question.

Re: Multiple Logistic Regression Interpretation for Dependent Variables with Multiple Levels

Hi Mark, thank you for your reply. I am familiar with the "Save Probability Formula" and indeed it is useful to see the predicted log odds ratio and probability for each data point.

 

However, I still do not know how JMP calculates the odds ratio when one of the groups is the reference group, as I mentioned above. Note, I am doing a multivariable logistic regression, so the formula I am using might be wrong. Also, the odds ratios I am referring to are not the ones for each individual data point, but the ones that show up when you click the red triangle and select "odds ratios". 

 

So to go back to my example, JMP calculated the odds ratio for Category 1/Category 5: 0.9641408. My Category 1 parameter estimate is Category 1: 0.00630345 and Category 5 estimate is parameter 0.16108913. The formula I am using to calculate the odds ratio that JMP calculates is: e^(0.00630345)/ e^(0.16108913), but this value still not equal the odds ratio of 0.9641408.

 

Thank you very much for your time. 

Re: Multiple Logistic Regression Interpretation for Dependent Variables with Multiple Levels

I am sure that JMP is correct. I suggest that you start at the bottom and build up to the odds ratio to check the calculations. So you have a linear predictor, f(X)? What is it? You can see the linear predictor in my example above in the fourth picture. Is yours a single 5-level categorical factor and the overall mean (intercept)?

 

The estimates are for the parameters in the linear predictor for the logit, or log odds for the target. So e^f(X) should be the odds for the target, conditioned the levels of X vector. So the ratio of two properly conditioned odds should be what you want. You need the linear predictor, not a single parameter. I think that omission might be the problem.