cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar

Interpretation of Dummy Variables in Stepwise Regression wtih {0-1} Next to Variable Name

Just a quick question on interpreting my dummy variables in a stepwise regression.

 

I have two categorical variables with two categories in each (Lower vs Upper cluster, and Male vs Female) coded as: Male = 1, and Lower Cluster = 1.

 

I am confused by the  { 0 - 1 } next to the categorical variable name under "parameter" in the screenshot below. Take the variable gender: does this mean a value of 1 (male) has a coeff of 0.947, or a value of 0 (female) has a coeff of 0.947?

 

AlphaStarfish74_0-1700592728307.png

 

1 ACCEPTED SOLUTION

Accepted Solutions
Victor_G
Super User

Re: Interpretation of Dummy Variables in Stepwise Regression wtih {0-1} Next to Variable Name

Hello @AlphaStarfish74,

 

Welcome in the Community !

 

Depending on which modeling platform you use, the coding of nominal factors can be different. You don't need to code the factors by yourself, you could have left the levels "Male/female" in the column "Gender", or "Cluster1/Cluster2" in the column "Cluster Lower".

In the Stepwise platform with the rules "Combine", the categorical variables are coded in a hierarchical fashion. The values you're seeing between brackets show the levels grouped in the term that most separate the mean of the response. In your case since you have only two levels for each of your categorical factor, you only see {L1-L2} with L1 and L2 the corresponding levels names of your factor.

Concerning the parameter estimate calculated, this represent the half difference in mean response when you go from level L2 to L1 on the considered factor (with the notation {L1-L2}). So in your case for "Gender", if you change the level from 1 to 0, this results in augmenting the mean response by 2x the corresponding estimate, so approximately 1,896.

 

You can find more information about the nominal coding of factors in the different platforms here : Models with Nominal and Ordinal Effects

And an example about nominal factor in model : Example of a Model with a Nominal Term

 

Hope this answer will help you,

 

Victor GUILLER
Scientific Expertise Engineer
L'Oréal - Data & Analytics

View solution in original post

3 REPLIES 3
Victor_G
Super User

Re: Interpretation of Dummy Variables in Stepwise Regression wtih {0-1} Next to Variable Name

Hello @AlphaStarfish74,

 

Welcome in the Community !

 

Depending on which modeling platform you use, the coding of nominal factors can be different. You don't need to code the factors by yourself, you could have left the levels "Male/female" in the column "Gender", or "Cluster1/Cluster2" in the column "Cluster Lower".

In the Stepwise platform with the rules "Combine", the categorical variables are coded in a hierarchical fashion. The values you're seeing between brackets show the levels grouped in the term that most separate the mean of the response. In your case since you have only two levels for each of your categorical factor, you only see {L1-L2} with L1 and L2 the corresponding levels names of your factor.

Concerning the parameter estimate calculated, this represent the half difference in mean response when you go from level L2 to L1 on the considered factor (with the notation {L1-L2}). So in your case for "Gender", if you change the level from 1 to 0, this results in augmenting the mean response by 2x the corresponding estimate, so approximately 1,896.

 

You can find more information about the nominal coding of factors in the different platforms here : Models with Nominal and Ordinal Effects

And an example about nominal factor in model : Example of a Model with a Nominal Term

 

Hope this answer will help you,

 

Victor GUILLER
Scientific Expertise Engineer
L'Oréal - Data & Analytics

Re: Interpretation of Dummy Variables in Stepwise Regression wtih {0-1} Next to Variable Name

Hey thanks so much for this response, I really appreciate it! Links provided are great. 

 

I think what had me second guessing was when I ran the nominal variable alone it looked like females were more negative, but when I added it into the final model it looked like they were more positive.... not sure why the sign changed from "-" to "+"

 

AlphaStarfish74_0-1700852309244.png vs 

AlphaStarfish74_1-1700852460060.png

 

Any thoughts on why this could be?

 

Victor_G
Super User

Re: Interpretation of Dummy Variables in Stepwise Regression wtih {0-1} Next to Variable Name

Hi @AlphaStarfish74,

 

Glad the answer was helpful !

 

On your second question, there may be two reasons that explain why the parameter estimate for "Sex" is different :

  1. You're comparing two models (with very different explainability performance through highly different R²) that do not include the same effects, so the estimates will be different. A model is not a fixed equation, it changes based on which effects it includes (and the type of modeling/analysis). In your first case, every responses is modeled through a very simple model, Comp Score = Intercept + a1x[Sex]. 
    In the second, you still have the same coefficient to estimate, but also the ones from "Education", "Village Cluster" and the other factors. So in order to take into account the influence of these additional factors without changing the response values, the estimate of "Sex" will be different, as well as the Intercept value.
  2. Depending on the modeling platform, the coding of nominal factor is different, which can also explain the difference in parameter estimates values. You can have more details about the coding of nominal factors through "Stepwise" and "Fit Model" platform here : 

 

I hope this complentary answer will help you uderstand your models' differences,

Victor GUILLER
Scientific Expertise Engineer
L'Oréal - Data & Analytics