cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
0 Kudos

Use consistent labels for categorical variables in regression output

The format for "one-hot" or "indicator parameterization" of categorical variables is not the same when using the standard least squares and generalized regression personalities of Fit Model. 

 

For example, the following regression has a continuous response and two predictors (FICO, a continuous variable) and Loan Type (two categories, O and R). Fit a least squares regression with these two predictors using Fit Model.  With the standard least squares personality and the indicator parameterization, the output looks like the following:

bob_stine_1-1678725746065.png

If instead you fit the same model but select a generalized regression with a normal model for the errors (and so get an OLS fit), the output looks like this:

bob_stine_2-1678725784763.png

The estimates are the same, but the labeling for the dummy variable has changed.

 

I prefer the labeling of the generalized regression since it indicates the left-out group, but whichever is used should be the same.  Students find the change in labeling confusing.