Use consistent labels for categorical variables in regression output

bob_stine · ‎03-13-2023

The format for "one-hot" or "indicator parameterization" of categorical variables is not the same when using the standard least squares and generalized regression personalities of Fit Model.

For example, the following regression has a continuous response and two predictors (FICO, a continuous variable) and Loan Type (two categories, O and R). Fit a least squares regression with these two predictors using Fit Model. With the standard least squares personality and the indicator parameterization, the output looks like the following:

If instead you fit the same model but select a generalized regression with a normal model for the errors (and so get an OLS fit), the output looks like this:

The estimates are the same, but the labeling for the dummy variable has changed.

I prefer the labeling of the generalized regression since it indicates the left-out group, but whichever is used should be the same. Students find the change in labeling confusing.