Our World Statistics Day conversations have been a great reminder of how much statistics can inform our lives. Do you have an example of how statistics has made a difference in your life? Share your story with the Community!
Choose Language Hide Translation Bar
Highlighted
Level I

## Categorical variables/New values for categorical variable?

Hi,

I'm super new to predictive modeling, I'm hoping someone can help.

I have a data set that has 7 variables, 4 of which are categorical (2 nominal, 2 ordinal).  I have partitioned my data into training and validation. I also have a "new" data set for which I want to run my model on.

In my training/validation partitions I have ordinal variable x1, it has the following values: b, c, d, e, f.  I would like to be able to account for variable x1 having the following values in the new data set: a,b, c, d, e, f, g, h, knowing a is better than b, and g and h are worse than f.  What is the best approach for doing something like this?  I though perhaps I could create extra dummy variable columns to account for the new values that will are in the new data set, but it doesn't work very well.

Also, I have nominal variable x2.  The only information I have is that value "J" commands a higher price than "none". Is there a way to build this into a model?  Is conditional formula the way to go?

Thanks in advance for any help!

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted
Staff

## Re: Categorical variables/New values for categorical variable?

If I understand your #1 correctly, X1 in your training and validation data set doesn't have levels a, g and h, but your "new" data does.  If so, you wouldn't be able to make predictions with x1 from the new data simply because you don't have the estimates on X1a, Xg and X1h. You could randomly redraw your training and validation after mixing two data sets.

Regarding #2, you could enter X2 into your model as an ordinal variable to account for the ordering, J vs. None. (Note-the difference between nominal coding and ordinal coding is just the interpretation of the estimates of that variable; it doesn't affect the parameter estimates of other variables except the intercept or the goodness of fit.)

Highlighted
Staff

## Re: Categorical variables/New values for categorical variable?

If I understand your #1 correctly, X1 in your training and validation data set doesn't have levels a, g and h, but your "new" data does.  If so, you wouldn't be able to make predictions with x1 from the new data simply because you don't have the estimates on X1a, Xg and X1h. You could randomly redraw your training and validation after mixing two data sets.

Regarding #2, you could enter X2 into your model as an ordinal variable to account for the ordering, J vs. None. (Note-the difference between nominal coding and ordinal coding is just the interpretation of the estimates of that variable; it doesn't affect the parameter estimates of other variables except the intercept or the goodness of fit.)

Article Labels

There are no labels assigned to this post.