cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Submit your abstract to the call for content for Discovery Summit Americas by April 23. Selected abstracts will be presented at Discovery Summit, Oct. 21- 24.
Discovery is online this week, April 16 and 18. Join us for these exciting interactive sessions.
Choose Language Hide Translation Bar
Annapurna20
Level I

Categorical variables/New values for categorical variable?

Hi, 

 

I'm super new to predictive modeling, I'm hoping someone can help. 

 

I have a data set that has 7 variables, 4 of which are categorical (2 nominal, 2 ordinal).  I have partitioned my data into training and validation. I also have a "new" data set for which I want to run my model on.  

 

In my training/validation partitions I have ordinal variable x1, it has the following values: b, c, d, e, f.  I would like to be able to account for variable x1 having the following values in the new data set: a,b, c, d, e, f, g, h, knowing a is better than b, and g and h are worse than f.  What is the best approach for doing something like this?  I though perhaps I could create extra dummy variable columns to account for the new values that will are in the new data set, but it doesn't work very well. 

 

Also, I have nominal variable x2.  The only information I have is that value "J" commands a higher price than "none". Is there a way to build this into a model?  Is conditional formula the way to go?

 

Thanks in advance for any help! 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
jiancao
Staff

Re: Categorical variables/New values for categorical variable?

If I understand your #1 correctly, X1 in your training and validation data set doesn't have levels a, g and h, but your "new" data does.  If so, you wouldn't be able to make predictions with x1 from the new data simply because you don't have the estimates on X1a, Xg and X1h. You could randomly redraw your training and validation after mixing two data sets. 

Regarding #2, you could enter X2 into your model as an ordinal variable to account for the ordering, J vs. None. (Note-the difference between nominal coding and ordinal coding is just the interpretation of the estimates of that variable; it doesn't affect the parameter estimates of other variables except the intercept or the goodness of fit.)   

View solution in original post

1 REPLY 1
jiancao
Staff

Re: Categorical variables/New values for categorical variable?

If I understand your #1 correctly, X1 in your training and validation data set doesn't have levels a, g and h, but your "new" data does.  If so, you wouldn't be able to make predictions with x1 from the new data simply because you don't have the estimates on X1a, Xg and X1h. You could randomly redraw your training and validation after mixing two data sets. 

Regarding #2, you could enter X2 into your model as an ordinal variable to account for the ordering, J vs. None. (Note-the difference between nominal coding and ordinal coding is just the interpretation of the estimates of that variable; it doesn't affect the parameter estimates of other variables except the intercept or the goodness of fit.)