Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- JMP User Community
- :
- Discussions
- :
- Discussions
- :
- Re: Categorical variables/New values for categorical variable?

Topic Options

- Start Article
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Sep 28, 2017 8:49 AM
(3046 views)

Hi,

I'm super new to predictive modeling, I'm hoping someone can help.

I have a data set that has 7 variables, 4 of which are categorical (2 nominal, 2 ordinal). I have partitioned my data into training and validation. I also have a "new" data set for which I want to run my model on.

In my training/validation partitions I have ordinal variable x1, it has the following values: b, c, d, e, f. I would like to be able to account for variable x1 having the following values in the new data set: a,b, c, d, e, f, g, h, knowing a is better than b, and g and h are worse than f. What is the best approach for doing something like this? I though perhaps I could create extra dummy variable columns to account for the new values that will are in the new data set, but it doesn't work very well.

Also, I have nominal variable x2. The only information I have is that value "J" commands a higher price than "none". Is there a way to build this into a model? Is conditional formula the way to go?

Thanks in advance for any help!

Solved Go to Solution

1 ACCEPTED SOLUTION

Accepted Solutions

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

If I understand your #1 correctly, X1 in your training and validation data set doesn't have levels a, g and h, but your "new" data does. If so, you wouldn't be able to make predictions with x1 from the new data simply because you don't have the estimates on X1a, Xg and X1h. You could randomly redraw your training and validation after mixing two data sets.

Regarding #2, you could enter X2 into your model as an ordinal variable to account for the ordering, J vs. None. (Note-the difference between nominal coding and ordinal coding is just the interpretation of the estimates of that variable; it doesn't affect the parameter estimates of other variables except the intercept or the goodness of fit.)

1 REPLY 1

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

If I understand your #1 correctly, X1 in your training and validation data set doesn't have levels a, g and h, but your "new" data does. If so, you wouldn't be able to make predictions with x1 from the new data simply because you don't have the estimates on X1a, Xg and X1h. You could randomly redraw your training and validation after mixing two data sets.

Regarding #2, you could enter X2 into your model as an ordinal variable to account for the ordering, J vs. None. (Note-the difference between nominal coding and ordinal coding is just the interpretation of the estimates of that variable; it doesn't affect the parameter estimates of other variables except the intercept or the goodness of fit.)