Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- JMP User Community
- :
- Discussions
- :
- Discussions
- :
- Categorical variables/New values for categorical v...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Sep 28, 2017 8:49 AM
(1611 views)

Hi,

I'm super new to predictive modeling, I'm hoping someone can help.

I have a data set that has 7 variables, 4 of which are categorical (2 nominal, 2 ordinal). I have partitioned my data into training and validation. I also have a "new" data set for which I want to run my model on.

In my training/validation partitions I have ordinal variable x1, it has the following values: b, c, d, e, f. I would like to be able to account for variable x1 having the following values in the new data set: a,b, c, d, e, f, g, h, knowing a is better than b, and g and h are worse than f. What is the best approach for doing something like this? I though perhaps I could create extra dummy variable columns to account for the new values that will are in the new data set, but it doesn't work very well.

Also, I have nominal variable x2. The only information I have is that value "J" commands a higher price than "none". Is there a way to build this into a model? Is conditional formula the way to go?

Thanks in advance for any help!

Solved! Go to Solution.

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Sep 29, 2017 7:02 AM
(2556 views)

Solution

If I understand your #1 correctly, X1 in your training and validation data set doesn't have levels a, g and h, but your "new" data does. If so, you wouldn't be able to make predictions with x1 from the new data simply because you don't have the estimates on X1a, Xg and X1h. You could randomly redraw your training and validation after mixing two data sets.

Regarding #2, you could enter X2 into your model as an ordinal variable to account for the ordering, J vs. None. (Note-the difference between nominal coding and ordinal coding is just the interpretation of the estimates of that variable; it doesn't affect the parameter estimates of other variables except the intercept or the goodness of fit.)

1 REPLY

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Sep 29, 2017 7:02 AM
(2557 views)

If I understand your #1 correctly, X1 in your training and validation data set doesn't have levels a, g and h, but your "new" data does. If so, you wouldn't be able to make predictions with x1 from the new data simply because you don't have the estimates on X1a, Xg and X1h. You could randomly redraw your training and validation after mixing two data sets.

Regarding #2, you could enter X2 into your model as an ordinal variable to account for the ordering, J vs. None. (Note-the difference between nominal coding and ordinal coding is just the interpretation of the estimates of that variable; it doesn't affect the parameter estimates of other variables except the intercept or the goodness of fit.)