cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar

using categorical data with random forests in JMP

We are having a little trouble wrapping our heads around how JMP treats categorical data in random forests. We have created a small pilot data set and mapped the categorical data using a variety of techniques including many suggested in this forum. However, I don't really understand why we should see so much of a difference in performance when using these mappings. If I am mapping a discrete set of values to another discrete set of values (e.g., character strings to integers), why should it make so much of a difference in JMP?

We don't see this kind of variation when using Python or MATLAB's random forest algorithms. With JMP, the difference in error rates for held out data and on the training set are significant. 

We have read most of the posts on this topic, and can supply more specifics, including a trial data set, if necessary. But before we jump into that rabbit hole of choosing a method that optimizes performance in JMP, I was hoping someone could briefly explain why their implementation of random forests is so sensitive to how you map categorical data.

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: using categorical data with random forests in JMP

Are the mapped integer values using the nominal modeling type, or the default continuous modeling type?

View solution in original post

2 REPLIES 2

Re: using categorical data with random forests in JMP

Are the mapped integer values using the nominal modeling type, or the default continuous modeling type?

Re: using categorical data with random forests in JMP

Hi Mark,

Good point. My grad student doing this work said "oh!"

Thanks very much,

-Joe