Subscribe Bookmark RSS Feed

Decision trees: same model inputs, different results

owiuser

Community Trekker

Joined:

Sep 23, 2011

I just repeated a decision tree analysis that I originally did yesterday. The data input and all the modeling options were identical in both runs. However the results differed. I tried it again and got yet a different set of results. Why does this happen?  It will be disconcerting to report results that can't be independently replicated.

1 ACCEPTED SOLUTION

Accepted Solutions
Solution

20% of your data is being held back for validation.  The validation data is sampled randomly so you will get a different set each time.

If you have JMP Pro you can create a validation column and control the hold-back sample that way.  If you don't have Pro, you can try this:

1. Add a new column and initialize the data as a random indicator - by default 20% will have the value 1 and 80% the value 0.

2. Exclude all the rows with value 1

3. Run the Partition platform (with 0 for validation portion),

It should automatically use the excluded rows for validation.

-Dave
4 REPLIES
Jeff_Perkinson

Community Manager

Joined:

Jun 23, 2011

What version of JMP are you using? What options are you specifying in the launch dialog?

-Jeff

-Jeff
owiuser

Community Trekker

Joined:

Sep 23, 2011

JMP 12.0.1

I have checked the informative missing box, held 0.2 for validation, and then manually performed 2 splits.  The response variable is categorical.  The same two X variables are selected in the different runs, but the cut-off points for the splits and model fitting results are different.

Solution

20% of your data is being held back for validation.  The validation data is sampled randomly so you will get a different set each time.

If you have JMP Pro you can create a validation column and control the hold-back sample that way.  If you don't have Pro, you can try this:

1. Add a new column and initialize the data as a random indicator - by default 20% will have the value 1 and 80% the value 0.

2. Exclude all the rows with value 1

3. Run the Partition platform (with 0 for validation portion),

It should automatically use the excluded rows for validation.

-Dave
owiuser

Community Trekker

Joined:

Sep 23, 2011

Thank you David!  I should have figured that out myself.

Dave