Discussions

lucy_luo_conoco · Apr 6, 2018 09:18 AM

How can I identify which rows(observations) are my validation points when my target variable is category variable when I use random selections? such as example from https://www.jmp.com/support/help/14/example-of-profit-matrix-and-decision-matrix-rep.shtml#1276835 I know for continuous target variable, I could select rows from predict vs actual plot to know which observations are labeled as validation/training data. but I could not find this for categorical target variable. thanks

Peter_Bartell · Apr 6, 2018 11:10 AM

I think I see what you are doing...but I'm not 100% sure. It sounds like you might be selecting the validation % from the modeling platform dialogue launch? What I am suggesting is BEFORE modeling, using JMP Pro's Make Validation Column capability, create a column in the data table whose row values will be training, validation, and optionally, test. Then within the model launch window just place the validation column name in the Validation role and then run the platform as you would normally. See the attached example data table for what I'm suggesting...I didn't include predictor variable columns but it shows what I was suggesting in my first reply.

View solution in original post

Jeff_Perkinson · Apr 6, 2018 11:44 AM

You're using the "Validation Portion" option in the Partition launch dialog.

Unfortunately, I don't see a way to identifiy which rows are used for validation with this method.

If you've got JMP Pro, I agree with @Peter_Bartell. You can create a Validation column to specify which rows are used for validation.

-Jeff

View solution in original post

Peter_Bartell · Apr 6, 2018 12:03 PM

The other thing I'll suggest is if you have a categorical response, a general best practice wrt to creating the Validation column in JMP Pro's Make Validation Column capability is to select the Stratified Random option circled below, then select your target variable. This will force the randomization % you choose to apply to each level of the categorical variable...which helps prevent against model bias.

View solution in original post

Peter_Bartell · Apr 6, 2018 10:46 AM

Off the top of my head, one visual and interactive way to determine which rows are in any of the three potential Validation column variable row categories is to use the Distribution platform to create two distributions. One distribution is your target variable, the other the validation column. Then you can just click on each bar in the distribution plot of the Validation column, observing the changes occurring in the data table as well as the target variable distributions. Other methods could apply as well like using Row Selection...but I always prefer the visual over non visual methods.

lucy_luo_conoco · Apr 6, 2018 10:55 AM

Peter,

Thanks for your reply. But "the other the validation column" could not be found in the data or result when it is random split for categorical variable. Row selections only could be used for numerical target variables when result show plot of actural vs predicted (by highlighting data points from plot) . But for categorical target variable, the results have ROC, Matrix etc, and I could not identify which rows are from validation part, which rows are from training parts...

Peter_Bartell · Apr 6, 2018 11:10 AM

I think I see what you are doing...but I'm not 100% sure. It sounds like you might be selecting the validation % from the modeling platform dialogue launch? What I am suggesting is BEFORE modeling, using JMP Pro's Make Validation Column capability, create a column in the data table whose row values will be training, validation, and optionally, test. Then within the model launch window just place the validation column name in the Validation role and then run the platform as you would normally. See the attached example data table for what I'm suggesting...I didn't include predictor variable columns but it shows what I was suggesting in my first reply.

lucy_luo_conoco · Apr 6, 2018 02:33 PM

Peter and Jeff,

Thank you all for the answers and solutions.

I used and understand the features you mentioned.

Just wondering how JMP split data randomly ( in the model of Random forest, boosted tree etc.) and from JMP random's split, I could identify rows/observations as Training/Validation for categorical target variable.

from Jeff's post, looks like JMP didn't have a way to identify them.

Jeff_Perkinson · Apr 6, 2018 11:44 AM

You're using the "Validation Portion" option in the Partition launch dialog.

Unfortunately, I don't see a way to identifiy which rows are used for validation with this method.

If you've got JMP Pro, I agree with @Peter_Bartell. You can create a Validation column to specify which rows are used for validation.

-Jeff

Peter_Bartell · Apr 6, 2018 12:03 PM

The other thing I'll suggest is if you have a categorical response, a general best practice wrt to creating the Validation column in JMP Pro's Make Validation Column capability is to select the Stratified Random option circled below, then select your target variable. This will force the randomization % you choose to apply to each level of the categorical variable...which helps prevent against model bias.

Discussions

how to know what rows are my validation dataset when I run model with random split (JMP PRO)

Re: how to know what rows are my validation dataset when I run model with random split (JMP PRO)

Re: how to know what rows are my validation dataset when I run model with random split (JMP PRO)

Re: how to know what rows are my validation dataset when I run model with random split (JMP PRO)

Re: how to know what rows are my validation dataset when I run model with random split (JMP PRO)

Re: how to know what rows are my validation dataset when I run model with random split (JMP PRO)

Re: how to know what rows are my validation dataset when I run model with random split (JMP PRO)

Re: how to know what rows are my validation dataset when I run model with random split (JMP PRO)

Re: how to know what rows are my validation dataset when I run model with random split (JMP PRO)

Re: how to know what rows are my validation dataset when I run model with random split (JMP PRO)

Recommended Articles