cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
tallman
Level I

Validation and weighting

I'm working with a file of approximately 8500 records where approxmately 100 have a resonse variable value Y and the rest a value of N.  I'm running a Partition Model on the response variable Y and then adding in various X, Factors.  Given the relatively low number of "responses" I have two questions when using this model.

1) shoud I weight the resonse variables using conditional formatting (adding a new column)?

2) what percentage should I use for the validation portion (20%, 30%...)?

9 REPLIES 9
cwillden
Super User (Alumni)

Re: Validation and weighting

For that total size, I would do a 70/30 ro 60/40 split.  You could try both.  How many factors do you have?

-- Cameron Willden
tallman
Level I

Re: Validation and weighting

Thanks Cameron,

I have 8 factors to start by may narrow that down.  Would you suggest weighting as well?

cwillden
Super User (Alumni)

Re: Validation and weighting

I probably would, though I guess it depends a bit on what a good classifier is for your situation.  If you want make sure there's a huge penalty for missing the "Y"  responses to make sure you do an adequate job at predicting those, that would be a good idea.  I'm not an expert in this area, but I would probably play around with different weights and observe the impact on the confusion matrix.

-- Cameron Willden

Re: Validation and weighting

You might also want to include the profit matrix as a weighting scheme. Take a look at this link to learn a little more:

https://www.jmp.com/support/help/13-2/Specify_Profit_Matrix.shtml

 

Hope that helps.

Chris

 

 

Chris Kirchberg, M.S.2
Data Scientist, Life Sciences - Global Technical Enablement
JMP Statistical Discovery, LLC. - Denver, CO
Tel: +1-919-531-9927 ▪ Mobile: +1-303-378-7419 ▪ E-mail: chris.kirchberg@jmp.com
www.jmp.com
cwillden
Super User (Alumni)

Re: Validation and weighting

@Chris_Kirchberg, That is really cool and is exactly the kind of thing I had in mind.  Learned something new today.

-- Cameron Willden
tallman
Level I

Re: Validation and weighting

Great info, will incorporate that.  Now that I've run various validation proportions and both weighted and unweighted I do get slighly varying results as would be expected.  What statistical factors would you suggest paying the most attention to as far as choosing the best results from the modeling?

cwillden
Super User (Alumni)

Re: Validation and weighting

Don't try to optimize too much or you'll overfit your validation set.  Just pick a setting that results in relative agreement between the training and validation set and go with it.

-- Cameron Willden
tallman
Level I

Re: Validation and weighting

Thanks.  And when you say relative agreement, how best would you gauge that? Simlar R2 values for both training and validation?

 

Tom

cwillden
Super User (Alumni)

Re: Validation and weighting

Sensitivity and specifity are probably better metrics.
-- Cameron Willden