BookmarkSubscribe
Choose Language Hide Translation Bar
tallman
Community Trekker

Validation and weighting

I'm working with a file of approximately 8500 records where approxmately 100 have a resonse variable value Y and the rest a value of N.  I'm running a Partition Model on the response variable Y and then adding in various X, Factors.  Given the relatively low number of "responses" I have two questions when using this model.

1) shoud I weight the resonse variables using conditional formatting (adding a new column)?

2) what percentage should I use for the validation portion (20%, 30%...)?

0 Kudos
9 REPLIES 9
cwillden
Super User

Re: Validation and weighting

For that total size, I would do a 70/30 ro 60/40 split.  You could try both.  How many factors do you have?

-- Cameron Willden
0 Kudos
tallman
Community Trekker

Re: Validation and weighting

Thanks Cameron,

I have 8 factors to start by may narrow that down.  Would you suggest weighting as well?

0 Kudos
cwillden
Super User

Re: Validation and weighting

I probably would, though I guess it depends a bit on what a good classifier is for your situation.  If you want make sure there's a huge penalty for missing the "Y"  responses to make sure you do an adequate job at predicting those, that would be a good idea.  I'm not an expert in this area, but I would probably play around with different weights and observe the impact on the confusion matrix.

-- Cameron Willden
0 Kudos

Re: Validation and weighting

You might also want to include the profit matrix as a weighting scheme. Take a look at this link to learn a little more:

https://www.jmp.com/support/help/13-2/Specify_Profit_Matrix.shtml

 

Hope that helps.

Chris

 

 

Chris Kirchberg
Principal Systems Engineer, Life Sciences - JMP Global Technical Enablement
SAS Institute, Inc. - Denver, CO
Tel: +1-919-531-9927 ▪ Mobile: +1-303-378-7419 ▪ E-mail: chris.kirchberg@jmp.com
JMP – A Division of SAS Institute | www.jmp.com
cwillden
Super User

Re: Validation and weighting

@chris_kirchberg, That is really cool and is exactly the kind of thing I had in mind.  Learned something new today.

-- Cameron Willden
0 Kudos
tallman
Community Trekker

Re: Validation and weighting

Great info, will incorporate that.  Now that I've run various validation proportions and both weighted and unweighted I do get slighly varying results as would be expected.  What statistical factors would you suggest paying the most attention to as far as choosing the best results from the modeling?

0 Kudos
cwillden
Super User

Re: Validation and weighting

Don't try to optimize too much or you'll overfit your validation set.  Just pick a setting that results in relative agreement between the training and validation set and go with it.

-- Cameron Willden
0 Kudos
tallman
Community Trekker

Re: Validation and weighting

Thanks.  And when you say relative agreement, how best would you gauge that? Simlar R2 values for both training and validation?

 

Tom

0 Kudos
cwillden
Super User

Re: Validation and weighting

Sensitivity and specifity are probably better metrics.
-- Cameron Willden
0 Kudos