BookmarkSubscribeRSS Feed
tallman

Community Trekker

Joined:

Aug 12, 2014

Validation and weighting

I'm working with a file of approximately 8500 records where approxmately 100 have a resonse variable value Y and the rest a value of N.  I'm running a Partition Model on the response variable Y and then adding in various X, Factors.  Given the relatively low number of "responses" I have two questions when using this model.

1) shoud I weight the resonse variables using conditional formatting (adding a new column)?

2) what percentage should I use for the validation portion (20%, 30%...)?

9 REPLIES
cwillden

Community Trekker

Joined:

May 1, 2017

Re: Validation and weighting

For that total size, I would do a 70/30 ro 60/40 split.  You could try both.  How many factors do you have?

-- Cameron Willden
tallman

Community Trekker

Joined:

Aug 12, 2014

Re: Validation and weighting

Thanks Cameron,

I have 8 factors to start by may narrow that down.  Would you suggest weighting as well?

cwillden

Community Trekker

Joined:

May 1, 2017

Re: Validation and weighting

I probably would, though I guess it depends a bit on what a good classifier is for your situation.  If you want make sure there's a huge penalty for missing the "Y"  responses to make sure you do an adequate job at predicting those, that would be a good idea.  I'm not an expert in this area, but I would probably play around with different weights and observe the impact on the confusion matrix.

-- Cameron Willden
chris_kirchberg

Joined:

May 28, 2014

Re: Validation and weighting

You might also want to include the profit matrix as a weighting scheme. Take a look at this link to learn a little more:

https://www.jmp.com/support/help/13-2/Specify_Profit_Matrix.shtml

 

Hope that helps.

Chris

 

 

cwillden

Community Trekker

Joined:

May 1, 2017

Re: Validation and weighting

@chris_kirchberg, That is really cool and is exactly the kind of thing I had in mind.  Learned something new today.

-- Cameron Willden
tallman

Community Trekker

Joined:

Aug 12, 2014

Re: Validation and weighting

Great info, will incorporate that.  Now that I've run various validation proportions and both weighted and unweighted I do get slighly varying results as would be expected.  What statistical factors would you suggest paying the most attention to as far as choosing the best results from the modeling?

cwillden

Community Trekker

Joined:

May 1, 2017

Re: Validation and weighting

Don't try to optimize too much or you'll overfit your validation set.  Just pick a setting that results in relative agreement between the training and validation set and go with it.

-- Cameron Willden
tallman

Community Trekker

Joined:

Aug 12, 2014

Re: Validation and weighting

Thanks.  And when you say relative agreement, how best would you gauge that? Simlar R2 values for both training and validation?

 

Tom

cwillden

Community Trekker

Joined:

May 1, 2017

Re: Validation and weighting

Sensitivity and specifity are probably better metrics.
-- Cameron Willden