- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Validation and weighting
I'm working with a file of approximately 8500 records where approxmately 100 have a resonse variable value Y and the rest a value of N. I'm running a Partition Model on the response variable Y and then adding in various X, Factors. Given the relatively low number of "responses" I have two questions when using this model.
1) shoud I weight the resonse variables using conditional formatting (adding a new column)?
2) what percentage should I use for the validation portion (20%, 30%...)?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Validation and weighting
For that total size, I would do a 70/30 ro 60/40 split. You could try both. How many factors do you have?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Validation and weighting
Thanks Cameron,
I have 8 factors to start by may narrow that down. Would you suggest weighting as well?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Validation and weighting
I probably would, though I guess it depends a bit on what a good classifier is for your situation. If you want make sure there's a huge penalty for missing the "Y" responses to make sure you do an adequate job at predicting those, that would be a good idea. I'm not an expert in this area, but I would probably play around with different weights and observe the impact on the confusion matrix.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Validation and weighting
You might also want to include the profit matrix as a weighting scheme. Take a look at this link to learn a little more:
https://www.jmp.com/support/help/13-2/Specify_Profit_Matrix.shtml
Hope that helps.
Chris
Data Scientist, Life Sciences - Global Technical Enablement
JMP Statistical Discovery, LLC. - Denver, CO
Tel: +1-919-531-9927 ▪ Mobile: +1-303-378-7419 ▪ E-mail: chris.kirchberg@jmp.com
www.jmp.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Validation and weighting
@Chris_Kirchberg, That is really cool and is exactly the kind of thing I had in mind. Learned something new today.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Validation and weighting
Great info, will incorporate that. Now that I've run various validation proportions and both weighted and unweighted I do get slighly varying results as would be expected. What statistical factors would you suggest paying the most attention to as far as choosing the best results from the modeling?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Validation and weighting
Don't try to optimize too much or you'll overfit your validation set. Just pick a setting that results in relative agreement between the training and validation set and go with it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Validation and weighting
Thanks. And when you say relative agreement, how best would you gauge that? Simlar R2 values for both training and validation?
Tom
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content