Validation and weighting

Report Inappropriate Content · Apr 2, 2018 04:08 PM

I'm working with a file of approximately 8500 records where approxmately 100 have a resonse variable value Y and the rest a value of N. I'm running a Partition Model on the response variable Y and then adding in various X, Factors. Given the relatively low number of "responses" I have two questions when using this model.

1) shoud I weight the resonse variables using conditional formatting (adding a new column)?

2) what percentage should I use for the validation portion (20%, 30%...)?

cwillden · Apr 2, 2018 05:00 PM

For that total size, I would do a 70/30 ro 60/40 split. You could try both. How many factors do you have?

-- Cameron Willden

tallman · Apr 2, 2018 05:12 PM

Thanks Cameron,

I have 8 factors to start by may narrow that down. Would you suggest weighting as well?

cwillden · Apr 2, 2018 05:28 PM

I probably would, though I guess it depends a bit on what a good classifier is for your situation. If you want make sure there's a huge penalty for missing the "Y" responses to make sure you do an adequate job at predicting those, that would be a good idea. I'm not an expert in this area, but I would probably play around with different weights and observe the impact on the confusion matrix.

-- Cameron Willden

Chris_Kirchberg · Apr 2, 2018 06:07 PM

You might also want to include the profit matrix as a weighting scheme. Take a look at this link to learn a little more:

https://www.jmp.com/support/help/13-2/Specify_Profit_Matrix.shtml

Hope that helps.

Chris

Chris Kirchberg, M.S.²
Data Scientist, Life Sciences - Global Technical Enablement
JMP Statistical Discovery, LLC. - Denver, CO
Tel: +1-919-531-9927 ▪ Mobile: +1-303-378-7419 ▪ E-mail: chris.kirchberg@jmp.com
www.jmp.com

cwillden · Apr 2, 2018 06:34 PM

@Chris_Kirchberg, That is really cool and is exactly the kind of thing I had in mind. Learned something new today.

-- Cameron Willden

tallman · Apr 3, 2018 09:26 AM

Great info, will incorporate that. Now that I've run various validation proportions and both weighted and unweighted I do get slighly varying results as would be expected. What statistical factors would you suggest paying the most attention to as far as choosing the best results from the modeling?

cwillden · Apr 3, 2018 12:57 PM

Don't try to optimize too much or you'll overfit your validation set. Just pick a setting that results in relative agreement between the training and validation set and go with it.

-- Cameron Willden

tallman · Apr 3, 2018 02:51 PM

Thanks. And when you say relative agreement, how best would you gauge that? Simlar R2 values for both training and validation?

Tom

cwillden · Apr 3, 2018 05:12 PM

Sensitivity and specifity are probably better metrics.

-- Cameron Willden

Validation and weighting

Re: Validation and weighting

Re: Validation and weighting

Re: Validation and weighting

Re: Validation and weighting

Re: Validation and weighting

Re: Validation and weighting

Re: Validation and weighting

Re: Validation and weighting

Re: Validation and weighting

Recommended Articles

Get Going with JMP: Essentials for Using JMP

Introduction to the JMP Scripting Language