Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- JMP User Community
- :
- Discussions
- :
- Predictive Modeling with asymmetric cost / imbalanced data set

News

We’re asking you to select a content label when starting a new topic in the Discussions area. Read more to find out why.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Feb 27, 2017 2:05 AM
(5358 views)

Hello,

How can I instruct predictive modeling platforms on JMP Pro (partition, random forests etc.) to assign a higher cost to falsely predicting the minority class. I have a dataset that is highly imbalanced with a high cost for false negative (readmission rate of acute patients) and when I use the classifiers the AUC is low and all positive cases are falsely classified.

Please help on tools available to address this issue.

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

Highlighted
As has been pointed out, improving the AUC and incorporating asymmetric costs/benefits are two different things. The AUC has to do with the ability of the model to classify correctly, while the profit matrix steers the classification errors towards where they do the least damage. If the AUC measure is not bad but the classifications are making costly mistakes (like failing to classify any of the 1 values), then you should try a different cutoff probability for the classification - that is what the profit matrix will do automatically, but you can also change the cutoff probabilities by hand to alter the misclassifications. If you still don't like what you can get, then you want a better model - one that will produce a higher AUC. Nothing about the misclassification costs will raise the AUC - only a better model can raise the AUC. So, if you need a better model, then try other techniques, transforming your variables, changing the model settings, etc. but don't use the profit matrix to try to improve the model itself.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

10 REPLIES 10

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Predictive Modeling with asymmetric cost / imbalanced data set

Take a look at the profit matrix (and links off this).

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Predictive Modeling with asymmetric cost / imbalanced data set

Im not sure I understand how it works. My understanding is that the profit matrix appears after I run the model. I want to be able to specify a higher cost to the minority class and make sure the model takes into account that when building the prediction.

Can you elaborate on the procedure to achieve that?

Thanks

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Predictive Modeling with asymmetric cost / imbalanced data set

Any help on this please?

Highlighted
##

The profit matrix can be defined as a column property prior to building any predictive model and should do what you are asking for. However, I would note that I have had difficulty filling in the profit matrix - the example in the JMP help guide is clear but often does not match a real situation. If you decision problem is such that you can specify exactly what the cost of false positives and negatives are, then the profit matrix should work. For example, the JMP example is for an agent making airlline reservations, where a wrong prediction carries a known additional cost to the agent. On the other hand, if you are predicting something like customer retention, then the costs of a false prediction are not so clear. If you wrongly predict a customer will be lost, then you presumably will incur extra costs to try to keep the customer. For some customers this can turn them from lost to retained, but certainly this will not be 100% effective. It becomes complicated to figure out what to put in the profit matrix in such cases - and I would recommend not using that feature. Better to look at alternative cutoffs and confusioin matrices.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Predictive Modeling with asymmetric cost / imbalanced data set

Highlighted
##

Thanks. I tried it but the AUC of my model is still the same under different methods. It did not force the model to assign a higher importance to the minority class.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Predictive Modeling with asymmetric cost / imbalanced data set

Highlighted
##

Using the profit matrix will not change the AUC. It only changes the chosen probability cutoff - it chooses this according to the asymmetric profits/costs you specify. Essentially it is choosing a point on the ROC curve (I'm not sure, but I think it is an optimal point where the relative costs of errors is equal to the slope of the ROC curve) according to the profit matrix. When you save the predictiions, there should be additional columns showing the profitability associated with the optimal classification - and that classification should differ from the default classification (assuming the costs are asymmetric).

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Predictive Modeling with asymmetric cost / imbalanced data set

Highlighted
##

How then it can help me increase the sensitivity rate of my model? and AUC? what are the other ways to ensure the AUC takes into account the cost when classifying the outcomes? thanks

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Predictive Modeling with asymmetric cost / imbalanced data set

Highlighted
##

Sounds like now you are really just trying to get an improved AUC. Up till now you've only talked about tree based models. Have you tried neural network techniques?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Predictive Modeling with asymmetric cost / imbalanced data set

Highlighted
##

I tried all of the models but because of this imbalanced data set all zero cases are classified correctly and all one cases are misclassified. The costing bias should logically be able to optimize the AUC but dont understand why it is not impacting it. That's what is puzzing me.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Predictive Modeling with asymmetric cost / imbalanced data set