Solved: Predictive Modeling with asymmetric cost / imbalanced data set

Nayimoni · Feb 27, 2017 05:05 AM

Hello,

How can I instruct predictive modeling platforms on JMP Pro (partition, random forests etc.) to assign a higher cost to falsely predicting the minority class. I have a dataset that is highly imbalanced with a high cost for false negative (readmission rate of acute patients) and when I use the classifiers the AUC is low and all positive cases are falsely classified.

Please help on tools available to address this issue.

Thanks

dale_lehman · Mar 1, 2017 09:13 AM

As has been pointed out, improving the AUC and incorporating asymmetric costs/benefits are two different things. The AUC has to do with the ability of the model to classify correctly, while the profit matrix steers the classification errors towards where they do the least damage. If the AUC measure is not bad but the classifications are making costly mistakes (like failing to classify any of the 1 values), then you should try a different cutoff probability for the classification - that is what the profit matrix will do automatically, but you can also change the cutoff probabilities by hand to alter the misclassifications. If you still don't like what you can get, then you want a better model - one that will produce a higher AUC. Nothing about the misclassification costs will raise the AUC - only a better model can raise the AUC. So, if you need a better model, then try other techniques, transforming your variables, changing the model settings, etc. but don't use the profit matrix to try to improve the model itself.

View solution in original post

ian_jmp · Feb 27, 2017 09:29 AM

Take a look at the profit matrix (and links off this).

Nayimoni · Feb 27, 2017 09:37 AM

Im not sure I understand how it works. My understanding is that the profit matrix appears after I run the model. I want to be able to specify a higher cost to the minority class and make sure the model takes into account that when building the prediction.

Can you elaborate on the procedure to achieve that?

Thanks

Nayimoni · Feb 28, 2017 01:13 AM

Any help on this please?

dale_lehman · Feb 28, 2017 07:22 AM

The profit matrix can be defined as a column property prior to building any predictive model and should do what you are asking for. However, I would note that I have had difficulty filling in the profit matrix - the example in the JMP help guide is clear but often does not match a real situation. If you decision problem is such that you can specify exactly what the cost of false positives and negatives are, then the profit matrix should work. For example, the JMP example is for an agent making airlline reservations, where a wrong prediction carries a known additional cost to the agent. On the other hand, if you are predicting something like customer retention, then the costs of a false prediction are not so clear. If you wrongly predict a customer will be lost, then you presumably will incur extra costs to try to keep the customer. For some customers this can turn them from lost to retained, but certainly this will not be 100% effective. It becomes complicated to figure out what to put in the profit matrix in such cases - and I would recommend not using that feature. Better to look at alternative cutoffs and confusioin matrices.

Nayimoni · Feb 28, 2017 11:48 AM

Thanks. I tried it but the AUC of my model is still the same under different methods. It did not force the model to assign a higher importance to the minority class.

dale_lehman · Feb 28, 2017 11:53 AM

Using the profit matrix will not change the AUC. It only changes the chosen probability cutoff - it chooses this according to the asymmetric profits/costs you specify. Essentially it is choosing a point on the ROC curve (I'm not sure, but I think it is an optimal point where the relative costs of errors is equal to the slope of the ROC curve) according to the profit matrix. When you save the predictiions, there should be additional columns showing the profitability associated with the optimal classification - and that classification should differ from the default classification (assuming the costs are asymmetric).

Nayimoni · Mar 1, 2017 07:19 AM

How then it can help me increase the sensitivity rate of my model? and AUC? what are the other ways to ensure the AUC takes into account the cost when classifying the outcomes? thanks

Peter_Bartell · Mar 1, 2017 07:46 AM

Sounds like now you are really just trying to get an improved AUC. Up till now you've only talked about tree based models. Have you tried neural network techniques?

Nayimoni · Mar 1, 2017 07:56 AM

I tried all of the models but because of this imbalanced data set all zero cases are classified correctly and all one cases are misclassified. The costing bias should logically be able to optimize the AUC but dont understand why it is not impacting it. That's what is puzzing me.

Predictive Modeling with asymmetric cost / imbalanced data set

Re: Predictive Modeling with asymmetric cost / imbalanced data set

Re: Predictive Modeling with asymmetric cost / imbalanced data set

Re: Predictive Modeling with asymmetric cost / imbalanced data set

Re: Predictive Modeling with asymmetric cost / imbalanced data set

Re: Predictive Modeling with asymmetric cost / imbalanced data set

Re: Predictive Modeling with asymmetric cost / imbalanced data set

Re: Predictive Modeling with asymmetric cost / imbalanced data set

Re: Predictive Modeling with asymmetric cost / imbalanced data set

Re: Predictive Modeling with asymmetric cost / imbalanced data set

Re: Predictive Modeling with asymmetric cost / imbalanced data set