How can I instruct predictive modeling platforms on JMP Pro (partition, random forests etc.) to assign a higher cost to falsely predicting the minority class. I have a dataset that is highly imbalanced with a high cost for false negative (readmission rate of acute patients) and when I use the classifiers the AUC is low and all positive cases are falsely classified.
Please help on tools available to address this issue.
Solved! Go to Solution.
As has been pointed out, improving the AUC and incorporating asymmetric costs/benefits are two different things. The AUC has to do with the ability of the model to classify correctly, while the profit matrix steers the classification errors towards where they do the least damage. If the AUC measure is not bad but the classifications are making costly mistakes (like failing to classify any of the 1 values), then you should try a different cutoff probability for the classification - that is what the profit matrix will do automatically, but you can also change the cutoff probabilities by hand to alter the misclassifications. If you still don't like what you can get, then you want a better model - one that will produce a higher AUC. Nothing about the misclassification costs will raise the AUC - only a better model can raise the AUC. So, if you need a better model, then try other techniques, transforming your variables, changing the model settings, etc. but don't use the profit matrix to try to improve the model itself.
Im not sure I understand how it works. My understanding is that the profit matrix appears after I run the model. I want to be able to specify a higher cost to the minority class and make sure the model takes into account that when building the prediction.
Can you elaborate on the procedure to achieve that?
The profit matrix can be defined as a column property prior to building any predictive model and should do what you are asking for. However, I would note that I have had difficulty filling in the profit matrix - the example in the JMP help guide is clear but often does not match a real situation. If you decision problem is such that you can specify exactly what the cost of false positives and negatives are, then the profit matrix should work. For example, the JMP example is for an agent making airlline reservations, where a wrong prediction carries a known additional cost to the agent. On the other hand, if you are predicting something like customer retention, then the costs of a false prediction are not so clear. If you wrongly predict a customer will be lost, then you presumably will incur extra costs to try to keep the customer. For some customers this can turn them from lost to retained, but certainly this will not be 100% effective. It becomes complicated to figure out what to put in the profit matrix in such cases - and I would recommend not using that feature. Better to look at alternative cutoffs and confusioin matrices.
Thanks. I tried it but the AUC of my model is still the same under different methods. It did not force the model to assign a higher importance to the minority class.
Using the profit matrix will not change the AUC. It only changes the chosen probability cutoff - it chooses this according to the asymmetric profits/costs you specify. Essentially it is choosing a point on the ROC curve (I'm not sure, but I think it is an optimal point where the relative costs of errors is equal to the slope of the ROC curve) according to the profit matrix. When you save the predictiions, there should be additional columns showing the profitability associated with the optimal classification - and that classification should differ from the default classification (assuming the costs are asymmetric).
How then it can help me increase the sensitivity rate of my model? and AUC? what are the other ways to ensure the AUC takes into account the cost when classifying the outcomes? thanks
Sounds like now you are really just trying to get an improved AUC. Up till now you've only talked about tree based models. Have you tried neural network techniques?
I tried all of the models but because of this imbalanced data set all zero cases are classified correctly and all one cases are misclassified. The costing bias should logically be able to optimize the AUC but dont understand why it is not impacting it. That's what is puzzing me.