JMPer Cable

KristenBradford · Apr 10, 2023 10:00 AM

Have you ever been tasked with predicting who will cancel their services with your company? Or which patients will discontinue their medication? Or perhaps which individuals will be the early adopters of your new product? You might answer these business questions with the same method, carefully preparing data and building different models to compare and choose the best based on specific statistical criteria. Piece of cake, right? Maybe.

But what if you need to pinpoint the main contributing factor to a prediction for a single person? Some modeling algorithms do not easily answer this. Traditional regression models are great for answering this kind of question because there is an equation with coefficients for each parameter estimate. What about for tree-based algorithms or even more complicated models like neural networks? Explaining an individual prediction is more complex and often difficult to explain. And even though one of these more complex algorithms is more reliable for predicting an outcome in many cases, often, stakeholders scrap your hard work.

What if you could get the best of both worlds? The best performing model AND a way to easily explain the individual predicted probabilities to your stakeholders so that your work has a better chance to be used as a business tool. Meet Shapley Values—a new feature in JMP 17.

Shapley Values provide the solution we have all been looking for to solve this kind of situation. These values quantify the contribution of each factor in your model down to the row level. Using the average prediction as a baseline, each row of data can be broken down into a sum of probabilities by factor. This means you can say that Sean has a 79% chance of discontinuing his medication, and compared to the average patient, the primary reason for his probable discontinuation is his age.

Why is this such a big deal? Without Shapley Values, we could speak more generally about the driving factors for an outcome in the population of interest, but comparing across individuals with similar characteristics and understanding why the prediction differs between them is not so straightforward. In tree-based algorithms, the prediction formula is created based on individual leaves or trees. And let’s be honest, they can get quite messy. Shapley Values solves this issue by breaking down the prediction into individual contributions for each factor, which looks across the entire model instead of leaf by leaf, tree by tree, or node by node.

Combined Infographic.png

How can you get started exploring this new feature? It’s simple. Add the Profiler to your model results and select the option for Save Shapley Values. This will save individual columns to your data table for each factor/response level combination.

3 Final.png

So the next time your stakeholders ask if you can pinpoint which patients are most likely to discontinue a medication based on a factor that can be influenced by a pharmacist’s actions or a marketing tactic, you can feel confident that you can say yes, no matter what type of model you’re creating.

jpol · ‎04-11-2023

Thanks Kristen for this article on Shapley Values.

Could you please post the data table used here in order to run the same example on my own laptop?

Thanks,

Philip

KristenBradford · ‎04-18-2023

@jpol Certainly! This example was made with mock/sample data to illustrate the feature in a clear and concise manner. Row 734 in the attached data table corresponds to Sean in my screenshots. Please let me know if you have more questions about this feature. It's truly one of the most exciting recent additions to the software (in my opinion!).

Kind Regards,

Kristen

jpol · ‎04-19-2023

Thanks Kristen for sharing.

BR.

Philip

PatrickGiuliano · ‎05-03-2023

@KristenBradford Wonderfully done thanks for sharing this practical example!

dmmdiego · ‎09-01-2023

This is a great option, and I love that it's being made available for many models (in prior versions, onlyb the XGBoost add-in had the ability to create SHAP values).

However, the option right now is very slow. I have tried it even in smaller datasets and often times JMP is unresponsive. I suggest focusing on parallelization optimizing the code which runs behind the scenes to make it faster, right now with such slow speeds, it's more likely that the software will just crash.

PatrickGiuliano · ‎04-11-2024

@dmmdiego Thank you for your feedback on the performance. Would you mind sharing this with JMP Technical Support (please send an email to support@jmp.com) and providing an example where the performance is slow or unresponsive? This will prompt us to engage with development so we can tangibly make it better in future releases. In your email, please copy this post and mention that we had some dialog about it here.

Cheers,

@PatrickGiuliano (JMP Technical Support)

lala · ‎04-11-2024

Can this example directly give the final analysis result using the decision tree model?

Thanks!

lala · ‎04-12-2024

I found the resulting action steps in the picture.

But what if you could do this directly with JSL?

Thanks Experts!