Welcome to the talk E xplainable AI: Unboxing the Black box.
Let's introduce ourselves and let's start with Laura.
Hello, I'm Laura Lancaster, and I'm a S tatistical Developer in JMP
and I'm located in the Cary office.
Thanks.
What about you, Russ?
Hey, everyone. Russ Wolfinger.
I'm the Director of Life Sciences R&D in the JMP Group
and a Research fellow as well.
Looking forward to the talk today.
And Pete?
My name's Peter Hersch.
I'm part of the Global Technical Enablement T eam
and I'm located in Denver, Colorado.
Great and my name is Florian Vogt.
I'm Systems Engineer for the Chemical Team in Europe
and I'm located in beautiful Heidelberg, Germany.
Welcome to the talk.
AI is a hot topic at the moment and a lot of people want to do it.
But what does that mean for the industries?
Does it mean that scientists and engineers need to become coders,
for processes in the future run by data scientists?
A recent publication called Industrial Data Science - a review
of Machine Learning Applications for Chemical and Process Industries
explains industrial data science fundamentals,
reviews industrial applications using
state of the art machine learning techniques
and it points out some important aspects of industrial AI.
These are the accessibility of AI, the understandability of AI
and the consumability of AI, and in particular the output.
We'll show you some of the features that we think
are contributing to this topic very well in JMP.
Before we start into the program of today, let's briefly review
what AI encompasses and what our focus today is located on.
I've picked a source that actually separates it into four groups.
Those groups are: first, supporting AI also called Reactive Machines
and this aims at decision support.
The second group is called Augmenting AI
or Limited Theory and that focuses on process optimization
and the third group is Automating AI,
or Theory of Mind, which, as the name suggests, aims on automation,
and the fourth is called autonomous AI
or self aware AI, which encompasses autonomous optimizations.
Today's focus is really on the first and the second topic.
We had a brief discussion before.
Russ, what are your thoughts on these also with respect to what JMP can cover?
Well, certainly the term AI gets thrown around a lot.
It's used in many kind of different nuanced meetings.
I tend to prefer meetings that are
definitely more tangible and usable and more focused.
Like we're going to zoom in on today with some specific examples.
The terminology can get a little confusing though.
I guess I just tend to kind of keep it fairly broad, open mind
whenever anyone uses the term AI and try to infer its meaning from the context.
Right. That's in terms of introduction,
now we'll get a little bit more into the details and specifically into
why it is important to actually understand your AI models.
Over to you, Pete.
Perfect. Thanks, Florian.
I think what Russ was hitting on there and Florian's introduction is,
we oftentimes don't know what an AI model is telling us and what's under the hood.
When we're thinking about how well a model performs,
we think about how well that fits the data.
If we look here, we're looking at a neural network diagram
and as you can see, these can get pretty complex.
These AI models are becoming more and more prevalent
and relied upon for decision making.
Really, understanding why an AI model is making a certain decision,
what criteria it's basing that decision on,
is imperative to taking full advantage of these models.
When a model changes or updates,
especially with that autonomous AI or the automating AI,
we need to understand why.
We need to confirm that this model is maybe not
extrapolating or basing it on a few points outside of our normal operating range.
Hold on.
Let me steal the screen here from Florian,
and I'm going to go ahead and walk through a case study here.
All right, so this case study
is based on directional drilling from wells near Fort Worth, Texas.
The idea with this type of drilling is
unlike conventional wells, where you would just go vertically,
you go down a certain depth, and then you start going horizontal.
The idea is these are much more efficient than the traditional wells.
You have these areas of trapped oil and gas that you can
get at with some special completion parameters.
We're looking at the data here from these wells,
and we're trying to figure out what are the most important factors,
including the geology, the location, and the completion factors,
and can we optimize these factors to increase or optimize our well production?
To give you an idea, here's a map of that basin,
so like I mentioned, this is Fort Worth, Texas.
You can see we have wells all around this.
We have certain areas where our yearly production is higher,
others where it's lower.
We wanted to ask a few questions looking at this data.
What factors have the biggest influence on production?
If we know certain levels for a new well,
can we predict what our production will be?
Is there a way to alter our factors, maybe some of the completion parameters
and impact our production?
We're going to go ahead and answer some questions with a model.
But before we get into that, I wanted to ask Russ,
since he's got lots of experience with this.
When you're starting to dig into data, Russ,
what's the best place to start and why?
Well, I guess maybe I'm biased, but I love JMP
for this type of application, Pete,
just because it's so well suited for quick exploratory data analysis.
You want to get a feel for what target you're trying to predict
and the predictors you're about to use, looking at their distributions,
checking for outliers or any unusual patterns in the data.
You may even want to do some quick pattern discovery clustering
or PCA type analysis just to get a feel for any structure that's in the data.
Then also be thinking carefully about
what performance metric would make the most sense for the application at hand.
Typically the common one, obviously for continuous predictors would be
like root means square error, but there could be cases where
may be that's not quite appropriate, especially if there's
direct cost involved, sometimes absolute error is more relevant
for a true profit- loss type decision.
These are all things that you want to start thinking about
as well as how you're going to validate your model.
I'm a big fan of k-fold cross validation.
Where you split your data into distinct subsets and hold one out
and being very careful about not allowing leakage or also careful about overfitting.
These are all concerns that tend
to come top of mind for me when I start out with a new problem.
Perfect. Thanks, Russ .
I'm going to walk through this in JMP
some of the tools we can use to start looking at our problem.
Then we're going to cover some of the things that help us determine
which of these factors are having the biggest impact on well production.
I'm going to show Variable Importance and then Shapley Values
and we'll have Laura talk to that and how we do that.
But first, let's go ahead and look at this data inside a JMP.
Like we mentioned here, I have my production from these wells.
I have some location parameters so where it is latitude and longitude,
I have some geologic parameters.
This is about the rock formation we're drilling through.
Then I have some completion parameters
and this is factors that we can change while we're drilling as well,
some of these we can have influence on.
If we wanted to go through and this dataset only has 5,000 rows.
When talking to Russ in starting to prep this talk, he said just go ahead
and run some model screening and see what type of model fits this data best.
To do that, we're going to go ahead and go under the Analyze menu,
go to Predictive Model and hit Model Screening.
I'm going to put my response, which is that production,
take all of our factors, location, geology and completion parameters,
put those into the X and grab my Validation and put it into Validation.
Down here we have all sorts of different options on types of models we can run.
We can pick and choose which ones maybe make sense
or don't make sense for this type of data.
We can pick out some different modeling options for our linear models.
Even, like Russ mentioned here, if we don't have enough data
to hold back and do our validation.
That way we can utilize a k-fold cross validation in here.
Now to save some time, I've gone ahead and run this already so you don't have
to watch JMP create all these models.
Here's the results.
For this data, you can see
that these tree based methods: Boosted Tree, Bootstrap f orest and XGB oost
all did very well at fitting the data, compared to some of the other techniques.
We could go through and run several of these,
but for this one, I'm going to just pick
the Boosted Tree since it had the best RS quare and root average square error
for this dataset.
We'll go ahead and run that.
After we've run the screening, we're going to go ahead and pick a model
or a couple of models that fit well and just run them.
Al right, so here's the overall fit in this case.
Depending on what type of data you're looking at,
maybe an RS quare of .5 is great, maybe an RSquare of .5 is not so great.
Just depending on what type of data you have,
you can judge if this is a good enough model or not.
Now that I have this, I want to answer that first question.
Knowing a few parameters going in,
what can I expect my production level to be?
An easy way to do that inside a JMP with any type model is with this profiler.
Okay, so we have the profiler here,
we have all of the factors that were included in the model,
and we have what our expected 12 month production to be.
Here I can adjust if I know my certain location.
I know the latitude and longitude going in
maybe I know some of these geologic parameters.
I can maybe adjust several of these factors
and figure out of the completion parameters
and figure out a way to optimize this.
But I think here with a lot of factors, this can be complex.
Let's talk about the second question,
where we were wondering which one of these factors was having the biggest influence.
You can see based on which of these lines are flatter
or have more shape to them, what is the biggest influence.
But let's JMP do that for us.
Under Assess Variable Importance,
I'm going to just let JMP go through
and pick the factors that are most important.
Here you can see it's outlined the most important
down to ones that are less important.
I like this feature colorize the profiler.
Now it's highlighted the most important factors
and gone down to the least important factors.
Again, I can adjust these and see, oh, it looks like
maybe adjusting the depth of the well,
adding some more [inaudible 00:15:51], it might improve my production.
That is the way we could do this but we have a new way of looking
at the impact of each one of these factors on a certain well.
We can launch that under the red triangle in JMP 17 and Shapley values.
I can set my options or save out the Shapley values.
Once I do that, it will create new columns in my data table
that save out the contributions from each one of those factors.
This is where I'm going to let Laura talk to some Shapley values.
I just wanted going to talk briefly about what Shapley values are
and how we use them.
Shapley values are a model agnostic method for explaining model predictions
and they are really helpful for Black box models
that are really hard to interpret or explain.
The method comes from cooperative game theory
and I don't have time to talk about the background
or the math behind the computations, but we have a reference
at the bottom of the slide and if you Google it, you should be able to find
a lot of references to learn more if you're interested.
What these Shapley values do,
they tell you how much each input variable
is contributing to an individual prediction for a model.
That's away from the average predicted value
across the input dataset
and your input dataset is going to come from your training data.
Shapley values are additive,
which makes them really nice and easy to interpret and understand.
Every prediction can be written as a sum of your Shapley values
plus that average predicted value,
which we refer to as the Shapley intercept in JMP.
They can be computationally intensive
to compute if you have a lot of input values, input variables in your model
or if you're trying to create Shapley values for a lot of predictions.
We try to give some options for helping to reduce time in JMP .
These Shapley values, as Peter mentioned,
were added to the prediction profiler for quite a few of the models in JMP Pro17
and they're also available in the graph profiler.
They're available for Fit Least S quares Nominal Logistic, O rdinal Logistics,
Neural, Gen Reg, P artition, Bootstrap Forest and Boosted Tree.
They're also available if you have the XB oost Add -In.
Except in that Add-I n, they're available from the model menu
and not from the prediction profiler.
Okay, next slide.
In this slide I want to just look at some of the predictions from Peter's model.
This is from a model using five input variables.
These are stack bar charts of the first three predictions
coming from the first three rows of his data.
On the left you see a stack bar chart of the Shapley values for the first row.
That first prediction is 11.184 production barrels
in hundreds of thousands.
Each color in that bar graph is divided out by the different input variables.
Inside the bars are the Shapley values.
If you add up all of those values, plus the Shapley intercept
that I have in the middle of the graph, you get that prediction value.
This is showing you that first of all, all of these are making
positive contributions to the production and they show you how much, so the size,
I can see that longitude and proppant
are contributing the most for this particular prediction.
Then if I look to the right side to the third prediction,
which is 2.916 production barrels and hundreds of thousands,
I can see that two of my input variables
are contributing positively to my production
and three of them are having negative contributions,
the bottom three here.
You can use graphs like this to help visualize your Shapley Values .
That helps you really understand
these individual predictions.
Next slide.
This is just one of many types of graphs
you can create.
The Shapley values get saved into your data table.
You can manipulate them
and create all kinds of graphs in Graph Builder and JMP.
This graph is just a graph of all the Shapley values
from over 5,000 rows of the data split out by each input variable.
It just gives you an idea of the contributions of those variables,
both positive and negatively to the predictions.
Now I'm going to hand it back over to Peter.
Great. Thanks, Laura.
I think now we'll go ahead and transition to our second case study
that Florian is going to do.
Should I pass the screen share back to you?
Yeah, that would be great.
Make the transition.
Thanks for this first case study and thanks for the contributions.
Really interesting.
I hope we can bring some light onto a different kind of application
with our second case study.
I have given it the subtitle, Was it Torque?
Because that's a question we'll have hopefully answered
by the end of the second case study presentation.
This second case study is about predictive maintenance
and the particular aspects of why it is important to understand
your models in this scenario.
Most likely everybody can think that it's very important
to have a sense for when machines require maintenance.
Because if machines fail, then that's a lot of trouble,
a lot of costs, when plants have to shut down and so on.
It's really necessary to do preventative maintenance
to keep systems running.
A major task in this is to determine
when the maintenance should be performed
and not too early, not too late, certainly.
Therefore, it's a task to find a balance
which limits failures and also saves costs on maintenance.
There's a case study that we're using to highlight some functions features
and it's actually a synthetic data set
which comes from a published study.
The source is down there at the bottom.
You can find it.
It was published in the AI for Industry event in 2020.
The basic content of this dataset
is that it has six different features of process settings,
which are product or product type
which denotes for different quality variants.
Then we have air temperature,
process temperature, rotational speed, torque, and tool wear.
We have one main response and that is whether the machine fails or not.
When we think of questions
that we could answer with a model or models or generally data.
There's several that come to mind.
Now, the most obvious question
is probably how we can explain and interpret settings,
which likely lead to machine failure.
This is something that [inaudible 00:24:38]
to create and compare multiple models
and then choose the one that's most suitable.
Now, in this particular setting where we want to predict
whether a machine fails or not.
We also have to account for misclassifications
that is either a false positive or a false negative prediction.
With JMP's decision threshold graphs
and the profit matrix, we can actually specify an emphasis
or importance to which outcome is less desirable.
For example,
it is typically less desirable to actually have a failure
when the model didn't predict one
compared to the opposite, misclassification.
Then besides the binary classification, of course,
you'd be also interested in understanding what drives failure typically.
There are certainly several ways to deal with this question.
I think visualization is always a part of it.
But when we're using models we can consider,
for example, this self explaining models
like decision trees or we can use built- in functionality
like the prediction profiler and the variable importance feature.
The last point here
when we investigate and rate which factors are most important
for the predictive outcome,
we assume that there is an underlying behavior.
The most important factor is XYZ,
but we do not know which factor actually
has contributed to what extent to an individual prediction.
A gain, Shapley values are a very helpful
addition that can allow us to understand the contribution
for each of the factors in individual prediction.
On a general level,
now, let's take a look into three specific questions
and how we can answer those with the software.
The first one is how do we adjust predictive model
with respect to the high importance of omitting false negative predictions?
This assumes a little bit that we've already done a first step
because we've already seen model screening and how we can get there.
I'm starting one step ahead.
Let's move into JMP to actually take a look at this.
We see the dataset, we can see it's fairly small,
not too many columns.
It looks very simple.
We only have these few predictors and there's some more columns.
There's also a validation column that I've added,
but it's not shown here.
As for the first question,
let's assume we have already done the model screening.
Again, this is accessible
on the analyzed predictive model screening
where we don't specify what we want to predict
and the factors that we want to investigate.
Again, I have already prepared this.
We have an outcome that looks like this.
It looks a little bit different than in the first use case
because now we have this binary outcome
and so we have some different measures
that we can use to compare.
But again, what's important is that we have an overview
of which of the methods are performing better than other ones.
As we said, in order to now improve
the model and emphasize on omitting these
false negative predictions.
Let's just pick the one and see what we can do here.
Let's maybe even pick the first three here,
so we can just do that by holding the control key.
Another feature that will help us here
is called decision threshold
and it's located in the red triangle decision threshold.
The decision threshold gives us several contents.
We have these graphs here, these shows the actual data points.
We have this confusion matrix
and we have some additional graphs and matrix,
but we will focus on the upper part here.
Let's actually take a look at the test portion of the set.
When we take a look at this,
we can see that we have different types of outcomes.
The default of this probabilities threshold
is the middle, which would be here at .5.
We have now several options to see and optimize this model
and how effective it is with respect to the confusion matrix.
Confusion matrix, we can see the predicted value
and whether that actually was true or not.
If we look at when o failure is predicted,
we can see that here, with this setting,
we actually have quite a high number of failures,
even though there were no predicted.
Now we can interactively explore how adjusting this threshold
actually affects the accuracy of the model or the misclassification rates.
Or in some cases, we can also put
an emphasis on what's really worse than an other failure.
We can do this with the so called profit matrix.
If we go here, we can set a value
on which of the misclassifications is actually worse than the other one.
In this case, we really do not want
to have a prediction of no failure
when there actually is a failure happening.
We would put something like 20 times.
More importantly, we do not get this misclassification
and we set it and hit okay,
and then it will automatically update the graph
and then we can see that the values for the misclassification
have dropped now in each of the models
and we can use this as an additional tool
to select a model that's maybe most appropriate.
That's for the first question of how we can adjust a predictive model
with respect to the higher importance of omitting false negative predictions.
Now, another question here is also when we think of maintenance
and where we put our efforts into maintenance,
is how can we identify and communicate
the overall importance of predictors?
What factors are driving the system, the failures?
Let's go back to the data table to say that first,
I personally like visual and simplistic ones.
One of them that I like to use is stuff like the parallel plot.
Because it's really a nice overview summarizing
where the failures group and which parameters settings and so on.
On the modeling and machine learning side,
there's a few other options that we can actually use.
One that I like because it's very crisp and clear,
is the predictor screening.
Predictor screening gives us very compact output
about what is important and it's very easy to do
and it's under analyzed screening,
predictor screening.
A ll we need to do is say what we want to understand
and then specify the parameters
that we want to use for this.
Click okay, and then it recalculates
and we have this output.
For me, it's a really nice thing
because as I said, crisp and clear and consuming.
But we've talked about this before
and Russ, when we're working with models particularly,
do you have any other suggestion or do you have anything to add
to my approach to understanding the factors
of which predictors are important.
Yes, it is a good thing to try.
As I mentioned earlier, you got to be really careful
about overfitting.
I tend to work with a lot of these wide problems,
say from Genomics and other applications,
where you might even have
many more predictors than you have observations.
In such a case, if you were to run predictor screening,
say maybe pick the top 10 best
and then go turn right around and fit a new model
with those 10 only, you've actually just set yourself up
for overfitting if you did the predictor screening on the entire data set.
That's the scenario I'm concerned about.
It's an easy trap to fall into,
because you think you're just filtering things down,
but you've got to reuse the same data twice.
The danger would be if you were to plug then apply that model
to some new data, it likely won't do nearly as well.
If you're in the game where you want to reduce predictors,
I tend to like to prefer to do it within each fold of a K-fold.
The drawback of that is you'll get
a different set every time, but you can't aggregate those things.
If you got a certain predictor
that's just showing up consistently across folds.
It's very good evidence of that. That's a very important one.
I expect that's what would happen in this case with, say, torque.
Even if you were to do this exercise,
say 10 times with 10 different folds, you'd likely get a pretty similar ranking,
but it's more of a subtlety
but again, a danger that you have to watch out for.
Job can make it a little bit easier just because things are so quick and clean,
like you mentioned, that you might fall into that trap if you're not careful.
Yeah, that's very valuable addition to this approach.
Just accompanying to this additional information,
there's also the other option that we have,
particularly when we have already
gone through the process of creating a model where we can then
actually again, use the prediction profiler and the variable importance.
It's another way where we can assess
which of the variables have the higher importance.
Russ, do you want to say word on that also
in contrast maybe to the predictor screening?
Yeah. Honestly, Vogt, I like the importance were it a little better.
Just dive right into the modeling.
Again, I would prefer with K-fold.
Then you can just use the variable importance measures,
which are often really informative directly.
They're very similar. In fact, predictor screening,
I believe, it's just calling bootstrap forest in the background
and collecting the most important variables.
It's basically the same thing.
Then following up with the profiler, which can be excellent for seeing exactly
how certain variables are marginally affecting the response,
and then drilling that even further with Shapley
to be able to break down individual predictions
into their components.
To me, it's a very compelling and interesting way
to dive into a predictive model
and understand what's really going on with it,
kind of unpacking the black box
and letting you see what's really happening.
Yeah, thanks.
I think that's the whole point,
making it understandable and making it consumable
besides, of course, actually getting to the results,
which is understanding which factors are influencing the outcome.
Thanks.
Now, I have one more question, and you've already mentioned it.
When we score new data, in particular,
what can we do to identify
which predictors have actually influenced the model outcome?
Now, with what we have done so far,
we have gained a good understanding of the system and know
Which of the factors are the most dominant and we can even derive operating ranges.
If the system changes, what if a different factor actually drives a failure?
Then as it would be expected in this case and we have talked to Laura beforehand,
and Shapley V alues again are a great addition that will help us to interpret.
we've seen how we can generate them,
and you've learned on which platforms they'll be available.
Now, the output that you get when you save out Shapley Values is,
for example, also a graph that shows us per actual per row.
In this case,
we have 10,000 rows in the data tab, so we have 10,000 stack bar charts,
and then we can already see that besides the common pattern,
there's also times when there's actually other influencing factors
that drive the decision of the model.
It's really a helpful tool to not only raise an individual prediction,
but also add on to that what Russ just said,
understanding of the system, which factors contribute.
When we move a little bit on
in this understandable or exploratory path,
we can use these Shapley Values in different ways.
What I personally liked was the suggestion
to actually plot the Shapley value by their actual parameters setting,
because that allows us to identify areas of settings.
For example,
if we take rotational speed here, we can see that there's actually
areas of this parameter that tend to contribute a lot
in terms of the model outcome, but also in terms of the actual failure.
That also helps us in getting more understanding
with respect to the actual problem of machine failure and what's causing it,
and also with respect to why the model predicts something.
Now, finally, I like to answer the question.
When we take these graphs of Shapley values,
and we have seen it before in several occasions,
stock is certainly a dominant factor.
But from all of these, I've just picked a few predictions,
and we can see that sometimes it stocks, sometimes it's not.
With the Shapley values, we have really a great way
to interpreting a specific prediction by the model.
All right, so those were the things we wanted to show.
I hope this gives some great insight
into how we can make AI models more explainable,
more understandable,
more easy to digest and to work with, because that's the intention here.
Yeah, I'd like to summarize a little bit.
Pete, maybe you want to come in and help me here.
I think what we're hoping to show is that
as these AI models become more and more prevalent
and are relied upon for decision making, that understanding, interpreting,
and being able to communicate those models is very important.
We hope that with these Shapley v alues,
with the variable importance, and with the profiler,
we've shown you a couple of ways that you can share share those results
and have them easily understandable.
That was the take- home there between that and being able to utilize
model screening, and things like that,
that hopefully, you found a few techniques
that will make this more understandable and less of a black box.
Yeah, I absolutely agree.
Just to summarize, I really like to thank Russ and Laura
for contributing here with your expertise.
Thanks, Pete. It was a pleasure.
Thanks, everybody for listening.
We're looking forward to having discussions and questions to answer.