Explainable AI: Unboxing the Blackbox (2022-US-45MP-1147)

Welcome to the talk E xplainable AI: Unboxing the Black box.

Let's introduce ourselves and let's start with Laura.

Hello, I'm Laura Lancaster, and I'm a S tatistical Developer in JMP

and I'm located in the Cary office.

Thanks.

What about you, Russ?

Hey, everyone. Russ Wolfinger.

I'm the Director of Life Sciences R&D in the JMP Group

and a Research fellow as well.

Looking forward to the talk today.

And Pete?

My name's Peter Hersch.

I'm part of the Global Technical Enablement T eam

and I'm located in Denver, Colorado.

Great and my name is Florian Vogt.

I'm Systems Engineer for the Chemical Team in Europe

and I'm located in beautiful Heidelberg, Germany.

Welcome to the talk.

AI is a hot topic at the moment and a lot of people want to do it.

But what does that mean for the industries?

Does it mean that scientists and engineers need to become coders,

for processes in the future run by data scientists?

A recent publication called Industrial Data Science - a review

of Machine Learning Applications for Chemical and Process Industries

explains industrial data science fundamentals,

reviews industrial applications using

state of the art machine learning techniques

and it points out some important aspects of industrial AI.

These are the accessibility of AI, the understandability of AI

and the consumability of AI, and in particular the output.

We'll show you some of the features that we think

are contributing to this topic very well in JMP.

Before we start into the program of today, let's briefly review

what AI encompasses and what our focus today is located on.

I've picked a source that actually separates it into four groups.

Those groups are: first, supporting AI also called Reactive Machines

and this aims at decision support.

The second group is called Augmenting AI

or Limited Theory and that focuses on process optimization

and the third group is Automating AI,

or Theory of Mind, which, as the name suggests, aims on automation,

and the fourth is called autonomous AI

or self aware AI, which encompasses autonomous optimizations.

Today's focus is really on the first and the second topic.

We had a brief discussion before.

Russ, what are your thoughts on these also with respect to what JMP can cover?

Well, certainly the term AI gets thrown around a lot.

It's used in many kind of different nuanced meetings.

I tend to prefer meetings that are

definitely more tangible and usable and more focused.

Like we're going to zoom in on today with some specific examples.

The terminology can get a little confusing though.

I guess I just tend to kind of keep it fairly broad, open mind

whenever anyone uses the term AI and try to infer its meaning from the context.

Right. That's in terms of introduction,

now we'll get a little bit more into the details and specifically into

why it is important to actually understand your AI models.

Over to you, Pete.

Perfect. Thanks, Florian.

I think what Russ was hitting on there and Florian's introduction is,

we oftentimes don't know what an AI model is telling us and what's under the hood.

When we're thinking about how well a model performs,

we think about how well that fits the data.

If we look here, we're looking at a neural network diagram

and as you can see, these can get pretty complex.

These AI models are becoming more and more prevalent

and relied upon for decision making.

Really, understanding why an AI model is making a certain decision,

what criteria it's basing that decision on,

is imperative to taking full advantage of these models.

When a model changes or updates,

especially with that autonomous AI or the automating AI,

we need to understand why.

We need to confirm that this model is maybe not

extrapolating or basing it on a few points outside of our normal operating range.

Hold on.

Let me steal the screen here from Florian,

and I'm going to go ahead and walk through a case study here.

All right, so this case study

is based on directional drilling from wells near Fort Worth, Texas.

The idea with this type of drilling is

unlike conventional wells, where you would just go vertically,

you go down a certain depth, and then you start going horizontal.

The idea is these are much more efficient than the traditional wells.

You have these areas of trapped oil and gas that you can

get at with some special completion parameters.

We're looking at the data here from these wells,

and we're trying to figure out what are the most important factors,

including the geology, the location, and the completion factors,

and can we optimize these factors to increase or optimize our well production?

To give you an idea, here's a map of that basin,

so like I mentioned, this is Fort Worth, Texas.

You can see we have wells all around this.

We have certain areas where our yearly production is higher,

others where it's lower.

We wanted to ask a few questions looking at this data.

What factors have the biggest influence on production?

If we know certain levels for a new well,

can we predict what our production will be?

Is there a way to alter our factors, maybe some of the completion parameters

and impact our production?

We're going to go ahead and answer some questions with a model.

But before we get into that, I wanted to ask Russ,

since he's got lots of experience with this.

When you're starting to dig into data, Russ,

what's the best place to start and why?

Well, I guess maybe I'm biased, but I love JMP

for this type of application, Pete,

just because it's so well suited for quick exploratory data analysis.

You want to get a feel for what target you're trying to predict

and the predictors you're about to use, looking at their distributions,

checking for outliers or any unusual patterns in the data.

You may even want to do some quick pattern discovery clustering

or PCA type analysis just to get a feel for any structure that's in the data.

Then also be thinking carefully about

what performance metric would make the most sense for the application at hand.

Typically the common one, obviously for continuous predictors would be

like root means square error, but there could be cases where

may be that's not quite appropriate, especially if there's

direct cost involved, sometimes absolute error is more relevant

for a true profit- loss type decision.

These are all things that you want to start thinking about

as well as how you're going to validate your model.

I'm a big fan of k-fold cross validation.

Where you split your data into distinct subsets and hold one out

and being very careful about not allowing leakage or also careful about overfitting.

These are all concerns that tend

to come top of mind for me when I start out with a new problem.

Perfect. Thanks, Russ .

I'm going to walk through this in JMP

some of the tools we can use to start looking at our problem.

Then we're going to cover some of the things that help us determine

which of these factors are having the biggest impact on well production.

I'm going to show Variable Importance and then Shapley Values

and we'll have Laura talk to that and how we do that.

But first, let's go ahead and look at this data inside a JMP.

Like we mentioned here, I have my production from these wells.

I have some location parameters so where it is latitude and longitude,

I have some geologic parameters.

This is about the rock formation we're drilling through.

Then I have some completion parameters

and this is factors that we can change while we're drilling as well,

some of these we can have influence on.

If we wanted to go through and this dataset only has 5,000 rows.

When talking to Russ in starting to prep this talk, he said just go ahead

and run some model screening and see what type of model fits this data best.

To do that, we're going to go ahead and go under the Analyze menu,

go to Predictive Model and hit Model Screening.

I'm going to put my response, which is that production,

take all of our factors, location, geology and completion parameters,

put those into the X and grab my Validation and put it into Validation.

Down here we have all sorts of different options on types of models we can run.

We can pick and choose which ones maybe make sense

or don't make sense for this type of data.

We can pick out some different modeling options for our linear models.

Even, like Russ mentioned here, if we don't have enough data

to hold back and do our validation.

That way we can utilize a k-fold cross validation in here.

Now to save some time, I've gone ahead and run this already so you don't have

to watch JMP create all these models.

Here's the results.

For this data, you can see

that these tree based methods: Boosted Tree, Bootstrap f orest and XGB oost

all did very well at fitting the data, compared to some of the other techniques.

We could go through and run several of these,

but for this one, I'm going to just pick

the Boosted Tree since it had the best RS quare and root average square error

for this dataset.

We'll go ahead and run that.

After we've run the screening, we're going to go ahead and pick a model

or a couple of models that fit well and just run them.

Al right, so here's the overall fit in this case.

Depending on what type of data you're looking at,

maybe an RS quare of .5 is great, maybe an RSquare of .5 is not so great.

Just depending on what type of data you have,

you can judge if this is a good enough model or not.

Now that I have this, I want to answer that first question.

Knowing a few parameters going in,

what can I expect my production level to be?

An easy way to do that inside a JMP with any type model is with this profiler.

Okay, so we have the profiler here,

we have all of the factors that were included in the model,

and we have what our expected 12 month production to be.

Here I can adjust if I know my certain location.

I know the latitude and longitude going in

maybe I know some of these geologic parameters.

I can maybe adjust several of these factors

and figure out of the completion parameters

and figure out a way to optimize this.

But I think here with a lot of factors, this can be complex.

Let's talk about the second question,

where we were wondering which one of these factors was having the biggest influence.

You can see based on which of these lines are flatter

or have more shape to them, what is the biggest influence.

But let's JMP do that for us.

Under Assess Variable Importance,

I'm going to just let JMP go through

and pick the factors that are most important.

Here you can see it's outlined the most important

down to ones that are less important.

I like this feature colorize the profiler.

Now it's highlighted the most important factors

and gone down to the least important factors.

Again, I can adjust these and see, oh, it looks like

maybe adjusting the depth of the well,

adding some more [inaudible 00:15:51], it might improve my production.

That is the way we could do this but we have a new way of looking

at the impact of each one of these factors on a certain well.

We can launch that under the red triangle in JMP 17 and Shapley values.

I can set my options or save out the Shapley values.

Once I do that, it will create new columns in my data table

that save out the contributions from each one of those factors.

This is where I'm going to let Laura talk to some Shapley values.

I just wanted going to talk briefly about what Shapley values are

and how we use them.

Shapley values are a model agnostic method for explaining model predictions

and they are really helpful for Black box models

that are really hard to interpret or explain.

The method comes from cooperative game theory

and I don't have time to talk about the background

or the math behind the computations, but we have a reference

at the bottom of the slide and if you Google it, you should be able to find

a lot of references to learn more if you're interested.

What these Shapley values do,

they tell you how much each input variable

is contributing to an individual prediction for a model.

That's away from the average predicted value

across the input dataset

and your input dataset is going to come from your training data.

Shapley values are additive,

which makes them really nice and easy to interpret and understand.

Every prediction can be written as a sum of your Shapley values

plus that average predicted value,

which we refer to as the Shapley intercept in JMP.

They can be computationally intensive

to compute if you have a lot of input values, input variables in your model

or if you're trying to create Shapley values for a lot of predictions.

We try to give some options for helping to reduce time in JMP .

These Shapley values, as Peter mentioned,

were added to the prediction profiler for quite a few of the models in JMP Pro17

and they're also available in the graph profiler.

They're available for Fit Least S quares Nominal Logistic, O rdinal Logistics,

Neural, Gen Reg, P artition, Bootstrap Forest and Boosted Tree.

They're also available if you have the XB oost Add -In.

Except in that Add-I n, they're available from the model menu

and not from the prediction profiler.

Okay, next slide.

In this slide I want to just look at some of the predictions from Peter's model.

This is from a model using five input variables.

These are stack bar charts of the first three predictions

coming from the first three rows of his data.

On the left you see a stack bar chart of the Shapley values for the first row.

That first prediction is 11.184 production barrels

in hundreds of thousands.

Each color in that bar graph is divided out by the different input variables.

Inside the bars are the Shapley values.

If you add up all of those values, plus the Shapley intercept

that I have in the middle of the graph, you get that prediction value.

This is showing you that first of all, all of these are making

positive contributions to the production and they show you how much, so the size,

I can see that longitude and proppant

are contributing the most for this particular prediction.

Then if I look to the right side to the third prediction,

which is 2.916 production barrels and hundreds of thousands,

I can see that two of my input variables

are contributing positively to my production

and three of them are having negative contributions,

the bottom three here.

You can use graphs like this to help visualize your Shapley Values .

That helps you really understand

these individual predictions.

Next slide.

This is just one of many types of graphs

you can create.

The Shapley values get saved into your data table.

You can manipulate them

and create all kinds of graphs in Graph Builder and JMP.

This graph is just a graph of all the Shapley values

from over 5,000 rows of the data split out by each input variable.

It just gives you an idea of the contributions of those variables,

both positive and negatively to the predictions.

Now I'm going to hand it back over to Peter.

Great. Thanks, Laura.

I think now we'll go ahead and transition to our second case study

that Florian is going to do.

Should I pass the screen share back to you?

Yeah, that would be great.

Make the transition.

Thanks for this first case study and thanks for the contributions.

Really interesting.

I hope we can bring some light onto a different kind of application

with our second case study.

I have given it the subtitle, Was it Torque?

Because that's a question we'll have hopefully answered

by the end of the second case study presentation.

This second case study is about predictive maintenance

and the particular aspects of why it is important to understand

your models in this scenario.

Most likely everybody can think that it's very important

to have a sense for when machines require maintenance.

Because if machines fail, then that's a lot of trouble,

a lot of costs, when plants have to shut down and so on.

It's really necessary to do preventative maintenance

to keep systems running.

A major task in this is to determine

when the maintenance should be performed

and not too early, not too late, certainly.

Therefore, it's a task to find a balance

which limits failures and also saves costs on maintenance.

There's a case study that we're using to highlight some functions features

and it's actually a synthetic data set

which comes from a published study.

The source is down there at the bottom.

You can find it.

It was published in the AI for Industry event in 2020.

The basic content of this dataset

is that it has six different features of process settings,

which are product or product type

which denotes for different quality variants.

Then we have air temperature,

process temperature, rotational speed, torque, and tool wear.

We have one main response and that is whether the machine fails or not.

When we think of questions

that we could answer with a model or models or generally data.

There's several that come to mind.

Now, the most obvious question

is probably how we can explain and interpret settings,

which likely lead to machine failure.

This is something that [inaudible 00:24:38]

to create and compare multiple models

and then choose the one that's most suitable.

Now, in this particular setting where we want to predict

whether a machine fails or not.

We also have to account for misclassifications

that is either a false positive or a false negative prediction.

With JMP's decision threshold graphs

and the profit matrix, we can actually specify an emphasis

or importance to which outcome is less desirable.

For example,

it is typically less desirable to actually have a failure

when the model didn't predict one

compared to the opposite, misclassification.

Then besides the binary classification, of course,

you'd be also interested in understanding what drives failure typically.

There are certainly several ways to deal with this question.

I think visualization is always a part of it.

But when we're using models we can consider,

for example, this self explaining models

like decision trees or we can use built- in functionality

like the prediction profiler and the variable importance feature.

The last point here

when we investigate and rate which factors are most important

for the predictive outcome,

we assume that there is an underlying behavior.

The most important factor is XYZ,

but we do not know which factor actually

has contributed to what extent to an individual prediction.

A gain, Shapley values are a very helpful

addition that can allow us to understand the contribution

for each of the factors in individual prediction.

On a general level,

now, let's take a look into three specific questions

and how we can answer those with the software.

The first one is how do we adjust predictive model

with respect to the high importance of omitting false negative predictions?

This assumes a little bit that we've already done a first step

because we've already seen model screening and how we can get there.

I'm starting one step ahead.

Let's move into JMP to actually take a look at this.

We see the dataset, we can see it's fairly small,

not too many columns.

It looks very simple.

We only have these few predictors and there's some more columns.

There's also a validation column that I've added,

but it's not shown here.

As for the first question,

let's assume we have already done the model screening.

Again, this is accessible

on the analyzed predictive model screening

where we don't specify what we want to predict

and the factors that we want to investigate.

Again, I have already prepared this.

We have an outcome that looks like this.

It looks a little bit different than in the first use case

because now we have this binary outcome

and so we have some different measures

that we can use to compare.

But again, what's important is that we have an overview

of which of the methods are performing better than other ones.

As we said, in order to now improve

the model and emphasize on omitting these

false negative predictions.

Let's just pick the one and see what we can do here.

Let's maybe even pick the first three here,

so we can just do that by holding the control key.

Another feature that will help us here

is called decision threshold

and it's located in the red triangle decision threshold.

The decision threshold gives us several contents.

We have these graphs here, these shows the actual data points.

We have this confusion matrix

and we have some additional graphs and matrix,

but we will focus on the upper part here.

Let's actually take a look at the test portion of the set.

When we take a look at this,

we can see that we have different types of outcomes.

The default of this probabilities threshold

is the middle, which would be here at .5.

We have now several options to see and optimize this model

and how effective it is with respect to the confusion matrix.

Confusion matrix, we can see the predicted value

and whether that actually was true or not.

If we look at when o failure is predicted,

we can see that here, with this setting,

we actually have quite a high number of failures,

even though there were no predicted.

Now we can interactively explore how adjusting this threshold

actually affects the accuracy of the model or the misclassification rates.

Or in some cases, we can also put

an emphasis on what's really worse than an other failure.

We can do this with the so called profit matrix.

If we go here, we can set a value

on which of the misclassifications is actually worse than the other one.

In this case, we really do not want

to have a prediction of no failure

when there actually is a failure happening.

We would put something like 20 times.

More importantly, we do not get this misclassification

and we set it and hit okay,

and then it will automatically update the graph

and then we can see that the values for the misclassification

have dropped now in each of the models

and we can use this as an additional tool

to select a model that's maybe most appropriate.

That's for the first question of how we can adjust a predictive model

with respect to the higher importance of omitting false negative predictions.

Now, another question here is also when we think of maintenance

and where we put our efforts into maintenance,

is how can we identify and communicate

the overall importance of predictors?

What factors are driving the system, the failures?

Let's go back to the data table to say that first,

I personally like visual and simplistic ones.

One of them that I like to use is stuff like the parallel plot.

Because it's really a nice overview summarizing

where the failures group and which parameters settings and so on.

On the modeling and machine learning side,

there's a few other options that we can actually use.

One that I like because it's very crisp and clear,

is the predictor screening.

Predictor screening gives us very compact output

about what is important and it's very easy to do

and it's under analyzed screening,

predictor screening.

A ll we need to do is say what we want to understand

and then specify the parameters

that we want to use for this.

Click okay, and then it recalculates

and we have this output.

For me, it's a really nice thing

because as I said, crisp and clear and consuming.

But we've talked about this before

and Russ, when we're working with models particularly,

do you have any other suggestion or do you have anything to add

to my approach to understanding the factors

of which predictors are important.

Yes, it is a good thing to try.

As I mentioned earlier, you got to be really careful

about overfitting.

I tend to work with a lot of these wide problems,

say from Genomics and other applications,

where you might even have

many more predictors than you have observations.

In such a case, if you were to run predictor screening,

say maybe pick the top 10 best

and then go turn right around and fit a new model

with those 10 only, you've actually just set yourself up

for overfitting if you did the predictor screening on the entire data set.

That's the scenario I'm concerned about.

It's an easy trap to fall into,

because you think you're just filtering things down,

but you've got to reuse the same data twice.

The danger would be if you were to plug then apply that model

to some new data, it likely won't do nearly as well.

If you're in the game where you want to reduce predictors,

I tend to like to prefer to do it within each fold of a K-fold.

The drawback of that is you'll get

a different set every time, but you can't aggregate those things.

If you got a certain predictor

that's just showing up consistently across folds.

It's very good evidence of that. That's a very important one.

I expect that's what would happen in this case with, say, torque.

Even if you were to do this exercise,

say 10 times with 10 different folds, you'd likely get a pretty similar ranking,

but it's more of a subtlety

but again, a danger that you have to watch out for.

Job can make it a little bit easier just because things are so quick and clean,

like you mentioned, that you might fall into that trap if you're not careful.

Yeah, that's very valuable addition to this approach.

Just accompanying to this additional information,

there's also the other option that we have,

particularly when we have already

gone through the process of creating a model where we can then

actually again, use the prediction profiler and the variable importance.

It's another way where we can assess

which of the variables have the higher importance.

Russ, do you want to say word on that also

in contrast maybe to the predictor screening?

Yeah. Honestly, Vogt, I like the importance were it a little better.

Just dive right into the modeling.

Again, I would prefer with K-fold.

Then you can just use the variable importance measures,

which are often really informative directly.

They're very similar. In fact, predictor screening,

I believe, it's just calling bootstrap forest in the background

and collecting the most important variables.

It's basically the same thing.

Then following up with the profiler, which can be excellent for seeing exactly

how certain variables are marginally affecting the response,

and then drilling that even further with Shapley

to be able to break down individual predictions

into their components.

To me, it's a very compelling and interesting way

to dive into a predictive model

and understand what's really going on with it,

kind of unpacking the black box

and letting you see what's really happening.

Yeah, thanks.

I think that's the whole point,

making it understandable and making it consumable

besides, of course, actually getting to the results,

which is understanding which factors are influencing the outcome.

Thanks.

Now, I have one more question, and you've already mentioned it.

When we score new data, in particular,

what can we do to identify

which predictors have actually influenced the model outcome?

Now, with what we have done so far,

we have gained a good understanding of the system and know

Which of the factors are the most dominant and we can even derive operating ranges.

If the system changes, what if a different factor actually drives a failure?

Then as it would be expected in this case and we have talked to Laura beforehand,

and Shapley V alues again are a great addition that will help us to interpret.

we've seen how we can generate them,

and you've learned on which platforms they'll be available.

Now, the output that you get when you save out Shapley Values is,

for example, also a graph that shows us per actual per row.

In this case,

we have 10,000 rows in the data tab, so we have 10,000 stack bar charts,

and then we can already see that besides the common pattern,

there's also times when there's actually other influencing factors

that drive the decision of the model.

It's really a helpful tool to not only raise an individual prediction,

but also add on to that what Russ just said,

understanding of the system, which factors contribute.

When we move a little bit on

in this understandable or exploratory path,

we can use these Shapley Values in different ways.

What I personally liked was the suggestion

to actually plot the Shapley value by their actual parameters setting,

because that allows us to identify areas of settings.

For example,

if we take rotational speed here, we can see that there's actually

areas of this parameter that tend to contribute a lot

in terms of the model outcome, but also in terms of the actual failure.

That also helps us in getting more understanding

with respect to the actual problem of machine failure and what's causing it,

and also with respect to why the model predicts something.

Now, finally, I like to answer the question.

When we take these graphs of Shapley values,

and we have seen it before in several occasions,

stock is certainly a dominant factor.

But from all of these, I've just picked a few predictions,

and we can see that sometimes it stocks, sometimes it's not.

With the Shapley values, we have really a great way

to interpreting a specific prediction by the model.

All right, so those were the things we wanted to show.

I hope this gives some great insight

into how we can make AI models more explainable,

more understandable,

more easy to digest and to work with, because that's the intention here.

Yeah, I'd like to summarize a little bit.

Pete, maybe you want to come in and help me here.

I think what we're hoping to show is that

as these AI models become more and more prevalent

and are relied upon for decision making, that understanding, interpreting,

and being able to communicate those models is very important.

We hope that with these Shapley v alues,

with the variable importance, and with the profiler,

we've shown you a couple of ways that you can share share those results

and have them easily understandable.

That was the take- home there between that and being able to utilize

model screening, and things like that,

that hopefully, you found a few techniques

that will make this more understandable and less of a black box.

Yeah, I absolutely agree.

Just to summarize, I really like to thank Russ and Laura

for contributing here with your expertise.

Thanks, Pete. It was a pleasure.

Thanks, everybody for listening.

We're looking forward to having discussions and questions to answer.

Explainable AI: Unboxing the Blackbox (2022-US-45MP-1147)

Presenter