Choose Language Hide Translation Bar

SVEM: Machine Learning Applied to DOE (2021-US-45MP-847)

Level: Intermediate

 

Peter Hersh, JMP Senior Systems Engineer, SAS
Trent Lemkus, Doctoral Student, University of New Hampshire
Phil Kay, Learning Manager, JMP

 

We design experiments to understand our processes and systems: to know what input factors we need to control and how we need to control them to consistently achieve our desired outcomes. Statistical design of experiments (DOE) is the most effective way to build models that yield this understanding. With DOE we can maximize our understanding from our experimental effort.

But to use these methods, we often need to accept certain assumptions (including effect sparsity, effect heredity, and active/inactive effects) that don’t always sit comfortably with our knowledge and experience of the domain. Can machine learning approaches overcome some of these challenges and provide a richer understanding from experiments without extra cost? Could this be a revolution in the design and analysis of experiments?

This talk explores the innovative method of self-validated ensemble modeling (SVEM). With animated visuals, you can see how fractional weighting of small data sets enables self-validation in model fitting for designed experiments. You can also see the results of rigorous simulation studies that demonstrate the predictive robustness of ensemble models built using this validation method.  And case study examples are presented to demonstrate how scientists and engineers can get improved understanding from their experiments without extra effort using SVEM.

 

 

Auto-generated transcript...

 

Speaker

Transcript

Jordan Hiller going out.
Phil Kay Okay, well, thank you for joining us. I'm Phil Kay. I'm the learning manager at JMP and I'm going to kick off this presentation with an interactive example of SVEM, which is self-validated ensemble modeling.
I'm going to be going into the what SVEM is and and the mechanism of how it works and then Pete and Trent will be sharing examples,
case studies and simulations, which really explain why you would want to do this. I'm just going to be presenting this on a
very simple case study, so it might not seem initially why...it may not seem clear why you'd want to do this, but hopefully you'll understand the mechanism after I've explained this.
Okay, so I'm going to stop my video now and just focus on the software demonstration here.
We've got a JMP journal here, which we're sharing with you, so you can access this from the post in the JMP community at Community.JMP.com.
And the example we will use is a classic DOE example. This is the bounce data, which you can get from the help files in JMP.
So it's three factors, 15 runs. It's a Box-Behnken design and we've got a response there, which is stretch, so really quite straightforward, standard,
Ttaditional old-fashioned DOE.
And there's no real challenge with modeling this data, and we see if we fit a response surface model that this curvilinear behaviors, this quadratic effects and those interactions between factors as well that turn out to be important.
So let's see how we might model this using the SVEM approach.
So if we're going to model this using SVEM, the first thing that you'll notice is that we have 30 rows of data, so we've actually duplicated each row of data, which may seem a little bit crazy. It may seem like cheating. This is a key trick in the SVEM methodology.
You'll see we've got some additional columns here, and the key one is this validation weight. So let's just plot that so you can see what it means.
So you can see that, for each of our rows of data, each of our runs, we've actually got pairs of runs, so run number one, you can see here.
We're actually weighting each of these runs into either the training set, which is in zero, or the validation set, which is one. And you can see that for each pair
those weightings are inversely proportional. So, for example, run number seven here, you can see that that's got a heavy weighting in training, zero validation set, and it's got a low weighting correspondingly in the validation set.
Now those have been randomly assigned and we can...
by clicking on here, we can redraw those, so we can reassign the weights. And again if a row has a heavy weight in the training set, so for example, row number six here has a heavy weight in the training set, it will have a correspondingly low weighting in the validation set.
So we're doubling up our rows of data here.
And we're going to use this for self-validation. So what we can actually do is, we can use a variant on hold back validation, where
instead of holding back completely a row of data into the validation set, what we're going to do is hold back portions of each row of data, such that each row can actually be within the training set and the validation set and, importantly, we can reassign those weights
repeatedly, which will do when it comes to the ensemble modeling.
Okay, so just to summarize we've ...we've reduplicated basically each row of data and we're assigning weights to those such that some rows are going to be more in the training and some are going to be more in the validation set.
So now we can fit a model using validation.
So we're using GenReg here, and you can see that we're using a validation set (and just ignore the error message there).
We're using Lasso and we're using validation as model selection. And, in this case, it's actually not selected any active effects in the model. So it's fit a response surface model and we've used the validation as a method of selection, using the the Lasso,
and we've determined that no effects are actually active, so we have a model with effectively no active affecting.
What we can then do is to redraw those weights again, so we can reweight our columns.
And we can see, this time we get a different model. So actually some of the effects now become active, so silica is active in this interaction between silica and sulfur.
Let's save that model.
So now, with our
data table, you can see we've got a saved model formula column there.
Let's redraw those weights again. So again we're redrawing them, such that some of the rows are more heavily in the training, some of them are more heavily in the validation.
And we got a different model again and let's save that now. So we now have
two rows of...two columns with saved prediction formulae.
And we can do that again. So I'm going to redraw the weights and save the prediction formula. I've just got a script that does that for us. And actually then creates this model average, so we've actually averaged our three saved models.
So each individual model used a different validation weighting.
Each model was different, some were better than others. The key point is that when we average of those together,
the average of those is the best model so far. So you can see this is actually looking quite close to the original model that we fitted using standard methods on the standard...
the original data set.
And we can keep doing this. We can do this 10 times over, so we're going to repeat that. And you can see redrawing the weights; each time we get a slightly different model.
And we can save each of those different models.
And ultimately end up with an average that is probably the best model.
And, in fact, if we compare that against the original model that we saw from the standard analysis of this data,
you can see the two models are actually in pretty close agreement.
So in this case, using this self-validated ensemble modeling has really arrived at the same model as we would get using standard
linear regression.
So I want to go over the key points in what you've seen there. And as I say, we're going to go on to Pete and Trent, who are going to explain the motivation for wanting to use this novel modeling technique.
What we did was we duplicated the original rows of data.
Then what that means is, we can apply a random weighting to say how much of each row pair is in training versus validation.
This enables us to use validation. We wouldn't be able to use that for such a small data set. It wouldn't make sense to hold out completely a row of data, but what we can do with this duplicated set
is to have weighted validation. So we fit a model using Lasso selection in this case and that Lasso selection made use of this weighted validation column.
Then we redraw those weights. We randomly assign different weights again to those row pairs and we refit the model. And, in our case, we repeated this 10-15 times, but in practice to get a good self-validated ensemble model, we would want to repeat this process 100 or more times.
And each model is itself...each individual model at each step of that iterative process is not particularly useful, but the model average or the ensemble is, as we saw with our final model that we achieved as the average of all of those iterations.
So, now I want to hand it over to Pete and he's going to talk through a case study now, which actually illustrates why you would want to use this unusual method for modeling data like this.
Peter Hersh Great thanks, Phil. I'm gonna go ahead and share my screen here.
And let's walk through an example about when we might
use this method. So here I'm
looking at the results first, so we're looking at a couple designs that are very small DOEs, so a six run, eight run, and 10 run DOE, with three factors.
And these are response surface designs that are super saturated to saturated in nature and they're all I optimal design, so these are very small number of runs
to to accomplish our goal here. So this is where we're going. So let's back up and talk about the design and then how we got here.
So.
This is a case study where we're trying to maximize the growth rate of a microbe by adjusting different nutrient combinations. And like I mentioned before, we're using a super saturated I optimal design and we're adjusting the number of runs.
And this is simulated data, where all of the main effects are active, as well as all the quadratics.
So how this was set up, we started with our three factors. I designed my model. If you haven't designed a saturated or super saturated design in JMP before,
the way to do that is treat those effects that aren't the main effects as if possible, and that allows you to have less runs or the same amount of runs as you do model effects. And so then I generated my designs there.
And went through this process that Phil so beautifully described there of resampling the data, getting those invert...inversely proportional weights, which are key to this process.
We have a couple of talks that have been done and I just have QR codes to those.
This technique was developed by Chris Gotwalt and Phil Ramsey and they have introduced the technique in a few earlier talks. And this last
QR code was an add-in that set up this validation column in older versions of JMP. Starting in JMP 16, you can do that natively in the software, but if you have JMP 15, you would need to use this add-in setup.
And like Phil said, so an
individual model might not fit the data very well, some will fit okay, some won't have any terms.
But this is just walking through the example here, where you can see each individual models. Some are just an intercept, some are pretty poor fits, and the occasional model does pretty well.
And so this is like Phil described, each of these has a different recalculated weight. And so we're getting a slightly different model and we're going to go ahead and save those out. So that's the first step there.
And then the next step is we averaged those models. And as you can see, as you average those models, more and more models, the model fit gets better and better. So the individual models aren't great, but the overall average model gives us that good fit.
And the last step which I wanted to talk about was these models are are biased by nature. As you saw with the first model Phil made, there was no active effects in there. So
when you're doing some model or variable selection technique with your model, you'll occasionally get a model that that specific variable doesn't enter.
So, if we look over here at these distributions, you can see
all of these are active effects, and they have a distribution around 5, which is the effect size, but they also have this bar down here that's 0.
So we're we're not gonna overestimate the effect as often as we get 0 to counterbalance this, so this is going to be biased to a lower value, so we're gonna do a
little trick to debias these models, where we
treat this model as a predictor in just a least squares model, and that will remove those bias and I'll walk through this process to show you how that's done.
But in the end we get that very nice result that I shared with very small number of runs. So that's kind of the motivation for why you'd use SVEM is the ability to predict and make a decent model with less runs than you could without using this technique.
Alright, so let's hop over to JMP here
and
walk through that process.
Okay, so here's the six run DOE of that example that I showed, where we have our three factors and our growth rate. And this is after we've built the model.
This is what it looks like and we've gone through and run now. So we have our six runs with our six results. Our next step is to create that validation column and to do that inside of JMP, we will just go to analyze predictive modeling,
make validation column, and then instead of a validation column, we're going to make an auto-validation table.
Okay, and so that's how I've done this. I've already gone through and done that, so I won't repeat the process.
But here's that data with its re sampled data and that inversely correlated weight, like Phil demonstrated. You have a high weight coupled with a low weight of the repeated run. After we've done that, we go through and make our model.
In this case, I've used a GenReg model and you want to select one of the variable selection techniques. So if I go ahead and run this, I want to do something like forward selection Lasso
or or one of those selection techniques. And Trent's going to go into more detail about those selection techniques and which ones they've investigated.
So after I've run the model, you see that JMP's gone through and selected some effects that are active. It's caught some of the ones, but you can see here
that biasing that we're seeing, where both of these main effects should be active, but they've come out as 0. So when I save this to the data table,
I'll create a new column like Phil showed and then I'll recalculate that weight and do it over and over again. And here's a simple script to do this 100 times, and
we'll be happy to share this. And this is very easy to just copy and paste your script, your model script into here and be able to reproduce what we've done here.
Okay, so here's the end result, where we've gone through with our DOE, our results, and then we have
the 100 models, which I have saved
and hidden here. And then the final model, which is
an average of all 100 of those models. And the last step that we want to do is that debiasing step. So all we're going to do is treat this
average model as a predictor for growth rate.
So here, you can see the the original model we have
has a a much steeper slope, so it's got a decent R squared, but it's not not as good as we were would hope to do. I mean, even with six runs this...six runs is a pretty good fit but
if we can remove that bias, maybe we can do a little better. So real simple setup. We just take the growth rate. Our response is our Y. We add the
ensemble model here, averaged of all of those
individual models, as our predictor. We run this and we get a result, and you can see it's it's a much better actual by predicted plot. And if we run a comparison there, you can see that removing that bias has a big effect in the positive direction.
Right and so before I hand it back to Trent, I wanted to show one more
slide here, which is basically just a summary of
that we... (lost my taskbar there it is)...that we we highly encourage you guys to use
and check out this technique, but it's not a magic technique that will always work in every situation. But when you're checking this out in screening designs and things like this, this could be a very nice technique that will
help you accomplish things that that you might not have been able to with standard modeling techniques. And with that, I'm going to pass it to Trent who has
a little bit more detail about how this is done and the
information there. So Trent, you want to take it away?
Trent Lemkus Yes, thank you, Peter. I appreciate that.
So
I am sharing my
PDF. I just want to make sure that that is in fact sharing and everyone can see that.
Peter Hersh Yep.
Trent Lemkus Okay fantastic. So
as Phil and Peter presented they presented case studies, which I think are phenomenal anecdotes that show the power of SVEM and when and why to use SVEM.
What we did for our research paper that is now currently under the peer review process is we conducted an inordinate amount of simulation studies to show...to see empirically how SVEM would perform in a predictive setting. So
the next...I have a series of slides that I'm going to show. I know it may come across as a little boring and dry, but I think it's imperative just to discuss the details
before I go into something a little more entertaining, which is an example of how we used...
how we used JMP, JSL and how I ran the simulations as an example and to show you what some of the output looks like, as well as the results.
So, first and foremost, the outline and the goals of our simulation studies was primarily
we wanted to create real-world simulation architecture. So we wanted our simulations to mimic the real-world industrial DOE as close to as humanly possible.
And secondly, we wanted to see how contemporary model building or contemporary model selection techniques, as well as SVEM perform when primarily focused on pure prediction,
plain and simple. So we wanted to see when we fit a model to a design, how does it predict on a truth or a holdout set of data, or excuse my accent, data.
And from this if we can simulate these unique runs as many times is as computationally feasible, we want to make as real-world inferential...we want to make...infer real-world outcomes as close to as possible so that we can
prescribe settings and where and how to use SVEM, since SVEM is so nascent.
So, first and foremost, what designs did we use in our simulations? We used definitive screening designs
and we used Box-Behnken designs where the underlying true generating model in our designs were full quadratic modles, or more colloquially known as second order models. And the reason we chose these designs is because A) for the definitive screening designs, the
more main effects you have, the more saturated and then super saturated the models become, meaning that you have far less degrees of freedom the higher the main effects become. Whereas with your Box-Behnken designs with the main effects we used,
you don't run into that scenario. So we have a...we have a juxtaposition between two scenarios, where in one hand we have
adequate degrees of freedom with our Box-Behnken designs to estimate our coefficients when applying a model selection technique, and in the definitive screening design case, we have almost
in the majority of our cases, we had super saturated design, so we had an inadequate amount of degrees of freedom. So in other words, simply put,
we couldn't estimate all of the effects when using contemporary model building techniques, like your forward selections, like your Lassos, you Dantzig selectors and so on,
which leads us to what actual model algorithms we used for our simulations. So for the SVEM models, keeping in mind that if you...
if you don't have a good understanding of what SVEM is, it's it's very important that you go back to some of the previous presentations that Peter was referring to.
But we have to use SVEM with an underlying model building technique, like forward selection, pruned forward and Lasso, which is what we use for our simulation studies.
In our non SVEM model cases, we use those same model building algorithms, but we also incorporated the Dantzig Selector, which is quite popular in DOE,
as well as the fit definitive screening design, which is germane exclusively to definitive screening; it isn't used on other designs.
Furthermore, we use an information criteria call the AICc or the corrected AIC, which is very common in DOE and is actually quite notorious for selecting very parsimonious models.
So, as you can see, we're trying to incorporate contemporary techniques that are very much current in the use of industrial DOE, which is sticking to the core...core point that I showed in the initial slide, which is
we want to mimic the real world as close to as humanly possible with our simulation studies, so we can make these empirical...
so we can empirically assess how these different model building algorithms perform predictively. And hopefully we can show that SVEM performs or outperforms its competition empirically.
So for the actual simulation structure, we ran 1,000 simulations for each model building algorithm. So for instance, if we were fitting a forward model with AICc,
every single one of those 1,000 runs we would generate a true model, the truth, and we would then add noise to it or.
gaussian noise (which is very common in academia and in and...when performing research),
thereby creating a training set, if you will, and a truth set so that, once we fit a model, we can see how that model predicts against the truth. And we rerun and regenerate those data sets randomly 1,000 times.
The SVEM hyperparameter, which I won't go into what that is, we used an nBoot of 200. Simply put we were basically fitting 200 models and we're averaging those models to yield our final SVEM model.
The designs were also iterated over several features. So the number of main effects
for definitive screening designs, we had four main effects and eight main effects. For our Box-Behnken designs, we had
three main effects and five main effects. What that means, just to give you an example is for our DSD, for instance, where K is eight, you have eight main effects. If you have a full quadratic model,