Speaker

Transcript

Phil Kay 
Okay, so we are going to talk about rethinking the design and analysis of experiments. I'm Phil Kay. I'm the learning manager in the global technical enablement team and I'm joined by Pete Hersh. 
Peter Hersh 
Yeah I'm a member of the global technical enablement team as well, and excited to talk about DOE and some of the new techniques we're...we're exploring today. 
Phil Kay 
So the first thing to say, is we're big fans of DOE. It's awesome. It's had huge value in science and engineering for a very long time. 

And having said that, there are some assumptions that we have to be okay with in order to use DOE a lot of time. 

And they don't always feel hugely comfortable, as a scientist or an engineer. So things like effect sparsity, so the idea that not everything you're looking at turned out to be important 

and actually only a few of the things that you're actually experimenting on turn out to be important. Or effect heredity is another assumption. 

So that means that we only expect to see complex behaviors or higher order effects for factors that are active in their simpler forms. 

And, and just this idea of active and inactive effects, so commonly, the sequential process of design of experiments is to screen out the active effects from the inactive effects. 

And that just feels sometimes like too much of a binary decision. It seems a bit crazy, I think, a lot of time, this idea that some of the things we are experimenting on a completely inactive, 

when really, I think we, we know that really everything is going to have some effect. It might be less important but 

it's still going to have some effect. And these are largely assumptions that we use in order to 

get around some of the challenges of designing experiments when you can't afford to do huge numbers of runs and yeah. Pete, I don't know if you want to comment on that from your perspective as well. 
Peter Hersh 
Yeah I completely agree. I think that thinking of something like temperature being inactive is is...I think hard to imagine that temperature has no effect on an experiment. 
Phil Kay 
Yeah it's kind of absurd, isn't it? So, 

yeah, if that's in your experiment, then it's it's always going to be active in some way, but maybe not as important as other things. 

So I'm just going to skip right to the results of 

of what we've looked at. So we've been looking at this autovalidation technique that Chris Gotwalt and Phil Ramsey essentially invented 

and using that in the analysis of designed experiments, and it's really provided results that are just crazy. We just didn't think they were possible. 

So, first of all I looked at a system with 13 active effects and analyzing a 13 run definitive screening design and from that I was able to identify that 13 effects were active, which is... that's what we call a saturated situation. We wouldn't... 

commonly, we've talked about definitive screening designs as being effective in identifying the active effects when we have effect sparsity, when there's only a small number of effects that are actually important or active. But in this case I managed to identify all of the 13 active effects. 

And not only that , was actually able to build a model with all those 13 active effects from this 13 run definitive screening design. 

So, again that's kind of incredible; we don't expect to be able to have a model with as many active effects as we've have rows of data that we're building it from. 

And Pete, you looked at some other things and got some other really interesting results. 
Peter Hersh 
Yeah, absolutely, and and Phil's results are very, very impressive. I think 

what the next step that we tried was making a super saturated design, which is more active effects than runs and we tried this with 

very small DOEs. So a six run DOE with seven active effects, which if we did in standard DOE techniques, there'd be no way to analyze 

that properly. And we looked at comparing that to eight and 10 run DOEs and how much that bought us. So we got fairly useful models, even from a six run DOE, which was 

better than I expected. 
Phil Kay 
Yeah, it's better than you've got any right to expect really, isn't it? 

And so we've got these really impressive results and the ability to identify a huge number of active effects from a small 

definitive screen design and actually build that model with all those active effects. And in Pete's case, I have been able to build a model with seven active effects from really small 

Designed experiments. 

So, how did we we do this? How does the autovalidation methodology work? Well, it's taking ideas from machine learning, and one of the really useful tools from machine learning is validation. So holdout validation is a really nice way of ensuring that you, you build 

the most useful model. So it's a model that's robust. So we hold out a part of the data, we use that to test different models that we build, and 

basically, the model that makes the best prediction of this data that we've held out is the model that we go with, and that's just really tried and tested. 

It's actually pretty much the gold standard for building the most useful models, but with DOE that's a bigger challenge, itn't it, Pete? It doesn't really obviously lend itself to that. 
Peter Hersh 
Yeah, yeah, the the whole idea behind DOE is exploring the design space as efficiently as possible. And if we start holding out runs or holding out 

analysis of runs, then we're going to miss part of that design space and 

we really can't do that with a lot of these DOE techniques like definitive screening designs. 
Phil Kay 
Right, right, so it'd be nice if there was some trick that we could get the benefits of this holdout validation and not suffer from holding out critical data. So that brings us to this autovalidation 

idea, and, Pete, do you want to describe a bit about how this works? 
Peter Hersh 
Absolutely, so this was a real clever idea developed by Chris and Phil Ramsey, and they they essentially take our original data from a DOE and they resample it, so you 

repeat the results. So if you notice that the table here at the bottom of the slide, the the runs in gray are the same results of runs in white. They're just repeated. 

And the way they get away with this is by making this weighting column that is paired together. So basically, if one has a high weight, the 

the repeated run of that has a low rate...weight and so on and so forth. And this is...this is...enables us to use the data with this validation and the weighting and we'll go into a little bit more detail about how that's done. 
Phil Kay 
Yeah, you'll kind of see how it happens when we go through the demos. So we've basically got two case studies of simulated examples that we use to illustrate this methodology. So this first case study I'm going to talk through, 

I emphasize it's a simulated example. And in some ways, it's kind of an unrealistic example, but I think it does a really nice job of demonstrating the power of this methodology. 

We've got six factors and to make it seem a bit more real, we've chosen some some real factors from a case study here where they were trying to grow some biological organism and optimize the nutrients that they feed into the system to optimize the growth rate. 

So we've got these six different nutrients and those are our factors. We can add in different amounts of those, so I designed a 13 run definitive screening design to explore those factors with this growth rate response. 

And the response data was completely simulated by me, and it was simulated such that there were 13 strongly active effects. So 

I simulated it so the all of the main effects, all all six of the the main effects, are active. 

And then, for each of those factors, the quadratic effects are active as well, so we've got six quadratic effects. 

Plus we've got an intercept that we always need to estimate, so there are 13 effects in total that are active, that are important in understanding the system. 

1 signal to noise, but that's still a real challenge with standard methodology in order to model that, and we'll come to that in the demo. 

So, really, the question is, can we identify all those important effects and, and if we can, then can we build a model with all those important effects as well? Because as I said, that would be really quite remarkable 

versus what we can do with standard methodology. 

And then case study #2, Pete? 
Peter Hersh 
Yeah, absolutely. Very, very similar to Phil's case study. Ssame idea with we're feeding different nutrients at different levels to an organism and and checking its growth rate. In this case I simplified what Phil had done and broke it down to just three nutrient factors. And this is 

building a different type of design, so an Ioptimal supersaturated design where we're looking at a full response surface 

in a supersaturated manner and we looked at six, eight and 10 run 

designs. And so same same idea. 

The the 

effects were very, very 

high signal to noise ratio, so really wanted to be able to pick out those effects if they were active. And just like Phil's, I kept the main effects in the quadratics active, as well as the intercept and we're trying to pick those out. 

And same idea, so how many runs would we need to see these active effects and how accurate of a model can we make from these very small designs? 
Phil Kay 
Yeah because you know, like I said, you've really got no right to expect to be able to build a good model from such a small design. 
Peter Hersh 
Yeah, exactly. Okay. 
Phil Kay 
So I'll go into a demo now of case study #1. 

And I'm presenting this through a JMP project, so that's a really nice way to present your results. I'd recommend trying this out. 

And that's our 

design, so this is our 13 run definitive screening design, where we vary these nutrient factors, and we have the simulated growth rate response. As I said, that's been simulated such that 

the main effects, the quadratics of all of these factors are strongly active, plus we've got to estimate this intercept. 

Now, with a definitive screening design I've generally recommend you use fit definitive screening as a way of looking at the results as one of the analyses that you can do. 

It works really well when we have this effect sparsity principle being true. So as long as only a few of the effects are strongly active...are active and the rest of them are unimportant, 

then it will find those...the few important effects and separate them from the unimportant ones. 

But in this case I wasn't expecting it to work well and it doesn't work well. It does not identify that all six factors are active. In fact it only identifies one of the factors as being active here. 

So that's not a big surprise, this is too difficult, too challenging a situation for this type of analysis. 

If somehow we knew that all of these active effects are active and we try and fit a model with all six main effects, all six quadratic and the intercept, 

then that's a saturated model. We've got as many parameters to estimate as we have rows of data, so we can just about fit that model, but we don't get any statistics. 

And in any case, you know, aside from the fact of I've simulated this data, in a real life situation, we wouldn't know which ones are active, so we wouldn't even know which model to fit. 

Now, using the autovalidation method, I was able to actually very convincingly identify the active effects, and I'll talk through how we did this. 

And this is just a visualization of my results here. You don't necessarily need to visualize it in this way. This is for presentation purposes. 

I was able to identify that first of all, the intercept was active. I've got all my six main effects, 

and my quadratic effects, and then my two factor interactions, which I simulated to have zero effect. You can see they are well down 

versus the other ones. And there's actually a null factor here that we use that...so so a dummy factor. So anything less than the null factor we can declare as being unimportant or or inactive, if you like. 

And what we're actually...the metric we're looking at here is something called proportion nonzero and I'll explain what that means, as we go through this. That's kind of the metric we're using here to identify the strength of an effect, of the importance of an effect. 

So a bit about how I went through this. So I took my original 13 run definitive screening design and then I set it up for autovalidation so we've now got 26 rows we've duplicated. 

And there's an addin for doing this, one of our colleagues, Mike Anderson. created an addin that you can use to do this in JMP 15. 

In JMP 16 they're actually adding the capability in the predictive modeling tools in the validation column platform. 

And what that does, we get this duplicate set of our data, and then we get this weighting and as Pete said, we have...each row is in the training set and in the validation set. 

In the training set, if it has a low weighting, it'll have a high weighting in the validation. So if it has a high weighting in the training set, it'll have a low weighting in the validation set. 

And what we do actually is, we read...these have basically been randomly assigned. 

We reassign those and we were able to kind of iterate over this hundreds of times, fitting the models each time and then looking at the aggregate...aggregated results over many simulation runs. So what you would do 

is to fit the model and I'm using GenReg here in JMP Pro. 

And you'll need JMP Pro anyway, because you need to be able to specify this validation role, so we put...the train validation column goes into validation. 

And the weighting goes into frequency and then we set up everything else as we normally would with our response. And then I've got a model, which is the response surface model here with all these effects in, and then I would click run. 

And it will fit a model, and we can use forward selection or the Lasso. Here, I've used the Lasso. 

It's not hugely important in this case. 

And what's actually happened is we've identified only the intercept as being important in this case, so we've only actually got the intercept in the model. 

But if we change the weighting, if we go back to our data table resimulate these weightings, we will likely get a different result from the model. 

We weight different rows of data, different runs in the experiment, that changes the model that's fit. So we're going to do that hundreds of times over, and what I'm going to do is actually to use the simulate function in JMP Pro. 

And what we do is we switch out the weighting column and switch in a recalculated version of the weighting column. And you can do that a few hundred times. I actually did it 250 times in this case. I'm not going to actually let that run, because that will take a minute or two. 

Once you've done that, what you'll get is a table that looks like this. 

So now I've got the parameter estimate for every one of those 250 models for each of these effects. So 

in my first model in this run that I did, this was the parameter estimate for this citric acid main effect. In the next model when we resampled the weighting, 

citric acid main effect did not enter the model, so it was zero in that case. 

And you can actually run distributions on all of these parameter estimates. And one of the things you can do is to 

customize this statistics, the summary statistics, to look at the proportion nonzero. So you can see the intercept here, 

the estimates that we've had of the intercept. You can see with citric acid, a lot of the time it's been estimated as being zero so those the models were, 

citric acid main effect was not in the model, and then a lot of the time it's been estimated as around about 3, which is what I'd simulated it to be. 

So what we look at is the proportion of times that it is non zero and we can make a combined data table out of those. And I've already done that, and just done a little bit of... 

a little bit of additional augmentation here. I've just added a column for whether it's a main effect or whatnot, and then that was how I created 

this visualization here. So what you're looking at is the proportion of times each of those effects is non zero, so the proportion of times that each of the effects is in our model over all those 

250 simulation runs we've done, where we've resimulated the fractional weighting. And that's what we use to identify 

the active effects, and that's...and it's done a remarkable job. It's been able to do what our standard methods would not be able to do. It's identified 13 active effects from a 13 run 

definitive screening design. 

Now, what would you want to do next? We maybe want to actually fit that model with all those effects and I've been able to do that. And I'm comparing the model that I've fit here 

versus the true simulated response, and you can see how closely they match up. So I've been able to build a model with all these main effects, all these quadratics and the intercept. 

So I've got a 13 parameter model here that I've been able to fit to this 13 run definitive screen design, which again is just remarkable. 

And I'm not going to talk through exactly how I got to that part. I'll hand over to Pete now. He's going to talk a bit more about this idea of self validated ensemble models. 
Peter Hersh 
Absolutely. Thank you, Phil. Let's see. I'm going to share my screen here and we'll just take a look at this project. So 

you can see here 

in the same 

flow as Phil, we're looking at a project here and I have 

started with that six runs supersaturated DOE, and here you can see, I have three factors, 

what my my actual underlying model growth rate is and then what the growth rate...the simulated growth rate was and then like like Phil mentioned, 

I create this autovalidation column, which can be done with an addin in JMP 15 that that Mike Anderson developed. Or in JMP 16, it's it's built right into the software and you can access that under the analyze predictive modeling platform make validation column. 

So just like Phil showed, he showed a excellent example of how we can find which factors are active, so a factor screening. And that is oftentimes our main goal with DOE, but if we want to take it a step further and build a model out of that, 

we'd go through this the same process, right. So we build our DOE, we get an autovalidation added to that DOE, we build our model, just like Phil showed, using generalized regression and one of the 

variable selection techniques. So Phil Ramsey and Chris Gotwalt have looked at many of these different techniques and they all seem to work fairly well. So whether you're using a Lasso or even a two stage forward selection, they all seem to have similar results and 

work fairly well. So once you set this up and launch it, you get a model, like like Phil had shown, and you know some of these models will have 