Speaker

Transcript

Philip J. Ramsey 
Hello, this is Phil Ramsey from the University of New Hampshire and Predictum. And along with my co presenters Wayne Levin of Predictum and Marie Gaudard, the 

Emeritus Professor from the University of New Hampshire and Predictum, we're going to present a talk on modern mixture designs. 

And the agenda for the talk is we're going to talk about machine learning and design of experiments, briefly introduce what are mixture designs, 

talk about a new machine learning method for DOE called autovalidation, which is combined with ensemble modeling to create something we call selfvalidating ensemble modeling methodology. 

And we'll then talk about SVEM, which is selfvalidating ensemble modeling, and we'll do a demonstration using a mixture experiment and then we're going to talk about 

the SVEM method and spacefilling designs. And Wayne Levin will be talking about an addin that Predictum has developed to do SVEM and he'll also be doing the demonstration of the urea experiment. 

And I want to point out that recently, in fact this year alone, we've already had three and that will soon be for papers published in journals on the topic of 

DOE and machine learning, so people have discovered that machine learning methods traditionally applied to big data 

may have a lot of applicability to DOE, and we certainly agree. And we're going to show how they can be used effectively with mixture experiments. But one of the traditional limitations in the application of socalled machine learning 

to smaller data sets like DOE is the belief that one had to have very large data sets. This is actually not entirely true. 

And as you're going to see, the SVEM method that has been developed by a number of people, including myself and Dr Chris Gotwalt from JMP, 

we actually make it quite feasible to apply these traditional big data methods to the much smaller data like we find with design of experiments. 

And also machine learning is really all about prediction, so we divide statistical modeling up into two approaches, one is called explanatory, which is classic, it focuses on hypothesis testing and P values and parameters. 

And that usually does not lead to good predictive models, but machine learning focuses entirely on prediction. 

So that is our point of view, and that is what we're going to use today and apply it to mixture experiments. We're going to use machine learning 

for prediction. Our focus is not on explanation and, as you will see, we are going to be able to apply these predictive machine learning methods to the small design of experiments 

data sets. Well, traditionally the way we evaluate predictive models is by having a portion of the data used to train, so we call it the training set, a very common method. 

And then we have another set that in some way we've held back. There are a number of strategies. We'll only talk about one, called hold back. 

We set it aside, we fit the model to the training set and then the hold back set, or validation set, allows us to evaluate the prediction capability. In other words, we're predicting data that's not being used to fit the model. 

However, this strategy won't work with design of experiments. They're very efficient and information centric, 

and there typically is not sufficient resources or budget to develop validation sets. Rarely can people do additional runs that could be used for validation. 

So, this would appear to be a barrier to using machine learning for predictive modeling but, as we'll show, this SVEM method gets us around this constraint. 

And I just, before we get into SVEM, want to quickly give you a discussion of mixture designs. I know some of you are familiar with them, some are not. 

These are a special type of experimental design. They're not as common as they should be, because people don't know about them, but it refers to the experimentation with a formulation or recipe, and the components of these formulations can literally be liquids, gases or solids. 

And one of the important concepts is the impact on the responses only depend on the relative proportions of the components. 

So we're very much concerned about the proportions of our components, as you would, say, in a recipe. We are not focused on amounts. If the total amount present were important, this leads to another type of experiment called a mixture amount, which is outside of today's scope. 

But because these are components of a formulation, there is constraints on these factors. In other words, the total amount has to be constrained; sometimes that happens naturally. 

But also the proportions have to be constrained. So in a mixture experiment, unlike a factorial experiment, you cannot independently change the factors. 

So in traditional DOE that many people are familiar with, there are no constraints on the joint settings of the factors. In some cases there are, but typically there are not. 

A mixture experiments, on the other hand, since they are ingredients of some formulation, the total amount has to sum to 1 or 100%. So if I had Q factors, then the settings of those Q experimental factors for every trial has to sum to 1. Furthermore, 

the setting for any one individual component has to be somewhere between 0 to 1. I'll also point out, and we will not have time to discuss this today, 

an important variant on mixture designs are called mixture process experiments, in which we combine mixture factors with process factors. 

We do this because the behavior of mixtures. For instance, think something like a three part adhesive, the behavior changes dramatically with the processing conditions. So we might look at things like pressure and temperature for curing an adhesive and the optimal 

recipe typically changes as a function of those process factors. That is out of scope for today's talk, but it is an important aspect of mixtures. 

So, historically, when people analyze mixtures and even create them...created them, the focus is on explanatory goals, that is, 

hypothesis testing on parameters, P values, and confidence intervals. However, from mixtures in general, this is always been problematic because mixtures inherently have a lot of 

multicolinearity or correlation among the factors and effects. They have to because the total...the total amounts is constrained, and they have to sum to 1, so they cannot be independent by design. However, 

we disagree with this traditional focus and with a lot of experience that we have with mixture experiments, the goal is invariably, frankly, prediction. 

It's not explanation. We're trying to predict performance of mixtures or recipes and that's what the scientists or engineers are interested in. 

So there has been a big disconnect between what the scientists need and how we've analyzed mixtures. Also a lot of the traditional designs are what we would call boundary point designs. 

If you're familiar with the term D optimal, they tend to be very D optimal, that is, 

most of the points are assigned around the boundaries, we have little interior information. There are very few points, sometimes none 

on the interior of the design region. Why is this important? Well, mixture systems, especially in chemical and biological applications, 

tend to be very complex. There are very complex kinetics going on, and these response surfaces have very complicated shapes. 

And if we don't have a good deal of information on the interior of the design region, then very likely we are not going to come up with models that really represent the behavior of the system. 

And this has actually happened quite frequently in the history of the use of mixture designs. Also again, we won't have time to talk about it today, 

a classic approach to analysis or modeling in the past are what I call Scheffe polynomials, 

and these are a type of polynomial designed to be used with mixtures. There are many variants of them, but these mock polynomials are frequently, especially in their classic forms, inadequate to model these response surfaces. 

So how do we approach the problem? Well, we believe the way we need to approach mixture designs is from the point of view of space filling designs. 

And why space filling designs? Again that could be a talk by itself, but these are designs that are created to cover a design space as best one can for a given number of design points. 

And the number of design points 

are an option for the user. This can actually allow us to use smaller designs, 

classic designs, and they covered the design region. This, as a result, can give us more accurate and useful predictive models. We have better information over the whole design region. 

And in our approach to mixtures, it is all based upon the use of space filming designs, and this is a trend we've noticed in industry in recent years. 

I now want to shift gears and talk about...well, given we've created the mixture, we've run it, well, how do we analyze it? 

And I mentioned earlier, one of the problems with machine learning applied to DOE is we don't really have a natural validation set. Well, Gotwalt and Ramsey, that would be me and Dr Chris Gotwalt, actually proposed a method of validation that we've referred to as autovalidation. And 

the first talk on this was actually given at a Discovery Europe conference some years ago. 

And this will not seem intuitive, but hear me out. We can use the original data, the original experiment for training and for validation. In other words, we take the original data and create a copy or clone of it, 

and we refer to this as the autovalidation set. So every observation in the training set has a twin in the autovalidation set itself. 

Now by itself, this would seem nonsensical. Since the autovalidation set has the same observations as the training set, how does that supply an independent assessment of prediction capability? Well, it can't by itself. 

The key to the idea of autovalidation is that we apply a weighting scheme to the observations, and the scheme is based upon generating weights from a gamma distribution. 

By the way, the gamma distribution is commonly used for weighting in statistical applications. The gamma weights have a number of...number of nice properties again that we will not get into today. 

And what we do is we generate a set of weights, but we do it in a special way. If an observation in the training set gets a high value, a high weight, 

the observation in the autovalidation set, the twin, gets a low weight. Why do we do this? This drives anti correlation between the two data sets. 

In other words, we're trying to uncorrelate the sets using a weighting scheme, and we now have a good deal of research that has been done on this in the last few years. It indicates this approach is very effective. 

And there will be a paper coming out shortly in which all these results will be made available. So the idea is to use this weighting scheme, so we take the 

training set, we assign the gamma weights, we have the autovalidation set, and we assign corresponding weights to each of the twins, so if the training set has a high weight, 

the twin in the autovalidation set gets a low weight, and this drives this anti correlation behavior. And then we simply repeat this strategy some number of times to be specified by the user. So in the next slide, I show an example of a simple experiment. We have three factors. Notice 

runs 17 are the training set, the original. Runs 814 are the clone or the copy. Notice the observations are exactly the same values for the two sets and then notice the column called 

Paired FWB Weights. I'll explain FWB in a moment. 

And notice if an observation, like observation 1 gets a high weight, it's twin, observation 8, gets a low weight. So observation 8 is in the validation set. 

So this idea of fractional weighting is important, so the analysis is done applying the fractional weights to the response. We then repeat this strategy 

some number of times (and the number of times will depend upon the user and the scenario) and we fit the model each time. In other words, we're basically what we call fractionally weighted bootstrapping. 

And what is fractionally weighted bootstrapping? It's basically a form of bootstrapping, also known as generalized bootstrapping, 

in which we do not bootstrap the observations. We bootstrap weights applied to observations. This is a well known methodology and we're applying it. 

What's different is, we apply this repetitively, as I said, it could be hundreds of times, thousands of times, it's really up to the user. 

Each time we fit a model and then at the end, what we do is we take an average or ensemble. And these are called...and this is where we get the term 

selfvalidating, or autovalidation is selfvalidation ensemble modeling. We're taking averages 

and ensemble modeling, by the way, is very common in machine learning and deep learning because it tends to lead to more stable and better predictive modeling. 

So, SVEM does fractionally weighted bootstrapping to generate a family of models and then uses model averaging to come up with a final predictive model. 

So, how does the method work? Well, you pick an algorithm for your predictive modeling. In our case, we're going to focus on neural networks for 

mixture designs, but you can pick other methods. SVEM is rather agnostic about what algorithm it is used with it, and it's very amenable to many algorithms. So we create the training and autovalidation set. We assign our fractional weights. 

And then we fit the model, and then we bootstrap this some number of times, as defined by the user, and then we take an average. Okay, so at this point I'm going to turn it over to my colleague, Wayne Levin, who's going to give a demonstration of SVEM and talk about an addin to do it. 
Wayne 
I'm just doing the switch here 

and share screen. 

And just going to switch over the PowerPoint. 

Thanks very much, Phil. We're going to use this example here, this urea data, so the purpose of the experiment is to optimize a solution for a cleansing agent. 

And the three components of this mixture are water, alcohol and urea. And so those are these three columns over here. And we're just going to focus on one response, which is viscosity. The target we're trying to hit us 100 but, as we can see down here, 

we want to be between 95 and 105 in terms of viscosity. So with that, I'm going to switch over to JMP. 

Okay, so here is our urea data. Just opened it up here, and just note that there's only 15 runs, and so I'm just going through go to the next step. And the next step in the 

process is to create the autovalidation table, so when I click on that you'll see a couple of things happen. First, we have a new table, and we have the first 15 runs, which is the exact same as what we had here. And then for each of the runs...so if I focus, for example, on this first row (52, 29, 19), 

you can see it's repeated down below here, so 52, 29, 19. 

And likewise the second row will be 57, 23, 19. I'm obviously not going to repeat through all of these, but I think you get the idea. It's simply repeating the combinations in this... 

in the second set down here, and the first set is the training...what's used for training and then the second, as indicated here, is what we get for validation. Now in between is the paired fractional weighting column that Phil was talking about. So 

we tend to see...when we tend to see high numbers up here, we'll get corresponding lower numbers down below and it's very much following the... 

well, the gamma distribution that that phil was talking about. So why don't I just show that. I'm just going to plot the weights over here, just kind of 

position that on the screen. So every time we go through the the iterations, we get...we refresh the weights. So just have a look at the weights over here and just watch the 

column over here, this column right here. So I'm just going to run it so you can see. So each iteration, we get a new fresh set of 

fractional...fractionally weighted pairs across the the both the training and the validation sets. Okay. So next I'm going to 

do a fit model. So I'm going to do, if you will, one iteration, you might say. 

And so viscosity is the response. Just notice down here that the frequency role has this paired fractional...fractionally weighted bootstrapping weight, so that's how it gets factored into the modeling. So there we go and here is, if you will, one iteration, 

okay, based on these particular weights. Well, 

if we want to have an ensemble model, we've got to do that repeatedly. So what I'm going to do now is I'm going to run this with new weights and, if you would, 

keep an eye...you have to keep an eye on kind of three different places at once, which could be a bit of a challenge. I'm just going to try and position these things, alrighty. 

So what's going to happen is, as I click here run with new weights, you're going to see the weights here change, a model will be created and the predictive model will be saved 

up here. I'm going to do one here. Let's give it a go. Bang, so it changed the way it's produced the model. 

Save the model here. This column is also added; this is the ensemble that Phil was talking about it, this is the average. Of course I've only done one iteration so the average is just 

the same. So I'm just going to do another iteration here and we get a new set of weights, a new model, and a new model has been saved, and now the ensemble model over here is the average of these two 

over here. So I'm just going to do it, oh I don't know, a few more times. Tt's not hard to do, and we can see these things 

change as we're going along. So I've got six of them now, so why don't we have a quick look at these six 

models...individual models, as well as the average model, the ensemble, and that's shown here in the profiler. And this this rather shows the instability that 

Phil was alluding to earlier, because if, if you were only do this once...now I know there's the fractional weighting that's going on in here, but as Phil mentioned, you know, a movement of any point I can really 

disturb the model, if you will, and you'll end up with features that don't really... 

perhaps don't really belong there. They're just artifacts of particular design points. But just have a look here, you can see how 

each of these individual models looks...well, they have some different features here. but down below, the average model is in a sense...well, it's the average of all of the models above. 

And you can more or less see that, that big kink over here kind of gets smoothed out. You see a little bit of it down here. 

Now this is only with six iterations and with the addin, we recommend actually doing 50 iterations. 

But what I'll do as well, let's have a look at the actual versus predicted. Now this one came out pretty... 

pretty consistent. The the lines are all pretty much a slope of one anyway. Often when we do this, we do see more variation across this, but what you will note in the 

average model, it's a little hard to see here in this particular example, but it is a solid line, basically going through the mass of the the other lines there. So I'm just going to move these aside and what I'll do next is 

I'm going to actually do a bunch of runs here. I think we'll do in this demonstration, it's going to do 25 iterations. You can see the model change, you can see the weights 

changing over here. It's not adding anything new up here. So we're just going through. Bang, it's done and now we'll have a look at the report. So I'm just going to open that up. 

Initially we end up with an actual versus predicted over here and a residual plot. It might be nice to look at over here the 

profiler. 

And I I love the profiler. It's 

very appropriate for mixtures, because you can see, as I changed the water, as I increased it, the other two go down. That's the 

inherent nature of mixtures, of course, they all have to contain or add up to 1. Okay, so this is, if you will, the average model. 

The ensemble model across the 25 fits that have taken place over here, alright. So if I may, I'm going to 

now just have a quick look at the mixture profiler. 

We get this look up here. 

Actually, if I may, what what you can see from this is... 

notice the points. Do you see how the points are fitting in very nicely? This is the space filling nature of the design, and you'll also notice that this design actually has constraints in it. 

So we're not covering by any means the entire space of water 01 or urea 01, and so on, so we have these constraints up here. 

And what I'm going to do now is just throw in the the limits, just so I can visualize...visualize that 95 to 105. So that's showing me in the white space here just where I can... 

what combinations, if you will, satisfy those those those conditions...the spec limits up here. 

Okay, so that's the demonstration of 

of the SVEM method, the methodology, if you will, and 

I may... I'll just. 

So SVEM and space filling design. So again, one of the big motivations about this is that they're inherently unstable. 

So small changes in the data can lead to big changes in the predictions and a single predictive model, of course, can be unstable as a result. You know, you end up with those kinks in it and that just, you know, 

usually quite disturbing just to look at because that's often just not the way the kinetics happen. 

So we need stability and the ensembling of multiple models, based on the fractionally weighted bootstrapping give us that. 

The other...the other big thing about this is the multicolinearity that's inherently present in these...in these situations. Mixtures are complex systems. 

They're not the sum of the components' behavior. They are really the product of the interactions, and that's why, again, I love the profiler because they depict 

those interactions beautifully. Those dynamics just come across really nicely. So because we got all this multicolinearity, it kind of mitigates that and gives a stable, more accurate predictive models. We've been running this now... 

actually it reaches back a few years but 

since we created the SVEM addin, which I'll talk about just a moment, we've run it numerous times now, since earlier this year and it's it's done a really nice job. 

Our clients are very pleased with it. So like I said, the model averaging using SVEM, it's not automated and JMP standard or JMP Pro, but we do have an addin, and if you go to our website at Predictum.com, you can learn more about it and how to get it. 

So yeah there's a particular page on SVEM. We also have a modern mixture design course and you can see that that's with Marie and Phil and myself. 

And that, you can see on the training page on the Predictum site. We do cover a section on the classical mixture designs and but two thirds to three quarters of the course features SVEM. 

And that's available, the SVEM addin is available as part of the course. So in this talk we've seen how building a predictive model from data is... 

data is limited by the feasibility of conducting validation trials to control overfitting. You know, in a classical design experiment, we just don't have the runs to do 

both training and validation, so we suggest using space filling designs, especially for mixture DOEs, but actually more generally for other DOE situations as well. And we described SVEM, which is based on autovalidation and fractionally weighted bootstrapping. 

And we discussed how SVEM enables predictive modeling for DOE data without, again, a separate set of validation runs. 

And we demonstrated SVEM using the urea experiment data. 

And I do want you to know that if you download the slides, you'll find a whole bunch of references. So here's a few here and a few more, and one of the things I love about this is that 

it actually reaches all the way back to 1996. So this is something that has been brewing for some time, and I would encourage you to have a look at these references. 

And with that we'll take questions. 