Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
SVEM: A Paradigm Shift in Design and Analysis of Experiments (2021-EU-45MP-779)

Level: Intermediate


Philip Ramsey, Professor, North Haven Group and University of New Hampshire
Wayne Levin, President and CEO, Predictum
Trent Lemkus, Ph.D. Candidate, University of New Hampshire
Christopher Gotwalt, JMP Director of Statistical R&D, SAS


DOE methods have evolved over the years, as have the needs and expectations of experimenters. Historically, the focus emphasized separating effects to reduce bias in effect estimates and maximizing hypotheses testing power, which are largely a reflection of the methodological and computational tools of their time. Often DOE in industry is done to predict product or process behavior under possible changes. We introduce Self-Validating Ensemble Models (SVEM), an inherently predictive algorithmic approach to the analysis of DOEs, generalizing the fractional bootstrap to make machine learning and bagging possible for small datasets common in DOE. Compared to classical DOE methods, SVEM offers predictive accuracy that is up to orders of magnitude higher. We have remarkable and profound results demonstrating SVEM capably fits models with more active parameters than rows, potentially forever changing the role of classical statistical concepts like degrees of freedom. We use case studies to explain and demonstrate SVEM and describe our research simulations comparing its performance to classical DOE methods. We also demonstrate the new Candidate Set Designer in JMP, which makes it easy to evaluate the performance of SVEM on your own data. With SVEM one obtains accurate models with smaller datasets than ever before, accelerating time-to-market and reducing the cost of innovation.



Auto-generated transcript...




Wayne Levin self-validating ensemble model, as we view it as a paradigm shift in design of experiments and so.
  Just gonna...there we go. Our agenda here is...we're going to...just some quick introduction about who's talking.
  So, what, why and how of SVEM. So we'll have a quick overview of DOE and machine learning, describe blending them into SVEM, analyze some real world...real world SVEM experiments, and we're going to
  review some current research and demonstrate JMPs new candidate set designer, new in JMP 16, and then we'll end with the usual, you know, next steps and Q & A and so forth, so.
  SVEM is a remarkable new method to extract more insights with fewer experimental cycles and build more accurate predictive models from small sets of data, including DOEs.
  Using SVEM then we anticipate, we promise you're gonna have less cost, you'll be faster to market, a lot faster problem solving. We're going to explore all that, as we go forward.
  in the session.
  I'm Wayne Levin and I'm the president of Predictum and joining me is Chris Gotwalt, who's the chief data scientist at JMP, and also joining us is Phil Ramsey who's a senior data scientists here at Predictum and associate professor at the University of new Hampshire.
  So let's just
  take stock here. JMP's contributions to data science is huge, so we all know, and in the DOE space going back over 20 years ago now, it would...JMP 4, with the coordinate exchange algorithm.
  And then about eight years ago, nine years ago, we saw definitive screening designs coming out. And we'd like to think that SVEM, we call it for short, self-validating ensemble modeling (try and say that fast three times), we like to think that this is another contribution.
  machine learning and design of experiments.
  It overcomes limitations of limited amounts of data, right. With small amounts of data, you can't do machine learning, you normally have to have large amounts of data to make that happen.
  Now we've been trying SVEM in the field and we've had a number of companies who approached us on this and we've made a trial version of SVEM available. We're, at
  Predictum, the first to create a product in this area, bring a product to market, and I just want to share with you some of the things we've learned so far.
  And one is that SVEM works exceptionally well. I mean, when you have more parameters and number of runs, it works very well with
  higher order models. It does give more accurate predictions, it also helps recover from broken DOEs. So, for example, if you have definitive screening designs that for
  whatever reason can't be run in fit definitive, we've had some historical DOEs sent to us that
  lack power. They they just, you know, didn't find anything and but with SVEM, actually it did. It was able to. And you know we've had cases where there were some missing or questionable runs.
  Something else we've learned is that it's not going to help you with rational factorial, so things like that. We're interested in predictive models here
  and fractional factorials really don't give predictive models; they're not designed for that. So it's not going to give you something that the data is not structured to deliver,
  at least not to that extent. The other thing we've learned is that we've not yet tested the full potential of SVEM. To do that, we really need
  to design experiments with SVEM in mind to begin with, and that means look at more factors. We don't want to just look at three or four factors. How about 10, 15 or even more?
  We understand from the research that Chris and Phil will be talking about that Bayesian I-optimal designs are the best...the best approach for using with SVEM and also mixture designs
  were particularly useful for SVEM and I'll have a little something more to say about that later. And as far as SVEM goes
  if you'd like to try it out, we'll have some information at the end after Chris and Phil to to talk about that. And so with that said, I'm going to throw it over to you, Chris, who will
  go into the what and the why and the how of SVEN. So off to you, Chris.
Chris Gotwalt Alright, great well, thank you, Wayne. Oh wait, am I sharing my screen?
Wayne Levin Not yet.
  There we go.
Chris Gotwalt Okay, all right, well, thank you, Wayne.
  So now I'm going to introduce the idea of self-validating ensemble models or SVEM.
  I'm going to keep it fairly basic. I'll assume that you have some experience of design and analysis of experiments.
  Most importantly, you should understand that the statistical modeling process boils down to fitting functions to data
  that predictor response variable as a function of input variables. At a high level, SVEM is a bridge between machine learning algorithms using big data applications
  and design of experiments, which is generally seen as a small and good analysis problem. Machine learning and DOE each have a long history, but until now, they were relatively separate traditions.
  Our research indicates that you can get more accurate models using fewer observations using SVEM.
  This means that product and process improvement can happen in less time using fewer resources at lower cost. So if you're at a competitive industry,
  where being first to market is critical, then SVEM should be of great interest to you.
  In particular, we believe that SVEM has particular...has tremendous promise in the pharma and biotech industries,
  high technology like semiconductor manufacturing, and in the development of consumer products.
  In my work here at JMP, I've had something of a bird's eye view of how data is handled in a lot of industries and areas of application.
  I've developed algorithms for constructing optimal design experiments as well as related analysis procedures like mixed models and generalizing your models.
  On the other hand, I'm also the developer of JMP Pro's neural network algorithms, as well as some of the other machine learning techniques that are in the product.
  Over the last 20 years I've seen machine learning become more prominent when modeling large observational data sets and seeing many new
  algorithms that have been developed. At the same time, the analysis of designed experiments has also changed, but more incrementally over the last 20 years.
  The overall statistical methodology of industrial experimentation would be recognizable to anyone that read Box, Hunter and Hunter in the late 1970s.
  Machine learning and design of experiments are generally applied to industry to solve problems that are predictive in nature.
  We use these tools to answer questions, such as is the sensor indicating a faulty batch or how do we adjust the process to maximize yield while minimizing costs.
  Although they are both predictive in nature, machine learning and design of experiments have generally applied to rather different types of data.
  What I'm going to do now is give a quick overview of both and then describe how we blended them into SVEM.
  I'll analyze some data from a real experiment, to show how it works and why we think it's doing very well overall.
  I'll go overall...I'll go...I will go over results of simulations done by Trent Limpkiss(?), a PhD student at the University of New Hampshire, which has given us confidence that the approach is very promising.
  Then I'll go over an example that shows how you can use the new candidate set designs in JMP 16 to find optimal subsets of existing data sets so you can try out SVEM for yourself.
  I'll highlight some of our observations, I mentioned some future questions that we hope to answer, and then I'll hand it over to Phil, who will go through a more in-depth case study and a demo of the SVEM add-in.
  Consider the simple data on the screen. It's a data from the metallurgical process and we want to predict the amount of shrinkage at the end of the processing steps.
  We have three variables on the left that we want to use to predict shrinkage. Modeling amounts to finding a function that connects pressure, time, and temperature that predicts that the response (shrinkage) here.
  Hopefully this model would generate...will generalize to new data in the future.
  In a machine learning context, the standard way to do this is to partition that data into two disjoint sets of rows. One is called the training set and it's used for fitting models to make sure that we don't overfit the data,
  which leads to models that are inaccurate on new observations. We have another subset of the data, called the validation set that is used to test out various models we have been...
  that have been fit to the training set. There's a trade off here, where functions that are too simple will fit poorly on the training and validation sets because they don't explain enough variation.
  Whereas models that are too complex can have an almost perfect fit on the training set but will be very inaccurate on the validation set.
  We use measures like validation R squared to find the goldilocks model that is neither too simple nor too complicated.
  This goldilocks model will be the one whose validation R squared is the highest on the validation set.
  This hold out style model selection is a very good way to proceed when you have a quote unquote reasonably large number of rows,
  often in the hundreds of millions of rows or more. The statement reasonably large is intentionally ambiguous and depends on the task at hand. That said, the 12 rows you see here is really too small and is used here just for illustration.
  In DOEs, we're usually in situations where there are serious constraints on time and/or resources.
  A core idea of designed experiments, particularly the optimal designed experiments that are a key JMP capability, is around obtaining the highest quality information packed into the very smallest number of rows possible.
  Many brilliant people over many years have created statistical tools like F tests, half-normal plots and model selection criteria, like the AIC and BIC,
  that help us to make decisions about models for DOE data. These are all guides that help us identify which model we should use to base our scientific, engineering and manufacturing decisions on.
  One thing that isn't done often is applying machine learning model selection to designed experiments. Now why would that be?
  One important reason is that information is packed so tightly into designs that removing even a few observations can cause the main effects and interactions to collapse
  so that we're no longer able to separate them into uniquely estimable effects. If you try to do it anyway, you'll likely see software report warnings, like lost degrees of freedom or singularity details added to the model report.
  Now we're going to conduct a little thought experiment. What if we tried to cheat a little bit?
  What if we copied all the rows from the original table, the Xs and Ys, and labeled one copy "training" and labeled the other copy "validation"? Now we have enough data, right? Wrong. We're not fooling anybody with this, this is this crazy scheme.
  It's crazy, because if you tried to model using this approach and looked at any plot of index of goodness of fit, you'll see that making them more complicated leads to a better fit on the training set but because training and validation
  sets are the same in this case, the approach will always lead to overfit.
  So let's back up and recast machine learning model selection as a tale of two weighted sums of squares calculated on the same data.
  Instead of thinking about model selection in terms of two data partitions, we can think of it as two columns of weights.
  There's a training weight that is equal to one for rows in the training set and zero otherwise, and validation column
  that is equal to zero for the training rows and equal to one for the validation column, for the validation rows.
  So from machine learning model selection, we can think of each row is having its own pair of weight values, where both weights are binary zeros and ones.
  Having them set up in this way is what gives us independent assessment of model fit.
  Models are fit by finding finding parameter values that minimize the sums of squared errors weighted using the training weight, and the generalization performance is measured using the sum of squared error weighted using the validation column weights.
  If we plot these weight pairs in a scatter plot, we see, of course, that the training validation weights...the training and validation weights are perfectly anti correlated.
  Now, what if, instead of forcing ourselves to use perfectly anti correlated pairs of binary values, we relax the requirement that the values can take on only the values of zero and one
  and allow the weights to take on any strictly positive value?
  To do that, we have to give something up, in particular, to my knowledge, for this to work mathematically, we can no longer have perfectly anti correlated weight pairs, but we can do this in a way that we still have a substantial anti correlation.
  There are many ways that this could be done, but one way to do this would be to create weights using the same scheme as the fractionally weighted bootstrap, where we use an exponential distribution with a mean of one
  to create the weights. And there's a little trick using the properties of the uniform distribution that we can use to create exponentially distributed, but highly anti correlated weight pairs.
  When using this as a two column weighted validation scheme, we call this autovalidation. Phil and I have given several Discovery papers on this...on this very idea.
  Using this approach is like having a training column of weights that has a mean of variance of one and the validation column with the same properties.
  Under this autovalidation scheme, if a row contributes more than average weight to the training sums of squares, that row will contribute less to the validation sums of squares and vice versa.
  There is an add-in that Mike...I want to point out that there's an add-in that Mike Anderson created that sets data sets up into this kind of format that JMP Pro's modeling platforms can consume.
  Now recently Phil and I have taken on Trent as our PhD student.
  Over the spring and fall, Trent did a lot of simulations trying out many different approaches to fitting models to DOE data, including an entire zoo of autovalidation based approaches. I'll show some of his results
  here in a little bit. Suffice to say, the approach that worked consistently the best, in terms of minimum average error on an independent test set, was to apply autovalidation
  in combination with a variable selection procedures such as forward selection, but instead of doing this once we repeat the process dozens or hundreds of times,
  saving the prediction formula each time and ultimately averaging the models across all these autovalidated iterations. And we call this general procedure self-validated ensemble models or SVEM.
  So, to make things a little more concrete I'm going to give a quick example that illustrates this. We begin by initializing the autovalidation weights.
  We apply a model selection procedures, such as the Lasso or forward selection, then save the prediction formula back to the table.
  Here's the first prediction formula. In this iteration, just two linear main effects have been selected.
  Then we reinitialize the weights, fit into the model again using the new set of weights, and save that prediction formula.
  Here's the second autovalidated prediction formula. Note that this time, a different set of main effects and interactions were chosen and their regression coefficients are different this time.
  We repeat this process a bunch of times.
  And at the end, the SVEM model is simply the average across all the prediction formulas.
  Here's a succinct diagram that illustrates the SVEM algorithm as a diagram. So the idea of combining the bootstrap procedures with model averaging
  happens over and over again, as we're...after we reinitialize the weights. We save the the prediction formula from that iteration.
  We I'm saving the the illustration is showing the parameters, but it's the same thing with the formulas. Just save them all out and then at the end of the day, the final model is just simply the average across all the iterations.
  Once we're done, we can use that SVEM model in the graph profiler with expand intermediate formulas turned on so that we can visualize the resulting model and do optimization of our process and so forth.
  Now I'm going to go over the results from Trent's dissertation research.
  a Box Behnkens and DSDs in four and eight factors. Each simulation consisted of 1,000 simulation replications and each simulation had its own set of true parameter values for our quadratic response surface model.
  The active parameters were doubly exponentially distributed and we explored different percentages of active effects from 50% to 100%.
  For each of the 1,000 simulation reps, the SVEM prediction was evaluated relative to the true model on an independent test set that consisted of a 10,000 run space filling design over the factor space.
  We looked at a large number of different classical single shot modeling approaches, as well as a number of variations on all the validation.
  But it's easier just to look at the most interesting of the results since there was too much to compare. The story was very consistent across all the situations that we investigated.
  This is one of those simulations where the base design was a definitive screening design, with eight factors and 21 runs. In this case 50% of the effects were non zero.
  This means that the model had as many non zero parameters as there were runs, just about.
  On the left are the box plots of the log root mean squared error for the Lasso, Dantzig selector, forward selection and pruned forward selection, all tuned using the AICc, the best performing at the methods.
  Also, the best performing of the methods from the fit definitive platform is next, followed by SVEM applied to a different modeling...several different modeling procedures over on the right.
  There is a dramatic reduction in independent test set root mean squared error, when we compare the test set RMSEs relative to the SVEM predictions.
  And note that this is on a log scale, so this is a pretty dramatic difference in predictive performance.
  Here's an even more interesting result. This was the plot that really blew me away. This is the same simulation setup, except here all of the effects are non zero.
  So the true models are supersaturated, not just in the design matrix but in the parameters themselves.
  In the past we have had to assume that the number of active effects was a small subset of all the effects and certainly smaller than the number of rows.
  Here we are getting reasonably accurate models despite the number of active parameters, being about two times the number of rows, which is really truly remarkable.
  Now I'm going to go through a quick illustration
  of SVEM to show how you can apply it to your own historical DOEs. I'm going to use a real example of a five factor DOE from a fermentation process, where the response of interest was product yield as measured by pDNA titer.
  The great advantage with SVEM is that we can get more accurate models with less runs. This means we can take existing data sets,
  use the new candidate set designer in JMP 16 to identify the subset of rows that form the quote unquote predictive heart of the original design.
  I'll take the original data which had 36 rows, load the five columns in as covariates and tell the custom designer to give me the 16 rows subset of the original design that gives the best predicted performance at that size.
  Then I'll hide and exclude all the rows that are not in that subset design, so that they can be later used as an independent test set.
  I'll run the SVEM model and compare the performance of the SVEM model to what you get if you apply a standard procedure like forsward selection, plus AICc fit to the same 16 runs.
  I do want to say that the new candidate set designer in JMP 16 is also a remarkable contribution and I just want to call out how incredible Brad's team has been in the creation of this new tool which is going to be profoundly useful in many ways, including with SVEM.
  So to do this, we can take our data set, go to the custom designer.
  We select covariate rows, so this is a new outline node in JMP 16.
  Then we select the five input factors.
  Click continue.
  And we're going to press the RSM button, which will automatically convert the optimality criteria to I-optimal or predictive type design.
  We select all the non intercept factors and right click in the model...
  select list here and create...set all the non intercept effects as if possible effects. This is what allows us to have models
  with more parameters in them than there are runs as our base design, as the design criteria will now be a Bayesian I-optimal design. Now we can set the number of runs equal to 16.
  Click make design.
  And once we've done that, the optimizer works its magic, and all the rows that are in the Bayesian I-optimal subset design are now selected in the original table.
  We can go to the table and then use an indicator or transform column to record the current row selection.
  And I went ahead and renamed the the column a meaningful name for later.
  We can select the points not in the subset design by selecting all the rows where the new indicator column is equal to zero.
  And then we can hide and exclude those rows so that they will not be included in our model fitting process.
  Phil will be demoing the SVEM add-in, so I'll skip the modeling steps and go right to a model comparison of forward selection plus AICc and the SVEM model on the rows not included in the Bayesian I-optimal subset design. So you can see that comparison here in the red...
  in the red outline part of the the report here. We see that SVEM has an R square on the
  independent test set of .5, and the classical procedure, it has an R square of .22, so we're getting basically twice the amount of variation explained with SVEM then the classical procedure.
  We can also compare the profile traces of the two models when we apply SVEM to all 46 observations.
  It's clear that we're getting the same model basically with when we apply SVEM to all 46 runs as we're getting with just 16 runs under under SVEM.
  But the forward selection based model is missing curvature terms, not to miss...not to mention there's a lot of interactions that are missing also. This is a fairly procedure...fairly simple procedure that you can use to test SVEM out on your own historical data.
  So overall SVEM
  raises a lot of questions. Many of them are centered around the role of degrees of freedom in design experiments as we are now able to fit models where there are more parameters than rows, meaning that there would effectively be negative degrees of freedom, if there were such a thing.
  I think this will cause us to reconsider the role of P-value based thinking in industrial experimentation.
  We're going to have to work to establish new methods for uncertainty in analysis, like confidence intervals on predictions.
  Phil and I are doing some work on on trying to understand what is the best family of base models that we can work from. So this could be quadratic RSMs and we're also looking at these partial cubic models that had been proposed a long time ago, but now we believe are worthy of reconsidering.
  What kinds of designs should we use? What sample sizes are appropriate as a function of the number of factors in the base model that we're using? What is the role of screening experiments? And one big unknown is what is the role of blocking and split plot experimentation
  in this framework?
  So now I'm going to hand it over to Phil. He's going to do a more in-depth case study and also demo Predictum's SVEM add-in. Take it away, Phil.
Philip Ramsey Okay, thank you, Chris. And what I'm going to do is discuss a case study, so let me put this in
  slideshow and I'll do some
  illustration, as Chris said, of the Predictum
  add-in that actually automates a lot of the SVEM analysis. And what we're trying to look at is an analytical method used in the biotech industry.
  And this one is for characterizing the glycoprofiling of therapeutic proteins, basically proteins to become antibodies.
  And many of you who work in that industry know glycoproteins are really a very rich source of therapeutics and you also know that if you work to see GMP, you have to demonstrate the reliability of the measurement systems. And actually for glycoprofiling,
  fast, easy to use analytical methods are not really fully developed. So the idea of this experiment was to come up with a fairly quick and easy method that people could use that would give them an accurate assessment of the different
  (I am not chemist here) sugars that have been
  attached to the base protein post transcription.
  And to give you an idea, the idea of using chromatography and I'm going to assume you have some familiarity with it, basically it's a method where you
  have some solution, you run it through a column, and then, as the different chemical species go through the column,
  they tend to separate and come out of the column at different times. So basically a form, what is called a chromatagram, where the peak
  is a function of concentration and the time at which the peak occurs is actually important to identifying what the species actually was. So in this case, we're going to look at a number of
  sugars. I'm simply going to call them glycoforms, and what is going on here is the scientists who did this work developed a calibration solution.
  And we know exactly what's in the calibration solution. We know exactly what peaks should elute and roughly where.
  And then we can compare an actual human antibody sample to the calibration sample. And some of these glycoforms are charged. They're difficult to get them through the column.
  They tend to stick to it, so we're using what is sometimes called a gradient elution procedure.
  And I won't get into the details. And what we're doing here, we're using something called high performance anion exchange chromatography. I'm not an expert on it, but the scientist I've worked with has done
  a very good job of developing this calibration, and the reason we need a calibration solution that historically
  the sugars that elute from a human antibody sample are not entirely predictable as to where they're going to come out of a column, so we have something that we can use for calibration.
  The person who did the work, and I'm going to mention her in a moment, designed two separate experiments. One is a 16-run three-level design.
  And then later she came back and did a bigger 28-run, three-level design. And so one could be used as a training set and the other is a validation set, but to demonstrate the covariate selector and Bayesian I-optimal strategy that Chris talked about,
  both designs were combined into a single 44-run design. We then
  took that design into custom design with the covariate selector and then created various Bayesian I-optimal designs.
  And we could have done more combinations, but we only have so much time to talk about this. So there are three designs. One has 10 runs, remember the five factors and we're doing 10 runs.
  A 13-run design and a 16-run design and again, we could do far more. So, what we want to do is see how these
  designs perform after we use fan to fit models and, by the way, as Chris mentioned, in each one of these designs, the runs that are not selected from the 44 are then used as a validation or test set to see how well the model performed.
  Okay, so these are the initial factors, the initial amount of sodium acetate, the initial amount of sodium hydroxide. And then there are three separate gradient elution steps that take place
  over the process time, which runs to roughly 40 minutes. So these are the settings that are being used and manipulated in the experiment.
  There are actually, in this experiment, 44 different responses one could look at. So, given the time we have available, I'm going to look at two.
  One is the retention time for glycoform 3, and this is the key, this is what anchors the calibration chromatogram.
  And we'd like it to come out at about eight and a half minutes because it aligns nicely with human antibody samples.
  It's actually fairly easy to model. The second response is the retention time of glycoform 10. It's a charged glycoform. It elutes very late and tends to be bunched up with other charged glycoforms and is harder to distinguish. OK, so those are the two responses I'll look at.
  And for those who aren't familiar with chromatography, here are a couple of what are called chromatograms. These are essentially the responses,
  pictures of what the responses look like and, in this case, if you take a look at the picture, we're going to look at the retention time, that is, when did
  Peak 3 come out? At what time and then, at what time did Peak 10 come out and show up in the chromatogram? And, by the way, notice between the two
  chromatograms I'm showing, how different the profiles are. So in this experiment when we manipulated those experimental factors
  (and by the way, we have 44 of these chromatograms; I'm showing you two of them), you see a lot of differences in the shapes and the retention times and so forth of these peaks and in resolution. So whatever we're doing, we are clearly manipulating
  the chromatograms, and for those of you who are curious, yes, we are thinking about this in the future as functional data and how functional data analysis might be used.
  But that is often the future but that's definitely something that we're very interested in. Chromatograms really are functional data in the end. But I'm just going to extract two features and try to model them. Okay. And I also want to give credit to
  Dr Eliza Yeung of Cytovance Biologics, a good friend and excellent scientist. And she did all the experimental work and she was the one who came up with the idea of constructing the...
  what she calls the glucose ladder, the calibration solution. And Eliza, besides being a nice person, she also loves JMP. So we like Eliza in many ways, okay. Excellent scientist. Okay, so
  before I get into
  SVEM, I just want to mention...Chris mentioned the full quadratic model that's been used since, what, 1950 as the basis of optimization.
  Basically, all the main effects, two factor interactions and squared terms. In point of fact, for a lot of complex physical systems, the kinetics are actually too complex for that model.
  And Cornell and Montgomery, in a really nice paper in 1998, pointed this out and suggested that in many cases,
  what they call the interaction model, we call it the partial cubic model, may work better. And in my experience, using SVEM and applying this model to a lot of case studies, they are right. It does give actually better predictive models, but there's a problem.
  These partial cubic models can be huge and they are a big challenge to traditional modeling strategies where supersaturated models, as Chris mentioned, are a big problem.
  How many potential terms are there? Well, take K square, two times that and add one for the intercept. So for five factors, the full
  partial cubic model would have 51 terms. By the way, I'm I'm going to use these models, but I'm only going to use 40 of the terms that are usually most important.
  And then we're going to use self-validating ensemble modeling to fit the models and, as Chris said, SVEM has no problem with supersaturated models.
  And by the way, in the machine learning literature, supersaturated models are fit all the time.
  And using the right machine learning techniques, you can actually...they can actually show you, you get better prediction performance.
  This is largely unknown in traditional statistics, okay. So we're going to use the SVEM add-in and I mentioned if you'd like to learn about it, you can contact Wayne Levin, who's already spoken, at Predictum and I'm sure Wayne would love to talk to you.
  So let's go over to JMP briefly.
  And I'll show you how the add-in actually works, so let me bring over JMP.
  So I've installed the add-in so it has its own tab. So I click on Predictum. By the way, this is one of the Bayesian I-optimal designs we created, and this one has 16 runs. Select self-validating ensemble modeling.
  So you see what looks like your typical fit model launch dialogue. I'm going to select my five factors.
  And I'm just going to do a response surface model at this point for illustration, and I want to model
  retention time for glycoform 3. Notice in the background, this setup, the autovalidation table, so there's a waiting function.
  And there's an autovalidation portion. This is all in the background, you don't need to see it. But the saw...but the add-in created it and then it basically hides it, but you can look at it, if you want, but it's hidden because it's not terribly important to see it.
  So now we open up GenReg, so this is the GenReg
  control panel. And for illustration, by the way, SVEM is really agnostic in terms of the method you use it with for model building. So, in fact, you could use methods that aren't even in GenReg, but that's our primary focus today. So we're going to do forward selection.
  And because we only have so much time I'm only going to do 10 reps.
  Click go and we'll create the model.
  So here is the output for only 10 reps, and you get an actual by predicted plot. But what's really nice about this, is it actually creates the ensemble model average for you.
  So I click on self-validating ensemble model, save prediction formula. So I'm going to go ahead and close this display.
  Come over.
  And there's my model.
  I can now take this model, and I can go ahead and apply it to the validation data. So we've got 16 runs that were selected, so there's another 28 available that can be used as a validation set to see how we actually did. So at this point I'm now going to go back to the
  presentation and, by the way, without the SVEM add-in, it can be a little bit difficult for, especially if you aren't particularly proficient in JMP scripting
  to actually create these models. So that add-in, it may not look like it did an awful lot of work in the background, okay. So how did we do? So let me put this back on slideshow.
  So what I did, there are three designs, a 16-run design, 13-run, and then I picked a 10 run. We really push the envelope.
  And there are two responses, as I said, glycoform 10 elutes late and tends to be noisy and and can be hard to resolve. So for the 16-run design, I actually fit a 40 term partial cubic model. Keep that in mind, 16 runs. And I fit a 40 term model.
  And then the key to this is the root average square error on the validation set, okay. So
  I fit my 40 term model, and then the
  root average square error on the validation set was low and the validation R squared was actually close to 98, so it fit very, very well.
  Notice, even for the much noisier
  glycoform 10, we still got a pretty good model. I'll show you some actual by predicted plots. So we did this for 13, once again for 13, we got very good results.
  And then finally for 10. By the way for 10, I just fit the standard full quadratic model, I felt like I was really pushing my luck.
  But I could have gone ahead and tried the partial cubic. And notice, once again, even with 10 runs, modeling glycoform 3 retention time, I got an R square .94.
  To me, that is really impressive and again I model had twice as many predictors as there were observations and even with the difficult to model glycoform 10, we did quite well.
  So, here are some actual by predicted plots and again I pick glycoform 10 because it's really hard to model.
  So I didn't want to game the system too much. I know from experience it's hard to model. So there is an actual by predicted plot. There's another actual by predicted plot for the 10-run design, it still does pretty good
  when you consider what we're trying to do here.
  And then finally, how did we do overall?
  So this is a little table I put together and I took the 16-run design as the baseline, and then what I'm doing is comparing what happened when we fit the 10-
  term model and the 13 term for retention time for G10.
  You'll notice that we got a rather large error for 13 and a much smaller one for 10. I don't know why. I chose to just show what the results were. It's a bit of a mystery, and I have not gone back to explore it.
  But for retention time three, notice I go from 16 down to 10. Yes, I got some increase in the root average square error, but you know if you look at the actual by predicted plot,
  it still does a good job of predicting on the validation set. So the point of this being, and this is, I know this is, I work a lot with biotech companies, the
  actual efficiencies you could potentially achieve in your experimental designs and I know for many of you these experiments are costly.
  And even more important, they take a lot of time. This really can shorten up your experimental time and reduce your experimental budget and get even more information.
  So just kind of a quick conclusion to this. This SVEM procedure and this, and as Chris showed, you we've investigated this in simulations and case studies.
  SVEM is great for supersaturated models. I won't get into it, but from machine learning practice supersaturated models are known to perform very well. They use all the time and deep learning as an example.
  Basically, as Chris says, SVEM is combining machine learning with design of experiments.
  And even more important, once you start going down the pathway of SVEM, that, in turn, informs how you think about experimenting.
  So basically using SVEM, we can start thinking about highly efficient experiments that can really speed up the pace of innovation and actually reduce the time and costs of experimentation. And with time to develop products and processes
  becoming...the lead times getting shorter and shorter, this is not a trivial point, and I know many of you in biotech are also faced with serious budget constraints.
  And then (by the way, most people are) and then finally one promising area is Bayseian I-optimal designs.
  And again, I'll mention that Brad Jones and his group have done a great job introducing these in JMP. It's the only software package I know of that actually does Bayesian I-optimal designs.
  And we think SVEM is going to open up the window to a lot of possibilities. So basically, that is my story and I'm going to turn it over to Wayne Levin.
Wayne Levin Thanks, very much for that,
  Phil and Chris. I hope everybody can see that there's an awful lot of work, a lot of research has gone into this, including some
  work in the in the trenches, you might say, and I've been in this game for over 30 years now, I don't mind telling you I'm very excited about what I'm seeing here.
  I think this can be really transformative, really changed industrial experiments are are conducted here. I was previously very excited by supersaturated designs alone and that was
  facilitated with a custom designer and now with...that's on the design side.
  of things.
  Now, when you bring SVEM on the analysis side, I mean those are two great tastes
  that go great together, you might say. So anyway, just to conclude, I'm just going to slide in here that
  Phil, myself and Marie Goddard, we're putting together an on-demand course (it should be available next quarter, like in April)
  on mixture designed experiments, and we're going to focus a good amount of time on SVEM in that course. So if that's of interest to you, please let me know or we can let you know or follow us on LinkedIn or one of those things.
  And for the SVEM add-in, again, if I may, I'm delighted we were first to market with this.
  We've been working hard on it for a number of months. We do a new release about every month, so it is evolving as we're getting feedback from the field, and I want to thank Phil very much
  for leading this. And he's been partnering with one of our developers (???) and it's really a really been a terrific effort, a lot of hard work
  to get to this point, and we want to make that available to you. And so as Phil mentioned earlier, there's really two ways for you to try this. One is just try the add-in itself,
  it works with 14 and 15. And with 16 coming out, I think we're going to have it working for 16 as well. It does require JMP Pro, so if you don't have JMP Pro, there's a link here, where you can apply get a JMP Pro trial license.
  And you can get this link, of course, by downloading the slides for this presentation. So that's how how you can get it, so this is one way you can work with it to try it yourself.
  Another way is just contact us and we can work with you on a proof of concept basis, all right.
  And we could do some comparative analyses and so on, and of course, review that analysis with you. So you know we'll put the necessary paperwork in place, and
  we don't mind trying it out. Wwe've done that for a good number of companies and that may be the easiest way for you, whatever whatever works for you. Okay, and for now, then what I'd like to do is
  open it up for any questions or comments at this time. I'll say on behalf of Chris and Phil, thanks for joining us. So okay, questions and comments, please.

By developing this area with a consistent effort, Gottwalt and Ramsey have been of tremendous service to the area of biomedical and biotechnology research, which are often plagued with small samples. Even though machine learning approaches have been applied to these areas, this effort will hopefully evolve as a game-changer. Thanks for getting this done and let us know of the exciting developments at JMP and Predictum. Intriguingly, this also represents the major academia-industry collaborative success in the data analysis area.


Very interesting!

Two questions:

1) The design with 12 runs used in the example that starts at 9:51 includes four pairs of replicates. Can the autovalidation algorithm take the pairs of replicates into account? For example, the pairs of replicates could be given (0,1), (1,0) weights, or (a, b), (b, a) weights. Would it make any difference? Of course, the "predictive heart" design with 16 rows in the pDNA example probably doesn't include replicates, although the whole design with 36 rows did.

2) In addition to finding the SVEM model, could we also exploit all these calculations to identify whether certain runs have a large influence on model selection or model parameters or prediction? We could, for example, run the process again at these conditions. It's natural to consider this option if an initial design with a small number of runs is used. More broadly, we may want to augment the initial design, but that is a separate issue.

thanks - looking forward to the discussion next Tuesday!


Nice innovative research!   The gold standard in the data science community for validating models is k-fold cross validation.   It would be interesting to see some fair comparisons between SVEM and k-fold in terms of out-of-sample performance.    Models from one or more k-fold runs can also be ensembled, and in fact this is what is done in the XGBoost Add-In for JMP Pro.   I’m imagining something along the following lines for the pDNA data:


  1. The outer loop consists of running DOE Custom Designer to extract a small targeted training subset, e.g. 16 out of the 46 as you did in the talk.    My understanding is this subset is stochastic and so an outer loop like this makes sense and also avoids potential biases due to a single holdout.
  2. For each iteration, completely set aside the non-training observations using Hide/Exclude or create a new response with missing values for the holdout data.   Run SVEM and k-fold ensembling on the training data and save important results like the prediction formulas and estimated generalization performance based on the training data alone.
  3. Apply formulas to the holdout set and collect performance stats, e.g. for continuous responses RSquare, root mean square error, and Pearson correlation.


An interesting side question is if any of the k-fold individual fits break down or perform poorly because of reduced size in the small training set—this kind of occurrence would strengthen the case for SVEM.   I’m also curious about how the in-sample estimates of validation performance compare to the out-of-sample ones obtained empirically as above. 


Another connection is that Custom DOE with covariates can also be used to create k-folds, but using the Y variable as the covariate instead of the Xs.  The JMP Pro XGBoost add-in comes with a supplemental routine Make K-Fold Columns for doing this.    This is a great way to optimize information gain across a set of k-fold partitions and it would be a recommended approach in the experiment above. 


Finally, I’m interested in trying the fractional weighting SVEM approach with XGBoost as the base model.   Looks like this is readily doable in a loop by the clever trick of reevaluating the formula in the column that defines the weights.

Great work and looking forward to further developments and applications of SVEM.


Hi Russ,

BTW, I love the XGBoost add-in.  I think SVEM may out perform K-fold because it gives you potentially hundreds or thousands of models to ensemble.  In our case we are focused on DoE, so it was not possible to use techniques like leave one out (actually always a bad idea) or k-fold.  For future work, I think you are correct that a comparison needs to be made and your basic outline of the process makes sense. For the pDNA data if we combine it then possibly we could do what you are suggesting, perhaps we should talk more about this and hopefully Chris Gotwalt will join the discussion

@eharoon Thanks - we are very excited about the results so far, and there is still a lot forthcoming from this industry-academia interaction. We hope to submit our paper for publication in the next week, it should be the first of three that we hope to get out by September.

Thanks for your questions interest. Here are my responses:

1)  Can the autovalidation algorithm take the pairs of replicates into account? We've not explored how to incorporate replicates, yet. It would be easy to do so, though. You would simply adjust the weighting scheme so that when one replicate is 'in the training set' the other is left out. There are so many interesting extensions and modifications that it has been hard in the research to maintain focus, strip it down to its essence, and get it submitted for publication.


2) Could we also exploit all these calculations to identify whether certain runs have a large influence on model selection or model parameters or prediction? Yes, an interesting diagnostic would be to look at scatterplots of the regression coefficients vs. the weights for a particular observation. If the correlation between a coefficient and the weight is large in absolute value, then that run is highly influential for that effect. This is another great idea we haven't had the time to pursue, yet!

There was a question about the role of p-values. Our sense is that a large portion of industrial experimentation is predictive in nature: maximizing yield, minimizing waste, making products that stay within specifications, etc. The tools that are used often include a lot about hypothesis testing to determine active factors, and assess modeling assumptions like normality and non-autocorrelated errors. We see SVEM as going directly from the data to the predictive model. Since we are no longer bound by degrees of freedom, determining 'significant' effects (in the hypothesis testing sense) may be less important in the years to come. The question shifts to "Which factors are important and which factors are ignorable?", allowing people to sidestep a lot of awkward statistical machinery from the min-20th century, and focus directly on solving the problem at hand. 

Hi @philramsey ,   Thanks for the reply.  Sorry I did not make it clear, I was referring to repeated k-fold cross validation, which is easily implemented in a DOE context and can also provide hundreds to thousands of models to ensemble depending on the design size and choice of k.   In most cases I've encountered a small number of repeats of k-fold works great, with typically less than a hundred total model runs.  


One initial easy test with current SVEM JSL code is to change the fractional weighting formula to convert the weights to either 0 or 1 based on some threshold that controls the holdout size.    This strictly speaking is no longer full k-fold, but is instead a series of random single holdouts.  I would guess this to still work okay and provide reasonable results.   More efficient would be to construct a collection of k-fold partitions set up to be orthogonal as is available in the Make K-Fold Columns utility that comes with the XGBoost add-in.   

As a side note, wanted to mention that the XGBoost add-in handles its Validation column differently from other JMP Pro platforms in order to easily accommodate repeated k-fold.  It also automatically ensembles all individual k-fold models when saving its prediction formula, just like the averaging of all of the prediction columns at the end of SVEM.  No duplication of rows is required.  


Repeated k-fold is well established in the literature and implemented in places like Python sklearn,model_selection.RepeatedKFold, the R package caret, and now also in the XGBoost add-in.  To be honest, I feel a fair and compelling comparison of SVEM to repeated k-fold is really needed if you are going to convince serious users to adopt it.


My current interpretation of SVEM is as a smoothed version of repeated k-fold.   Or as mentioned above, this can be flipped around to view k-fold as a discrete binning of the fractional weights.   My guess is that honestly-assessed out-of-sample performance will be similar in most cases, but would definitely like to see the comparisons.   


Hi @philramsey and @chris_gotwalt1. I recently installed and tried the Add-in for auto-validation from here which is useful for variable selection thru the use of the null factor idea. The Add-in presented here in this link seems to be more complete or comprehensive because of the model averaging. Is it a free Add-in?


Regardless, were the simulated parameters estimates from the auto-validation Add-in used for model averaging? If so, how can I get rid of the null factor parameter in the final averaged model?




Please can I try this add-in out?

@ahmedmohamed ,  Are you referring to the set-up add-in?  That can be found here. Add-in To Support Auto-Validation Workflow 

 I mean SVEM add-in please?

@ahmedmohamed,  There is no SVEM add-in developed.  We do have a simple script you can use from our Discovery talk that you can modify that will help with the SVEM analysis.  Re-Thinking the Design and Analysis of Experiments? (2021-EU-30MP-776) 


Anyone looking to download the SVEM add-in need only visit this page:


You'll see there a link to request a trial license.