Choose Language Hide Translation Bar
Experiments on Experiments, Models of Models

(NOTE: This is part three of three-part series on stochastic optimization.)

Over the last two weeks, I introduced robust process engineering and stochastic optimization – the effort to achieve good product in the face of variation among the factors. Last week, I gave a cooking example. This week, I present a solution to the optimization problem.

In-Silico Surrogate

The inspiration for the solution comes from the world of computer experiments, also called in-silico science. Suppose you want to build the optimal passenger jet. You have factors like wing length, wing pitch, engine size and body composition, and you have responses like fuel economy, passenger volume, noise and speed. You create an experimental design with 64 runs, and you are ready to go. No problem. Each plane will cost around \$85 million, so that makes the experiment cost around \$5.44 billion. Oops. Your experimental budget is just \$40,000. What do you do?

You don’t build those planes. You create computer models of them and run those computer models to determine the performance characteristics. The planes are flown in-silico. If the models are good, they will report responses that are reasonably close to the real values. So you could, theoretically, optimize the characteristics using these models.

But those in-silico models are expensive, too. Each run for the computational fluid dynamics could take hours on a supercomputer. So you develop what is called a surrogate model, or meta model. You make a space-filling experimental design that samples 100 or more factor combinations in the factor space. Then you carry out the runs on the supercomputer. Then you fit an interpolation model to those points, and now, instead of taking hours for each point in the visualization or optimization, the interpolation model takes a fraction of a second to evaluate.

We now have a three-stage model for a computer experiment:

Real World << Expensive Computer Model << Cheap Surrogate Model

So let's return to the stochastic optimization problem.

'Cooking' a Chemical Process

We needed to determine whether we should “cook” a chemical process “hot and fast” or “warm and slow.” If the factors could be fixed, the hot and fast settings would be best. But because the factors are subject to variation, we already know that “hot and fast” is not a good setting; the variation will cause about 4 percent of the batches to yield below the minimum of .55, and those batches will have to be discarded. So now we develop a surrogate model of the defect rate:

Chemical Process << Model + Variation Simulaton << Surrogate Model for Defects

Though our model of chemical yield is cheap and easy, the model of the defect rate in the presence of variation is not cheap and easy, and it is obtained through Monte Carlo simulation. We generate 10,000 runs of random factor data at given factor center settings; we then calculate the yield and then develop a defect rate based on the portion of the simulations that fall below the lower specification limit.

We already know defect rates for two points:

 Temperature Center Time Center Defect Rate 539.95 .1158 .04224 535 .1568 .01897

Remember that these are not fixed factor settings, but centers of the distribution of the factor settings, which have underlying variation with standard deviations of 1 and .03, respectively.

Those defect rates are just estimates based on simulation. If you do a new Monte Carlo simulation, you will get slightly different values.

The Surrogate Experiment

So now we need to systematically vary the centers of the temperature and time distributions according to a space-filling experimental design. We use space-filling experimental designs because we expect a complex surface, and we can afford to investigate that surface.

The workhorse space-filling design is the Latin Hypercube. These are easy to make. You just make an evenly spaced set of values for each factor and scramble them individually. The result will have a uniform distribution across each factor and at least a random joint distribution. The JMP Design of Experiments platform actually optimizes the scrambles to fill the space better. If the runs are computer experiments, you don’t have to worry about randomization and replication because there are no outside factors to randomize against.

The Profiler Simulator in JMP has a built-in feature called “Simulation Experiment” that makes all this very easy. It prompts you to enter the number of runs and to identify the portion of the factor space you want to investigate (around current settings), and it performs the simulations and estimates the defect rates. In our case, we will ask it to run a computer experiment in 80 runs across the whole factor space.

This is a lot of work. For each of 80 defect rate estimates, the software does 10,000 runs. Fortunately, computers are fast, so this takes less than a half a minute.

Here is how the space-filling design arranges the points and what the defect rates are at each point.

Now we need to model this defect surface. The emerging standard fitting technique for computer models is the Gaussian Process model. This model essentially calculates a weighted average of the neighboring points to predict each point on the surface. (Kriging and radial basis function neural nets are close relatives of Gaussian Process models.)

After we fit the surface, we now call the optimizer to find the minimum on this surface. Now we know that to minimize defects, we cook it warm (526 degrees) and slow (.287) -- the opposite of the optimum for fixed-factor settings, which was hot and fast. The log10 defect rate predicted is 10^-3.206, which is 0.000622, clearly much smaller than the 4 percent defect rate at the fixed-factor optimum.

This is a cross-section at the minimum of the surface that looks like this: Now let’s use simulation again to see whether the defect rate holds up to this prediction. The actual rate in this simulation is .0007. We have dropped our defect rate from 4 percent to .07 percent, which is one-sixtieth of the defects from the previous settings. How about the average yield? Before, the average yield was .602; now it is .595, a small sacrifice to pay for the decreased variation.

Conclusion

This new technique worked when previous techniques -- which involved finding the flats -- didn’t work. Not only did it work, but it also enabled us to build an understanding of the defect rate behavior as a separate response surface that can be visualized, as well as optimized. What about the older techniques?

If the variation is small relative to the curvature in the response surface, then local methods using the derivatives still work well.

If the variation is large enough to be affected by the curvature (second derivative) of the response surface, then you need to switch to simulation experiments.

With surrogate models, we now have a great new way to do stochastic optimization. Now we can tune our processes to be robust to variation in the factors, improving quality and reducing waste.

Article Tags
1 Comment
Community Member

Anonymous wrote:

Your series is fascinating but appears entirely focussed on conventional linear models. I have attempted to find the profiling features in JMP 7 within the GLM and survival analysis platforms without success. In frustration, I am turning to R and its local regression packages. Have I missed anything?

Your generation of a defect "rate" surface suggests that I might have. Your target parameter is probably more accurately called "log defect proportion". In epidemiologic parlance, a true rate would need to have "dimensions" of counts per cases_at_ risk per time_under_observation. Nonetheless, I wonder if there might be possibilities of examining the surface of estimated rates versus 2 chosen predictors (such as serum albumin and serum globulin) within regressions that are simultaneously controlling for other predictors (such as age and gender)?

I am also wondering if these methods allow estimation within a non-rectangular space defined as the region that has sufficient observations for estimation. Even with large data sets, the "corners"

of regins are often sparsely populated and events may be missing.

David Winsemius, MD, MPH

Heritage Labs