Transcript |
Hello Chris Gotwalt here. |
Today, we're going to be |
constructing the history of |
graphic paradoxes and oh wait, |
wrong topic. Actually we're going |
to be talking about candidate |
set designs, tailoring DOE |
constraints to the problem. |
So industrial experimentation |
for product and process |
improvement has a long history |
with many threads that I admit I |
only know a tiny sliver |
of. The idea of using observation |
for product and process |
innovation is as old as humanity |
itself. It received renewed |
focus during the Renaissance and |
Scientific Revolution. During the |
subsequent Industrial |
Revolution, science and industry |
began to operate more and more |
in lockstep. In the early 20th |
century, Edison's lab was an |
industrial innovation on a |
factory scale, but it was done |
to my knowledge, outside of |
modern experimental traditions. |
Not long after RA Fisher |
introduced concepts like |
blocking and randomization, his |
associate and then son in law, |
George Box, developed what is |
now probably the dominant |
paradigm in design of |
experiments, with the most |
popular book being Statistics |
for Experimenters by Box, |
Hunter and Hunter. |
The method described in Box, Hunter |
and Hunter are what I call the |
taxonomical approach to design. |
So suppose you have a product |
or process you want to improve. |
You think through the things you |
can change. The knobs can turn |
like temperature, pressure, |
time, ingredients you can use or |
processing methods that you can |
use. These these things become |
your factors. Then you think |
about whether they are |
continuous or nominal, and if |
they are nominal, how many |
levels they take or the range |
you're willing to vary them. If a |
factor is continuous, then you |
figure out the name of the |
design that most easily matches |
up to the problem and resources |
that you...that fits your budget. |
That design will have... |
will have a name like a Box |
Behnken design, a fractional |
factorial, or a central |
composite signs, or possibly |
something like a Taguchi array. There will |
be restrictions on the numbers |
of runs, the level...the numbers |
of levels of categorical |
factors, and so on, so there |
will be some shoehorning the |
problem at hand into the design |
that you can find. For example, |
factors in the BHH |
approach, Box Hunter and Hunter |
approach, often need to be |
whittled down to two or three |
unique values or levels. |
Despite its limitations, the |
taxonomical approach has been |
fantastically successful. |
Over time, of course, some |
people have asked if we could |
still do better. |
And by better we mean to ask |
ourselves, how do we design our |
study to obtain the highest |
quality information pertinent to |
the goals of the improvement |
project? This line of |
questioning lead ultimately to |
optimal design. Optimal design is |
an academic research area. It was |
started in parallel with the Box |
school in the '50s and '60s, but |
for various reasons remained out |
of the mainstream of industrial |
experimentations, until the |
custom designer and JMP. |
The philosophy of the custom |
designer is that you describe |
the problem to the software. It |
then returns you the best design |
for your budgeted number of |
runs. You start out by declaring |
your responses along with their |
goals, like minimize, maximize, |
or match target, and then you |
describe the kinds of factors |
you have, continuous, categorical |
mixture, etc. Categorical |
factors can have any number of |
levels. You give it a model that |
you want to fit to the resulting |
data. The model assumes at least |
squares analysis and consists of |
main effects and interactions in |
polynomial terms. The custom |
designer make some default |
assumptions about the nature |
of your goal, such as whether |
you're interested in screening |
or prediction, which is |
reflected in the optimality |
criterion that is used. The |
defaults can be overridden |
with a red triangle menu |
option if you are wanting to |
do something different from |
what the software intends. |
The workflow in most |
applications is to set up |
the model. |
Then you choose your budget, |
click make design. Once that |
happens, JMP uses a mixed, |
continuous and categorical |
optimization algorithm, solving |
for the number of factors times |
the number of rows terms. |
Then you get your design data |
table with everything you need |
except the response data. This |
is a great workflow as the |
factors are able to be varied |
independent from one another. |
What if you can't? What if |
there are constraints? What |
if the value of some factors |
determine the possible ranges |
of other factors? |
Well then you can do....then |
you can define some factor |
constraints or use it |
disallowed combinations |
filter. |
Unfortunately, while these |
are powerful tools for |
constraining experimental |
regions, it can still be very |
difficult to characterize |
constraints using these. |
Brad Jones' DOE team, Ryan Lekivetz, |
Joseph Morgan and Caleb |
King have added an |
extraordinarily useful new |
feature that makes handling |
constraints vastly easier in |
JMP 16. These are called |
candidate or covariate runs. |
What you can do is, off on your |
own, create a table of all |
possible combinations of |
factor settings that you want |
the custom designer to |
consider. Then load them up |
here and those will be the |
only combinations of factor |
settings that the designer |
will... |
will look at. The original |
table, which I call a |
candidate table, is like a |
menu factor settings for |
the custom designer. |
This gives JMP users an |
incredible level of control over |
their designs. What I'm going to |
do today is go over several |
examples to show how you can use |
this to make the custom |
designer fulfill its potential |
as a tool that tailors the |
design to the problem at hand. |
Before I do that, I'm going to |
get off topic for a moment and |
point out that in the JMP Pro |
version of the custom designer, |
there's now a capability that |
allows you to declare limits of |
detection at design time. |
If you want a non missing values |
for the limits here the custom |
designer will add a column |
property that informs the |
generalized regression platform |
of the detection limits and it |
will then automatically get the |
analysis correct. This leads to |
dramatically higher power to |
detect effects and much lower |
bias in predictions, but that's |
a topic for another talk. |
Here are a bunch of applications |
that I can think of for the |
candidate set designer. The |
simplest is when ranges of a |
continuous factor depend on the |
level of one or more categorical |
factors. Another example is when |
we can't control the range of |
factors completely |
independently, but the |
constraints are hard to write |
down. There are two methods we |
can use for this. One is using |
historical process data as a |
candidate set, and then the |
other one is what I call filter |
designs where you create...design |
a giant initial data set using |
random numbers or a space |
filling design and then use row |
selections in scatter plots |
to pick off the points that |
don't satisfy the constraints. |
There's also the ability to |
really highly customize mixture |
problems, especially situations |
where you've got multilayer |
mixturing. This isn't |
something that I'm going to be |
able to talk about today, but in |
the future this is something |
that you should be looking to be |
able to do with this candidate |
set designer. You can also do |
nonlinear constraints with the |
filtering method, the same |
ways you can do other kinds of |
constraints. It's it's very |
simple and I'll have a quick |
example at the very end |
illustrating this. |
So let's consider our first |
example. Suppose you want to |
match a target response in an |
investigation of two factors. |
One is equipped...an equipment |
supplier, of which there are two |
levels and the other one is the |
temperature of the device. The |
two different suppliers have |
different ranges of operating |
temperatures. Supplier A's is more |
narrow of the two, going from |
150 to 170 degrees Celsius. |
But it's controllable to a |
finer level of resolution of |
about 5 degrees. Supplier B |
has a wider operating range |
going from 140 to 180 degrees |
Celsius, but is only |
controllable to 10 degrees |
Celsius. Suppose we want to do |
a 12 run design to find the |
optimal combination of these |
two factors. |
We enumerate all possible |
combinations of the two |
factors in 10 runs in the |
table here, just creating |
this manually ourselves. |
So here's the five possible |
values of machine type A's |
temperature settings. And then |
down here are the five possible |
values of Type B's temperature |
settings. We want the best |
design in 12 runs, which exceeds |
the number of rows in the |
candidate table. This isn't a |
problem in theory, but I |
recommend creating a copy of the |
candidate set just in case so |
that the number of runs that |
your candidate table has exceeds |
the number that you're looking |
for in the design. |
Then we go to the custom |
designer. |
Push select covariate |
factors button. |
Select the columns that we want |
loaded as candidate design |
factors. Now the candidate |
design is loaded and shown. |
Let's add the interaction effect, |
as well as the quadratic effect |
of temperature. Now we're at the |
final step before creating the |
design. I want to explain the |
two options you see in the |
design generation outline node. |
The first one, which will force |
in all the rows that are |
selected in the original table |
or in the listing of the |
candidates in the custom |
designer. So if you have |
checkpoints that are unlikely to |
be favored by the optimality |
criterion and want to force them |
into into the design, you can |
use this option. It's a little |
like taking those same rows and |
creating an augmented design |
based on just them, except that |
you are controlling the possible |
combinations of the factors in |
the additional rows. |
The second option, which I'm |
checking here on purpose, allows |
the candidate rows to be chosen |
more than once. This will give |
you optimally chosen |
replications and is probably a |
good idea if you're about to run |
a physical experiment. If, on |
the other hand, you are using an |
optimal subset of rows to find |
to try in a fancy new machine |
learning algorithm like SVEM, a |
topic of one of my other talks |
at the March Discovery |
Conference. You would not want |
to check this option if that was |
the case. Basically, if you |
don't have all of your response |
values already, I would check |
this box and if you already have |
the response values, then don't. |
Reset the sample size to 12 and |
click make design. The candidate |
design in all its glory will |
appear just like any other |
design made by the custom |
designer. As we see in the |
middle JMP window, JMP also |
selects the rows in the original |
table chosen by the candidate |
design algorithm. Note that 10 |
not 12 rows were selected. |
On the right we see the new |
design table, the rightmost |
column in the table indicates |
the row of origin for that |
run. Notice that original rows |
11 and 15 were chosen twice |
and are replicates. |
Here is a histogram view of the |
design. You can see that the |
different values of temperature |
were chosen by the candidate set |
algorithm for different machine |
types. Overall, this design is |
nicely balanced, but we don't |
have 3 levels of temperature in |
machine type A. Fortunately, |
we can select the rows we |
want forced into the design |
to ensure that we have 3 |
levels of temperature for |
both machine types. |
Just select the row you want |
forced into the design in the |
covariate table. Check include |
all selected covariant rows into |
the design option. |
And then if you go through |
all of that, you will see |
that now both levels of |
machine have at least three |
levels of temperature in the |
design. So the first design |
we created is on the left |
and the new design forcing |
there to be 3 levels of |
machine type A's |
temperature settings is over |
here to the right. |
My second example is based on a |
real data set from a |
metallurgical manufacturing |
process. The company wants to |
control the amount of shrinkage |
during the sintering step. They |
have a lot of historical data |
and have applied machine |
learning models to predict |
shrinkage and so have some idea |
what the key factors are. |
However, to actually optimize |
the process, you should really |
do a designed experiment. |
As Laura Castro-Schilo |
once pointed... |
As Laura Castro-Schilo once |
told me, causality is a |
property not of the data, but |
if the data generating |
mechanism, and as George Box |
says on the inside cover of |
Statistics for Experimenters, |
to find out what happens when |
you change something, it is |
necessary to change it. |
Now, although we can't use the |
historical data to prove |
causality, there is |
essential information about |
what combinations of factors |
are possible that we can use |
in the design. |
We first have to separate the |
columns in the table that |
represent controllable factors |
from the ones that are more |
passive sensor measurements |
or drive quantities that |
cannot be controlled directly. |
A glance at the scatter plot of |
the potential continuous factors |
indicates that there are |
implicit constraints that could |
be difficult to characterize as |
linear constraints or disallowed |
combinations. However, these |
represent a sample of the |
possible combinations that |
can be used with the |
candidate designer quite |
easily. |
To do this, we bring up the |
custom designer. Set up the |
response. I like to load up some |
covariate factors. Select the |
columns that we can control as |
factor...DOE factors and |
click OK. Now we've got them |
loaded. Let's set up a quadratic |
response surface model as our |
base model. Then select all of |
the model terms except the |
intercept. Then do a control |
plus right click and convert |
all those terms into if |
possible effects. This, in |
combination with response |
surface model chosen, means |
that we will be creating a |
Bayesian I-optimal candidate |
set design. |
Check the box that allows for |
optimally chosen replicates. |
Enter the sample size. |
It then creates the design |
for us. If we look at the |
distribution of the factors, |
we see that it is tried hard |
to pursue greater balance. |
On the left, we have a |
scatterplot matrix of the |
continuous factors from the |
original data and on the right |
is the hundred row design. We can see |
that in the sintering |
temperature, we have some |
potential outliers at 1220. |
One would want to make sure that |
those are real values. In |
general, you're going to need to |
make sure that the input |
candidate set it's clear of |
outliars and of missing values |
before using it as a candidate |
set design. In my talk with Ron |
Kennet this...in the March |
2021 Discovery conference, |
I briefly demo how you can use |
the outlier and missing value |
screening platforms to remove |
the outliers and replace the |
missing values so that you could |
use them at a subsequent |
stage like this. |
Now suppose we have a problem |
similar to the first example, |
where there are two machine |
types, but now we have |
temperature and pressure as |
factors, and we know that |
temperature and pressure cannot |
vary independently and that the |
nature of that dependence |
changes between machine types. |
We can create an initial space |
filling design and use the |
data filter to remove the |
infeasible combinations of |
factors setting separately for |
each machine type. Then we can |
use the candidate set designer |
to find the most efficient |
design for this situation. |
So now I've been through this, |
so now I've created my space |
filling design. It's got 1,000 |
runs and I can bring up the |
global data filter on it and |
use it to shave off different |
combinations of temperature |
and pressure so that we can |
have separate constraints by |
machine type. |
So I use the Lasso |
tool to cut off |
a corner in machine B. |
And I go back and I cut off |
another corner in machine B so |
machine B is the machine that has |
kind of a wider operating region |
in temperature and pressure. |
Then we switch over to machine |
A. And we're just going to use |
the Lasso tool |
to shave off |
the points that are outside its |
operating region. And we see |
that its operating region is a |
lot narrower than Machine A's. |
And here's our combined design. |
From there we can load that back |
up into the custom designer. |
Put an RSM model there, then |
set our number of runs to 32, |
allowing coviariate rows to be |
repeated. And it'll crank |
through. Once it's done that, it |
selects all the points that were |
chosen by the candidate set |
designer. And here we can see |
the points that were chosen. |
They've been highlighted and the |
original set of candidate points |
that were not selected are are |
are gray. |
We can bring up the new design |
in Fit Y by X and we can |
see a scatterplot where we see |
that the |
the the machine A design points |
are in red. They're in the interior |
of the space, and then the Type |
B runs are in blue. It had the |
wider operating region and |
that's how we see these points |
out here, further out for it. So |
we have quickly achieved a |
design with linear constraints |
that change with a categorical |
factor without going the |
annoying process of deriving the |
linear combination coefficients. |
We've simply used basic JMP 101 |
visualization and filtering |
tools. This idea generalizes |
to other nonlinear |
constraints and other complex |
situations fairly easily. |
So now we're going to use |
filtering and multivariate to |
set up a very unique new type of |
design that I assure you you |
have never seen before. |
Go to the |
Lasso tool. We're going |
to cut out a very unusual |
constraint. |
And we're going to invert |
selection. |
We're going to delete those rows. |
Then we can speed this up a |
little bit. We can go through |
and do the same thing |
for other combinations of X1 |
and the other variables. |
Carving out a very |
unusual shaped candidate set. |
We can load this up into |
the custom designer. Same |
thing as before. Bring our |
columns in as covariates, |
set up a design with all... |
all high order interactions made |
if possible, with a hundred |
runs. And now we see |
our design for this very |
unusual constrained region |
that is optimal given these |
constraints. |
So I'll leave you with this |
image. I'm very excited to |
hear what you were able to do |
with the new candidate set |
designer. Hats off to the DOE |
team for adding this |
surprisingly useful and |
flexible new feature. Thank |
you. |