of Design of Experiments.
Really the idea of this is to take full advantage
of the wealth of different tools that are under the DOE platform in JMP.
I'll walk through what we should be thinking about
in those early stages of an experiment.
If you look at the DOE platform listing,
what you'll see is that there's a lot of different choices.
Within each choice, there's many more choices.
Within some of those, there's nested possibilities.
If you're an expert in design of experiments,
this wealth of possibilities really feels like such a wonderful set of tools.
I love all of the options that are available in JMP that allow me
to create the design that I really want for a particular experiment.
But if you're just getting started, then I think this set of possibilities
can feel a little bit intimidating and sometimes a bit overwhelming.
It may be a little bit like going
to a new kind of restaurant that you've never been to before.
Someone who's a seasoned visitor to those kinds of restaurants,
loves all the possibilities and the wealth of options on a big menu.
But if you're there for the first time,
it would be nice if someone guided you to the right set choices
so that you could make a good decision
for that first visit and have it be successful.
Here's what I'm planning on talking about today.
First, I think the key to a good experimental outcome is to really have
a clear sense of what the goal of the experiment is.
I'll talk through some different possibilities of common goals
for experiments that really help us hone in on what we're trying to accomplish
and what will indicate a success for that experiment.
Then I'll do a quick walk- through of some
of the more common choices of design of experiment choices in JMP,
and then I'll return to how do we interact with those dialog boxes that we get
when we've chosen a design for what factors to choose, the responses,
and the relationship between the inputs and the outputs.
That's where we're headed through all of this,
and I will say that the first and the third steps really need
a tremendous amount of subject matter expertise.
If you're going to be successful designing an experiment,
you really need to know as much as possible
about the framework under which you're doing that design.
We want to in fact incorporate subject matter expertise wherever possible
to make sure that we're in fact setting up the experiment to the best of our ability.
What are we trying to do?
I've listed here six common experimental objectives.
I think that sort of gives you,
a checklist if you like of different options of things
that you might be thinking of accomplishing with your experiment.
We might start with a pilot study,
so we're just interested in making sure
that we're going to get data of sufficient quality
for the experiments and answering the questions that we want to have.
We might be interested in exploration or screening.
We have a long list of factors,
and we want to figure out which one seem
to make a difference for our responses of interest,
and which ones don't seem particularly important.
We also might want to do some modeling.
Actually formalizing that relationship that we're seeing
between inputs and responses, and capturing it in a functional form.
Sometimes we don't get the level
of precision that we need, and so we need to do model refinement,
and so that might be a second experiment.
Then once we have a model, we want to use that to actually optimize.
How do we get our system to perform to the best of its capability for our needs.
Then lastly, there's a confirmation experiment
where we make that transition
from the controlled design of experiments environment
that we're often doing our preliminary data collection in
to production, and making sure we can translate
what we've seen in that first experiment into a production setting.
You can see from this progression that I've outlined here,
that we may actually have a series of small experiments that we want to connect.
We may start off with a pilot study to get the data quality right,
then we'll figure out which factors are important, then we'll want to model those,
then we'll want to use that model to optimize,
and then lastly translate those results
into the final implementation in production.
We can think of this sequentially
or for an individual experiment just tackling one of these objectives.
Now we have some framework for what
the goals of the experiment are and how to think about that,
we'll now transition to looking
at what some of the common choices are in JMP
and how they connect with different goals.
I'll open up the DOE tab in JMP, and you can see that we've got
the list of possibilities here where we've got the nested options
tucked underneath some of the main menu items that we have here.
The talk is only half hour, and so I won't be able to cover all of the tabs.
I've given a brief description of some
of the tabs that I won't have time to talk about.
Design Diagnostics is all about having a design or maybe several designs
and comparing and understanding the performance.
Sample Size Explorer is all about how big
should the experiment be and some tools to evaluate that.
Consumer Studies and Reliability Designs are really kind of specialized ones.
I'm setting those aside
for you to do a little research on your own about that.
In Consumer Studies, we're usually asking questions
of consumers, about what their priorities are, what features they like.
That tends to be a comparison between two options
and how they value those choices.
Reliability is all about how long our product will last.
That's a little bit different
than things that I'll talk about in the rest of the talk.
I'll start off with some of the C lassical Designs
or the general designs that we have that have been developed.
Then I'll finish with some of the JMP specific tools
that are much more flexible and adaptable to a broader range of situations.
I'll start with that bottom portion of the tab.
Here we are in JMP in the DOE tab, and I'm going to start with Classical.
You'll see that I'm tackling this
in a little bit different order than the list is presented by JMP.
I think those ones are presented by JMP
in their order of popularity, and I'm choosing to tackle them
more from principles about how they were developed.
In Classical Designs, a Full Factorial design
is looking at all combinations of all factors at all levels.
That works nicely
if we have a small-ish number of factors,
but it can in fact get a little bit out of control
if we have a large number of factors, but it's exploring
the entire set of possibilities very extensively.
The next one that I'll talk about is a Two-Level Screening design,
and essentially, what that's doing is it's choosing
a subset of the two factorial possibilities,
and it's a strategic subset
that allows us to explore the space, but keep the design size more manageable.
You'll notice that those first two possibilities
I've shown at two levels, and that's typical for screening designs.
Usually, we just want to get a simple picture
of what's happening between the input and the responses.
When we want to start modeling,
then a Response Surface Design typically allows for exploring curvature.
When we're modeling, three levels or sometimes more than three levels
can be a good way to understand curvature and also understand interactions between
the factors and how they impact the response.
Alright. T hat's three of the items under the Classic tab.
The other ones are Mixture Design.
Typically in all the other possibilities,
what we have is that we can vary
the individual factors separately from each other.
But in a Mixture Design where we're talking about the composition
or the proportion of the ingredients, they're interdependent.
If I increase the amount of one ingredient,
it probably reduces the proportion
of the other ingredients that are in that overall mixture.
A bit of a specialized one
when we're looking at putting together ingredients
into an overall mixture.
Taguchi Arrays, I've listed here as a kind of optimization,
and the optimization that they're interested in
is making our process robust.
Typically when we're in a production environment, we might have noise factors.
These are in fact, factors that we can control
in our experiment
but when we get to production, we're not able to control them.
Then we have a set of factors
that we can control both in the experiment and in production.
The goal of Taguchi Arrays is to look for a combination
of the controllable factors that gets us nice stable predictable
performance across the range of the noise factors.
You can see C1 here has a pretty horizontal line
which means it doesn't matter which level we are at
for the noise factor, we'll get a pretty consistent response.
Those are the classical options.
The next of the items on this JMP design tab
that I'll talk about are Definitive Screening Designs.
These are specialized designs that were developed at JMP,
and they are a blend of an exploration or screening design,
so a focus on a lot of two- level factor levels,
and modeling.
You can see with the blue dots,
we have some third levels, so a middle value for the factors
that allows us to get some curvature estimated as well.
It's a nice compact design
that's primarily about exploration and screening,
but it does give us the option
for an all in one chance to do some modeling as well.
That's very popular in a lot of different design scenarios.
The next tab is Special Purpose, and you can see there's quite a long list
of possibilities there,
and I'll hit some of the more popular ones
that I think show up in a lot of specialized situations.
A Covering Array is often used when we're trying to do testing of software.
A lot of times what causes problems
in software is when we have the combinations of factors.
This is a pretty small design
that's typical for Covering Arrays, so 13 runs,
and we're trying to understand things about 10 different factors.
What's nice about these Covering Arrays is that it gives us a way
to see all possibilities of, in this case, three different factors.
If I take two levels of each factor, a zero and a one,
there's eight different combinations for how I can combine those three factors.
All zeros, all ones, and then a mixture of zeros and ones.
I've highlighted those with eight different underlined,
what's really nice about these Covering Arrays
is whichever three factors I choose,
I will be able to find all eight of those combinations.
There's 10 choose three different combinations of those 3 factors
that I might be interested in,
and all of them have all of those possibilities represented.
That's a very small design
that allows us not so much estimation, but to check possibilities for problems
that we might encounter particularly in software.
Next, a very important category of Space Filling Designs.
Compared to the other options that I've talked about,
which are model- based,
this one just says, I maybe don't know what to expect in my input space.
Let me give even coverage throughout
the space that I've declared and just see what happens.
You can see that I have many more levels of each of the factors.
There's a lot of specialized choices in here, but they all have this same feel
of nice, even coverage throughout the inputs face.
I think these are often used
in computer experiments or in physical experiments
where we're just not sure what the response will look like.
I'll talk a little bit more about that
when we get to the decision making portion in Step 3 of the talk.
Next is MSA Design or a Measurement S ystem Analysis,
and this typically is associated with the Pilot Study.
Before I dive in and really start
to model things or do some screening, it's helpful to understand some basics
about the process and the quality of the data that I'm getting.
Here, I can sort of divide the variability that I'm seeing
in the responses and attribute it to the operator,
the measurement device, or the gage, and the parts themselves.
Sort of understand the breakdown of what's contributing to what I'm seeing.
That's very helpful before I launch into a more detailed study.
Finally, G roup Orthogonal Super saturated Designs
are in fact, really compact designs.
In this example, where you have six runs,
and we're trying to understand what's happening with seven different factors.
That may seem a little bit magical,
but it's a very aggressive screening tool that allows us to understand
what's happening with a lot of factors in a very small experiment.
It's important with these designs to not have a lot of factors.
If all seven factors are doing something,
and I only have six runs, I'll end up quite confused at the end.
But if I think two or three of them may be active,
this may be a very efficient way
to explore what's going on without spending too many resources.
Those are the start here ones that I've talked through a little bit.
Now I'm going to finish with these wonderful tools in JMP
that are more general and more flexible for different scenarios.
Custom Design, I think is just an amazing tool for its flexibility.
What's really nice in Custom Design
is that I have this wealth of different possibilities
for the kinds of factors that I can include.
Continuous factors, maybe I'll add in,
Discrete Numeric ones, and then also Categorical Factors.
I have a lot of different choices so I can put together the pieces,
and if I'm not sure what the design should look like
in that bottom portion of the list,
this gives JMP some control to help guide me to a good choice.
On the next page, I have the option about
whether I'm just interested in Main E ffects,
whether I want to add some two factor interactions,
and whether I want to build a Response S urface Model,
so more the modeling goal of the experiment.
This is sort of an easy way to build a design,
and I have flexibility here to specify whatever design size
I feel would be helpful and is within my budget to make a design
and the expertise of the JMP design team
are going to guide me to a sensible choice.
This is a great way if you're not sure about how to proceed,
but you're still making some key decisions
about what the goal of the experiment should look like.
Next, the Augment tab.
If you think back to what I've talked about for the Experimental Objectives,
you see that there's this connection between the stages.
Maybe I've done some exploring or screening,
and then I'd like to transition to modeling.
Well, this allows me to take an experiment
that I've already run and collected data for,
and then connect it to the Augment D esign, assign the roles of what's a response
and what's the factor, and then add in some additional runs.
There's some specialized ones here,
but if I choose the Augment portion,
that allows me to specify a new set of factors,
perhaps a subset of what I have or an additional factor
and then also what model I would now like to design for.
This is a flexible tool for connecting several sets of data together.
Lastly, Easy DOE is a great way to get started for your very first experiment.
It allows you to build sequentially
and it guides you through the seven different steps
of the entire experiment.
It'll allow us to design and define,
and so that's figuring out what the factors are,
what the levels are,
their general nature, then we can select what kind of model
makes the most sense for what we're trying to accomplish,
then progress all the way to actually running the experiment, entering the data,
doing the analysis and then generating results.
This is a wonderful progression that walks you all the way through
what am I trying to do?
To having some final results to be able to look at.
What I will say is that this is designed for a model- based approach.
What you'll see is that all of these
look like they're going to choose a polynomial form of the model.
That needs to make sense as a starting point.
But if that does make sense and it does in a lot of situations,
then this is a wonderful option.
Just to finish things up here, what are some of the other key questions
now that I have a goal I know a particular choice that I want to use in JMP,
what are some of the other key questions before I actually generate that design?
A whole category is about the factors.
We need to use our subject matter expertise
to figure out which factors we should be looking at.
If we have too long of a laundry list of factors,
then the experiment necessarily needs
to be quite large in order to understand all of them.
That's going to have an impact on how expensive our experiment will be.
If we have too few factors,
then we run the possibility of missing something important.
What type are they going to be?
We need to think about getting the right subset.
As I showed you in C ustom Design, we have quite a wide variety of different
types of roles for the different factors that we're looking at.
That's another set of choices.
How much can we manipulate the factors?
Are they naturally categorical, or are they continuous?
Then we need to think about the ranges or the values for each of those.
Let's go to DOE and Custom Design.
Then I'll just start off
and I'll have three different continuous factors.
What you can see is I can give a name
to each of the factors, but I also get to declare the range
that I want to experiment in for each of those factors.
A s you can imagine,
this has a critical role in the space that I'm actually going to explore.
I need to hone in on what's possible
and what I'm interested in to get those ranges right.
If I make the range too big,
then I may actually have a lot going on across the range of the input
and I may not be able to fully capture what's going on.
If I make the range too small, then I may miss the target location
and I may get a distorted view of the importance of that factor.
Here, this input actually has a lot going on for that response,
but if I sample in a very narrow range, it looks like it's not doing anything.
Lastly, if I'm in the wrong location, I may miss some features
and not be able to optimize the process for what I'm doing.
Again, the choice of which factors and the ranges,
relies a lot on having some fundamental understanding
about what we're trying to do and where we need to explore.
The next piece to talk about is
the relationship between inputs and responses.
I will say that one of the common mistakes
that I often see is that we run an experiment,
and then after the fact, people realize, oh, we should have collected this.
In textbooks, a lot of times,
it looks like there's a single response that we're interested in
and we run the experiment to just collect for that response.
In practice, I think most experiments have multiple responses
and so this is a key decision, is to make sure
before we collect that first data point
that we actually include the right set of responses
so that we can answer all of the questions from that one experiment.
Then we need to think about what we know about the relationship.
Is it likely to be smooth?
Is it going to be continuous in the range that we've selected?
How complicated are we expecting it to be?
A ll of these have an impact on the design that we're going to have.
A couple of common mistakes about the relationship is,
one, being a little too confident,
so we assume that we know too much about what's going to happen,
and then don't build in some protection against surprises.
Then also if we have multiple responses,
not designing for the most complicated relationship.
If one of them were interested in Main E ffects
and the other one we think there might be curvature,
we need to build the design so that it can estimate the curvature
because that's the more complicated relationship.
A first key decision that I think
is a little bit hidden in JMP is that we have to decide between model- based,
and that's usually sensible if we're confident
that our responses will be smooth and continuous,
and that we're not investigating too big of a region,
or should we do space filling?
Space filling can be a good safety net if we're not sure what to expect,
if we're exploring a large region,
or if we want to protect against surprises.
I'm pointing here on the last slide.
I have more details about that to a paper
that I wrote with a colleague, Dr. Lu Lu
at the University of South Florida,
where we talk about the implications
of that first fork in the road, how do we choose between model- based
and space filling, and what are the repercussions?
Then lastly, we need to think a little bit about constraints.
Our input region,
if we've declared some ranges for the different inputs,
that naturally seems like it's a square or a rectangle.
But in that region,
there may be some portions where we can't
get a response or we just don't care about what the responses look like.
Imagine if I am doing an experiment about baking and I'm varying the time
that the cookies are in the oven and the temperature of the oven.
I might know that the coolest temperature
for the shortest amount of time won't produce a baked cookie.
It'll still be raw, or it might be the hottest temperature
for the longest time will overcook the cookies.
I want to sort of chop off regions
of that space that aren't of interest or won't give me a reasonable thing.
In JMP, there's easy ways to specify constraints
to make the shape of that region match what you want.
The last thing is all about budget,
how big should my experiment be, and that's a function of the time
that I have available and the cost of the experiment.
In JMP, we jump to here.
Maybe I specify a response surface model,
you'll see that there's a new feature called Design Explorer,
which when I activate that,
it allows me with a single click of a button to generate multiple designs.
I can optimize for good estimation, so D or A-O ptimality,
or good prediction of the responses with I-O ptimality.
I can vary the size of the experiment and center points and replicates.
If I click Generate All Designs, it will generate a dozen or so designs,
which then I can compare and consider
and figure out which one makes the most sense.
I think understanding the budget, thinking of that as a constraint,
is an important consideration that we need to have.
To wrap things up, just a few helpful resources.
The first one is a JMP web page
that talks in a little more detail about the different kinds of designs.
It fills in a lot of the details that I wasn't able to talk about today
about those individual choices on the DOE tab.
The Model-B ased versus Space-F illing , that's the paper I referenced earlier,
where we need to understand
the implications of choosing a model- based design
or doing space- filling, which is a little more general
and a little more protective if we are expecting some surprises.
Then the last two things are, two White Papers that I wrote,
the first one talks about how you can use Design Explorer
to consider different design sizes
and different optimality criteria and then choose between
the different choices by looking at the compare design option in JMP.
Then lastly, everything I've talked about here
is dependent on subject matter expertise.
The why and how of asking good questions,
give some strategies for how to interact with our subject matter experts
to be able to target those conversations and make them as productive as possible.
I hope this has been helpful,
and will help you have a successful first experiment using JMP software.
Thanks.