A Surprising Use of the Fit Definitive Screening Platform (2022-EU-45MP-1066)

6 Kudos

Invited Paper Winner

Bradley Jones, JMP Distinguished Research Fellow, JMP

As is evident by its name, the original intended use of the Fit Definitive Screening platform was to analyze Definitive Screening Designs (DSDs). The surprise is that this platform can analyze a much broader class of designs than just DSDs. It turns out DSDs are a very special kind of Foldover Design, which are a standard textbook design used for factor screening. All that's needed for Fit Definitive Screening platform to do its innovative analysis is a Foldover Design. This talk demonstrates how to make Foldover Designs using the Custom Design tool and then analyze them using the Fit Definitive Screening platform. Several examples illustrate this two-step procedure and the analytical results are compared with more standard approaches that ignore the structure of the design.

Hello. My name is Bradley Jones.

I'm the manager of the JMP DOE Group, and what I want to talk to you about today is

a surprising use of the Fit Definitive Screening Platform.

If you haven't done a Definitive Screening D esign

and analyzed it using this platform,

then you wouldn't know exactly where the platform was.

I'll show you that.

But what I'm going to show you is that you don't have to use

the definitive screening fitting platform just to fit definitive screenings.

It can fit other stuff as well, and that's a surprise.

I'll start out by talking about what the main idea of this presentation is,

and then I'll review how Fit Definitive Screening works.

Of course,

I'm going to show you by hand, but the platform does all the work for you,

so you don't really have to ever do all this tedious stuff.

And then I'll have a couple of examples of using Fit Definitive Screening

to analyze other designs that are not definitive screening designs.

And I'll make some recommendations at the end.

To start out, here's a definitive screening design.

And if you look at the first pair of runs at the top, you can see that

each value here is plus or minus one, and each value here is minus or plus one.

W hat it's trying to show there is that whatever value the top number has,

the bottom number has the opposite value.

If this is plus one, then that will be minus one.

If this is minus one, then this will be plus one.

And the fact that all six pairs of designs for this example

are mirror images like this

means that the Definitive Screening Design is a Fold over design.

Let's think about what Fold over designs are

and what it means to have a Fold over design in terms of properties.

For any Fold over design,

the main effects and two- factor interactions are uncorrelated,

and that means that they're statistically independent.

Orthogonal Fold over designs exist for every multiple of eight runs.

However, orthogonal main effects are not as important

as the orthogonality of main effects in two- factor interactions.

You may choose to allow for some non- orthogonality in main effects

in order to get this nice property that main effects

are not correlated with two- factor interactions,

which means that if you have active two- factor interactions,

they won't bias the estimates of any main effects.

I want to show you how to make,

or talk about how to make a Fold over design in JMP,

and you can do this in the Custom Designer.

You open the Custom Designer.

You add two- level Categoricals or Continuous factors.

By default, Continuous factors are always two levels.

And then you choose a model that only has main effects, which is Default, again.

And then you can choose a number of runs that's a multiple of two

where the multiple of two has to be at least as big as

the number of factors times two.

Then in the red triangle menu on the Custom Designer,

you go to Optimality Criterion, and look at the sub menu

and choose Make Alias Optimal Design from the red triangle menu.

Then you're done. All you have to do is click Make Design.

And then after you see the design, you can check that the Alias Matrix

contains only zeros for the main effects and two- factor interactions.

If the number of runs in your design is not a multiple of eight,

then you may see some correlations between two- factor interactions and the intercept.

But that intercept estimate isn't really important for screening.

Let me give you a JMP D emo of that process.

The first example here is how to make a Fold over.

And I'm going to create a six- factor custom design.

And in this design, I have factors A through F.

And you can see that the A through F factors are the only things in the model

that's the Default.

Now I'm going to the red triangle menu

and I choose Alias Optimal, which is the last choice here.

And then if I say, "Well, let's do 16 runs instead of 12," and Make Design,

it's going off and it's going to compute

the Alias Optimal design for six factors and 16 runs.

Now I'm going to check by looking at the Alias matrix,

and you can see that everything in the Alias matrix is zero.

Another thing I can do is look at the color map and the correlations,

and I can see that the main effects are all white, and the rectangular area

showing main effects and two- factor interactions is also all white,

which means that this design is, in fact, a Foldover design.

And I could use it to do a screening experiment.

Let me go back to my slides.

All that the Definitive Screen Design platform does

is check that the design is a Foldover.

It doesn't actually require that what you have in the table that is current

is actually a DSD.

You can use Fit DSD to analyze any Fold over design.

And that's the surprise.

And that's the main idea of this is to show you not first,

that you can create Fold over designs very simply using the Custom Designer,

and that if you have a Fold over design,

you can use Fit Definitive Screen Design to analyze the data.

It turns out that since main effects and two- factor interactions

are orthogonal to each other in a Foldover design,

you can split the response that you observed into two new responses.

One response you use for identifying main effects,

and you could call it YME.

And the other response you can use to identify two- factor interactions,

and you could call that Y2FI .

And it turns out, because

the main effects are orthogonal to the two- factor interactions,

the two columns that you create doing this will be orthogonal to each other.

The way you would do that is you can fit the main effects model with no intercept

and save the predicted values of that model.

And then after doing that, and you can call that column YME,

and then the next thing you do is save the residuals from that fit,

and those residuals are in the space of the two- factor interactions.

Now, these two actions are unnecessary,

but if you wanted to know what's behind the screen,

then this allows you to do all these actions for yourself if you want.

But, of course, you can use the Fit Definitive Screening platform,

and it's doing this behind the scenes.

Let me make a small digression.

I think it's valuable to use the Model Heredity Assumption,

which is that, generally speaking,

two- factor interactions are much more probable to occur

if both the main effects that compose that interactions are active themselves.

For example,

if factor A and factor B main effects are both active,

then you might want to consider fitting the two- factor interaction of A B.

Now this is not a physical law.

Nothing makes it absolutely necessary that this hold.

And yet empirical evidence has shown that

such models are much more likely than having

interactions be active when the main effects are not active.

Now, everybody who's done a lot of experiments has counter examples to this.

All I'm saying is those counter examples are comparatively rare.

Now why would you make this assumption?

Well, here's the reason.

If you use the Heredity Assumption,

you can have the set of possible models being much smaller

than if you don't make this assumption.

In the example that I showed you earlier where I had factors A through F,

let's suppose that it turned out that only factors C, D, and F were active,

then you would only consider the three two- factor interactions, CD, CF, and DF.

And since there are three of these interactions,

there are two to the third possible models,

one of which has no interactions,

and you have three with one interaction,

three with two interactions, and one with all three interactions.

However, if we wanted to look at all the two- factor interactions

among all the factors A through F,

then there are six choose two or 15 possible two- factor interactions,

which means that there are two to the 15th or more than 32,000 possible models.

And sifting through all of those models is a much harder model selection problem.

If you can rely on the Heredity Assumption,

you can save yourself a lot of work

and also a lot of ambiguity in making your model selections.

Now going back to how you do this,

you can form the two- factor interactions involving the active main effects

and then do step wise regression up to the point where

the mean squared error of the model that you have is relatively small.

If you have an estimate of a Sigma squared,

if they're roughly comparable, then that is the time to stop.

But, of course, this still is not necessary t o do by hand

because Fit Definitive Screening is going to do it for you.

Let me show you a couple of examples of this process

and I'll start out with an example from Doug Montgomery's

Design and Analysis of Experiments textbook, the eighth edition.

And he did this as first running a resolution three fractional factorial

with seven factors and eight runs.

Let me show you that design and how that is analyzed.

Here's a resolution three design,

and if I do just Fit Screening,

what you see here is that our B, D, and A are the active effects

and maybe G is marginally active but small compared to the effects of B, D, and A.

But let's evaluate this design.

I just click the Evaluate Design script in the table.

And if I look at the Alias matrix,

you can see that factor A is confounded with the BD interaction.

You could also learn the same thing by looking at the color map.

And you can see that the correlation between A and the BD interaction is one,

which means that I don't know whether what I'm seeing

is actually the main effect of A

or the two- factor interaction of B and D or any linear combination of those.

What I have is an ambiguity

and I need to make some more runs in order to resolve that ambiguity.

Going back to my example,

what then happens in the textbook is that the design is folded over.

Now instead of eight runs, there are 16 runs.

And let me show you that example.

Here's the folded over design.

If I do Evaluate Design and then look at the Alias matrix,

I can see that the Alias matrix is identically zero for every possible value.

And I can learn the same thing by going to the color map on correlations.

And I see that the main effects are orthogonal

and all the main effects are orthogonal to all the two- factor interactions.

I now know that I have a Fold over design and that

my main effects are not going to be biased by two- factor interactions.

I have data for the time it takes for the eye to focus.

And if I click on this script, I see the result of having done the Fold over.

And what I see is first that B and D are the two main effects, that's as before,

except that A is no longer there because, guess what?

The BD interaction is massively significant.

And the true model is BD and the BD interaction.

And now we can run this model and we can see that

first our actual by- prediction plot and our residuals all look good.

And then playing with the profiler, we can see that

as I move B from one end to the other,

the slope of the line, the prediction line of the effect of D on time changes.

And that's the nature of interactions.

When you have an interaction,

the slope of one factor depends on the value of the other factor.

And so now we have

this is the setting that you would use if

you wanted to maximize the time that takes the eyes to focus.

Generally, you would want to minimize the time.

This would be the setting that you wanted to use to minimize the eye focus time.

That's the end of that example.

Let me now go back to my slides for just one second

and introduce the Peanut Solids example.

The Peanut Solids example was an example

that my friend Chris Nachtsheim actually did in a consulting environment

and we have in the sample data library and

the Peanut Solids experiment as a definitive screening design.

But what I did was instead I created a two- level Foldover design

and used the same model to create data for that.

And so let me show you that data,

which is my second example here, the peanut example.

And notice that I have PH, water temperature, extraction time,

ratio, agitation speed, and two categorical factors,

whether you hydrolyze the peanuts first and whether you presoak the peanuts,

and then what are being measured are the peanut solids.

Notice also that the number of runs here is 22,

which is not a multiple of eight.

And therefore, when I look at Evaluate D esign,

I don't expect that this design will be orthogonal for the main effects.

And when I look at the Alias matrix,

you can see that the intercept is correlated or aliased

by a very small amount by any active two- factor interaction.

And again, if I look at the correlation color map cell plot,

you can see that the main effects are not orthogonal to each other,

but their correlations,

their absolute correlations are very small like one over eleven.

And again, because this area is white,

main effects are orthogonal to two- factor interactions.

And this is what we wanted.

This is what we have to have in order to use the Fit Definitive Screening platform.

Now I'm going to run that platform.

And what I see are that I have four active main effects and that means that I have

as many as four choose two or six two- factor interactions to check.

And it looks like ratio and agitation speed

is a term that I don't really need, very small estimate of the example.

But the true model that generated the data

involves these four two- factor interactions here.

Ratio times agitation speed turns out to be a type one error,

except we would probably get rid of it anyway, and its estimate is very small.

Here we've used the Fit Definitive Screening Design

and also this assumption of heredity

of the main effects and two- factor interactions

to find not only all the main effects but four active two- factor interactions

among the two- factor interactions that are related to the active main effects.

We found the actual data- generating model, the correct model.

Okay, so what do I recommend that you do?

First, I've shown you how to use the optimal design criterion

in the Custom Designer to create Fold over designs

and you can do that relatively simply

and you don't necessarily have to create orthogonal Foldove r designs.

The Fit Definitive Screening Design platform doesn't care

whether the design is orthogonal or not,

it will still analyze the data as long as the design is a Fold over design.

And then once you have this Fold over design and the data,

you can use the Fit Definitive Screening Design platform to analyze the data.

And so, in the words of Nike, "Just do it."

Here are some references.

The first two are referencing the original paper

on Definitive Screening experiments and then the paper by Xiao and co- authors

shows how we create Definitive Screening Designs nowadays

without involving optimizations, just by direct construction

using conference matrices.

Then the second paper, Miller and Sitter,

have the basic idea that I've introduced here to analyze F old over designs.

This was in Technometrics way back in 2005,

but we're using slightly more current model selection techniques

than Miller and Sitter.

And finally, the last reference there

is telling how, again in Technometrics, Chris Nachtsheim and I wrote a paper to

basically explain how to use this Fold over technique

and the analysis to do

model selection for Definitive Screening D esigns in a two- step method

that I talked about at the very beginning of this talk.

Thank you very much for your attention

and I'll be at the talk when it's finally delivered to answer questions.

dctrindade · ‎03-30-2022

Very interesting presentation and application of the FIT DSD platform for the analysis.