So Many Designs, So Little Time: From Compare Designs to Design Explorer (2025-U...

With the numerous options available in JMP's Custom Design and DOE menus, choosing the right design can be overwhelming. In this session, we guide you through the process of evaluating and comparing different designs using the Compare Designs platform.

We also explore the Design Explorer feature in Custom Design, highlighting its capabilities in JMP 19 to consider a variety of screening designs – all in one spot. Whether you're new to DOE or looking to refine your design selection process, this session offers practical insights and strategies to streamline your workflow and enhance your experimental outcomes.

Thank you, everybody, for joining me here. I talk Many Designs, Little Time. I'm Ryan Lekivetz, Director of Advanced Analytics, R&D at JMP. I actually lead the DOE and Reliability team here at JMP. I thought this would be an interesting topic because in the newest version of JMP, we've actually done some additional work to the design explorer, which if you haven't seen it that yet, you'll see that by the end of this talk.

But I thought before doing that, you see a little bit of the outline here. We're going to do a bit of a detour to maybe talk a little bit about the idea behind design of experiments. Part of this was… I was looking back at some of the previous discovery talks, and I realized that we haven't had much discussion on this since compare designs was introduced back in JMP 13. I think that was almost 10 years ago at this point. I feel like it wouldn't be a discovery conference if you didn't see this quote, right? "All models are wrong, but some are useful."

Admittedly, when we're designing an experiment, one of the criticisms I hear is these models are too simple, they're never really going to explain my process. Why has design of experiments worked for so long? At the end of the day, they're going to be able to build models that are useful for understanding and improving our systems. We'll fully admit they're not going to be perfect, but at least we can gain some better idea as to which factors or which effects are going to be driving that system.

The way I often like to look at designing an experiment is there's this idea of a variance bias trade-off. We're going to see that in what comes out of these different designed experiments. I think that's often what I look at when I'm trying to choose a design. It's this idea behind what model effects do I have? The model terms that I'm telling JMP that I want versus the terms that maybe I didn't include, but maybe they'd be part of that useful model. If I had some more runs, if I collected some more data, it may turn out that there's some of these effects or terms that turned out to be important.

Let's just look at a simple example here. Here you see, I just have a simple designed experiment with three factors X1, X2, and X3, and I've gone through and collected some response. When I think of that variance bias trade-off, if we look at something like the estimation efficiency. This just tells me what is the standard error of that effect estimate, in this case, X1, X2, X3, relative to that standard error.

If I have a lot of noise in this Y, maybe I'm not going to be able to tell much about that. But if we actually take a look, here I've just generated some data. Let's just run that. As you see here the RMSE was right around 1. If we look at the parameter estimates, that standard error is directly related to that relative standard error, 0. 2.

What that really tells me, if I have effect estimates that are big, if I have big main effect estimates, ideally, this design should be able to detect that. That's what we see here, that estimation efficiency, that fractional increase as I couldn't even do any better than that. If this is the design that I'm using, I couldn't do anything better.

But so now the real question is, "What if I was wrong?" This design here, I designed that. If we look at the DOE dialog, there I just assumed that I had main effects. I had my X1, X2, and X3, and I just said all that I had was main effects. Well, what if it turned out that I actually had an interaction between X1 and X2? Maybe it's too late, I've gone through, I've already collected the data, but what would happen? Here, the same design, if I look at that evaluate design, what if I just added that particular interaction? The X1 and X2.

We see here, I think I use my different design here, but what do we get here is we get an increase in what's happening X2 and X3 and this interaction. All that this is telling me that now actually my variant on X2 and X3 has jumped up a little bit because of adding this X1 and X2 interaction because I didn't design for that. I'll say oftentimes this comes also through in that alias matrix, where if you've ever looked at that, that's where you get those negative one-thirds on that.

Where does that actually matter? If we think of this evaluate design, this is actually the same design here. Here, what I've done is I've actually just explicitly added that X1, X2 interaction. All of that's saying I'm going to take X1 times X2. Let's think about that here. This is where I now have that estimation.

Now if we look at the color map on correlations, you see we often do that. This is where you get those larger pieces here. But what does it matter now? Let's say if I had that X1 and X2, I could still model my main effects. What is it going to matter if I model something like this.

Let's look at our parameter estimates. Well, now you notice, things weren't necessarily significant, but what does that really matter? I mean, everything was okay, so where does something like this come into play? Let's say if I did this over and over again. If you see here, I have this Y simulation. Let's take a look at the formula behind that. Here, let's say this was my true model. I have these main effects, X1, X2, X3 are all doing something, as well as that interaction for X1, X2.

But what if I turned out to miss that in my model when I was modeling that? One thing we can do in JMP is run a simulation to do that over and over again. If I do that, let's take a look at what happens here. I'm just going to keep drawing from that distribution over and over again. All I'm doing is adding a little bit of random noise. If we see here X1, the mean was right on 2, which is what I would have wanted. X2 was 3, which is what we would have expected. But you see X3 here was at 1.

Why is that? That has to do with that alias matrix. It turns out, because I missed that X1 and X2, then X3 is going to be biased by a third of that parameter estimate. It turns out by not having that term in the model, we're going to mess up if we go about doing that. This is something we always have to think about when we're designing an experiment, is we're going to be doing our best job to give it the model that we think is correct. But oftentimes when we're looking at diagnostics, this is why you see us bring up that color map on correlation so often, is because we're looking at those terms we have in the model versus those terms that perhaps were in the set of alias terms, but what might happen if those turned out to be important.

This is where custom design comes into play. If you haven't done a lot of design experiments in JMP, custom design is that first option that we have. Creates an optimal design for your specific experimental settings. In here, we're going to go into custom design. We're going to assume that we have some useful model. Again, not all models are wrong. We're going to assume we have enough understanding of the system to say, "Okay, if I look at these main effects or maybe some interactions, can I make a design according to this useful model?"

Now, one of the things with custom design, even if you looked at that hover, it talked about this idea of an optimal design. This idea of efficiencies in the optimal design, it's really relative to what are we actually trying to get out of that experiment. In particular, if we just launch custom design here, our different optimality criteria, as you see, there's all these different letters, D, I, and A, and recommended. What does this boil down to?

Well, the first basic set is just if I want to estimate model terms. If we assume that we have this useful model, maybe I'm interested in making sure that I can estimate those really well.

Whether it be my main effects, my interactions, and I don't want to get into the mathematical details of this. Something like D-optimality, this is where you're going to be minimizing the overall confidence ellipsoid for those parameter estimates. It says I'm going to consider all of those at one time. Whereas A is more like an average, you're just going to be minimizing the average variance of those parameter estimates.

Those are great if I'm just saying, "I really want to make sure I'm getting those correct terms." But is this useful model? Can I figure out what are the most important things in that useful model?

Now, where is that coming from? Well, if we see here, in custom design, this is where it's going to be based on the model that I've told JMP I'm assuming. By default, here, maybe it just has those main effects. Those alias terms are those ones that maybe they're important, but I haven't included them in the model. Whereas if I told JMP, I want these two-factor interactions, then this is what it's going to be basing those different efficiencies on, the model that I've told it. That's that idea the custom design.

Now, let's say instead, though, maybe I actually think I already have a useful model. Maybe I'm more interested in predicting, doing a better job at predicting the response. This is where you get the other I-optimality. This is where I'm just trying to minimize the average. If I think of the entire design region, I just want to make sure I'm doing generally a good job. I would use I-optimality.

What shows up in some of those different efficiencies, G-optimal just says, "Minimize that worst case. Make sure I never do too bad." You also see things in there. Alias optimality, this is more like a compromise. It says, "Protect me from that bias from those alias terms. Try to protect me from those terms that I maybe didn't have in the model but might be important while still giving me a reasonable design overall."

You also see something called If Possible terms. If I go back to that DOE. What are If Possible terms? Let's take a look here. This is where I can now tell JMP, "Maybe these interactions, maybe they're important, maybe not. I can now set these to If Possible." All that that's telling JMP is that "Give me those necessary terms. You have to make sure that you're going to estimate those. The If Possible terms, if you don't get the chance to be able to estimate them, do it, but they're not going to be as critical as those necessary terms."

How do those all come into play? I can tell JMP, "This is what I want. I have my model, I have my alias terms, my different optimality criteria." Now, one of the biggest problems actually is when you look at the design evaluation. I've created my design, I've told JMP, this is what I want to do. It turns out some of those diagnostics are relative to the sample size, and some of them are not.

If you think about things like the power analysis, well, the power doesn't really care about what your sample size was. Now, of course, the bigger your sample size, the higher power you're going to have for your effect estimates, but it's not relative to the sample size.

Same with things like, the standard error of estimate is just a calculation, but yet the fractional increase in confidence interval is relative to the sample size.

Likewise, the color map on correlations is just a calculation, but the different efficiency metrics that we get at the end are relative to that sample size. Here's just the 12-run design that I've created. I've said the main effects only. As I'd said, some of these, like the power, this is just that straightforward calculation. Things like the Design Diagnostics, you see this is saying, "Well, this is 100% D efficient." The reason for that is because it's a 12-run design, and it hits the best efficiency that you can get for a 12-run design.

But at the same time, I could do the same thing for a 16-run design. But now if we look, the power analysis, not surprisingly, we have a bigger sample size. Now, the Design Diagnostics here. Again, likewise, the D, G, and A efficiencies are all saying 100% because something like these efficiencies are relative to that sample size.

This is telling me for a 16-run design, I can't do any better than this. Likewise, for the 12-run case, it says for 12 runs, I can't do any better than that. Does that mean I should be using the 12-run design over the 16? Well, that depends. If I could afford 16 runs, there's perhaps some other things that I would rather be looking at to make sure if that's what I would want to do.

This is where it can actually become problematic at times is if I want to start thinking about which of these do I want to compare versus not, that's where we get into compare designs. Before I take a detour there, I do want to talk about definitive screening designs. If you've been to some JMP Discovery conferences, you'll often hear this idea of definitive screening designs come up. In fact, we have this right at the top of our menu for definitive screening designs.

Again, don't want to get too much into the mathematical details here. Definitive screening designs are special because this idea of a foldover of a conference matrix. Now, I really like to look at definitive screening designs as these compromise plans. They were actually discovered as an alias optimal design, if you remember back to those different optimality criteria. But they were really efficient designs where we kept zeros on the diagonal and added a center point. One of the reasons definitive screening designs have become so popular is when we think about the variance bias trade-off, it turns out these designs are orthogonal.

Those main effects, back to that estimation, the main effects are estimated really well. But if it turns out if we have interactions or quadratic effects, these main effects are going to be protected from them. We have the bias protection that if the useful model has quadratic effects or interactions, by constructing the design in this way, we're going to get that built-in protection. That main effects bias protection.

Now, if you think of those other two designs that I had, why would we want to use this? I'd say one of the really nice things is by having those zeros in there, we can make sure if there's this nonlinear effect, if something is happening in the middle of our design space, the traditional just using plus and minus one, we're going to miss that. I like to think about if I'm baking a cake, maybe I'm supposed to be using 350 degrees. Well, if I use 325 or 375, maybe it's not quite right.

If I was telling JMP that I want my range to be between 325 and 375, I've never actually seen anything at the 350, whereas maybe I want to make sure that I'm doing that.

The other thing I just want to mention on the side here, one thing you may not have noticed, oftentimes if you've taken a design of experiments class, they talk about the classic fractional factorial designs which show up in our screening design platform, but also hidden within there, there's something called near-orthogonal designs and orthogonal mixed-level designs, which are actually like a specialized case of Definitive Screening. They're actually hidden under that classical one. One of the things that you're going to see in the newest version of JMP is that we've made those a lot easier to find.

The other problem with that definitive screening design, it turns out I can't actually get that from custom design. I said it was this compromised plan with the alias optimal, but it turns out to force those zeros in just the right way. If you've ever tried to make a definitive screening design and custom design, it's possible to do so, but it takes a lot of work. Whereas all I need to do, I can go to this platform and it's automatic. It's constructed for me without trouble.

But let's say if I'm talking to some colleagues, and we're debating between three different designs. Somebody has gone through. I have a 12-run design that just came from custom design for five factors. I have this definitive screening design with five factors and as well this 16-run design. Now the 16-run design is going to be more expensive. But what do I want to do? Well, this is where compare designs comes into play.

Now I'm going to be able to compare multiple designs all within this one platform. I don't have to start thinking about launching these individual evaluate designs and trying to figure out which ones make sense to compare and which ones don't. I just mentioned here, you can actually match column names or let JMP try and do it, and it allows us to consider multiple models.

You'll find compare designs. Under the DOE, Design Diagnostics, and there we have compare designs. You can see right now my active data table is that 16 run one. Maybe let's compare that to that 12 run and the 13 run. Now at the bottom here, I can actually tell JMP how I want to match the columns, but let's just give it a try to try the automatic matching.

JMP is going to try to find how it can do that. We see here that it actually did find the right one. Even if they have different factor names, JMP is going to try to do that matching for you. What's the nice thing with something like compare designs? Now I can start looking at things like power analysis. Now I can go through with my colleagues and start to decide, okay, which of these three designs? Of course, the 16 run is going to cost me more, but of course, I get a higher power for that.

I can look at things. This fraction of design space plot has to do with that prediction variance. Across the range, how well does it do? Now, the nice thing, because we're in compare designs, we can use this idea of relative efficiencies. I don't have to worry about it comparing to a specific run size, I can directly make these comparisons.

Now you'd say, "Well, according to this, maybe I would only ever want to use that 16 or the 12. Why would I ever use the DSD?" Now this goes back to that. It's these different correlations. Those model versus alias terms. If it turned out that I have something like a two-factor interaction, this is the nice thing with compare designs. I can play the what-if game. We actually see, well, in this case, one of those designs couldn't even fit that particular interaction because of the way it was set up.

Now I might say, "Well, that's problematic. Let me just try to add something else." You can see, it looks like this particular design is not doing much for me. Now you can start to think about, if I start trying to add interactions, it looks like that 16 run design was having trouble. Let me try to add them all. We see here JMP is actually telling me, "Well, that design here is actually struggling to fit interactions. By the way, I've constructed it because I hadn't factored that into the design construction, now I can't even do a good job if I'm trying to add that."

This goes back to that idea, if I'm worried about those alias terms, and so here are alias terms where those two-factor interactions. If I'm worried about interactions, then maybe I'd rather be going with something like that definitive screening design. That's where the compare designs is going to be able to help me figure that out.

Okay, I'd say the compare designs has been a fantastic tool. Since we built that in, I'll say even within JMP, we use that on a regular basis. The one thing with compare designs, though, is that if you noticed, I needed to create each of those individual data tables, and I really had to have a sense as to which designs I wanted to compare.

But sometime earlier on, I'd actually rather think about more designs before doing that full-on design comparison. That's where design explorer comes into play. If you've never seen design explorer, I'm just going to launch custom design again here, and let's do the same thing. We're just going to add five different factors. I go to the Continue.

Let's say we're going to try this main effects model. If you notice right under that make design. When it's available, you'll see this design explorer. "Using the current factors and model, explore different design choices to help choose a design." Let's click on design explorer and see what we get. Here it's telling me I have that main effects model, I have that model here.

What we see, select options for a single design or combination of specified options. Maybe now I want to talk to somebody, and we'll say, "Well, let's try a D and an A-optimal design and maybe go from 12 to 20 runs." If I let design explorer do that, you see what it's actually done is now created the 12, 16, and 20 run D-optimal designs and 12, 16, and 20 run A-optimal designs.

Now in previous versions of design explorer, all I could do was things that I could generate within custom design. In the newest version of JMP, now it turns out I can actually bring in a definitive screening design. Maybe now if I wanted to compare what would happen if I were to do a definitive screening design.

Now I can start to compare some of these different efficiencies and see what happens. You notice here, it's going to tell me… The main effects are orthogonal to the quadratic effects and two-factor interactions. This is giving me a hint that this can do something a little bit extra by being able to select that. Now I can see, I can start to compare how these different designs do relative to some other choices that I have.

Likewise, this is a design you may not have seen before, so near orthogonal arrays, which actually just tries to balance. This main effect screening designs, all it's trying to do is balance each pair of columns together. Likewise, things like this orthogonal mix-level design, another type of design that was hidden under that screening platform in the past.

Now I can quickly generate all of these different designs and start to look at which one might I want to use. At the same time, now I might be starting to get overwhelmed. I'm going to say, "Okay, I don't even know where I want to begin. Why don't I send this to a colleague?" If you see, we have this enhanced data table. I have efficiencies versus runs, scatter plot of efficiencies. But the other nice thing with this, I can now send this to a colleague. I have scripts here that are ready for them to run and be able to create these designs.

Likewise, if I'm doing this on my own, I may say, "Okay, this 17 run DSD looks like it might be good. Let me evaluate it and see what that looks like." I can still run design evaluation directly from in here. Maybe once I'm ready, I'm going to go ahead and create that selected. There's my 17 run definitive screening design ready to go.

That's the design explorer. I'll be around If you're at Discovery, happy to talk about that. If not, you can leave some comments down below. This online one.

Some final thoughts. I'll say, even with that design explorer, it still may seem like you have too many designs. I like this quote, "Discoveries made by accident are often celebrated, but those made by design are more reliable."

Okay, with that, all models are wrong. The other quote I like is, "The best time to plan an experiment is after you've done it." I think for every experiment that I've run, there's always like, "Well, I wish I would have done this." Or after the fact, you're like, "Well, this was obvious had I just thought about this ahead of time." But really just the fact that if you're thinking about doing a design experiment, you're already ahead of the game.

Even those failed experiments, or what I'll call a failed experiment, you're always learning something about what you would do the next time. You really should be viewing design of experiments as a sequential process. Really, it's helping you figure out what to do in the next one.

The other thing I do want to mention, though, is that how you model does matter. Generally speaking, as long as you're just thinking about this idea of designing an experiment, at JMP, we spend our days thinking about the best way to put the user in the best position for when they run that experiment to try to protect them with things like definitive screening designs, protect you from that bias and things like that.

But at the end of the day, how you go about modeling, it does matter. If you think about things like definitive screening designs and If Possible terms of all these interactions, when I design an experiment, I often try different modeling techniques to try to make sure that I haven't missed some of those terms that maybe I hadn't factored into that original model or trying to make sure that that model that I think is useful is the most useful one that I could have found.

I'll say this is also very true when you look at things like the power analysis. One thing when I think about, if I have these four factors with interactions, if I look at power analysis for these main effects and two-factor interactions, it's very rare that I expect every single one of these terms to be in my final model.

Oftentimes when I think about things like power analysis, I'm running simulations to play the what-if game. What if two of these were significant? Or what if three of these were significant? Because it's very rare that my final model is going to include all of those. That's just something else, I think, to keep in the back of your mind when you're designing these experiments is how deep am I going to get into the model selection and playing that what-if game?

With that, thank you very much for your time. I hope you have found this talk useful. Again, leave any comments or questions below. Thank you.

Presented At Discovery Summit 2025

Presenter

Ryan Lekivetz

Skill level

Intermediate

Beginner
Intermediate
Advanced

So Many Designs, So Little Time: From Compare Designs to Design Explorer (2025-US-30MP-2569)

Presenter

Skill level

Design of Experiments