Our World Statistics Day conversations have been a great reminder of how much statistics can inform our lives. Do you have an example of how statistics has made a difference in your life? Share your story with the Community!
Choose Language Hide Translation Bar
Staff
Analyzing a 4-factor definitive screening design with diecast cars data

When I created the four-factor definitive screening design discussed in my previous blog post, I was excited to try out the new technique that Bradley Jones presented at the JMP Discovery Summit. Looking at the dyed cars, I noticed some promising results and a wide array of colors.

The new technique involves fitting a main effects model and using the main effects for “fake” factors that are used to create the design but not actual changeable factors (in this case there were two, since the design was based on a six-factor definitive screening design) to provide an estimate the pure error. A more detailed analysis using this method will be saved for another day, since I had one significant effect: heat setting. However, it seemed like there was too much noise based on my past experiments, and the vinegar that was so promising in the last experiment was no longer there. I took a quick look with Graph Builder and wasn’t surprised that there was no main effect for vinegar:

This becomes even more pronounced when looking at the residuals (from fitting the heat setting and block) vs. vinegar:

Typically, when I’m fitting a model from a designed experiment, I follow the principle of effect heredity, which means I won’t add a second-order term unless the main effect component(s) is significant. However, that doesn’t seem to hold for this data. If you look at the rating vs. heat setting, and overlay vinegar, there also appears to be an interaction with heat setting and vinegar amount:

I did some other investigation with other effects, particularly with time. This is because, with the heat setting being so significant (and suggesting high heat), time should be factored in. After all the models I fit, in the final model, time was significant. I also kept the quadratic effects for time and vinegar, as they were marginally significant:

You’ll notice that time shows up as significant in the final model. The estimate itself doesn’t change from the main-effects-only model, since main effects are orthogonal to all main effects and second-order effects. However, the standard error has been reduced because the variation in rating comes from the second-order terms in the model.

Confirming the experiment

Now that I had the final model, I needed to see how it well it works. Using the Maximize Desirability option from the red triangle menu in the Profiler, it looks like 50% vinegar for 23 minutes on heat setting 3 is best.

It turns out I can dye multiple cars within a batch of liquid (hmm … that sounds like the makings of a split-plot design in the future). I used 50% vinegar with heat setting 3 and the lowest dye amount for three cars, taking them out at 10, 20 and 30 minutes. The thought was the 20 minutes should be ideal. From left to right, the cars here are undyed, 30 minutes, 20 minutes and 10 minutes:

I was expecting the 20- and 30-minute cars to look good, and I was happy with the results. In fact, I prefer the 20-minute car, as 30 minutes on high heat made the plastic in the car start to melt.

Final thoughts

While the principle of effect heredity is based on empirical studies, sometimes it doesn’t hold, and you have to start investigating if you’ve missed some second-order effects. The definitive screening design worked out nicely in this case. That's because, unlike just using center points, not only could I detect the possibility of quadratic effects, but I could also estimate them.

If you compare the results for the red cars in the first experiment, it’s incredible to see the difference after two follow-up experiments. I have a much better sense of getting other colors as well, especially with the Profiler. For example, if I want medium colors, I can use no vinegar and heat, while the lighter colors involve no heat with some vinegar. (If you missed any of these experiments, check out the whole series on dyeing diecast cars.)

I still want a definitive screening design to try out the new analysis technique. Any suggestions? Thanks for reading!

Article Labels

There are no labels assigned to this post.

Visitor

Richard Steger wrote:

Dr. Lekivetz - I've read your recent blog posts on DSD using fake factors. I'm curious to know if you made progress or published something on the more detailed analysis " A more detailed analysis using this method will be saved for another day, since I had one significant effect: heat setting."

I'm working on a DSD and think that using fake factors would be useful. We are experimenting with asphalt additives at different dose levels and measuring the effects using standard asphalt mixture tests. I find that I can get a higher D Efficiency (~78.6) when using 3 continuous factors, 2 fake factors and a block. This adds up to 14 runs. However, I'd like to find more information on how to analyze this design using fake factors before embarking on the experiment.

I attended the Jones DSD workshop in Houston at the technical conference in Oct 2015 and tried to read about this design approach as much as possible. I've also watch the video posted from the Summit in San Diego.

Thanks,

Richard

Staff

Ryan Lekivetz wrote:

Hi Richard,

The paper describing the analysis from Bradley Jones and Christopher Nachtsheim is forthcoming.

The issue I ended up having was that the analysis assumes effect heredity - it looks for important main effects in the first step and the second step looks for 2nd order effects from the important main effects. In this example, vinegar and time, had quadratic effects, but the main effects were not large enough to be picked up in the first step. In fact, the main effect for vinegar is almost 0.

For 3 continuous factors and a block, you should be able to use 3 fake factors and still remain at 14 runs. Those 3 fake factors provide 3 df to estimate the error in the first step.

From the videos, you've noticed the idea of partitioning the response. For your case, the main effects of the 3 continuous (and the 3 fake factors) go into the odd space, while the block and second order effects to the even space. The first step uses an error estimate with the 3 df from the fake factors to look for main effects. Any main effects not selected in the first step are added to the error (estimate and degrees of freedom) used in the second step.

We've been working on fixing up the add-in, but if you have a version from one of the events, all you should need to specify is the 3 continuous factors and block. I would recommend simulating some data by creating a formula column and trying out a few models in the add-in. In the paper I mentioned above, there are excellent results in regards to power when there are all continuous factors compared to other model selections. We're working on more simulations when there are categorical and blocking factors, but everything does look good.

Hope that helps!

Ryan