Subscribe Bookmark RSS Feed

Analyzing factorial experiment with missing data

michelle

Community Trekker

Joined:

Feb 4, 2015

I will preface my question with the fact that I am new to DOE in general. After trying to learn the basics, I designed a 2^5 full factorial experiment with 5 center points (to obtain an estimate of process error). Ultimately the process is producing samples, which we are characterizing in different ways. We are looking at MANY responses. The problem is that for some of the samples, we cannot do all of the characterization, because for some reason that sample was not viable for that type of characterization. So, I am wondering how to analyze the results with a few missing data points. Right now I am trying to do something really simple and just calculate main effects, interaction effects, etc. I produced a spreadsheet to hand calculate the results, and also ran the data through the software. For the responses where I had all the data, the effects I calculate are the same as those calculated by the software. For the responses where I am missing a few data points, the effects are DIFFERENT. They are close - but not the same. All I changed in my calculation was omitting the data points I didn't have (so if I needed to average eight values, I would average seven, for example). What I would really like to know is how the software calculates the effects if there are data points missing? How can I reproduce this calculation? I can't for the life of me find information on what to do in general with missing data points. I believe that there is still a lot of value in mining the results since I am only missing a few data points out of 32 runs...

1 ACCEPTED SOLUTION

Accepted Solutions
Solution

Michelle,

Very good question! When there is no missing data the design parameter estimates are easily calculated by hand and agree with the software output. That is because all of the wonderful properties that orthogonal designs display. As soon as you have missing data however the design becomes unbalanced and correlations arise in the model terms which can bias the parameter estimates. As you mention the calculations are indeed close but do not agree with the software output. That is because the software output is calculating the parameter estimate which includes the bias that occurs as a result of the missing row(s) of data. If you use the evaluate design capability in JMP and look at the color map on the correlations with the data table with the rows of data with missing values excluded you will see the bias that is included in the parameter estimate.

I hope this helps.

4 REPLIES
Solution

Michelle,

Very good question! When there is no missing data the design parameter estimates are easily calculated by hand and agree with the software output. That is because all of the wonderful properties that orthogonal designs display. As soon as you have missing data however the design becomes unbalanced and correlations arise in the model terms which can bias the parameter estimates. As you mention the calculations are indeed close but do not agree with the software output. That is because the software output is calculating the parameter estimate which includes the bias that occurs as a result of the missing row(s) of data. If you use the evaluate design capability in JMP and look at the color map on the correlations with the data table with the rows of data with missing values excluded you will see the bias that is included in the parameter estimate.

I hope this helps.

michelle

Community Trekker

Joined:

Feb 4, 2015

Thanks Lou - super helpful! Would it be fair to say, then, that:

(a) it is nontrivial to hand-calculate the effects in the same way as the software (i.e., there is no (relatively) simple formula for this)

(b) the effects calculated by the software are better (more accurate) than those that I hand calculate by simply omitting the missing data - so I should use those

(c) that the results from the software are still OKAY even with the missing data (especially if the majority of the data is there...)

It really bugs me that I cannot calculate the effects by hand; I would like to have a little more control over the analysis, and use the software more as a tool for throughput rather than outsourcing understanding how the calculations work.

louv

Staff

Joined:

Jun 23, 2011

Michelle,

I'm sure you could calculate it however it would involve some matrix calculations. Practically speaking however the fact that you completed a full factorial design there are certainly enough degrees of freedom to fit a model that delineates your responses despite having missing values. The parameter estimates may be somewhat biased but your conclusions whether or not an effect is significant should do just fine.

Lou

louv

Staff

Joined:

Jun 23, 2011

Here is an example of a simple 3 factor full factorial design. The first case is where all 8 runs are used for the design evaluation and the second case is where one row is excluded due to the missing response. As you can see the properties are quite different and would impact the parameter estimates in the second case.

8017_Screen Shot 2015-02-04 at 5.42.24 PM.png8018_Screen Shot 2015-02-04 at 5.42.36 PM.png