In this International Year of Statistics, we at JMP are celebrating famous statisticians on a monthly basis. This month is my turn, and early this year I chose Professor George E.P. Box as the subject of my celebration. I was looking forward to writing this piece because I knew George personally and have been an admirer of his since the beginning of my career.
Sadly, George passed away in late March, and I wrote a remembrance of him for the JMP Blog at that time. That blog post expresses what I would have written in a post celebrating him. So, instead of speaking in general about his life and accomplishments, in this post I will focus on one of his many great papers. My plan is to write several such blog posts this month, each emphasizing a different one of his wonderful publications. One of the benefits for me is that I get to reread these papers.
In this post, I want to focus on the first of his two-part paper with J. Stuart Hunter on the family of regular two-level fractional factorial designs that was published in Technometrics in 1961.
This seminal paper is 40 pages long, and one thing I found notable about it was that the mathematical content did not go past arithmetic and a little algebra! Despite this, there are many fundamental results in this paper, but all are stated in natural language without formal proofs. That was refreshing.
How does the paper begin?
The paper starts with a brief exposition of two-level full factorial design in k factors. It shows how these designs can estimate interactions of all orders up to the k-factor interaction. This provides the motivation and background for introducing a half fraction of the full factorial design. They illustrate the construction method using the 2(4-1) design showing how one starts with the full factorial design in three factors and then adds a fourth factor by computing the elementwise product of the first three factors.
Can you show this in JMP?
To reconstruct their example the way they did it by using JMP, we start by using the Full Factorial designer. We call our factors 1, 2 and 3 and our response Y. We compute the 4th column using the formula editor. The Y column in our Table 1 below has the same values as the ones they use in their Table 3.
Of course, we could also just use the Screening Designer in JMP to enter 4 factors. The design we want is the first in the list. :-)
What happens next?
They now have a design with 8 runs that is just half as many runs as are in the full factorial design with four factors. With the full factorial design, you can estimate 16 effects – the overall average, 4 main effects, 6 two-factor interactions, 4 three-factor interactions and 1 four-factor interaction (16 = 1 + 4 + 6 + 4 + 1). Now with 8 runs, you can only estimate 8 effects. It turns out that the construction the authors use confounds the 16 effects of the full factorial into 8 pairs of effects. The average is confounded with the four-factor interaction. The 4 main effects are each confounded with one of the 4 three-factor interactions. Finally, the 6 two-factor interactions are confounded in 3 pairs (8 = 1 + 4 + 3).
Figure 1 below shows the analysis from the JMP Screening platform. The values JMP reports are half of the quantities Box and Hunter report, because they define their effects as being the difference in the response when changing from one level of the factor to the other. JMP defines an effect as the change in the response due to a one-unit change in the factor. Since one level of the factor is coded -1 and the other is coded +1, each factor changes by two units going from its low to its high level. Thus, the effect of a one-unit change is half the effect of going from low to high.
How does the rest of the paper go?
Of course, the paper is much too long for me to cover everything Box and Hunter introduce – especially not in this level of detail. Here are some of the big concepts:
Where has design for screening gone in the 50+ years since then?
It is a tribute to the combined power and simplicity of this approach that the regular two-level fractional factorial designs are still in frequent use today. The construction and analysis of these designs does not require a computer, which made them popular when computers were rare. Of course, the calculations can be a bit tedious, so having a computer do them for you makes for fewer errors and more free time.
In the same year as the publication of this paper, Hall published 5 different orthogonal arrays for 15 factors in 16 runs. The saturated design in the Box and Hunter’s paper was one of the 5. This paper was also fundamental as it turns out that all the orthogonal arrays 16 runs for fewer factors are projections of the Hall arrays.
Forty years later, Sun, et al. (2002) catalogued all the orthogonal 16 run designs for 5 to 14 factors. For 9 to 14 factors, the 16 run designs of Box and Hunter are all of resolution III, which means that main effects are confounded with two-factor interactions. Sun, et al. found designs in these cases where none of the two-factor interactions confounds a main effect. Instead, some two-factor interactions may be correlated either plus or minus one-half with a main effect. The benefit of these designs is that main effects can be identified without the built-in ambiguity that resolution III designs entail.
Box, G. E. P. and Hunter, J. S. (1961) "The 2k-p Fractional Factorial Designs Part I" Technometrics Vol 3, No. 3 311-351.
Hall, M. Jr. (1961). Hadamard matrix of order 16. Jet Propulsion Laboratory Research Summary, 1, 21–26.
Sun, D. X., Li, W., and Ye, K. Q. (2002), “An Algorithm for Sequentially Constructing Non-Isomorphic Orthogonal Designs and Its Applications,” Technical Report SUNYSB-AMS-02-13, State University of New York at Stony Brook, Dept. of Applied Mathematics and Statistics.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.