Replication is one of the four basic principles of experiment design introduced by R. A. Fisher. The other three were the factorial principle, randomization and blocking. The value of replication is that it provides an estimate of the run-to-run variability in the response that is unaffected if the model is incorrect.
You can replicate designs in JMP 9. How is individual run replication different?
In previous releases of JMP, to use replication you had to re-run every factor combination in the design a specified number of times. So, if a design required 12 unique factor combinations and you specified that you wanted one replicate, then the resulting JMP table had 24 rows and every factor combination appeared twice. The interface for specifying replicates was available only after you created a custom design.
In JMP 10, instead of specifying how many times you want to repeat the entire design, you say how many individual runs you want to replicate. You also make your request for replicates before rather than after design creation. For example, if you wanted six replicated runs in the above scenario, you would enter 6 in the edit box labeled Number of Replicate Runs. Then, you click the Radio Button labeled User Specified and enter 18 in the edit box. The resulting design will have 18 runs and 6 replicate observations.
How does this design compare to a design without replication?
To answer this question, let us consider a specific example. Suppose we have two categorical factors each having three levels and two continuous factors. Our model is the main effects model in all four factors, so we only need to consider the low and high ends of the range of the continuous factor. The full factorial design requires 36 runs, but our budget allows for only 18 runs.
Table 1 displays the factor settings for a design with 12 unique factor combinations where six of these combinations appear twice. Runs 3 and 4, 5 and 6, 7 and 8, 9 and 10, 13 and 14 and 17 and 18 are the six pairs of replicate runs.
Note that there is some decent symmetry in this design. The nine possible combinations of three level categorical factors each appears twice. Also, the factor, X4, is balanced with nine runs each at its low and high settings. However, the factor, X3, has only 8 runs at its low setting and 10 runs at its high setting.
Despite this slight lack of balance, the D-efficiency of this design is 99.1%. I created this design using the custom designer in JMP 10. In cases where the investigator specifies a number of replicates, the custom designer creates a design that is optimal given the constraint that it must supply the specified number of replicates. To create this design I specified the D-optimality criterion.
For comparison the design in Table 2 shows the 18 run D-optimal design generated specifying no replicates. This table has the same pattern for the two three-level categorical factors. However, both of the continuous factors have 9 runs each at their low and high settings.
The D-efficiency of this design is 99.8% compared to theoretically ideal design which in this case does not exist. So, the design with replication is only about 1 percent less D-efficient than the optimal design.
What are implications of individual run replication in analysis?
I added simulated data to both of the tables. The a priori model for each design was the main effects model but I added a term for the two-factor interaction involving X3 and X4 to the simulated data along with normal random error with a standard deviation of one. Here is the equation of the simulation model:
Y = 80 + 3X3 + 4X4 - 3X3X4
Because there are six replicated runs in Table 1, the JMP Fit Model platform automatically does a Lack-of-Fit test (see Figure 1). Because the two-factor interaction is not in the model, the test is highly significant indicating that the model is inadequate. Armed with this information, the savvy investigator can add two-factor interactions to the fitted model in an effort to find the missing effect.
For the unreplicated design, there is no test for lack of fit because there are no pure error degrees of freedom to detect lack of fit. The RMSE of the fitted model is 3.65, which is more than three times larger than the true error standard deviation. That is because the active two-factor interaction is not in the model, so its effect results in an overestimate of the error standard deviation. This, in turn, means that tests of the model effects have lower power.
To be fair, it is not too difficult to detect the lack of fit using the data from the unreplicated design. Figure 2 shows a graph of the residuals from the fit of the unreplicated data plotted against X3 and overlaying the two values of X4 with different colors and associated lines. The interaction is clear because the overlaid lines cross.
Producing this plot requires some additional effort and expertise on the part of the data analyst. By contrast, the lack-of-fit test is automatic.
What is the bottom line?
There is safety in replication. That safety comes at a price in that for a given run budget, replicating runs means that you will not be able to estimate as many terms. Also individual point replication generally results in some increase in the variance of your coefficient estimates.
What individual run replication offers is more flexibility in the specification of your design, and providing flexibility is the whole point of Custom Design in JMP.