Welcome to the finale of our space filling DOE series! After establishing our framework and evaluation methodology in the previous posts, we now present the comprehensive results of our comparative study. The findings provide clear guidance on when and how to select the most appropriate space filling design for your specific application.
Our analysis across four design types, three dimensionality settings, and three sample sizes conditions per factor reveals several interesting insights in terms of design performances and computation time.
As mentioned in the previous post in this series, the analysis of the results is done through two different methodologies: a ranking of design types across different dimensionality and sample size settings (Figure 1), and a modeling approach where the responses discrepancy, MaxPro, and time responses are modeled according to the factors (Figure 2).
Figure 1: Ranking results for all responses.
Figure 2: Prediction profiler and variable importance for all responses.
As expected, there is no universal space filling design winner in the different configurations tested. The context determines optimal choice, and, particularly, the dimensionality and number of points per factor have significant impacts on the design performance, as well as required computing time.
Latin hypercube is the best design for projection properties (MaxPro criterion).
Uniform is the best design for its space coverage/uniformity properties (discrepancy criterion), but the generation time required to compute it is the highest and can quickly become prohibitive.
Interestingly, the design performances rely heavily on sample size and dimensionality interactions, as the relative performance can change with the total number of runs for a specific number of factors.
Using the scripts presented in the second post of this series, the continuous responses can be transformed into rankings, where the performances of the four designs for each dimensionality and sample size scenario are rated from 1 (best design) to 4 (worst design).
Using the modified model script presented in the second post of this series, the models are launched on the three responses discrepancy, MaxPro, and Time. For each of these responses, we can note that the fitting of the model doesn’t seem adequate (Figure 6).
Figure 6: Actual vs. Predicted plot for initial response discrepancy model.
Moreover, a non-normal and non-random pattern can be seen in the residuals (Figure 7).
Figure 7: Residuals visualizations showing non-random and non-normal patterns.
The residual values are supposed to be scattered randomly about zero, so that the variation of the response is fully explained by the effects included in the model. If there is a non-random pattern, the nature of the pattern can indicate potential issues with the model. In this case, a curvature effect, non-normality of the residuals, and heteroskedasticity (non-homogeneous variance) may indicate that a data transformation is recommended to address these issues. As the data covers very different range of scale (Figure 8), a log-transformation (or Box-Cox transformation with λ=0) seems appropriate.
Figure 8: Results of a fitted lognormal distribution on discrepancy response.
This assessment is confirmed by the Box-Cox transformation assessment provided by JMP, showing a recommended value for λ very close to 0 (Figure 9).
Figure 9: Box-Cox Transformations panel result for discrepancy response.
Under the red triangle of Box-Cox Transformations (available in the red triangle next to Response […] in Factor Profiling>Box Cox Y Transformation), we select Replace with Transform and a lambda value of 0 to automatically replace the model fitting with one having the log transformation applied to our data. The models now appear to be adequate for analyzing the data from the experiments.
The impact of the design type choice on the discrepancy value can be visualized simply by displaying the discrepancy value measured by the ratio exp/factors for each number of factors scenarios analyzed in the DOE (Figure 14).
The impact of the design type choice on the MaxPro metric can be visualized simply by displaying the MaxPro value measured by the ratio exp/factors for each number of factors scenarios analyzed in the DOE (Figure 14).
The impact of the design type choice on the design generation time can be visualized simply by displaying the design generation time measured by the ratio exp/factors for each number of factors scenarios analyzed in the DOE (Figure 13).
Figure 13: Evolution of design generation time depending on factors design type, number of factors, and ratio exp/factors.
Design-specific strengths become more pronounced with larger sample sizes. Interestingly, the models allow for an area where discrepancy can be minimized and MaxPro can be maximized to be identified, depending on the dimensionality of the design space. An optimal ratio of number of experiments per factor seems to be between 60 and 80 to improve both discrepancy and MaxPro metrics (Figure 14).
In the case of MaxPro modelization, a change can be noted regarding the performances of Latin hypercube and fast flexible filling designs regarding the dimensionality (number of factors): when the optimal ratio of number of experiments per factor is around 70, both of these designs have MaxPro values increasing with the number of factors.
When the ratio of experiments per factor is above the optimal value, then the two designs have MaxPro values slightly decreasing with the number of factors (Figure 14).
Figure 14: Evolution of discrepancy and MaxPro values depending on design factors.
Uniform designs become computationally prohibitive for high dimensionality and/or large sample sizes. Performance versus speed trade-offs favor different designs per scenario. To summarize the different strengths and weaknesses of each design, a table with results is provided (Table 1).
Step 1: Assess your experimental context
Step 2: Apply selection rules
Table 1: Summary of results by space filling design type.
Note that one treatment of the full factorial design (uniform design with nine factors and 100 runs per factors) couldn’t be done due to high computation time (Figure 15):
Figure 15: JMP message displayed for the combination uniform design, comprised of nine factors and 100 runs per factor.
This design computation was aborted before completion for this configuration. Even if this experiment was missing in the full factorial design, the analysis could be conducted without any problems.
Study limitations
Future research
Our comprehensive evaluation provides clear evidence that space filling design selection should be driven by specific experimental characteristics rather than default choices.
Key takeaways
Final recommendation: Use our decision framework as a starting point but always validate design choice with pilot studies when possible.
This completes our three-part series on space filling designs of experiments:
Thank you for following this series! I hope these insights help you make more informed decisions about space filling designs in your own work.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.