JMP Blog

Victor_G · Dec 4, 2025 01:00 AM

Introduction

Welcome to the finale of our space filling DOE series! After establishing our framework and evaluation methodology in the previous posts, we now present the comprehensive results of our comparative study. The findings provide clear guidance on when and how to select the most appropriate space filling design for your specific application.

Summary of key findings

Our analysis across four design types, three dimensionality settings, and three sample sizes conditions per factor reveals several interesting insights in terms of design performances and computation time.

As mentioned in the previous post in this series, the analysis of the results is done through two different methodologies: a ranking of design types across different dimensionality and sample size settings (Figure 1), and a modeling approach where the responses discrepancy, MaxPro, and time responses are modeled according to the factors (Figure 2).

Figure 1: Ranking results for all responses.

Profiler+Feature-importance.svg
Figure 2: Prediction profiler and variable importance for all responses.

As expected, there is no universal space filling design winner in the different configurations tested. The context determines optimal choice, and, particularly, the dimensionality and number of points per factor have significant impacts on the design performance, as well as required computing time.

Latin hypercube is the best design for projection properties (MaxPro criterion).

Uniform is the best design for its space coverage/uniformity properties (discrepancy criterion), but the generation time required to compute it is the highest and can quickly become prohibitive.

Interestingly, the design performances rely heavily on sample size and dimensionality interactions, as the relative performance can change with the total number of runs for a specific number of factors.

Ranking results of design types

Using the scripts presented in the second post of this series, the continuous responses can be transformed into rankings, where the performances of the four designs for each dimensionality and sample size scenario are rated from 1 (best design) to 4 (worst design).

Comparison of space coverage (discrepancy) property:
- Uniform designs show the best space coverage performances across all dimensionality and sample size settings. It’s not surprising, given that uniform designs tend to distribute points to follow a uniform distribution as closely as possible, as seen in the first post of this series, so their performances regarding the discrepancy metric are expected to be good.
- Latin hypercube designs, which distribute points mimicking a uniform distribution, are in second place for their good space coverage performances.
- Finally, fast flexible designs, balancing between space coverage and projection properties, and sphere packing designs, that spread points out as much as possible inside the design region, are in third and last place, respectively (Figure 3).
  
  Figure 3: Discrepancy ranking of designs for all test configurations of dimensionality and sample size (lower is better).

Comparison of projection (MaxPro) property:

Latin hypercube designs show the best projection performances for situations involving small (three runs per factor) and high (hundred runs per factor) sample sizes.
For medium sample size (20 runs per factor), fast flexible and uniform designs show better performance. This situation seems to indicate that MaxPro response could be influenced by a quadratic effect of the ratio exp/factor, as well as interactions between design types and ratio exp/factor.
Sphere packing designs have undefined MaxPro statistic because points can have identical values for one dimension, so they end up in last place (Figure 4).

Figure 4: MaxPro ranking of designs for all test configurations of dimensionality and sample size (lower is better).

Comparison of design time generation:

Latin hypercube designs show the fastest design generation time across almost all dimensionality and sample size settings.
For medium (20 runs per factor) to high (hundred runs per factor) sample size, fast flexible design is also a very competitive choice for its design generation speed.
Sphere packing designs are generally slower to compute, and uniform designs are the slowest designs to compute (Figure 5).

Figure 5: Design generation time ranking of designs for all test configurations of dimensionality and sample size (lower is better).

Modeling results

Using the modified model script presented in the second post of this series, the models are launched on the three responses discrepancy, MaxPro, and Time. For each of these responses, we can note that the fitting of the model doesn’t seem adequate (Figure 6).

SF_Comparisons - Fit Least Squares Discrepancy.svg
Figure 6: Actual vs. Predicted plot for initial response discrepancy model.

Moreover, a non-normal and non-random pattern can be seen in the residuals (Figure 7).

SF_Comparisons - Fit Least Squares discrepancy2.svg

Figure 7: Residuals visualizations showing non-random and non-normal patterns.

The residual values are supposed to be scattered randomly about zero, so that the variation of the response is fully explained by the effects included in the model. If there is a non-random pattern, the nature of the pattern can indicate potential issues with the model. In this case, a curvature effect, non-normality of the residuals, and heteroskedasticity (non-homogeneous variance) may indicate that a data transformation is recommended to address these issues. As the data covers very different range of scale (Figure 8), a log-transformation (or Box-Cox transformation with λ=0) seems appropriate.

SF_Comparisons - Distributions-discrepancy.svg
Figure 8: Results of a fitted lognormal distribution on discrepancy response.

This assessment is confirmed by the Box-Cox transformation assessment provided by JMP, showing a recommended value for λ very close to 0 (Figure 9).

SF_Comparisons - Fit Least Squares-box-cox.svg

Figure 9: Box-Cox Transformations panel result for discrepancy response.

Under the red triangle of Box-Cox Transformations (available in the red triangle next to Response […] in Factor Profiling>Box Cox Y Transformation), we select Replace with Transform and a lambda value of 0 to automatically replace the model fitting with one having the log transformation applied to our data. The models now appear to be adequate for analyzing the data from the experiments.

Modeling results of discrepancy response: Most of the variation of discrepancy response is explained by factors ratio exp/factors and the type of design. The dimensionality (number of factors) is also an important factor, but with a relatively lower importance than the two factors mentioned earlier (Figure 2). Interaction effects are detected, but their impact on the variation of discrepancy is relatively low compared to main effects. A very important quadratic effect is detected for the factor ratio exp/factors (Figure 10).

Figure 10: Effects size plot for discrepancy response.

The impact of the design type choice on the discrepancy value can be visualized simply by displaying the discrepancy value measured by the ratio exp/factors for each number of factors scenarios analyzed in the DOE (Figure 14).

Modeling results of MaxPro response: Most of the variation of MaxPro response is explained by factors ratio exp/factors and the type of design. The dimensionality (number of factors) is also an important factor, but with a relatively lower importance than the two factors mentioned earlier (Figure 2). Moderate interaction effects are detected by the model and involve the three factors. A very important quadratic effect is detected for the factor ratio exp/factors (Figure 11).

Figure 11: Effects size plot for MaxPro response.

The impact of the design type choice on the MaxPro metric can be visualized simply by displaying the MaxPro value measured by the ratio exp/factors for each number of factors scenarios analyzed in the DOE (Figure 14).

Modeling results of design generation time: The three factors design type, number of factors, and ratio exp/factors have a similar and high importance on the design generation time response (Figure 2). Important interactions are detected between the three factors, showing how the difference in the points generation method type may impact the design generation time (Figure 12). The Latin hypercube design type seems particularly adapted to high dimensionality and high sample size (number of points per factor) scenarios, whereas uniform designs may be better suited for low dimensionality and low sample size scenarios.

Figure 12 : Effects size plot for time response.

The impact of the design type choice on the design generation time can be visualized simply by displaying the design generation time measured by the ratio exp/factors for each number of factors scenarios analyzed in the DOE (Figure 13).

Figure 13: Evolution of design generation time depending on factors design type, number of factors, and ratio exp/factors.

Key insights

Design-specific strengths become more pronounced with larger sample sizes. Interestingly, the models allow for an area where discrepancy can be minimized and MaxPro can be maximized to be identified, depending on the dimensionality of the design space. An optimal ratio of number of experiments per factor seems to be between 60 and 80 to improve both discrepancy and MaxPro metrics (Figure 14).

In the case of MaxPro modelization, a change can be noted regarding the performances of Latin hypercube and fast flexible filling designs regarding the dimensionality (number of factors): when the optimal ratio of number of experiments per factor is around 70, both of these designs have MaxPro values increasing with the number of factors.

When the ratio of experiments per factor is above the optimal value, then the two designs have MaxPro values slightly decreasing with the number of factors (Figure 14).

Figure 14: Evolution of discrepancy and MaxPro values depending on design factors.

Uniform designs become computationally prohibitive for high dimensionality and/or large sample sizes. Performance versus speed trade-offs favor different designs per scenario. To summarize the different strengths and weaknesses of each design, a table with results is provided (Table 1).

Practical selection guidelines/Decision framework

Step 1: Assess your experimental context

Dimensionality of factor space (number and type of factors)
Expected response characteristics (level of noise)
Sample size constraints
Available computational resources

Step 2: Apply selection rules

High dimensionality: Avoid uniform designs
Mixed factor types and/or presence of constraints: Use fast flexible filling designs
Noisy response: Favor sphere packing designs or first use a screening/model-based DOE
Low sample size: Create Uniform or Latin hypercube designs
Need fast design generation time: Generate Latin hypercube or fast flexible filling designs

Summary: Space filling designs comparison

Table 1: Summary of results by space filling design type.

Note that one treatment of the full factorial design (uniform design with nine factors and 100 runs per factors) couldn’t be done due to high computation time (Figure 15):

Capture d'écran 2025-08-18 093134.png
Figure 15: JMP message displayed for the combination uniform design, comprised of nine factors and 100 runs per factor.

This design computation was aborted before completion for this configuration. Even if this experiment was missing in the full factorial design, the analysis could be conducted without any problems.

Limitations and future directions

Study limitations

Computational constraints limited highest dimensional tests.
Focus on continuous factors without constraints to enable testing of several design types.

Future research

Integration with active learning strategies (Bayesian optimization)
Performance comparison with different machine learning models
Robustness/sensitivity to noise in the response

Conclusions and recommendations

Our comprehensive evaluation provides clear evidence that space filling design selection should be driven by specific experimental characteristics rather than default choices.

Key takeaways

No universal winner; context determines optimal choice.
Dimensionality is crucial.
Latin hypercube design is the best design for projection properties (MaxPro criterion).
Response type matters; smooth versus complex surfaces need different approaches.
Sample size interactions; relative performance changes with available runs.

Final recommendation: Use our decision framework as a starting point but always validate design choice with pilot studies when possible.

This completes our three-part series on space filling designs of experiments:

Introduction to common types of space filling designs
Comparative framework and evaluation metrics
Results, analysis, and design selection guidelines (this post)

Thank you for following this series! I hope these insights help you make more informed decisions about space filling designs in your own work.

Ressources and further reading

Configuration used for the study:
- Processor: 11th Gen Intel(R) Core(TM) i9-11900H @ 2.50GHz (2.50 GHz)
- Installed RAM: 64.0 GB (63.7 GB usable)
- System Type: 64-bit operating system, x64 processor

JMP scripts for design generation and evaluation
Extended results tables and visualizations
References :

“Design of Experiments and machine learning for product innovation: A systematic literature review”, by Rosa Arboretti, Riccardo Ceccato, Luca Pegoraro, Luigi Salmaso, https://doi.org/10.1002/qre.3025
“Design choice and machine learning model performances”, by Rosa Arboretti, Riccardo Ceccato, Luca Pegoraro, Luigi Salmaso, https://doi.org/10.1002/qre.3123
“Space-filling designs for computer experiments: a review”, by Joseph VR, Qual Eng. 2016; 28: 28-35. https://doi.org/10.1080/08982112.2015.1100447
Statistics Knowledge Portal: Regression Model Assumptions | Introduction to Statistics | JMP

DOE Club: https://community.jmp.com/t5/Design-of-Experiments-Club/gh-p/doe-grouphub
Community forum for questions and discussions