cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
  • Register to attend Discovery Summit 2025 Online: Early Users Edition, Sept. 24-25.
  • New JMP features coming to desktops everywhere this September. Sign up to learn more at jmp.com/launch.
Choose Language Hide Translation Bar
Xena
Level II

How to design experiments with well-distributed continuous inputs ?

Hello everyone! I have a question regarding facile experimental designs. I am working on experimental designs with 4 inputs and 4 outputs. All the inputs are continuous values: the first two range from 0 to 5, and the other two from 0 to 10.

 

When I create experimental designs, I notice that the only values suggested by the design are 0, half of the maximum, and the maximum. However, the influence of these inputs is not really linear, so my design is not very accurate due to the lack of diversity in these input values.

 

Do you know how to create a design with better-distributed points to improve prediction accuracy?

 

Thank you in advance !

 

Xena

1 ACCEPTED SOLUTION

Accepted Solutions
Victor_G
Super User

Re: How to design experiments with well-distributed continuous inputs ?

Hi @Xena,

Ok, if you're dealing with computer experiments, Space-Filling designs are indeed a good way to distribute points homogeneously in your experimental space, and approximate your simulation responses with a useful model, without any assumptions on the equation/model form.

 

By default, if you launch the Space-Filling design platform (in menu DoE, Special Purpose, Space-Filling design) with your 4 continuous factors, the recommended number of runs by JMP will be 40 (close to your acceptable number of runs), and you have access to all design types. Since the number of factors and runs is quite low, and that you may be interested to have predictive performances, a Uniform design (or Latin Hypercube) may be appropriate choices :

Victor_G_0-1756133182089.png

 

Concerning the model to fit the responses, by default when you make the data table, there is a script to launch a Gaussian Proces model on your response :

Victor_G_1-1756133266543.png

This is a good default benchmark model, but you could also try other Machine Learning models using the Model Screening platform with an appropriate validation strategy (to avoid overfitting). Random or stratified k-folds crossvalidation are often chosen in this type of situation, since you don't want to exclude some points from the training.

Once a model with good performances is found, you can use the models trained on each folds and average their predictions, like mentioned here : cross validation using k-fold fit quality 

If you can afford 50 runs in total, you could launch a DoE with 40 runs and train/validate the best model, and then use the platform Augment Designs (with a Space-Filling strategy) to add 10 new runs in your dataset, these runs being used as test set (final validation) for your predictive model.

 

Hope the suggestions make sense,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

View solution in original post

8 REPLIES 8
statman
Super User

Re: How to design experiments with well-distributed continuous inputs ?

I am unfamiliar with the term facile as it relates to experimental design.  A search indicates it is 

simple, straightforward designs that are easy to implement with minimal resources
. Such designs are useful for exploratory research, pilot studies, or when true randomization is not feasible. However, this simplicity often comes at the cost of internal validity, making it harder to establish a true cause-and-effect relationship. 

Your query has me guessing at the objective of your experiment.  Are you interested in understanding causal structure or "picking a winner"?  Is this the first experiment you are running in this investigation or have you already done screening designs?  What is the model you want to investigate?  If, for example, you are suspicious of a non-linear polynomial, but you are still trying to determine the appropriate design space, you might run center points.

It seems you are interested in testing the factors at 3 levels due to your suspicions about non-linear relationships. Typically to estimate quadratic effects, you only need 3 levels and those will be evenly distributed through the factor space.  This simplifies the analysis. This would be a reasonable place for your next iteration.  Then you can augment the space with future experiments.  The "accuracy" of your model (perhaps precision is a better word) is better understood by analysis of residuals (and learning about the effects of noise).

"All models are wrong, some are useful" G.E.P. Box
Xena
Level II

Re: How to design experiments with well-distributed continuous inputs ?

Hello Statman, 

 

Also thank you for your quick reply !

 

Actually, in JMP there is a category called “Easy DOE,” where JMP uses natural language to ask what you have and what you are trying to do. Indeed, my goal is to identify a combination that could meet my needs (with some outputs to minimize, others to maximize, or to target a specific objective).

 

I did not fully understand your point regarding factors with levels, because in my case the variables are continuous, and I would like to explore their entire range.

statman
Super User

Re: How to design experiments with well-distributed continuous inputs ?

I do not understand your situation well enough to provide specific advice.  There is indeed a platform called "Easy DOE". IMHO, that platform is intended for those who have little to no experience experimenting.  It is meant to encourage the use of the methodology (and perhaps market the software). However, there are many nuances the platform does not cover.

 

In any case, diagnosing the situation is important to selecting the appropriate design.  Are you doing explanatory or discovery work?  Have you developed hypotheses (e.g., why would factor have or not have an effect on the responses?)?  How did you end up with the 4 factors?  What are your strategies to handle all of the other factors?  Have the measurement systems been studied?...

 

There is no one "right way" to design experiments, but typically you start your experimentation by screening the many possible factors to develop an appropriate design space.  This is most efficiently done by running some fraction of n-dimensional space.  Since the studies are short-term in nature (i.e., you don't have time series to expose inherent variation of factor effects), you must exaggerate effects.  This is accomplished by setting factor levels bold (extremes of reasonableness) and experimenting over a large number of factors.  At the same time, you are trying to create as wide of an inference space as possible, so you similarly exaggerate the effect of noise (factors you are not willing to manage in the future) often with complete or incomplete blocks.  Once you have a reasonable and justifiable design space, then you augment that space to estimate a useful model.  

We typically build models following Taylor series.  That is, we start with first order (accomplished with 2-level factor settings) and augment with higher order (both factorially (interactions) and polynomially (non-linear). The objective isn't to find the most complex model that describes everything, but to find a useful model to assists in predicting future performance. Models with 3rd order interactions or cubic+ non-linear terms are extremely difficult to manage and are often not useful.

Here are some suggested references for introductory experimental design:

Box, George E. P., and Bisgaard, S. (1987) “The Scientific Context of Quality Improvement.”  Quality Progress June

Czitrom, Veronica, (1999) “One-Factor-at-a-Time Versus Designed Experiments”, The American Statistician, May, Vol. 53, No. 2

Box, G.E.P., Patrick Liu (1999) “Statistics as a Catalyst to Learning by Scientific Method Part I – An Example”, Journal of Quality Technology, Vol. 31, No. 1, January

Hahn, Gerald (1977) “Some Things Engineers Should Know About Experimental Design”, Journal of Quality Technology, January, Vol. 9, No. 1

Cochran, William “The Philosophy Underlying the Design of Experiments”, John Hopkins University

Montgomery, Douglas, Coleman, D., (1993). “A Systematic Approach to Planning for a Designed Industrial Experiment”, Technometrics, February 1993, Vol. 35, No. 1

"All models are wrong, some are useful" G.E.P. Box
Victor_G
Super User

Re: How to design experiments with well-distributed continuous inputs ?

Hi @Xena,

 

Welcome in the Community !

 

The situation you describe seems to indicate that you might be interested by Space-Filling designs.

Space-Filling designs are model-agnostic design of experiments with an emphasis on predictive performances, specifically designed for continuous factors (mostly), for low-noise responses (typically computer experiments and simulations), where highly non-linear and complex response surface are expected.

 

If you can provide more info about the objective (predictive modeling, metamodeling, ... ?), any factors constraints, types of responses (measurement ? simulations ? variability/reproducibility ? ...), maximum number of runs acceptable, we might help you to choose an appropriate Space-Filling design type (if this option seems valid for your topic).
My first guess based on your info and objective is that you don't have any factor constraints and using only 4 continuous factors for predictive purposes (with a possible low/medium sample size less than 100 experiments), so a Uniform or Latin Hypercube design may be appropriate choices.

 

Hope this answer will help you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)
Xena
Level II

Re: How to design experiments with well-distributed continuous inputs ?

Hello Victor, thank you for your quick reply!

 

I am not familiar with what space-filling designs are... but from what you mentioned, it does seem like they could be appropriate. I am indeed aiming to make predictions. My results come from computer simulations, which are therefore fully repeatable. What I am looking for is an output that allows me to analyze my data in a way that helps me identify the right combination, sometimes trying to maximize certain outputs while minimizing others.

 

An acceptable number of samples would be around 50 (or a little more if it worth it), since I do not have an API that would allow me to automatically retrieve the simulation data and feed it into JMP.

 

Thank's for your help

Victor_G
Super User

Re: How to design experiments with well-distributed continuous inputs ?

Hi @Xena,

Ok, if you're dealing with computer experiments, Space-Filling designs are indeed a good way to distribute points homogeneously in your experimental space, and approximate your simulation responses with a useful model, without any assumptions on the equation/model form.

 

By default, if you launch the Space-Filling design platform (in menu DoE, Special Purpose, Space-Filling design) with your 4 continuous factors, the recommended number of runs by JMP will be 40 (close to your acceptable number of runs), and you have access to all design types. Since the number of factors and runs is quite low, and that you may be interested to have predictive performances, a Uniform design (or Latin Hypercube) may be appropriate choices :

Victor_G_0-1756133182089.png

 

Concerning the model to fit the responses, by default when you make the data table, there is a script to launch a Gaussian Proces model on your response :

Victor_G_1-1756133266543.png

This is a good default benchmark model, but you could also try other Machine Learning models using the Model Screening platform with an appropriate validation strategy (to avoid overfitting). Random or stratified k-folds crossvalidation are often chosen in this type of situation, since you don't want to exclude some points from the training.

Once a model with good performances is found, you can use the models trained on each folds and average their predictions, like mentioned here : cross validation using k-fold fit quality 

If you can afford 50 runs in total, you could launch a DoE with 40 runs and train/validate the best model, and then use the platform Augment Designs (with a Space-Filling strategy) to add 10 new runs in your dataset, these runs being used as test set (final validation) for your predictive model.

 

Hope the suggestions make sense,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)
Xena
Level II

Re: How to design experiments with well-distributed continuous inputs ?

Hello Victor, 

 

I will try your solution. 

 

Thank you so much !

 

 

Craige_Hales
Super User

Re: How to design experiments with well-distributed continuous inputs ?

@statman @Victor_G  - I get a little more understanding each time, thanks!

Craige

Recommended Articles