cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Check out the JMP® Marketplace featured Capability Explorer add-in
Choose Language Hide Translation Bar
ZhouKexin
Level I

Why is validation not included in DOE?

Hi everyone, I'm learning the online course now. In the predictive modeling part, validation is used to evaluate the performance for the model on new data, and prevent choosing the overfitting model.

 

Validation requires extra data. However, in the DOE part, this extra data is not included. All the data collected are used to train the model. Even the center points are used evaluate quadratic effects in the model. 

 

Take response surface DOE as an example. The target for this DOE is to find the optimal point, which are usually "new data" (comparing to the points in DOE). Why in this case a validation set is not included? Is it not important?

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Why is validation not included in DOE?

Validating the model is an important step. Use the selected model to find factor levels of interest (e.g., they are predicted to give a desired response and an undesirable response), then set up a few runs (5-6) and compare the average response of the new runs to the predicted mean response. I recommend using a test of equivalence, not one of difference.

 

This data, validated on its own, may be combined with the original experiment, if you want to improve the estimates of the model terms.

View solution in original post

5 REPLIES 5

Re: Why is validation not included in DOE?

Validating the model is an important step. Use the selected model to find factor levels of interest (e.g., they are predicted to give a desired response and an undesirable response), then set up a few runs (5-6) and compare the average response of the new runs to the predicted mean response. I recommend using a test of equivalence, not one of difference.

 

This data, validated on its own, may be combined with the original experiment, if you want to improve the estimates of the model terms.

P_Bartell
Level VIII

Re: Why is validation not included in DOE?

To add a bit to @Mark_Bailey 's advice which I agree with...philosophically DOE as a problem solving method has efficiency of knowledge gathering as a fundamental principle. One of my DOE instructors from the 1980's defined 'efficiency in DOE' as the desire to obtain the required information for the least expenditure of resources. Nowadays we can say, with model driven optimal DOE tactics, the model is our 'required information'. As such, optimal DOE methods deliver that efficiency making every single treatment combination necessary. No additional runs required.

 

But as Mark suggests, and all sound DOE practitioners usually follow, is once we have the model, we conduct additional runs to verify the model is in fact useful for the practical problem at hand. So the DOE process does include validation (some call it confirmation or verification) as a required step...it just happens at a different time, following initial experimentation.

statman
Super User

Re: Why is validation not included in DOE?

Agreeing with both Mark and Pete's comments...the results of the experiment and the model created as a result of the analysis (what ever statistics or practical judgements you use to do this) are a function of the inference space of the experiment.  If the experiment does not represent future conditions, then the model, know matter how "statistically significant" in analysis, will not produce useful repeatable results. 

 

 “Unfortunately, future experiments (future trials, tomorrow’s production) will be affected by environmental conditions (temperature, materials, people) different from those that affect this experiment…It is only by knowledge of the subject matter, possibly aided by further experiments  (italics added) to cover a wider range of conditions, that one may decide, with a risk of being wrong, whether the environmental conditions of the future will be near enough the same as those of today to permit use of results in hand.”

Dr. Deming

"All models are wrong, some are useful" G.E.P. Box
Byron_JMP
Staff

Re: Why is validation not included in DOE?

Maybe it's important to verify what everyone means by "validate". 

Generally, in modeling ad-hoc data, validating the model using a holdout data set is a really good idea. The fit statistics of the holdout data are used to assess the quality of the model from the training data. In modeling ad-hoc data, we are trying to build estimates about the underlying population, working with small data sets undermines the ability to train and evaluate the model. Maximizing the number of observations about the system we are modeling is important for better estimates. 

 

The data for a DOE is very carefully planned and collected before the analysis. This design supports the inclusion/use of model terms that are specified a priori in contrast to modeling happenstance data where we may not know which terms can be included in the model until they are evaluated.

 

In the case of a DOE none of the data is held out for model validation because all of the data is necessary for estimating the model coefficients by design. A key goal of the DOE is to generate the least amount of data necessary to model physical phenomena. The term "validation" in the case of a DOE implies that additional runs will be completed after the experiment is complete. The "validation" runs will be compared to the predicted values to assess the quality of the DOE model's predictions. Often the number of validation runs is small, maybe only 1-3. These runs provide evidence to support changes to a physical system that will then generate many runs/observations at the same input settings. The model will be used to tune the physical system/process in the future. 

 

DOE World uses similar methods as Modeling World but the approach is very different.

 

JMP Systems Engineer, Health and Life Sciences (Pharma)
statman
Super User

Re: Why is validation not included in DOE?

Well put Byron.  I neglected to differentiate the Validation as you did so well.

"All models are wrong, some are useful" G.E.P. Box