Solved: Re: Help for analyzing data DOE Response Surface

Report Inappropriate Content · Oct 4, 2024 11:11 AM

I hope you're doing well.

This is my data from my study. It's on the DOE response surface.

How do you analyze and interpret the results?

What indicators should I take into account?

How to interpret and validate them, and choose the best possible formulations?

I need help

statman · Oct 19, 2024 7:23 AM

I looked at the data, but to really analyze the data, I would need to sit with the SME and discuss the findings. I added a couple of analysis scripts to your data table (attached here). It appears that all of your responses varied enough practically except for R8. The results are mixed from the model is insignificant to there are some responses that are interesting. There also appear to be some unusual data points (e.g., Treatment --0 for R2, treatment 000 for R7, R9 and R10). You might also want to look at residuals.

Your answer to Q3 is interesting and dare I say, misguided. The purpose of developing a model for prediction is to make sure the conditions of the future are represented in the experiment. If those conditions you held constant change in the future, none of your models may be useful. Holding factors constant during the experiment does the exact opposite of "ensuring the reliability of your results".

The exact standardization of experimental conditions, which is often thoughtlessly advocated as a panacea, always carries with it the real disadvantage that a highly standardized experiment supplies direct information only in respect to the narrow range of conditions achieved by the standardization. Standardization, therefore, weakens rather than strengthens our ground for inferring a like result, when, as is invariably the case in practice, these conditions are somewhat varied.

R. A. Fisher (1935), Design of Experiments (p.99-100)

“Unfortunately, future experiments (future trials, tomorrow’s production) will be affected by environmental conditions (temperature, materials, people) different from those that affect this experiment…It is only by knowledge of the subject matter, possibly aided by further experiments to cover a wider range of conditions, that one may decide, with a risk of being wrong, whether the environmental conditions of the future will be near enough the same as those of today to permit use of results in hand.”

Dr. Deming

How was the "pre-test" conducted?

You must be very careful in assessing significance. The estimate for the MSE is based on replicates at the center point (000) treatment (and some higher order terms left out of the model). it appears the center point is unusual for some responses. If this treatment was unusual, all bets are off. Typically when doing optimization type designs, I wouldn't be as concerned about statistical significance, but I do not have enough insight to your iterations.

"All models are wrong, some are useful" G.E.P. Box

View solution in original post

statman · Oct 4, 2024 12:39 PM

Here are my initial thoughts, realize you have provided limited context for performing analysis, so all I can do is ask questions:

1. What are the response variables? (Are they R1-R13?). How much of a change in those values is of practical importance? If you can find a factor that would change those values, what is the smallest increment of change you would be interested in?

2. Should the response variables correlate? Positively or negatively? Why or why not?

3. Under what "conditions" was the experiment run? How likely are those conditions to be constant in the future? (By conditions I mean inference space. How were the variables not manipulated in your experiment handled during the experiment?)

4. How were your levels selected?

5. What previous experiments or sampling plans have been run prior to this experiment? (Typically, response surface designs are optimization designs. This means, as examples, you already have a good approximation of measurement error, the effect of noise, and the stability of the system)

6. Why a Box-Behnken design? These designs preclude the the vertices of the design space. Are you concerned with the extremes of the design space? Why?

7. What are the scientific or engineering mechanisms you are trying to understand?

8. Have you predicted the results á priori? How do the results look compared to what you expect or what you predicted?

Are there any unusual data points?

"All models are wrong, some are useful" G.E.P. Box

Cedrick1 · Oct 18, 2024 02:11 PM

Thank you for getting back to me. Here is my reply.
Q1)
A change of 5 in R1 is of considerable consequence and should be kept to a minimum in the vicinity of 20.
A change of 5 in R2 is significant and should be minimised to a value close to 30.
A change of 0.02 in R3 is significant and should be minimised to a value close to 0.15.
A change of 0.5 in R4 is significant and should be close to 2. Similarly, a change of 1E-10 of R5 is significant and must be maximised close to 5 × 10-10.
A change of 5 in R6 is significant and must be maximised to a value close to 70.
A change of one unit in R7 is significant and should be minimised.
A change of 3 in R8 is significant and should be maximised to a value of approximately 10.
A change of 2 in R9 is significant and should be minimised.
A change of 5 in R10 is significant and should be maximised near 60.
A change of 5 in R11 is significant and should be minimised near 10.
A change of 1 in R12 is significant and should be maximised near 5.
A change of 5 in R13 is significant and should be maximised near 50.

Q2)
There is only a positive correlation between R3 and R5.

Q3)
It is crucial to note that the experiment was conducted under strictly controlled conditions, including a constant temperature, humidity, and lighting environment. These conditions were maintained with the utmost precision to ensure the reliability of the results.

Q4)
The selection of these levels could have been informed by an analysis of existing literature and a preliminary assessment. The experiment involved the testing of three levels of each factor, designated as low, medium, and high. These levels were selected on the basis of their practical and scientific relevance, as well as their suitability for the purposes of this study.

Q5)
A pilot study (pre-test) was conducted to identify the most significant factors.

Q6)
The presence of extreme levels of factors can result in product degradation and elevated costs. Therefore, utilising a Box-Behnken design can assist in maintaining a safer and more practical operational environment.

Q7)
The objective is to comprehend the manner in which fluctuations in variables impact responses, and to construct a model that elucidates the interrelationship between variables and responses.

Q8)

The anticipated outcomes are as follows:
It can be reasonably deduced that X1 has a significant effect on responses R2, R5, R12 and R13, which can be described as a positive correlation.
Similarly, it can be reasonably deduced that X2 has a significant effect on responses R2, R5, R12 and R13, which can be described as a positive correlation.
I don't know if there are unusual data points.

statman · Oct 19, 2024 7:23 AM

I looked at the data, but to really analyze the data, I would need to sit with the SME and discuss the findings. I added a couple of analysis scripts to your data table (attached here). It appears that all of your responses varied enough practically except for R8. The results are mixed from the model is insignificant to there are some responses that are interesting. There also appear to be some unusual data points (e.g., Treatment --0 for R2, treatment 000 for R7, R9 and R10). You might also want to look at residuals.

Your answer to Q3 is interesting and dare I say, misguided. The purpose of developing a model for prediction is to make sure the conditions of the future are represented in the experiment. If those conditions you held constant change in the future, none of your models may be useful. Holding factors constant during the experiment does the exact opposite of "ensuring the reliability of your results".

The exact standardization of experimental conditions, which is often thoughtlessly advocated as a panacea, always carries with it the real disadvantage that a highly standardized experiment supplies direct information only in respect to the narrow range of conditions achieved by the standardization. Standardization, therefore, weakens rather than strengthens our ground for inferring a like result, when, as is invariably the case in practice, these conditions are somewhat varied.

R. A. Fisher (1935), Design of Experiments (p.99-100)

“Unfortunately, future experiments (future trials, tomorrow’s production) will be affected by environmental conditions (temperature, materials, people) different from those that affect this experiment…It is only by knowledge of the subject matter, possibly aided by further experiments to cover a wider range of conditions, that one may decide, with a risk of being wrong, whether the environmental conditions of the future will be near enough the same as those of today to permit use of results in hand.”

Dr. Deming

How was the "pre-test" conducted?

You must be very careful in assessing significance. The estimate for the MSE is based on replicates at the center point (000) treatment (and some higher order terms left out of the model). it appears the center point is unusual for some responses. If this treatment was unusual, all bets are off. Typically when doing optimization type designs, I wouldn't be as concerned about statistical significance, but I do not have enough insight to your iterations.

"All models are wrong, some are useful" G.E.P. Box

Cedrick1 · Oct 21, 2024 10:43 AM

Thank you very much for your answers, your contributions and your willingness to help me with this analysis.
I read them very carefully.
I would also like to ask you about the methodology or the steps to follow in order to analyse and interpret each response, the impact of the factors and the model as a whole with JMP.

Also, could you please clarify the following points for me:
1-"the model is insignificant"

2-"some answers are interesting"
3-"Presence of unusual data points"
4-"examine the residuals"

What does this involve? And how important is it?

5-"assessing significance"

What does this involve? How important is it? And what is its real implication for my experiment?

6-"Information about your iterations"

How can I get this information and share it with you so that you have more details?

7-The pre-test took place in the laboratory with several factors. The aim was to see the impact of these factors on just two responses, R1 and R12. There was no experimental design at this level. These were combinations that we chose at random. We then selected these three factors for this experimental design.

8-Should I repeat all the analyses or completely redesign the plan?

I must say that I am available to discuss the results.

Thank you in advance

With kind regards,

statman · Oct 21, 2024 11:36 AM

Cedrick, I don't have any idea how familiar you are with experimental design. It is not a casual discussion. There are many elements to designing and analyzing experiments, but I would say, if you have designed the experiment well, analysis is quite straight forward. By the way, in sone instances directed sampling is more efficient (e.g., components of variation studies) and should be done prior to experimentation.

"If your result needs a statistician, then you should design a better experiment."

Baron E. Rutherford

I will briefly address your questions in general, but suggest you find an avenue to develop your understanding of the methodology (e.g., self-study, take some classes). I typically teach the methodology over 6 months of intensive training, but it takes year of practical application to fully unleash the power of experimentation.

First thing to remember is learning is iterative. This is true of your experimental plans as well. The first experiment is intended to design a better experiment.

All experiments should begin with design. Is this an investigation that is explanatory or are you developing a predictive model? What questions are you trying to answer? Where are you in the knowledge continuum? What knowledge are you trying to gain? What hypotheses do you want insight into? How will those hypotheses be represented by factors? What levels can/should the factors be set (e.g., if you are low in the knowledge continuum, be bold but reasonable)? How will noise be handled (i.e., how will you handle factors that you are not willing to control in the future? What are the appropriate response variables? Is the opportunity to learn about central tendency, variation or both (if variation, do you have a response variable in the form of variation)? Are the measurement systems adequate? etc.

I recommend designing multiple experiments. Compare and contrast them for potential knowledge gained (e.g., what effects can be estimated, what effects will be confounded, what effects are not in the study) to resources required. Predict all possible outcomes of each plan and weigh this against the resources. The pick one and run it. Be prepared to iterate.

Analysis with multiple responses may start with multivariate methods. This is two fold:

1. Assess correlation between the multiple responses (responses that correlate strongly will have similar models)

2. Look for multivariate outliers (e.g., Mahalanobis)

Looking for outliers in DOE data sets is paramount. This is because you have very small data sets and therefore singular data points can have an influential impact on the analysis.

To analyze each response, I always follow a simple sequence: Practical, graphical, quantitative. In that order.

First, did the response variable vary enough over the design space to support further analysis (i.e., is there a practically significant change in the response over the design space)? How did the responses compare to your predicted values (this assumes you predicted the results á priori)? Does it make sense? Are there obvious patterns (I use ANOG) or unusual data points? How does the data relate to your hypotheses?

Then use graphical analysis. Plot the data (for each response). Plot in ANOG order, plot in run order. Normal plots, Pareto plots, etc.

"Results of a well planned experiment are often evident using simple graphical analysis. However the world’s best statistical analysis cannot rescue a poorly planned experimental program."

Gerry Hahn

For quantitative analysis, I suggest using a subtractive approach to model building. That is, I recommend starting with a saturated model and remove insignificant terms from the model. When designing experiments, you should recognize your design is a function of the model you are hypothesizing. As you simplify/reduce the model, use statistics to help. R-square-R-square adjusted delta, RMSE, p-values, CV, etc. and residuals analysis.

"A good model is an approximation, preferably easy to use, that captures the essential features of the studied phenomenon and produces procedures that are robust to likely deviations from assumptions."

G.E.P. Box

Some important points to keep in mind:

Statistical significance is a conditional statement! For experimental design, you are comparing the variation due to factors being manipulated to the noise that changes during the experiment under the inference of the conditions that are not changing. If any of these changes, so may statistical significance.

Extrapolation of experimental results is managerial or engineering decision, not a statistical one.

"All models are wrong, some are useful" G.E.P. Box

Help for analyzing data DOE Response Surface

Re: Help for analyzing data DOE Response Surface

Re: Help for analyzing data DOE Response Surface

Re: Help for analyzing data DOE Response Surface

Re: Help for analyzing data DOE Response Surface

Re: Help for analyzing data DOE Response Surface

Re: Help for analyzing data DOE Response Surface

Baron E. Rutherford

Recommended Articles

Get Going with JMP: Essentials for Using JMP