Discussions

YanivD · Jun 8, 2023 2:04 PM

Hi

i have done DOE definitive screening design test in my research.

now ask for your assistance to understand how to analyze the data.

please advice where can i find short explanations for this and how to read the "actual by predicted plot"?

thanks

Mark_Bailey · Jun 4, 2021 7:44 AM

The actual by predicted plot is a scatter plot. The predicted response (Y-hat) is used for the abscissa. The observed response (Y) is used for the ordinate.

This example from Fit Least Squares is also a 'leverage plot.' It adds a horizontal blue line for the null hypothesis (that the response is independent of the factors) and a slanted red line for the alternative hypothesis (that the response depends on the factors). It is a 'whole model' test that compares a model with only and intercept for the null hypothesis to the full model with all the factors added to the intercept for the alternative hypothesis. A red 95% confidence region is plotted on top of the slanted red line. If the horizontal blue line is contained by the red region, then the whole model test is not significant at the alpha = 0.05 level. If the blue line is not contained within the red region, then the whole model test is significant at the same level. This visual evaluation is equivalent to applying a significance level of 0.05 to the F-test presented in the Analysis of Variance table.

The plot can also be used to visually evaluate the possibility of 'lack of fit.' An unbiased prediction should produce predicted values that agree with the observed values on average. The red line should go through the middle of the data points. If the model is biased, then the data points will deviate from the line. For example, if the response to changing the factors is non-linear but the model includes only terms for linear effects, then the model will be biased. This example would exhibit curvature in the actual by predicted plot. Your example exhibits such curvature, so your model is biased.

You observed the response many times at the low and at the high temperatures. How many observations do you have at the center temperature?

You might try to add a Temperature squared term to the model. JMP represents powers by multiplication. So temperature squared is added as temperature*temperature.

How many factors in total were included in the definitive screening design that you used?

View solution in original post

Mark_Bailey · Jun 4, 2021 7:44 AM

The actual by predicted plot is a scatter plot. The predicted response (Y-hat) is used for the abscissa. The observed response (Y) is used for the ordinate.

This example from Fit Least Squares is also a 'leverage plot.' It adds a horizontal blue line for the null hypothesis (that the response is independent of the factors) and a slanted red line for the alternative hypothesis (that the response depends on the factors). It is a 'whole model' test that compares a model with only and intercept for the null hypothesis to the full model with all the factors added to the intercept for the alternative hypothesis. A red 95% confidence region is plotted on top of the slanted red line. If the horizontal blue line is contained by the red region, then the whole model test is not significant at the alpha = 0.05 level. If the blue line is not contained within the red region, then the whole model test is significant at the same level. This visual evaluation is equivalent to applying a significance level of 0.05 to the F-test presented in the Analysis of Variance table.

The plot can also be used to visually evaluate the possibility of 'lack of fit.' An unbiased prediction should produce predicted values that agree with the observed values on average. The red line should go through the middle of the data points. If the model is biased, then the data points will deviate from the line. For example, if the response to changing the factors is non-linear but the model includes only terms for linear effects, then the model will be biased. This example would exhibit curvature in the actual by predicted plot. Your example exhibits such curvature, so your model is biased.

You observed the response many times at the low and at the high temperatures. How many observations do you have at the center temperature?

You might try to add a Temperature squared term to the model. JMP represents powers by multiplication. So temperature squared is added as temperature*temperature.

How many factors in total were included in the definitive screening design that you used?

YanivD · Jun 5, 2021 07:23 AM

@Mark_Bailey wrote:
The actual by predicted plot is a scatter plot. The predicted response (Y-hat) is used for the abscissa. The observed response (Y) is used for the ordinate.

This example from Fit Least Squares is also a 'leverage plot.' It adds a horizontal blue line for the null hypothesis (that the response is independent of the factors) and a slanted red line for the alternative hypothesis (that the response depends on the factors). It is a 'whole model' test that compares a model with only and intercept for the null hypothesis to the full model with all the factors added to the intercept for the alternative hypothesis. A red 95% confidence region is plotted on top of the slanted red line. If the horizontal blue line is contained by the red region, then the whole model test is not significant at the alpha = 0.05 level. If the blue line is not contained within the red region, then the whole model test is significant at the same level. This visual evaluation is equivalent to applying a significance level of 0.05 to the F-test presented in the Analysis of Variance table.

The plot can also be used to visually evaluate the possibility of 'lack of fit.' An unbiased prediction should produce predicted values that agree with the observed values on average. The red line should go through the middle of the data points. If the model is biased, then the data points will deviate from the line. For example, if the response to changing the factors is non-linear but the model includes only terms for linear effects, then the model will be biased. This example would exhibit curvature in the actual by predicted plot. Your example exhibits such curvature, so your model is biased.

You observed the response many times at the low and at the high temperatures. How many observations do you have at the center temperature?

You might try to add a Temperature squared term to the model. JMP represents powers by multiplication. So temperature squared is added as temperature*temperature.

How many factors in total were included in the definitive screening design that you used?

First of all would like to thank you for your reply and detailed explanations. I am adding the results fit DSD report - if its possible, could you please advice how to analysis the results? its my first time using DSD, I would appreciate your support and assistance with the analysis part.

thanks a lot

P_Bartell · Jun 4, 2021 12:06 PM

@YanivD : There are many different methods one can use to analyze data and results from designed experiments. I suggest you enroll in the SAS "Statistical Thinking for Industrial Problem Solving" course and focus your attention on the DOE and modeling oriented modules. The entry portal to the course can be found here:

Statistical Thinking for Industrial Problem Solving

statman · Jun 4, 2021 03:58 PM

First, welcome to the community. You did not provide the experiment (factors/levels) only that you ran a DSD. This is not enough information to assist you in your analysis. We need to know how the data was acquired and what models make sense and then what models you tried to analyze. As Mark indicates, the actual by predicted (and leverage plots) are good graphical analysis tools that help understand the model. These along with other residual plots help determine if the model assumptions are being met and more importantly will the model be useful. It does appear that the counterpoint of temp. is "outside the expected region" of a linear model. It also appears your variance is non-constant at different temperature levels. These are both indicators your model is not particularly "good".

As Pete suggests, they are many ways to analyze data sets from an experiment. I would first suggest looking at the data and determining if the data varies enough to be of practical value. Identify any abnormalities or unusual data points (there are multiple techniques to do this). Then look for patterns in the response variable. Pay particular attention to when patterns in the response variable match patterns in the factors or interactions. Steps should always be Practical>Graphical>Quantitative.

When trying to build models, there are also multiple strategies (e.g., RSquare-RSquare Adj delta, RMSE, p-values) For example subtractive model building starts with a saturated model and then removes terms that appear insignificant. As opposed to additive that starts with one factor and then you add terms to the model (e.g., stepwise regression). Realize that the model statistics are conditional. The statistics for any term in the model depends on the other terms that are in the model and under what conditions the experiment was run (inference space). Should any of those conditions change, so may the statistics.

"All models are wrong, some are useful" G.E.P. Box

YanivD · Jun 5, 2021 07:25 AM

Thanks a lot for welcoming and assistance, really appreciate

Discussions

how to read actual by predicted plot

Re: how to read actual by predicted plot

Re: how to read actual by predicted plot

Re: how to read actual by predicted plot

Re: how to read actual by predicted plot

Re: how to read actual by predicted plot

Re: how to read actual by predicted plot

Recommended Articles