Re: Model Selection-stepwise

_Luci_ · Jun 8, 2023 5:41 PM

Hello,

I have screened a series of biological parameters on a biological response with a DSD. I am doing a forward stepwise regression AICc, and at p value below 0.05, I see only one or maximum two of my parameters become significant, but beyond that (like at around 0.1 or 0.2), several others pop out as significant, and if I remove them, the R square value decreases. In some models I do not see either Cp or R square values, which are mainly the last models the software shows after fitting. So my questions are: 1- Is it correct to exclude those models without Cp and R squared values? 2-The literature lacks enough information about the relationship between parameters for my response, but there might be some that I would like to understand with model. I have checked for multicollinearity, and the VIF value is around 1 to 2 (not high) but I am not clear how I can interpret my results, and whether I should confine my model to p value below 0.05 or above that?

As a beginner, I'd appreciate any help!

Mark_Bailey · Oct 28, 2021 09:27 AM

How many factors do you have?

Did you try to select a model selecting DOE > Definitive Screening > Fit Definitive Screening, or running the Fit Definitive Screening table script?

What is the estimated RMSE for your response with the best model so far? How does that value compare to the expected change in the response from a real effect?

_Luci_ · Oct 29, 2021 05:40 AM

Hello @Mark_Bailey . Thanks for replying. I have 5 factors and I used 'fit definitive screening' and I saw the profiler and residual plots. But, regardless of these results, again I did this stepwise modelling for a full factorial as I thought I'd get better understanding. The RMSEs vary between 4 to 6. My maximum desired value was initially set at 40.

Sorry I did not understand the question"How does that value compare to the expected change in the response from a real effect?". But in general, comparing all my 17 runs to a control run that did not have any of these factors in, I obtained lower values. As all materials were expensive, I could not run any scoping design beforehand unfortunately.

Mark_Bailey · Oct 29, 2021 10:23 AM

The Fit Definitive Screening method is based on the unique fold-over structure of the treatments. I also relies on the key principles of screening to work well. It is possible to test for all first-order, second-order, and two-factor interaction effects, but not at the same time (key principle: sparsity of effects). The DSD is too small to fit a model with terms for all these effects at once. It uses an ingenious approach to tease out the active effects before exhausting the data (degrees of freedom).

The full factorial model does not include second-order terms for non-linear effects. Could non-linear effects appear in your response? Did you get a lack of fit test to check for bias (missing effects)?

My question that confused you is about the relative size of the real effects you hope to find versus the random effects on the response. Your estimated RMSE is 4-6, so you are likely to find real effects that are at least 2-3 times as large. Did you include extra runs? The recommended number of extra runs dramatically improves the statistical power of the design. Is a real change of the response equal to 12-18 possible? Is it meaningful?

P_Bartell · Oct 28, 2021 09:56 AM

To add a bit to @Mark_Bailey 's input, remember you ran a screening design. These types of designs are usually part of a larger strategic pathway to learning or making a decision. One of the primary goals of a screening design is to estimate active or interesting effects that are worthy of further study. Emphasis here on 'further study'. I try to not get too hung up on things like critical p values, or modeling statistical diagnostics to make decisions beyond "Should I include these factors in further study?" So at this stage of your larger problem solving effort, you may want to continue to include factors or effects that are near the critical p values or regression diagnostics you are using. Keep these factors in the study...continue to study via DOE methods. Let the additional elements and analysis guide you in the overall investigation.

statman · Oct 28, 2021 01:46 PM

Both Mark and Pete have great questions and great advice. I will only add some practical notes to your investigation. Does the data make sense? Did the response variable change by any practical amount. Have you graphed the data?

"All models are wrong, some are useful" G.E.P. Box

_Luci_ · Oct 29, 2021 05:42 AM

Hello @statman. Thanks, as I've mentioned in my reply to Mark, I obtained lower values in all my 17 runs compared to my control.

Model Selection-stepwise