Hi @Victor_G I really appreciate you taking time to write me a thorough response!
I attached an anonymized dataset.
I ran the various models you mentioned (attached), except "Two-Stage Forward Selection" was not an option (it was grayed out). The stepwise approach added one variable at 0.12, which looked like it may have an influence on the outcome per the prediction profiler but biologically, in this specific experiment, I wouldn't expect it to be significant (I will be repeating this experiment with different bacteria and I expect it to be significant in those tests). The "Pruned Forward Selection" with AICc and Best Subset AICc bumped one variable to p=0.0564 but it would make biological sense for it to have a significant effect on the system. "Pruned Forward Selection" with BIC and Best Subset with BIC were similar to stepwise with the addition of the one at 0.12. I also deleted the variables <0.05 but they did not change the significance of the other factors.
Overall, I'm not surprised by the variables the models selected, except X_7, I thought for sure that one would have a significant effect.
Do you have a good resource that explains these models at the beginner level? I found some resources on JMP but I only learned the Stepwise approach in college. I don't actually know how to interpret the other models you mentioned, besides the general understanding of a p-value.
1. I created an augmented design. I eliminated 2 variables (X_4 and X_5) because biologically they should not be significant in this system (they may be in a future test though). I left all other factors the same except the 2 factors that showed significance that appear to benefit from testing a higher range (X_6 and X_8). I did not change the default number of runs but did block them. Should I change the number of runs? The video you sent (Using Definitive Screening Designs to Get More Information from Fewer Trials - JMP User Community) at ~35:56 it states that a weakness of DSD is that "Factor range for screening may not include optimum so, follow on design will be over different ranges - really can't augment." But, like you suggested, I can augment it with different values... is this going to be an issue? Do I need to do anything differently when I go to analyze the data?
2. I can rerun those 2 points. When I enter the new results should I just replace those points or somehow create a new block to include the new values? If I need to create a new block, how do I do it? I can't explain why they appear to be erroneous; the one point to the left could have variability due to the low concentration of bacteria (Poisson distribution issue). The one to the right is a little more perplexing to me but I am working with bacteria so an outlier every now and then isn't abnormal.
For RMSE - I'm a bit confused about it. I'm tracking a low RMSE should be good and would reflect the model is good at predicting outcomes. I can't find any literature on anyone doing something similar to what I'm doing to know what to compare it to. Based on what I know about the system, I would not expect a lot of "noise" for this particular experiment. This particular bacteria and system give me pretty consistent results (I have more variability with a different bacteria but that is a problem for next week).
3. I plan to rerun the two "strange" points. I added the regression assumptions to the attachment "Fit Least Squares." The residual by predicted plot may have some clustering?
I agree, the blocking should be a random effect. I found on another discussion post you can do it when you initially do the design but I can't find the "box" to check (I assumed random was the default but I guess not). I made it a random effect, it did affect the even order effects. The "Lack of Fit" box is grayed out though... I'm not sure why or what that means. In simple terms, when there is an "even order effect," what does it mean when it is X_2*X_2. I understand when it is something like X_2*X_3 and that there is interaction between the two variables but I don't understand how a variable interacts with itself.
Thank you for the resources, they were extremely helpful, especially the "help pages." I'm still overwhelmed by the level of statistics but I appreciate your patience in teaching me and helping me be a better scientist. This will be my first of many DSDs, I'm glad I started with the "simplest" one!