Central Composite Design Output Question

Feline614 · Sep 5, 2024 01:36 PM

Good afternoon,

I ran two different experiments using a CCD and received confusing results for both. Each experiment represents 2 different materials, they are not replicates of each other.

Each experiment was a face-centered CCD with 2 blocks, each block being a replicate. Each block does have variation within the samples. I assume the block is not significant in the model because there’s a good amount of within variance that makes the between variance negligible (I’m working with bacteria and the inoculum varies no matter what I do – it follows the poisson distribution). The outcome is presence (1)/ absence (0). The goal is 100% presence.

When I run fit nominal logistic and keep all variables in the model, the p-value for Whole Model Test for both experiments is <0.05 and the Lack of Fit is >0.05. However, when I start removing insignificant effects the lack of fit becomes <0.05.

I can’t figure out the best way to move forward. Should I do a third replicate for each experiment? Augment it with axial points (make it rotatable)? Leave in all variables? Use the profiler “as is” after removing variables (with lack of fit <0.05) and see if it works?

Thank you in advance for the help!

wzm · Sep 6, 2024 02:21 AM

这样分析是可以的吗？

Victor_G · Sep 6, 2024 01:45 PM

Hi @Feline614,

Regarding the limited information about the context and your objectives, I have a non exhaustive list of (many) questions that hopefully will also help your reflexions and guide you :

Objective(s) : What is your objective(s) ?
- Understand key factors and effects behind the presence of bacteria ?
- Optimize conditions for the presence of bacteria ? Globally or for each material specifically ?
Response : Are you only limited to a binary (Yes/No or Absence/Presence) response measurement ? Are there any possible option to count bacteria population (manually or with the help of computer vision tools ... ?) ?
A continuous response would help analyzing the results from your factors, by assessing the variability/noise from the measurement system and the experimental preparation, and enabling a precise estimation of the impact of the factors.
Experimental setup and DoEs choice : Why did you choose to run two experiments separately ? Are the factors, ranges and units for X_1 and X_2 the same in the two experimental setup ?
If the factors and units are the same between the two designs, why not analyzing the whole dataset ? You could concatenate the two datasets, using a "Source column" in the concatenation to identify the source of the data and related materials used, and create a similar model by adding in the terms list the material main effect and 2-factors interactions between material factor and X1 / X2. It could help generalize your finding, increase your inference space and understanding of the relationships between your factors and response.
Blocking factor : What is your blocking factor ? Do you suspect the blocking factor could cause a change on the mean response (fixed effect, design role "Blocking), or a change on the response variance (random effect in the model creation, using Generalized Linear Mixed Model) ?
If your blocking factor is linked to your experimental capacity (number of experiments/dayfor example), it may probably be considered as a random effect.
Model selection criteria : What is/are your criterion/criteria for model selection ? It seems you based your analysis on statistical criterion of effects and model p-values, but would like to refine your model even more ? Why do you want to remove some effects (statistical significance but no practical significance/importance ? Not in adequation with domain expertise ? Other ...) ?
Some comments : The Lack-Of-Fit test helps you assess whether the model fits the data well. a small p-value (<0.05) indicates a significant lack of fit (so in your case, "removing insignificant effects" seems to cause the model to underfit, being too simple to fit the data well).
P-values are not the only criteria to be used for model evaluation and selection, and the metrics should be chosen in accordance with your objective(s). For example, Information criterion (AICc, BIC) may also help assess the complexity/accuracy balance and adequation of your model.
Model choice : You have used a nominal logistic model. Depending if you have JMP/JMP Pro, other estimation methods may be used to try different models through Generalized Regression personality. It may also be possible to use simple Machine Learning models like tree-based models (Decision Tree / Random Forest) or Support Vector Machines. In order to avoid overfitting, you may have to limit the tree depth (only 2/3 splits) for tree-based methods, or the number of support vectors for SVM, but the use of these other modeling options can offer different insights.
Use of the model : What do you want to do with the learnings from your model ? How/why are the results "confusing" ? Did you learn something already with the analysis you have done ? Are you able to put in practice these learnings, by testing the conclusions from your model(s) to validate them ? Are the findings/conclusions in accordance with Domain Expertise ?
What is the motivation behind the replicates ? Augmentation with axial points ? How/why would you like to continue about your topic ?

I hope these questions may help you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

Central Composite Design Output Question

Re: Central Composite Design Output Question

Re: Central Composite Design Output Question