I want to conduct a screening experiment to identify which factors affect my response (Y) the most. From past research, I am aware that these variables have interactions, so I need to have interaction terms in my model. The factors I identified as relevant are the following:
To be specific, X7 is a type of chemical: a fatty acid. Fatty acids can be saturated or monounsaturated or polyunsaturated, longer chain or shorter chain, etc, so I selected 10 of them that spanned this range of properties. Because X7 has more than 3 levels, I am forced to use the Custom Design in JMP, and accounting for interaction terms in my model, this results in about 42 experiments. I wanted to ask if there is a more efficient way to do this? Specifically, a more efficient way to setup a DOE to study variables like X7 (that are categorical).
Thinking out loud, at first I thought that converting X7 from discrete to continuous variable (by maybe focusing on an important feature of fatty acids) could be the helpful because then I could simply pick a "high" and a "low" setting. But chemicals are multidimensional i.e. they cannot be just boiled down to one feature/number, so I do not believe this is the way forward.
So is there a more efficient way to design a study where you have a categorical variable like X7?
Hi @AsymptoticRules,
Welcome in the Community !
Using chemical structures as categorical factors come with several drawbacks :
As you mentioned being in a screening phase, it would be interesting to reduce the number of fatty acids candidates to screen, to reduce the number of experiments and interactions to screen and only keep the molecules with highest chemical variability, to detect significant effects and interactions, and from there augmenting the design to an optimization/predictive design in a second step, possibly with other molecule candidates.
In order to reduce the number of fatty acids candidates, I would try to analyze the chemical properties/molecular descriptors of the initial 10 fatty acids you plan to screen. Here is how I would do it :
You can then use these selected molecules as levels of your categorical factor in your design, or directly use the Principal components as continuous factors/covariates in the design.
On this topic, you might find this presentation interesting : https://community.jmp.com/t5/Discovery-Summit-Europe-2017/Increase-Efficiency-and-Model-Applicabilit...
This is only one possible option, I'm sure other members of this forum may have different experiences with molecules as factors. I personally always try to transform the categorical information in a continuous information whenever possible with this type of approach.
I hope this will help you,
Hi @AsymptoticRules,
Welcome in the Community !
Using chemical structures as categorical factors come with several drawbacks :
As you mentioned being in a screening phase, it would be interesting to reduce the number of fatty acids candidates to screen, to reduce the number of experiments and interactions to screen and only keep the molecules with highest chemical variability, to detect significant effects and interactions, and from there augmenting the design to an optimization/predictive design in a second step, possibly with other molecule candidates.
In order to reduce the number of fatty acids candidates, I would try to analyze the chemical properties/molecular descriptors of the initial 10 fatty acids you plan to screen. Here is how I would do it :
You can then use these selected molecules as levels of your categorical factor in your design, or directly use the Principal components as continuous factors/covariates in the design.
On this topic, you might find this presentation interesting : https://community.jmp.com/t5/Discovery-Summit-Europe-2017/Increase-Efficiency-and-Model-Applicabilit...
This is only one possible option, I'm sure other members of this forum may have different experiences with molecules as factors. I personally always try to transform the categorical information in a continuous information whenever possible with this type of approach.
I hope this will help you,
Thank you for the perspective, Victor! That does make sense to me! I will try to update this post when I give it a shot!
I agree wholeheartedly with @Victor_G 's thoughts. The only other thing I can think of to try and reduce the number of levels for X7 in the experiment is asking the question, 'Do you have historic observational data (maybe from production runs) where the X7 factor is varying across a wide range of values for the underlying chemical properties?" Then using variable identification modeling methods such as any of the tree based methods, or if you have JMP Pro, some of the Generalized Regression platforms that are adept at variable identification. One nice feature of many of these methods is they are somewhat robust to correlations amongst the factors compared to techniques like ordinary least squares regression or it's kissing cousin for variable identification, stepwise regression. You might be able to screen some of the X7 categorical levels.
你觉得这样可行吗,先预实验,在特定条件,单独筛选一下X7,然后再做第二轮。
Thinking out loud, at first I thought that converting X7 from discrete to continuous variable (by maybe focusing on an important feature of fatty acids) could be the helpful because then I could simply pick a "high" and a "low" setting. But chemicals are multidimensional i.e. they cannot be just boiled down to one feature/number, so I do not believe this is the way forward.