I'm currently struggling to combine the advantages of the DSD-platform with the Custom-Design platform and didn't find a somewhat workable solution.
The task is to investigate about 30 continuous variables and 8 categorial variables with as few runs (currently around 90) as possible AND to discard combinations which make no sens based on process knowledge. On top of that we are mainly interested in interactions as we already know that a lot main effects exist (on their own).
Any suggestions are welcome.
Given what you say, I would expect that a Custom Design would be the best route. But you also imply that you are not happy with the design you arrived at by this method, so perhaps there is more to say. Certainly if the 'process knowledge' that you mention can be expressed via a series of linear constraints, and if you have prior technological knowledge that some joint effects need to be estimated, and some just don't make sense, then a Custom Design is the right approach.
Manually eliminating runs in a DSD or adjusting levels in existing runs will destroy the fold-over structure upon which the unique analysis depends. At this point, the resulting design is no longer a DSD and no longer provides the benefits of a DSD. I agree with @ian_jmp that we should be able to find a custom design to work.
I do not like to suggest the following approach because I believe that the most realistic data comes from an experiment with all the relevant factors, but that approach is not always possible. Another way would be to divide the factors into two or more experiments. This approach runs the risk of separating factors that interact. You won't be able to model those interactions, and they might be a key characteristic of the process behavior. It requires enough process knowledge about interactions to make low risk decisions about which factors to separate.
The DSD is somewhat rigid and does not provide a solution for every case. That situation is when you should use custom design. The rest of my reply is based on custom design.
Have you screened the 38 factors already? Is the data available? Might it be possible to augment the existing data to fit the new model and test the interactions? Augmentation is a great way to learn more with new evidence while leveraging existing data for economy.
When you say, "make no sense," do you mean that these treatments are physically impossible or do you mean that they are likely to produce bad outcomes (undesirable responses)? A test is a kind of guessing game used to "pick the winner." An experiment is a designed plan to collect data to fit a model, regardless of the desirability of the responses. There is a big difference between these two approaches and the mindsets behind them. If your case is the former, then there are ways to eliminate nonsense runs. If your case is the latter, then I would not eliminate any runs based on prejudice.
I would add all two-factor interactions. Do you have process knowledge about the interaction effects? If you know that an interaction is not possible, remove this term. If you know that an interaction is active, keep it (primary effect). For the rest of them, change their Estimability from Necessary to If Possible (potential effect). This change will produce a Bayesian optimal design. It resets the minimum number of runs, which is always equal to the number of parameters that are necessary to estimate. This change will help the economy of the design to meet your budget. We recommend using more than the minimum number of runs. A good approach is to consider the potential interactions and decide how many will actually be active. Of course, you do not know this number, but make an educated guess. Add 3-4 runs for every one of these effects. For example, if I have 30 potential interactions but I expect only 4 of them to be active, then I would add 12 to 16 runs to the minimum.
Is this study primarily to screen effects? You stated that you already know that the 30+8 factors are active, so you would not need to screen factors. If so, then leave the recommended criterion: D-optimal. If your primary goal is model selection followed by optimization, then change the criterion to I-optimal. If your goal is a combination of screening effects and optimization, then change the criterion to A-optimal.
You want to balance the need for economy with the need for good estimates of the model parameters or the response. The large number of terms in the model leads to a large number of runs but not necessarily a large number of error degrees of freedom that will be used to establish confidence intervals and hypothesis tests for your decisions. So you generally hope to have effects that are 3-4 times the size of the response standard deviation. Make sure that you widen the range of the continuous factors in order to produce the largest effects. Do not narrow the ranges around the level that you expect to be optimal. That is a testing approach. The wider range supports and experimenting approach.
Just adding to Mark's excellent post. I' must say I'm a bit confused. You already know there are 38 factors that have an effect? And now you want to investigate 2nd order terms? This is very unusual. It does not follow one of the principles of experimentation: Scarcity of Effects? Usually when there are that many factors to investigate I would suggest some sort of directed sampling (Component of Variation) study to reduce the number to a more manageable size for experimentation. I can't imagine have a prediction model with that many terms in it. It would be virtually impossible for that model to be applicable in the "real world". The other thought I have is that you don't discuss noise at all. From short-term elements of noise (e.g., measurement error) to long term elements (e.g., ambient conditions, lot-to-lot variation of input variables). I must say you've peaked my interest about what you are studying.
The order of the replies is not chronological. It depends on which Reply button you click. I hope that this reply is at the end!
A colleague, @P_Bartell , raised two very important issues. The first issue is how impractical randomization will be with so many factors. That issue means that you must use a custom split-plot design, not a DSD. The second issue is how well you can control every run in an experiment with so many factors.
I hope that Peter adds his in-depth insight about these two issues to this thread.
1. From a purely logistical point of view how confident are you that you can actually set all 38 factors at their required levels as dictated in each treatment combination? You said you want all main effects and some unspecified collection of interactions...at unspecified effect order (2 way, 3 way or more?). So let's just assume you want half of all possible two factor interactions and a few degrees of freedom left over for effect estimation purposes...so you're looking at a minimum of A LOT OF TREATMENT combinations. With 38 factors! My experience is even with less than 10 factors when I was on teams executing designed experiments things got messed up no matter how hard we tried. Somebody set a level incorrectly, accidentally duplicated a treatment combination and/or didn't execute one that was required...which will have huge implications for effect estimation. Or maybe put the combination in the wrong block...or some such other experimental execution failure mode. And worse we didn't often know we made the mistake! Speaking of blocking...onto to a second thought.
2. With a large experiment I can envision scenarios where blocking might be called for. For example, if the experiment has to be executed over more than one day, and people are worried there might be a day to day effect, then day should become a blocking factor. Or what about a scenario where to minimize the timeline, somebody decides to run the experiment on two, three or more, say reaction vessels? Another blocking factor. And since blocking is a form of restricted randomization...on to my third thought.
3. As @markbailey suggests, can you completely randomize the experimental execution order? If not some sort of split plot design is called for so you are definitely in JMP's Custom Design wheelhouse. Now for my last thought.
I wish you well...if the experiment works out, and you are willing to share, maybe a JMP Blog entry would be a great story!
Good morning everyone,
thanks a lot for your input which will be discussed internally with my colleagues within this project. I'll let you know about the outcome.
There are no labels assigned to this post.