cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
DexRick
Level II

Creating a DoE with Two Boolean and One Discrete Numerical Factor

Hello,

I need to create a Design of Experiments (DoE) with two Boolean factors (B1 and B2) and one discrete numerical factor (X1). B1 and B2 can be either Yes or No, meaning the steps involving B1/B2 will either be performed or not. X1 can take on four values: 0, 0.25, 0.5, and 1.

I have four response variables: three ordinal and one continuous numeric.

Ideally, I want to obtain at least the 2nd order effect of X1 for the four possible cases of B1 and B2. I could perform 16 total experiments with B1 and B2 at (Yes, No), (Yes, Yes), (No, Yes), and (No, No) settings while changing X1 from 0 to 1. Is this the best approach, or should I treat the Boolean factors as categorical when setting up a custom DoE in JMP to reduce the number of experiments?

Additionally, I have the following constraints:

  1. Only five experiments can be performed in one day (I assume I will need to use blocking here).
  2. I want to obtain uncertainty in response due to variation across different days, meaning I need to repeat one experiment each day. Since JMP does not provide this capability directly, I am considering adding the same run across different days.

Any comments or advice would be helpful. Thanks in advance.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Victor_G
Super User

Re: Creating a DoE with Two Boolean and One Discrete Numerical Factor

Hi @DexRick,

 

Regarding your questions :

  1. Yes, you have almost complete aliasing (confounding) between X1 and X1^3, but it is not very surprising, as you can expect very strong correlation between terms from odd power order (for terms like X1, X1^3, X1^5, ...) or between terms from even power order (for terms like X1^2, X1^4, X1^6, ...). As you mention, it's a question of the number of points available (you need k+1 different levels for a factor to estimate up to k order term for this factor), design choice, and "natural" correlations between odd order terms or even order terms (as X1 and X1^k are correlated by default). The higher power order term you want to estimate, the more data you will have to collect to estimate precisely the coefficient.
    As you seems to have limited prior knowledge of your system, estimating terms up to 2nd order could be enough with this first design. In case of strong lack-of-fit and/or lack of curvature, you can augment the design to add higher order terms for X1.

  2. You can compare both designs using the Compare Designs platform. However, as you mention, it may be hard to try to fit all the 16 unique combinations + 4 replicate runs in a 20-runs max designs with 4 random blocks. One option to dot it is to :
    • Generate the 16 unique combinations : Create a full factorial design for your two boolean factors (2^2 = 4 runs), and then do a cartesian join with a table containing the 4 levels of X1, so that each combination of B1+B2 (4 combinations) meet all possible levels of X1 (4 levels), so you end up with unique 16 runs table.
    • Using these 16 unique combinations as candidate set to force these runs in the design : Using this table as candidate set with the Custom design platform, you can add a blocking factor (5 runs per day) and specify that you want all these 16 runs to be included in the design, with replicate runs to obtain 20 runs in total (check the two options "Include all selected covariates in the design" and "Allow covariate rows to be repeated", see table attached). You can also create the 4 blocks of 4 runs with this method (using all 16 combination runs without replicate runs), and repeat the run you want in each random block manually, by copy-pasting it in each block.

      This is quite tedious to do, but you can end up with the design you had in mind, except you won't have the same replicate run repeated across the block/days (unless you do it manually as explained). However, I'm not sure this is something to consider/recommend, as a lot of hypothesis have to be done on which run to repeat for each day : as you won't have any middle level for each factor, favor a specific level of boolean factor and a specific level of the discrete numeric factor might not be a good idea, as it could create imbalanced design (and might impact your analysis and estimation of effect).

 

The design attached here and the previous one created have similar performances, but this one has slightly better performances than the previous one for minimizing prediction variance and maximizing power for detecting effects.

 

Hope this answer will help you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

View solution in original post

6 REPLIES 6
statman
Super User

Re: Creating a DoE with Two Boolean and One Discrete Numerical Factor

First, welcome to the community.  There is not enough context to provide specific advice, but here are my initial thoughts and questions.

What is the objective of the experiment.  Is it to "pick a winner" (e.g., do you plan on iteration?), explain phenomena that already exist or develop a predictive model (i.e., understand causal structure)? I'm not exactly sure what you mean by "Boolean" factors as this typically refers to the two truth values in logic (true or false), but typically, if the steps will be preformed or not, is handled as categorical in experimental design. The 4 possibilities are the same regardless (2^2).  Have you studied the response variables and has their measurement system uncertainties been assessed?  Are you interested in understanding factor effects on the mean or variation? How do you know there is day-to-day variation?  How has this been studied? Why is there day-to-day variation?  Do you want to be robust to this variation?  There are several options to learn about the day-to-day variation or "handle" it during the experiment (e.g., sampling, RCBD, BIB, replication).  Do you really need to understand the 4 levels of the numerical factor (e.g., are you interested in a quartic or cubic term in the model?)?

"All models are wrong, some are useful" G.E.P. Box
DexRick
Level II

Re: Creating a DoE with Two Boolean and One Discrete Numerical Factor

Hi, Thank you for your response.

1) The objective of the experiment is to "Pick a winner", which maximize the value of Response 1 (R1) and Match Target for R2 to R4.

2) This is exactly what I meant that either step will be performed or not. For lack of better words I used the word Boolean here as I was not sure whether it should be handled as Categorical or not.

3) No. I do not have any idea about measurement system uncertainties and I know how this sounds. That is why I wanted to include replicate and repetitions in the DoE to provide some clue about Measurement system analysis.

4) I know Day to Day variation would exist due to nature of the experiment.

5) Yes we know that effect takes somewhat of a cubic shape based on historical data.

 

I hope this clarifies few of the ambiguities in my original post. I do feel like I am trying to do too much with one DoE as it should not only provide MSA but also a help me either "Pick a winner" or give me the model which can be used to find most optimum settings.

This means that I need to tweak the design a little bit of what JMP provides. Lets assume I go with 16 experiments as mentioned in original post. How do I randomize them properly across different days? and then shall I add one replicate across different days? or different replicate across different days?

Victor_G
Super User

Re: Creating a DoE with Two Boolean and One Discrete Numerical Factor

Hi @DexRick,

 

Welcome in the Community !

Concerning your topic, I may have some remarks or questions to better understand your problem statement :

  1. Your two boolean factors look like two 2-levels categorical factors.
  2. What is your primary goal behind this experimentation ? Understand the relative influence of factors or predict/find the best combination of factors levels ?
  3. I don't understand your constraint of 16 experiments total if you can only do 5 runs per day. It seems your total amount of runs should be a multiple of 5, so either 15 or 20.

Here are some suggestions for designing your DoE :

I would set up the factors like this (the levels names may have to be modified):

Victor_G_0-1729525133661.png

Regarding your needs, I would set up a model containing main effects and 2-factors interactions, the quadratic and cubic effect of X1 are added automatically as "If needed" due to the discrete numeric factor type of X1, but you can change the estimability of "Necessary" if you would like more balanced runs between the 4 levels of X1, by selecting the two terms and clicking on their estimability levels :

Victor_G_0-1729538910163.png

To respect your constraint of 5 runs per day, you can use and check the option "Group runs into random blocks of size :" and specify the size 5 :

Victor_G_3-1729526070536.png

The concept of random block is to estimate if the random factor (here the experimentation day) contributes to changes in the variance of the response (uncertainty). In order to estimate within-block and between-block variability, you could specify a number of replicate runs, that could be used to better estimate the error for each block. 

 

As an example, I attached a design created with your 3 factors, 20 runs in total in 4 group of 5 runs each, and with 4 replicate runs to estimate within-block variability (1 replicate run per day). Estimability of quadratic and cubic effects for X1 have been set to "Necessary" to balance the use of the different levels in the design (if you are more interested to "pick a winner" : find/predict best combination of factors levels). You can reduce the total amount of runs to 15 if needed (minimum is 14 in this configuration).

 

Hope this answer may help you, 

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)
DexRick
Level II

Re: Creating a DoE with Two Boolean and One Discrete Numerical Factor

Hi @Victor_G 

 

Thank you so much for this detailed design. Primary goal of the experiment is to pick the setting which maximize "R1" and Match target for "R2", "R3" and R4.


I have few question regarding this design if you have some time to help a beginner

 

1) For the design you have proposed,  colormap on correlations shows confounding for X1 and X1^3. It does make sense to me that confounding would be there as we have 4 unique points to solve for a cubic equation for each of the categorical settings. This means we will have to increase one level for X1. However, given what I want from the DoE, is it good practice to leave this confounding there and just solve for cubic equation nonetheless?

2) Not all of the unique points are tested for this design. For example two unique points (B1 = Yes, B2 = No, X1 = 0.5) and (B1 = Yes, B2 = Yes, X1 = 0) are not there. This always happens if I use custom design dialogue that it skips on few points. I think better approach would be to have all 16 different unique settings and then including 4 extra runs as replication to complete all 20 experiments but I do not know how to randomize these experiments properly in Blocks.

Victor_G
Super User

Re: Creating a DoE with Two Boolean and One Discrete Numerical Factor

Hi @DexRick,

 

Regarding your questions :

  1. Yes, you have almost complete aliasing (confounding) between X1 and X1^3, but it is not very surprising, as you can expect very strong correlation between terms from odd power order (for terms like X1, X1^3, X1^5, ...) or between terms from even power order (for terms like X1^2, X1^4, X1^6, ...). As you mention, it's a question of the number of points available (you need k+1 different levels for a factor to estimate up to k order term for this factor), design choice, and "natural" correlations between odd order terms or even order terms (as X1 and X1^k are correlated by default). The higher power order term you want to estimate, the more data you will have to collect to estimate precisely the coefficient.
    As you seems to have limited prior knowledge of your system, estimating terms up to 2nd order could be enough with this first design. In case of strong lack-of-fit and/or lack of curvature, you can augment the design to add higher order terms for X1.

  2. You can compare both designs using the Compare Designs platform. However, as you mention, it may be hard to try to fit all the 16 unique combinations + 4 replicate runs in a 20-runs max designs with 4 random blocks. One option to dot it is to :
    • Generate the 16 unique combinations : Create a full factorial design for your two boolean factors (2^2 = 4 runs), and then do a cartesian join with a table containing the 4 levels of X1, so that each combination of B1+B2 (4 combinations) meet all possible levels of X1 (4 levels), so you end up with unique 16 runs table.
    • Using these 16 unique combinations as candidate set to force these runs in the design : Using this table as candidate set with the Custom design platform, you can add a blocking factor (5 runs per day) and specify that you want all these 16 runs to be included in the design, with replicate runs to obtain 20 runs in total (check the two options "Include all selected covariates in the design" and "Allow covariate rows to be repeated", see table attached). You can also create the 4 blocks of 4 runs with this method (using all 16 combination runs without replicate runs), and repeat the run you want in each random block manually, by copy-pasting it in each block.

      This is quite tedious to do, but you can end up with the design you had in mind, except you won't have the same replicate run repeated across the block/days (unless you do it manually as explained). However, I'm not sure this is something to consider/recommend, as a lot of hypothesis have to be done on which run to repeat for each day : as you won't have any middle level for each factor, favor a specific level of boolean factor and a specific level of the discrete numeric factor might not be a good idea, as it could create imbalanced design (and might impact your analysis and estimation of effect).

 

The design attached here and the previous one created have similar performances, but this one has slightly better performances than the previous one for minimizing prediction variance and maximizing power for detecting effects.

 

Hope this answer will help you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)
DexRick
Level II

Re: Creating a DoE with Two Boolean and One Discrete Numerical Factor

This is such a great idea. The reason for Including the same run (Reference) as repeat across different day is to have a reference to compare one of the KPIs. This way, comparison is done with the reference trial produced on the same day and this reference trial would also act as a "check" on day to day variation.