cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
BenGengenbach
Level III

Covariates in defined order in custom design

Hi, 

 

In our experiment we have 3-4 continuous controllable factors and the fill height of a reservoir that is gradually reduced with each run as uncontrollable factor .

It is not an option to adjust the fill height for each run.

The reservoir can not be filled to less than 100% at the start of the experiment.

If blocking is used the fill height of the reservoir will decrease within each block.

The reservoir can be refilled to 100% after it is depleted and further runs can be performed.

 

I hoped to load the fill height as a covariate table with a defined order (Full -> empty), however that seems to be not possible due to randomization of the covariate input.

Any other suggestions how to best tackle that problem are highly welcome.

 

cheers ben

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Victor_G
Super User

Re: Covariates in defined order in custom design

Hi @BenGengenbach,

 

Yes, you already thought about this in your original post with this "defined order". Please find attached the datatable used in my example if you want to see or evaluate some parts in more details.


I did the DoE creation in a very "conventional" way at the beginning : for 3 factors and a model including main effects and 2 factors interaction, the number of runs recommended by JMP is 12. So I created a table for "Run Order" or "Time" (with one column) to be used as a covariate factor, with 14 rows : 12 rows required/recommended for the 3 factors model, and 2 additional runs : 1 to estimate main effect for time, and 1 to estimate quadratic effect for time (recommended by default in the book "Optimal Designs of Experiments : A Case Study Approach" from Peter GOOS and Bradley JONES). These 14 rows represent my "row order covariate" (numeric continuous covariate factor) or time variable.


In the DoE creation, I create the 3 factors, and then add the time covariate factor thanks to the previous covariate table. In the model, I specify main effects and 2 FI for the three X factors, and main effect and quadratic effect for "Time" covariate. I set the number of runs to 14 (as seen previously) and click on "Make Design". Since you have entered "Time" (or "order") as a Factor in your design, you can right click on this factor, and then Sort by Time column ascending (or do it on your datatable).

 

Since your design already includes this "Time" factor constraint in the model (through its main effect and quadratic term), the restriction on randomization is taken into account and you can sort by this "Time" column to realize your runs. You can check on your table that runs with factor levels in common are well "dispersed" regarding this Time constraint (high number of changes from one experiment to the next one, and if you have replicates like in my example, you'll see they are allocated at the beginning and end of the experiment: rows 2 and 12, 4 and 14, 6 and 10, 1 and 11, 3 and 13, 5 and 9 ("original"/unique rows 7 and 8 are at the middle time of the experiments).

 

I hope this answer is helpful for you,

Victor GUILLER
Scientific Expertise Engineer
L'Oréal - Data & Analytics

View solution in original post

11 REPLIES 11

Re: Covariates in defined order in custom design

How about including Fill Height as an Uncontrolled factor? During your experiment, you'd simply record the level as it changes, and JMP would include it as part of the analysis. I've attached a sample DOE table.

 

Jed_Campbell_0-1675204058141.png

 

 

BenGengenbach
Level III

Re: Covariates in defined order in custom design

Hi Jed,

 

thanks for your suggestion, we also thought about that but hoped that there would be an option that would allow to have the fill height factors settings in the model during setup to properly estimate power and fds.

 

cheers ben

statman
Super User

Re: Covariates in defined order in custom design

First, welcome to the community.  I don't have enough information or context to provide advice, so perhaps you can describe the situation with more detail?  Is this experiment on processing or design?  What is the product?  What are the response variables? I don't know what the reservoir is, nor why it can't be purposely adjusted, at least for the duration of the experiment?  Apparently there is some sort of solution in the reservoir.  Is the solution formulation part of the experiment?

 

Do you hypotheses about the effect of reservoir fill?  Why would the fill of the reservoir have any effect on the response variables?  Must the reservoir be emptied before it can be re-filled?

 

Treating it as a covariate has issues... What number will you record (you can only have one value for each treatment)?  The fill level at the start of the treatment run or at the end or an average, or...?

 

In any case, Jed's suggestion is a good one.  For every treatment, record the value of the fill level of the reservoir (see issues above).  Add a column in your experiment manually to record the value you decide on.  When doing analysis, do the typical covariate analysis (Use Sequential Tests, Type 1) for the covariate and Partial, Type 3 for the other factors in your model).  Also plot the data in run order and don't forget VIF's.

"All models are wrong, some are useful" G.E.P. Box
BenGengenbach
Level III

Re: Covariates in defined order in custom design

Hi statman,

 

and thanks for the warm welcome ;).

I didn't want to distract from the problem with to much context but to be more precise:

The experiment is about finding the best settings for agitating a particle suspension and then transferring it from the reservoir to smaller vessels.
Responses are i) particle target amount, ii) particle CV (n=6 transfers per run) and finally the iii) yield of a process that uses these particles.  

We assume (but do not hope) that there will be some interaction between the transfer and agitation factors and the fill height.

So we might need to change the settings in production while the reservoir gets depleted.

 

The whole issue about not refilling is that it will not be possible in the productive system. Meaning a particle suspension that has 50% fill height once had 100% fill height and unfortunately the particle a likely to "degrade" or become less performant during the depletion.

 

We found a workaround that might help us:

We reduced the consumption of particle suspension per run and used blocking for the fill height (and some other factors). Within these blocks the depletion will only be minimal and we decide to negate it. At the same time we provide additional reservoirs that are depleted outside of the actual experiment, so we have the degrading effect. Between blocks we switch the reservoirs that have been depleted to the target (and now randomized) fill height.

 

Please share your thoughts

 

cheers ben

 

statman
Super User

Re: Covariates in defined order in custom design

I understand it is difficult to know what information needs to be provided to get the appropriate advice.  I have found the more I understand the situation, the better the advice. I'm not sure a in-depth discussion of your options is appropriate for this forum, but here are my thoughts:

If I summarize, you are concerned with homogeneity of the "solution".  This is assessed with some measures:

1. "particle target amount".  Not sure what this is?  Is it particle "density"? Are you measuring the distribution of particle sizes?  How much does this vary in production?  How "good" is the measurement system? Is there more variation within a "batch" (over the course of depleting a reservoir) or batch-to-batch?  Is this consistent? How much of a change in this metric will impact product performance (or yields). This is important to know prior to experimentation.

2. "particle CV".  My guess is this is the coefficient of variation of the particle size distribution.  How is this measured? Again, has the measurement system been studied? Also how much of a change in this will impact product performance?

3. "yield of the process that uses the solution"  I'm not a fan of yields as a measure (particularly for causal structure understanding) as it can be too aggregate to get precise information about the physical/chemical mechanisms at work, but I understand it is used by management and ultimately you want to increase yield (this might be at a higher level in the hierarchy of metrics).  Are the production yields consistent now?  How much do yields vary?   Have you listed all of your hypotheses about what might affect yields? The consistency of particle size/distribution being one of those.  Of course, be aware you could greatly improve the consistency of your solution and have yields drop because of some other factor effects.

 

Your idea of blocking is quite good.  I would recommend treating the block as a fixed effect (vs. random) and including all block-by-factor interactions in your model.  This will provide insight to what factors are robust to reservoir height and what factors may need to be adjusted to compensate for height (this will appear as significant block-by-factor interactions).  Once discovered, you have multiple options to deal with managing the effect (e.g., constantly adjust factors, change the process to continuously keep the reservoir level constant, etc.).

 

As a note, while JMP is excellent at creating design matrices, quite often I manipulate the design matrix to fit the situation. Once the design has been run, there are a host of methods to analyze the data.

 

"All models are wrong, some are useful" G.E.P. Box
BenGengenbach
Level III

Re: Covariates in defined order in custom design

 

"1. "particle target amount".  Not sure what this is?  Is it particle "density"? Are you measuring the distribution of particle sizes?  How much does this vary in production?  How "good" is the measurement system? Is there more variation within a "batch" (over the course of depleting a reservoir) or batch-to-batch?  Is this consistent? How much of a change in this metric will impact product performance (or yields).

This is important to know prior to experimentation."

It's particles per volume, a concentration or density that we measure. Shifting particle size distributions are not so much of a concern for us. If particles "break" they won't be recognized by the counting device, same goes for agglomeration. The measurement device has much higher precision and accuracy than the particle transfer so let's consider it "good". 
We don't know about the depletion caused (?) variation within a block. Unsuitable particle agitation or transfer parameters could result in accumulation of particles in the reservoir. Variation between blocks should be minimal as all blocks are fed from the same starting particle suspension that can be maintained as "ideal" conditions

The acceptance corridor for "good" particle density is about 2xSD and we would aim for 1xSD prediction variance.

 

"2. "particle CV".  My guess is this is the coefficient of variation of the particle size distribution.  How is this measured? Again, has the measurement system been studied? Also how much of a change in this will impact product performance?"

Coefficient of variation of particle density (concentration), 6 transfer replicates per run. To make it a bit more complicated the "replicates" are taken from different positions within the reservoir. Therefore the CV will tell us about homogeneity of particle dispersion.

Target is to minimize CV.

 

"3. "yield of the process that uses the solution"  I'm not a fan of yields as a measure (particularly for causal structure understanding) as it can be too aggregate to get precise information about the physical/chemical mechanisms at work, but I understand it is used by management and ultimately you want to increase yield (this might be at a higher level in the hierarchy of metrics).  Are the production yields consistent now?  How much do yields vary?   Have you listed all of your hypotheses about what might affect yields? The consistency of particle size/distribution being one of those.  Of course, be aware you could greatly improve the consistency of your solution and have yields drop because of some other factor effects."

Yield or product concentration is rather a backup response, we are at risk of finding good transfer settings (at target concentration, minimal CV) that are yet unfavorable for the subsequent production step. Unfortunately we have a pretty long production process that simply cannot be fully covered in a single design.

Therefore we already separated the experiments based on minimal anticipated interaction between factors.

 

Thanks for the tip about the fixed block effect. But if I already have fill height in the model as hard to change resulting in the appropriate blocking why would I need to do that? 

 

statman
Super User

Re: Covariates in defined order in custom design

Just a response to this:

Thanks for the tip about the fixed block effect. But if I already have fill height in the model as hard to change resulting in the appropriate blocking why would I need to do that? 

Most often, block effects are treated as random effects.  While this has the advantage of increasing inference space and increasing precision, the effects are unassignable. If you are careful and able to assign/confound specific noise variables to the block where they are consistent within block and purposely change between blocks, you can treat the block as a fixed effect and also add block-by-factor interactions to the model.

 

"All models are wrong, some are useful" G.E.P. Box
Victor_G
Super User

Re: Covariates in defined order in custom design

Hi @BenGengenbach,

 

Welcome to the Community !

 

  1. The first option proposed by @Jed_Campbell works, but is not optimal regarding robustness of the design regarding time trend. You could have correlations between your time trend and your factors, and not be able to differentiate which of the two is responsible for the response(s).
  2. A second option would be to include time as covariate (and include both linear and quadratic effects) in the design : It is a special case of DoE with covariates ("time-trend robust designs") that you can find in "Optimal Designs of Experiments : A Case Study Approach" (chapter 9) by Peter GOOS and Bradley JONES. 

 

Some key element from this chapter to answer your problematic of randomization with a run order factor :

"Using a systematic run order or systematic assignment to experimental units does not necessarily imply that there is no randomization at all".

 

There are already some litterature on this topic :

Making Experimental Designs Robust Against Time Trend : Paper5.pdf (ssca.org.in)

Experimental Designs Optimally Balanced for Trend on JSTOR

+ Joiner and Campbell (1976), Daniel and Wilcoxon (1966), ...

 

Here is an example with 3 continuous factor of a time trend-robust design, estimating main effects (X1, X2, X3 and time), 2-factors interactions (between X1, X2 and X3) and quadratic effect for time. You can see on the graph below that including time as a covariate doesn't inflate the standard deviation of estimates for main effects (X1, X2, X3) and 2-FI for these main effects (comparison of relative Std Error of Estimates with a design including time covariate and a design without time covariate), which means you won't lack precision for the estimation of these terms by including time as a covariate.

Victor_G_0-1675242449980.png

And you can also check that there are (almost) no aliasing/correlations between your time covariate and your other factors for main effects and interactions :

Victor_G_2-1675243671670.png

The design here is not completely trend-robust, but it is D-optimal with this time covariate constraint.

 

Hope this answer will help you,

Victor GUILLER
Scientific Expertise Engineer
L'Oréal - Data & Analytics
BenGengenbach
Level III

Re: Covariates in defined order in custom design

Hi Victor,

 

thanks for your replay.

This sounds and looks exactly what I was planning to do but how did you force the constraint of the fixed order on the covariate factor?

I checked the literature which very nicely explains the principle behind it but how do I force it into JMP?

Whenever I load the fill height (or time to be closer to your example) in the design dialogue the resulting design is randomized for the covariate.

 

cheers ben