Solved: How to account for two fixed size random noise factors in one experiment

oerlikon · Jun 8, 2023 2:04 PM

We want to run a medium optimisation experiment in shake flasks but we have two noise factors that we need to compensate for:

We have four incubators that can only hold four flasks (incubator noise)
An experiment takes 24 hours and we have booked the incubators for two days (day/batch noise)

We want to get the maximum amount of information out of the experiment so we want to perform the following:

4 incubators X 4 Flasks X 2 Days = 32 runs

Ideally we'd like to test 4-5 continous medium factors but we'd settle for less if we can get more reliable results (some interactions and quadratic terms are likely). We don't care about the difference between the days or the incubators, we just want to account for the noise that they might bring. From experience we know that both the incubator and the day have an impact on the outcome.

So our question is how do we design this in JMP so that we can account for the noise while have fixed16 runs per day and 2 X 4 runs per incubator (Days X incubator capacity)?

Is this a split plot design? Can we add two random blocking factors somehow? When trying this out we never seem to be able to get both limits right (8 runs split over two days per incubator).

Phil_Kay · Jun 7, 2021 07:59 AM

Just to repeat: while there are different approaches that you might take for analysis, the important thing in the design step is to include the blocking factor that will give you the 8 blocks of 4 runs.

View solution in original post

statman · Jun 4, 2021 05:15 PM

Welcome to the community @oerlikon . I'm a bit confused by the situation...so here are my thoughts:

1. You suggest you want a "medium optimization experiment", what is this? I don't understand medium? 4-5 variables is a fairly large number of variables for optimization. Do you already have a first order model? If you want 1st, 2nd order and quadratic effects for 4-5 factors, you would need a lot more runs than you have allotted. What do you mean by "more reliable results"?

2. You suggest you want to account for the noise between incubator and between days and you KNOW these have an impact? What about noise-by-factor interactions? How useful is an optimized model if it changes incubator-to-incubator and Day-to-day?

3. What is changing incubator-to-incubator? Can this be measured? What is changing day-to-day? Can this be measured? If so, possibly covariates might be useful.

4. What is your predicted rank order of model effects? Include main effects, 2nd order linear and 2nd order quadratic. This will help determine the appropriate resolution, and polynomial for your design.

5. Blocking may be useful. You could first confound Day with Block (1 degree of freedom) and then run incubator blocks inside of day (3 DF's) to assign the block effects, but this would limit the number of design factors. You would have 27 DF's to "play with".

6. Split-plots partition noise by using a design factor, so this is not what you are suggesting. You are trying to partition the noise with noise. This is a blocking idea.

"All models are wrong, some are useful" G.E.P. Box

oerlikon · Jun 7, 2021 02:52 AM

Thanks Hadley and Statman for your responses. We will try out the suggested approach but would like to give some extra context nonetheless.

I agree with Statman on the design being underpowered. This is definitively the start of a journey and the next DOE we design is very unlikely to give us the final model but we want to get closer while capturing as much data as possible besides the optimisation variables. Perhaps the DOE will turn out to be just a way to build up a larger dataset that we might analyse with more tailored models.

You suggest you want a "medium optimization experiment", what is this?

We are optimizing the recipe for a bacterial growth substrate

4-5 variables is a fairly large number of variables for optimization. Do you already have a first order model?

We are still screening for the most relevant factors, hence the large number of variables. We have performed a definitive screening design with 11 factors which allowed us to select for the 5 variables with the largest effect and exclude some that are unlikely to be relevant. We saw quite a few factors where the optimum was closer to the ‘center point’ of the DSD design. We are building on these observations to make the next set of experiments with adjusted factor levels. For illustration I have included the main residuals plot below of the definitive screening design results.

What do you mean by "more reliable results"?

Despite our best efforts to control variation in production, we see significant noise batch to batch in our process. We have fixed all non recipe sources of variation that we can control and still don't get stability (biological systems are fun). Therefore we decided to change the components of the recipe to see if we can get to more stable running conditions with hopefully some optimisation along the way as an happy coincidence.

You suggest you want to account for the noise between incubator and between days and you KNOW these have an impact? What about noise-by-factor interactions? How useful is an optimized model if it changes incubator-to-incubator and Day-to-day?

I agree that there might be interactions but the situation is the same at the production level, so our scale down model appears to be useful. For the statistical model, there is definitely a possibility that some of these factors will interact with the variables we can control.

What is changing incubator-to-incubator?

We know that temperature control is not perfect so I was thinking of taking this into account as a covariate. For the day to day variation we are clueless.

What is your predicted rank order of model effects? Include main effects, 2nd order linear and 2nd order quadratic. This will help determine the appropriate resolution, and polynomial for your design.

We don’t know a priori which factors will have a quadratic effect. We have a strong suspicion for one based on biological knowledge, but we are hoping to get closer to the answer with this experiment. I am quite sure that we will not get all we need after this run to make a proper statistical model. We aim to get this done in an iterative manner.

Blocking may be useful

Would you be able to show how this two level blocking is exactly done in JMP? Including the incubators and days as separate categorical factors feels wrong somehow. I tried setting up two blocks but I don't understand how JMP calculates the amounts of runs in a block and I can't seem to set a constraint that every incubator needs exactly 4 runs per day.

Split-plots partition noise by using a design factor, so this is not what you are suggesting.

Thanks for clearing this up!

Mark_Bailey · Jun 7, 2021 09:31 AM

You might consider modeling the logarithm of the standard deviation as a response, since batch reproducibility is an issue that you want to improve. This approach would require true replicates and a large number of them, because variance estimates ae not as efficient as estimates of the mean.

Mark_Bailey · Jun 7, 2021 09:51 AM

I might have missed the suggestion in the long thread, but are you aware of augmentation? JMP can take an existing design and augment it to provide more information (greater power, new effect types, et cetera). Augmentation is a powerful strategy to build your knowledge in a guided way without 'breaking the bank' at the beginning.

Mark_Bailey · Jun 7, 2021 09:26 AM

I just want to add a clarification about split-plot design (randomization) versus blocking runs (limited resource or external source of variation). Day is a classic case for blocking runs (optimally) and treating the effect as random. Day is a limitation on the number of runs before you must go to a new day. Incubator might be treated as a block of runs but it depends. I assume that each of the four flasks in an incubator represents four different treatments (combinations of easy to change factors) but they share the same level of a factor like temperature (hard to change factor). Then I would not include incubator as a blocking factor. I would simply treat temperature as a hard to change factor.

On the other hand, if your four flasks share the same treatment (replicates), then incubator would seem to be an appropriate blocking factor, because the four runs (flasks) are correlated.

If you have hard to change factors and you want to add a blocking factor, I think the best you can do is to add a categorical factor with the number of levels equal to the number of blocks that you desire / need, and then add the Random attribute to the term in the Fit Model launch dialog. Custom Design does not allow hard to change factors and blocking factors in the same design.

HadleyMyers · Jun 5, 2021 04:26 AM

Hi,

Considering your description of the problem and the solution proposed by @statman in point#5, I created this design (table script below). I created a 32-run response surface with 5 continuous factors using Custom Design, and then grouped runs into random blocks of size 4 (corresponding to the number of flasks per incubator per day). In the table, I manually added a "Day" variable and adjusted "incubator" so that 1-4 repeat on both days. It would then be possible to account for the random variation by setting up the model as shown in the figure.

I'd be very interested to hear if @statman or anyone else has an opinion on this approach.

New Table( "day and incubator block design",
	Add Rows( 32 ),
	New Table Variable( "Design", "Custom Design" ),
	New Table Variable( "Criterion", "I Optimal" ),
	New Script(
		"Model",
		Fit Model(
			Effects(
				:incubator & Random, :X1 & RS, :X2 & RS, :X3 & RS, :X4 & RS,
				:X5 & RS, :X1 * :X1, :X1 * :X2, :X2 * :X2, :X1 * :X3, :X2 * :X3,
				:X3 * :X3, :X1 * :X4, :X2 * :X4, :X3 * :X4, :X4 * :X4, :X1 * :X5,
				:X2 * :X5, :X3 * :X5, :X4 * :X5, :X5 * :X5
			),
			Y( :Y )
		)
	),
	New Script(
		"Evaluate Design",
		DOE( Evaluate Design, X( Random Block, :X1, :X2, :X3, :X4, :X5 ) )
	),
	New Script(
		"Fit Mixed",
		Fit Model(
			Y( :Y ),
			Random Effects( :incubator ),
			Effects(
				:X1 & RS, :X2 & RS, :X3 & RS, :X4 & RS, :X5 & RS, :X1 * :X1,
				:X1 * :X2, :X2 * :X2, :X1 * :X3, :X2 * :X3, :X3 * :X3, :X1 * :X4,
				:X2 * :X4, :X3 * :X4, :X4 * :X4, :X1 * :X5, :X2 * :X5, :X3 * :X5,
				:X4 * :X5, :X5 * :X5
			),
			NoBounds( 1 ),
			Personality( "Mixed Model" ),
			Run
		)
	),
	New Script(
		"DOE Dialog",
		DOE(
			Custom Design,
			{Add Response( Maximize, "Y", ., ., . ),
			Add Factor( Continuous, -1, 1, "X1", 0 ),
			Add Factor( Continuous, -1, 1, "X2", 0 ),
			Add Factor( Continuous, -1, 1, "X3", 0 ),
			Add Factor( Continuous, -1, 1, "X4", 0 ),
			Add Factor( Continuous, -1, 1, "X5", 0 ), Set Random Seed( 1910031543 ),
			Number of Starts( 105 ), Add Term( {1, 0} ), Add Term( {1, 1} ),
			Add Term( {2, 1} ), Add Term( {3, 1} ), Add Term( {4, 1} ),
			Add Term( {5, 1} ), Add Term( {1, 2} ), Add Term( {1, 1}, {2, 1} ),
			Add Term( {2, 2} ), Add Term( {1, 1}, {3, 1} ),
			Add Term( {2, 1}, {3, 1} ), Add Term( {3, 2} ),
			Add Term( {1, 1}, {4, 1} ), Add Term( {2, 1}, {4, 1} ),
			Add Term( {3, 1}, {4, 1} ), Add Term( {4, 2} ),
			Add Term( {1, 1}, {5, 1} ), Add Term( {2, 1}, {5, 1} ),
			Add Term( {3, 1}, {5, 1} ), Add Term( {4, 1}, {5, 1} ),
			Add Term( {5, 2} ), Set Runs Per Random Block( 4 ),
			Set Sample Size( 32 ), Optimality Criterion( 2 ),
			Simulate Responses( 0 ), Save X Matrix( 0 ), Make Design}
		)
	),
	New Column( "Day",
		Character,
		"Nominal",
		Set Property( "Design Role", Design Role( "Random Block" ) ),
		Set Values(
			{"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1",
			"1", "1", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2",
			"2", "2", "2", "2"}
		)
	),
	New Column( "incubator",
		Character,
		"Nominal",
		Set Property( "Design Role", DesignRole( Random Block ) ),
		Set Property(
			"Value Order",
			{Custom Order( {"1", "2", "3", "4", "5", "6", "7", "8"} )}
		),
		Set Values(
			{"1", "1", "1", "1", "2", "2", "2", "2", "3", "3", "3", "3", "4", "4",
			"4", "4", "1", "1", "1", "1", "2", "2", "2", "2", "3", "3", "3", "3",
			"4", "4", "4", "4"}
		)
	),
	New Column( "X1",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Set Property( "Coding", {-1, 1} ),
		Set Property( "Design Role", DesignRole( Continuous ) ),
		Set Property( "Factor Changes", Easy ),
		Set Values(
			[-1, -1, 1, 0, 1, 0, -1, 0, 1, 0, -1, 1, -1, 0, 0, 1, 1, 0, -1, 1, -1, 1,
			-1, 1, -1, 1, 0, -1, 0, 1, 0, -1]
		)
	),
	New Column( "X2",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Set Property( "Coding", {-1, 1} ),
		Set Property( "Design Role", DesignRole( Continuous ) ),
		Set Property( "Factor Changes", Easy ),
		Set Values(
			[-1, 0, 1, 0, -1, 1, 1, 0, -1, 0, -1, 1, 1, -1, 0, 1, -1, 1, 0, -1, 0, 1,
			1, -1, -1, 0, 1, -1, 0, 1, -1, 1]
		)
	),
	New Column( "X3",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Set Property( "Coding", {-1, 1} ),
		Set Property( "Design Role", DesignRole( Continuous ) ),
		Set Property( "Factor Changes", Easy ),
		Set Values(
			[-1, 0, 1, 0, -1, -1, 1, 0, 1, -1, 1, 0, -1, 0, 1, -1, 1, 1, -1, 0, 0, 1,
			-1, -1, 1, 1, 0, -1, 0, -1, 1, 1]
		)
	),
	New Column( "X4",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Set Property( "Coding", {-1, 1} ),
		Set Property( "Design Role", DesignRole( Continuous ) ),
		Set Property( "Factor Changes", Easy ),
		Set Values(
			[1, -1, 1, 0, 1, 1, 0, 0, 1, -1, -1, 0, 1, 0, 1, 1, -1, -1, 0, -1, 1,
			-1, -1, 0, 1, 0, 0, -1, 1, -1, 0, 1]
		)
	),
	New Column( "X5",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Set Property( "Coding", {-1, 1} ),
		Set Property( "Design Role", DesignRole( Continuous ) ),
		Set Property( "Factor Changes", Easy ),
		Set Values(
			[1, 0, -1, -1, -1, 0, -1, -1, 1, 1, 1, 0, -1, 0, 0, 1, -1, -1, 0, 1, 1,
			1, 1, 0, -1, 0, 1, -1, 0, -1, 1, 1]
		)
	),
	New Column( "Y",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Set Property(
			"Response Limits",
			{Goal( Maximize ), Lower( . ), Upper( . ), Importance( . )}
		),
		Set Values(
			[., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., .,
			., ., ., ., ., ., ., .]
		)
	)
)

HadleyMyers · Jun 7, 2021 01:52 AM

Hi again,

Thinking more about it, I'm wondering if the interaction effect of the batches is necessary. Assuming we aren't using it, the design allows us to estimate the effects of the factors while quantifying the variation of incubator and day.

Phil_Kay · Jun 7, 2021 05:31 AM

This is really interesting. You effectively have "nested" blocking factors.

Having said that, I think you can just keep it simple. Here is something that you can try...

Use Custom Design to create a 32-run RSM design for 5 continuous factors and an 8-level blocking factor (4 runs per block) for incubator. This will ensure that the blocks are minimally correlated with the effects that you really care about.

Then just run blocks 1-4 on day 1 in incubators 1-4. Then run blocks 5-8 on day 2 in incubators 1-4. You can manually add Day and Incubator columns to your data table in case you need to use it later.

Once you have collected that data you can fit the RSM model for the 5 factors plus a random effect for the blocks. The advantage of using a random effect is that it does not consume all the degrees of freedom that a fixed factor effect does.

Now you could get fancy and fit the RSM model for 5 factors plus a random effect for Day and a random effect for Incubator nested within Day. Strictly speaking that is probably a more appropriate model.

However, I think this is probably overkill. I doubt that it will make any important difference to your conclusions. The simple model with a random effect for block should be able to adequately capture all the variance due to differences between incubators and days.

I have attached an example design with the 2 models saved. The Y response is just random.

I hope this helps.

Phil

Phil_Kay · Jun 7, 2021 07:59 AM

Just to repeat: while there are different approaches that you might take for analysis, the important thing in the design step is to include the blocking factor that will give you the 8 blocks of 4 runs.

How to account for two fixed size random noise factors in one experiment

Re: How to account for two fixed size random noise factors in one experiment

Re: How to account for two fixed size random noise factors in one experiment

Re: How to account for two fixed size random noise factors in one experiment

Re: How to account for two fixed size random noise factors in one experiment

Re: How to account for two fixed size random noise factors in one experiment

Re: How to account for two fixed size random noise factors in one experiment

Re: How to account for two fixed size random noise factors in one experiment

Re: How to account for two fixed size random noise factors in one experiment

Re: How to account for two fixed size random noise factors in one experiment

Re: How to account for two fixed size random noise factors in one experiment

Recommended Articles