Re: Specifying Random Block factor in classic DOE designs

hxnduke · Apr 6, 2020 05:13 PM

Hi all,

When generating a classic DOE design in JMP, any block factor is automatically defaulted as fixed block factor. I know I can change it to random block effect to utilize the REML method by adding to it the "random effect" attribute in the Fit Model platform. However, I notice that I could also change its design role to "Random block", it is not automatically assigned the "random effect" attribute in Fit Model. So is there any difference between "Blocking" and "Random block" as design role options? The only thing I've noticed is that the block factor is shown in the Prediction Profiler when it's assigned "Random block" but not when it's "Blocking". Also, Fit Definitive Screening platform only works with "Blocking" but not "Random block" design role.

Thank you.

cwillden · Apr 6, 2020 05:50 PM

Fixed blocks and random blocks are treated differently in the design optimization, and you should use the type of block that corresponds to how you will fit your model. They consume degrees of freedom differently, so there will be an impact in how points are allocated in your design space depending on the type of blocks you selected. Notice the difference in minimum and recommended design sizes between the 2 approaches. Fixed blocks generally require more runs.

There are some reasons you might treat a block as fixed, but I find that random blocks make more sense in most contexts that I've seen, particularly if your objective is to account for the variation caused for the blocks and not needing make any inference about effects. For example, if I include multiple lots of raw material in my design, I don't care that Lot A increases the average response by 1.5 and Lot B by -0.6 because once Lots A and B are consumed, they are gone forever and we'll never have another Lot A or Lot B. I only want the lots I use in the design to by representative of the typical variation in raw material lots I will encounter in the future.

-- Cameron Willden

hxnduke · Apr 6, 2020 06:05 PM

Thanks for clarifying the difference between the two blocking effects. I agree that random block effect makes the most sense in most scenarios, at least those relevant to my studies.

What I'm curious about is how to assign the block factor as "random block" when generating a classic DOE design or DSD design. In the custom DOE platform, you can specify this pretty easily. However, in classic DOE/DSD, the block factor is defaulted as fixed effect and you can only change its attribute after the design table is generated (at least as far as I'm aware of).

Does it make sense to used random block effect and REML method for classic DOE design or is this something specific to custom design?

cwillden · Apr 6, 2020 06:59 PM

It's not really something specific to optimal DOEs, so you can totally analyze a blocked classical design with random blocks even if it defaults to fixed. There's not really any downside there. That being said, if you are doing a DSD, I highly recommend using Fit Definitive Screening for the analysis, where I don't think you'll be able to coerce it into treating Block as random.

-- Cameron Willden

hxnduke · Apr 6, 2020 09:21 PM

Since you mentioned that fixed block consumes more DOF than random block, would it mean that changing the block factor attribute to "random effect" in existing data table generated via classical DOE would increase power of parameter estimates if there are multiple levels of the block factor?

Mark_Bailey · Apr 7, 2020 07:58 AM

There are many cases for which the DSD is not appropriate. One of these cases is when you have random effects do to blocking or split-plot randomization.

See Bradley Jones blog post about this issue.

statman · Apr 7, 2020 10:25 AM

I will offer a different perspective regarding blocks, block effects and block-by-factor interactions. The concept of blocking (replication) goes back to Sir Ronald Fisher. The idea of a block is to minimize noise within the block (hold constant) and maximize the noise between the blocks (select blocks purposefully to make sure the noise that was held constant within the block, changes between the blocks). Doing this increases the precision of detecting factor effects (e.g., reduces the MSE) while simultaneously increasing inference space (the study is replicated over changing noise). When treating blocks as a fixed effect, you are able to assign the effect of the noise and more importantly determine if the effects of the experimental factors repeat over changing noise. The later being estimated by block-by-factor interactions. In industrial experimentation and product design, robust means the absence of noise-by-factor interactions (The effect of the factors/interactions are the same over changing noise). If not, you will run into problems over time. Using the example of lot-to-lot incoming materials; If there is an effect (factor or interaction) that depends on lot of incoming material, you want to identify this early in design so you can design your product robust to incoming material. If you confound the raw material with the block and treat the block as a fixed effect, you can estimate the effect of the variables in the block and block-by-factor interactions. Unfortunately, most software programs confound the block-by-factor interactions with the error term. You will need to specify those effects manually (write your own model). I'll quote Dr. Box "Block what you can, randomize what you cannot". My interpretation of this quote is if you can identify the noise you are better off "blocking it". Of course, if you can't identify the noise, you randomize. I have had several discussion with Brad Jones over this topic...

"All models are wrong, some are useful" G.E.P. Box

hxnduke · Apr 7, 2020 10:58 AM

Thanks for providing the historical aspect and your thoughts on this matter. What is your thought on our specific situation mentioned below?

Just to clarify, in our planned studies, we have to divide up the runs into 2 blocks due to limited number of machines. We're planning to run the 2 blocks on the same set of machines but on separate days so the blocking factor here would be day-to-day variability. I'm assuming that this should be random effect but wondering whether it's worth blocking at all.

Also to add, we're also planning on including at least 1 centerpoint run in each block, which might help with estimating the block-to-block variance.

hxnduke · Apr 7, 2020 10:52 AM

In Bradley Jones blog post, he mentioned that DSD is not appropriate for split plot design but I don't really see him mentioning that random effects in general cannot be done.
In this paper from Jones and Nachtsheim, they did provide methods for DSD blocking with random effects by adding two centerpoint runs in the same block to estimate residual variance and another one in other block to estimate random block variance.

Just to clarify, in our planned studies, we have to divide up the runs into 2 blocks due to limited number of machines. We're planning to run the 2 blocks on the same set of machines but on separate days so the blocking factor here would be day-to-day variability. I'm assuming that this should be random effect but wondering whether it's worth blocking at all.

statman · Apr 7, 2020 11:58 AM

hxnduke,

Some words of wisdom from one of my mentors,

"All models are wrong, some are useful" (G.E.P. Box)."

"Two equally competent investigators presented with the same problem would typically begin from different starting points, proceed by different routes, and yet could reach the same answer. What is sought is not uniformity but convergence.” (Box, Hunter and Hunter, Statistics for Experimenters)

Unfortunately, I don't understand your situation based on the explanations you have posted. I would need to understand what questions you are trying to answer, what are the response variables you are trying to model, what factors are you experimenting on, etc. In general, when I use blocks in industrial experimentation, I spend a good deal of effort identifying and understanding the noise (using critical thinking and process maps). The purpose of experimentation is to have the experiment be representative of the conditions we are trying to draw conclusions over (inference space). This presents a dilemma, the more representative the experiment, the greater the noise in the experiment, the more difficult it is to detect factor effects. Historically, folks have made choices to hold noise constant to improve the precision of the design (Box's word for detecting factor effects). Unfortunately this has a hugely negative effect on inference space. So the challenge to make the experiment more representative of reality while increasing design precision is an extremely important set of work. There are a number of strategies to accomplish this: Repeats (allow for estimating short-term noise such as measurement error), Replicates (Randomized and RCBD, BIB), Split-plots, etc. If you can identify the noise a priori you have more efficient and effective methods at your disposal. If you cannot identify the noise, your only choice is to randomize which appears to be your case

Center point runs can be very efficient and useful, but the factors must be quantitative. They provide the ability to assess general curvature over the design space (degree of freedom to assign the quadratic effect) and, if used properly a look at stability over the design space. If you replicate the center points, some would say you have a reasonably good estimate of MSE (thinking is the center point might be current conditions, so you have a set of data at current conditions to test the factor effects against). If you run multiple center points randomly over the deign space, you also might be able to assess stability over the design space.

"All models are wrong, some are useful" G.E.P. Box

Specifying Random Block factor in classic DOE designs