cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
QW
QW
Level III

Are fixed blocking variables the same thing as a categorical variable, just that they don't appear in the prediction profiler?

Hello,

 

I have been analyzing data for an experiment measuring growth of human cells with 3 factors: 2 of which are independent variables that were controlled for, and the 3rd variable being the donor that the human cells came from (of which there are two). I have an n = 8, and the design was a 2^3 factorial. However, I'm currently finding that there are 3 ways to approach analysis of the data with regard to the variable "Donor":

 

1. I can simply include it as a categorical 2 level factor, in which case it enters the model and also shows up in the prediction profiler. As it turns out, there are interactions between "Donor" and the other two factors, which are conclusions that can be easily visualized as well. However, having this in the prediction profiler isn't particularly helpful because donor variability is something I will always have to deal with - I can't just keep 'maximizing' by going back to that same donor over and over.

 

2. I can change the 'Design Role' to a blocking variable. However I notice that when I fit a model, it basically results in the same model, with the exception that "Donor" no longer shows up in the Prediction Profiler. This is nice because I can elucidate what the other 2 factors (which were actual, controllable independent variables that I'd like to manipulate) were contributing. I read in an earlier post that blocking variables also can only show up in main effects, but this is not true since I am able to still model interactions. At this moment, I consider this the best approach.

 

3. Given that the 2 donors I chose are a subset of the infinitely large population of people present and future, I would probably consider this blocking variable to actually be "Random" and not "Fixed". However, based on some reading it seems like you do need around 5+ levels of the blocking variable for the estimation of variance to be accurate. As such, given that my dataset is small (n = 8, only 2 levels of the blocking variable), this may not be ideal.

 

Is my thinking on the right path?

 

16 REPLIES 16
Victor_G
Super User

Re: Are fixed blocking variables the same thing as a categorical variable, just that they don't appear in the prediction profiler?

Hi @QW,

 

Your thinking is on the right path, categorical factor and blocking factor have different realities and use (fixed effect that you can change independently in your experiments vs. random effect with known (day of the week) or unknown number of levels (donor characteristics)).
I would suggest to take a look at topics that were dealing with this topic (with excellent responses from @Mark_Bailey and @Dan_Obermiller), as this may help you choose the most reasonable option in your case : 

 

There is also a recorded webinar about blocking in DoE : Using Blocking When Designing Experiments - JMP User Community 

Hope this first answer will help you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)
QW
QW
Level III

Re: Are fixed blocking variables the same thing as a categorical variable, just that they don't appear in the prediction profiler?

Thanks, @Victor_G . I read through the answers given, but I am still confused about one thing regarding blocking vs. simply adding a categorical factor.

 

In the custom designer, if I have 8 runs and 2 variables, and choose to group them into blocks of size 4, I basically get a full factorial design where each block is orthogonal. If I instead choose to include my blocking variable as a categorical variable, I once again get a full factorial design, where each level of my categorical variable is orthogonal. In that sense, there appears to be no practical difference between assigning my nuisance variable as a blocking factor vs. just calling it a categorical.

 

To me, it then seems like the only point of blocking is to:

1. Remove block:factor interactions from the custom design dialog, since the idea is that the within-block factor effects should be independent of the which block it's in.

2. Make the prediction profiler simpler to understand, since you don't really care to find the 'best level' of your block.

 

 

Phil_Kay
Staff

Re: Are fixed blocking variables the same thing as a categorical variable, just that they don't appear in the prediction profiler?

Hi @QW ,

There is lots of good advice in the responses so far.

"...there appears to be no practical difference between assigning my nuisance variable as a blocking factor vs. just calling it a categorical."

Correct. In this case there will be no difference in the design.

Whether you specify the donor factor as a blocking factor or a categorical factor, for 8 runs the 2^3 full factorial is the optimal solution with 0 correlation of the factor effects (orthogonal).

I hope that helps.

Phil 

 

statman
Super User

Re: Are fixed blocking variables the same thing as a categorical variable, just that they don't appear in the prediction profiler?

I have a different "take" on the questions you pose.  Blocking is one method to handle noise in an experiment.  Noise is the factors you are not willing to manage in the future (either because you don't have the technology, cost is prohibitive or it is simply inconvenient to manage).  Blocks are meant to aggregate large chunks of noise (you can have many noise variables confounded with the Block effect). Blocks increase the inference space while not decreasing the design precision (in fact, the blocks may increase the design precision. At the end of the day you might be interested in the size of the effect of the Block, but you cannot select a "best" level.  This, of course, is not true for a categorical factor.  You can choose one level over another. 

With respect to Blocking, if you have identified what the noise is (this is an important step), you have the option of treating the noise as a fixed effect.  With this strategy you can assess the size of the Block effect, and the added bonus of being able to assess if the design factor effects are consistent over changing noise (As quantified with block-by-factor interactions). This is the true measure of design factor robustness. The question is how do you represent this noise in the experiment.  The same advice goes for investigating factors.  Since the experiment is done on a small scale (relatively narrow inference space), you need to exaggerate effects to increase the inference space.  This is why in screening experiments you set factors to bold levels.  This is true of the noise effects as well, exaggerate the noise in your design space.  For example, you might run one block where ambient conditions are cold and dry and the second block warm and humid.  Why would donors be different?  What physical characteristics of the donor might have an impact on the response variables?  What might your design factors interact with donor? The answers to these questions form the basis of the hypotheses that can assist in determining how to select donors.

Now, if you have not identified the noise, you are left with treating the noise as a random effect.  Since you have no idea of how representative the noise is of future conditions, you likely have to increase sample size to increase the confidence in extrapolating the results into the future.

 

"Block what you can, randomize what you cannot" G.E.P. Box. (Block the noise that has been identified, use randomization where you have not identified the noise)

"All models are wrong, some are useful" G.E.P. Box
QW
QW
Level III

Re: Are fixed blocking variables the same thing as a categorical variable, just that they don't appear in the prediction profiler?

Thanks, @statman . With respect to this sentence: "With this strategy you can assess the size of the Block effect, and the added bonus of being able to assess if the design factor effects are consistent over changing noise (As quantified with block-by-factor interactions)." Are you saying that while JMP doesn't consider block:factor interactions to be a characteristic of a blocking variable (i.e. this would be considered a characteristic of a categorical factor not a blocking variable), you should try to fit block:factor interactions anyway?

 

Secondly, regarding the point: "Now, if you have not identified the noise, you are left with treating the noise as a random effect.  Since you have no idea of how representative the noise is of future conditions, you likely have to increase sample size to increase the confidence in extrapolating the results into the future." There are so many possible sources of donor variability that I am seeking to simply confound them into one variable, rather than trying to determine what specifically it is about the donors that causes variability. In that case, it would be most logical to treat this as a random effect. However, since manpower limits the amount of donors I can screen to usually 2 or 3, I probably need to model donors as fixed effects ("I find these specific donors interesting on a case by case basis"). Are there rules of thumb/statistics to test for how many donors I might need to screen before I can make confident predictions about the noise associated with donor variability, that I can extrapolate to the greater population?

 

Thanks.

statman
Super User

Re: Are fixed blocking variables the same thing as a categorical variable, just that they don't appear in the prediction profiler?

Let me be as clear as I can.  It is not JMP that decides how analysis should proceed, it is the user.  Many enumerative statisticians consider block effects to be random effects.  I take a more analytical approach to understanding causal structure.  When I have done due diligence to identify the noise, there is, IMHO, a more effective means of understanding the noise and the ramifications of noise.  Yes, I would include block and all block-by-factor interactions in the saturated model for RCBD.

see: Doug Sanders, Mary G. Leitnaker & Robert A. McLean (2002) Randomized Complete Block Designs in Industrial Studies, Quality Engineering, 14:1, 1-8, DOI: 10.1081/QEN-100106880

Regarding your second paragraph, it is unlikely that you will be able to infer over ALL donors by sampling just two of them.  You might have hypotheses regarding why donors would effect the response (e.g., age, underlying conditions, sex, genetics...).  If you can select donors that would capture the extremes of the donor conditions (like bold level setting), you might be able to increase the inference space sufficiently to have the results of your study be useful in the future.  However, if this is not possible, you will need to capture the donor-to-donor variation over a much larger sample.

The rule of thumb:

 “Unfortunately, future experiments (future trials, tomorrow’s production) will be affected by environmental conditions (temperature, materials, people) different from those that affect this experiment…It is only by knowledge of the subject matter, possibly aided by further experiments  (italics added) to cover a wider range of conditions, that one may decide, with a risk of being wrong, whether the environmental conditions of the future will be near enough the same as those of today to permit use of results in hand.”

Dr. Deming

"All models are wrong, some are useful" G.E.P. Box

Re: Are fixed blocking variables the same thing as a categorical variable, just that they don't appear in the prediction profiler?

Since there are interactions, I'd be tempted to utilize your solution 1, and treat the donor as a factor, especially as the profiler can then help visualize these interactions. 

However, having this in the prediction profiler isn't particularly helpful because donor variability is something I will always have to deal with - I can't just keep 'maximizing' by going back to that same donor over and over.

Alt-clicking on the donor in the profiler can mitigate the maximizing problem if you select "Lock Factor Setting." This will make the profiler maximize the other factors, while keeping the donor from moving. Of course, I might be misreading what you wrote, and locking the factor setting may be exactly what you're trying to avoid.

 

Jed_Campbell_0-1666622487746.png

 

Re: Are fixed blocking variables the same thing as a categorical variable, just that they don't appear in the prediction profiler?

Others addressed the issues well. I want to address only the idea of excluding the blocking factor from interaction effects. It is a bit of a philosophical decision. The design assigns a block of runs to one level of a noise factor, such as Day or Lot. Consider the block effect to be a local adjustment to the intercept. It is a change from the overall mean response to account for a shift in a block. We exclude interactions because they indicate that the fixed effects of factors change. They aren't supposed to do that. The relationship between the response and the factors should be the same in every block. If that is not the case, then there must be lurking, unknown factors that change between blocks and contribute more fixed effects. So it is not the math or the linear model; it is a scientific point of view behind the exclusion.

statman
Super User

Re: Are fixed blocking variables the same thing as a categorical variable, just that they don't appear in the prediction profiler?

I may not understand your point @Mark_Bailey?  Um, "they aren't supposed to do that"?  Guess what, this happens frequently in the real world.  This is an indication the product or process design is not robust.  In order to create robustness, you must experiment on the design factors while the noise is changing.

 

Are you suggesting that block-by factor interactions cannot exist from a scientific point of view? If so I must disagree.  Let me give one example (from hundreds I have worked on).  Design an ink for ball point pens.  The delivery system for the ink is a function of gravity and capillary.  The user holds the pen to a substrate (e.g., paper) and the ball moves "into" the pen and allows ink to flow.  The substrate absorbs the ink. There are a number of design factors associated with the product (e.g., geometries, dimensions, materials, chemistry). The following is a short list of noise associated with the operation.  None of these can be controlled by the pen manufacturer.

Angle the pen is held at

Pressure applied to the pen

Absorption ability of the substrate

Ambient conditions 

 

Historically, these "factors" were held constant while performing the "Write Test" to quantify performance measures of the ink/pen assembly.  Designs were being "optimized" for a situation that virtually never happens in the real world. What we discovered through experimentation, is that the performance of the design factors depends on noise (This discovered originally through RCBD and then optimized through Cross Product Arrays (split-plots)).  Quite rational and scientifically reasonable.  The reason for customer complaints was due to not testing the conditions under which the pen would be used.

 

"All models are wrong, some are useful" G.E.P. Box