What is a covariate in design of experiments?

Ryan_Lekivetz · Feb 25, 2021 09:28 PM

Throughout my time at JMP, I have had many design problems that needed the use of covariates, both for my own problems and to help customers find a solution to their design problem. What I have noticed is that many customers, even very experienced ones, are not aware of the covariate option in the Custom Design platform.

To start off, I want to point out it has lived right in the Add Factor dropdown in Custom Design since JMP 10, and you may not have even noticed until now:

What is a covariate?

Part of the confusion stems from the use of the term covariate. In some contexts, you will see it used as an effect to control for, but not of primary interest (ANCOVA), and you may even see it loosely used for any independent factor in defining a model.

In a designed experiment, a covariate is an input variable that we want to account for in our experiment but we cannot control it to be any value in the way we can for other types of factors. However, if we can measure the values of such inputs ahead of time, we can account for them when designing the experiment.

When would I use a covariate?

If you think of a covariate from the standpoint of “uncontrolled, but observable ahead of time,” there are a few different use cases that come up. You may often hear the idea of a candidate set. In the Custom Designer, specifying the covariate factors produces a “candidate set” of runs for the Custom Designer to use. In Custom Design, the candidate set is specified from a data table. Very often, we have additional controllable factors (that can take on any value in the range), that we can allow the Custom Designer to pick as it sees fit (the way we usually define factors).

Broadly speaking, I tend to break the use of covariates into two cases:

Using a subset of the rows.
Using all the rows.

Using a subset of rows

When using the Custom Designer to select a subset of the covariates set, the idea is to use the values of the covariates that we can measure ahead of time to pick the best runs from the candidate set according to the experimental goal. This ends up being much more efficient than simply taking a random sample.

For an excellent example, I like to point people to Chapter 9 of Optimal Design of Experiments: A Case Study Approach. If you do not have a copy of the book, you can read an outline of the idea in an earlier blog post from Bradley Jones.

Another common use case is to provide a candidate set that enforces some constraint on the design space. While this can be done using factor constraints from the Custom Designer, the candidate set approach is useful when the region is quite complex, or if you want the runs of an experiment to have a certain structure. For example, you might want continuous variables to take on only five distinct values or restrict the number of non-zero factors in any given experimental run.

Using all the rows

Like the subset case described above, this can occur when all our experimental units are chosen, and we can measure some uncontrollable values before designing the experiment.

A common use case that may be less obvious is to force a desired structure for a subset of the factors.

For instance, say if you were designing a 12-run experiment with one two-level categorical factor, X1 with levels A and B, and four continuous factors, X2-X5, with an added restriction that for the categorical factor, 1/3 of the runs need to be at level A and 2/3 at level B.

All we need to do is create a data table for the categorical factor with 12 runs, and a column labeled with that factor name. Put four rows as A, and eight rows as B.

In Custom Design, choose Add Factor->Covariate, and then add the remaining four continuous factors. If we keep the number of runs at 12 before clicking Make Design, like this,

the resulting design will force the runs for X1 as specified by the candidate set (with the desired ratio) and design for the optimal settings for the continuous factors.

Other examples using this idea from previous blog posts include the creation of an experiment robust to a linear trend in the response over time or ensuring a definitive screening design structure for a subset of the factors, such as an experiment for hard-boiled eggs.

Anything else?

We have made some improvements to covariate handling in JMP 16. I've highlighted those new pieces in another post.

Byron_JMP · ‎02-26-2021

What does, "Enforce use oil selected covariate rows" do to the design?

Ryan_Lekivetz · ‎02-26-2021

@Byron_JMP Great question! I'll have some more details/examples in the next post, but effectively if you have rows selected in the data table when you load it in, it will force those rows in even if they're not an optimal choice.

Byron_JMP · ‎03-02-2021

@Ryan_Lekivetz looking forward to seeing more, the Covariates role looks like its going be really useful.

MannyUy · ‎04-14-2021

Would be more useful with examples of factors that we cannot easily control such as relative humidity, time of the day, etc.

Peter_Hersh · ‎10-11-2023

@Ryan_Lekivetz, What optimality criteria is the candidate set utilizing to make it's selection? Is this based on the underlying DOE?

Ryan_Lekivetz · ‎10-13-2023

@Peter_Hersh - Correct, as of JMP 16, it's using the optimality criteria as defined by the Custom Design specification.

Jed_Campbell · ‎10-13-2023

Thanks @Ryan_Lekivetz, I really like the idea of using covariates as a way to force structure/constraints on the design--I hadn't thought of it that way before. Another potential use for covariate factors is to use curved data as inputs to an experiment, rather than the traditional use of curves as outputs.

Peter_Hersh · ‎10-20-2023

@Ryan_Lekivetz Thanks is there a way to apply use MaxPro Criterion in FFF space filling designs for candidate selection? Which criteria will be used if you Augment the design? A bit of a niche use case the hope is to make a space filling design then Augment to catch corners, axial points and use the MaxPro Criterion to select a candidate set.

MichaelR1 · ‎11-29-2023

Thanks @Ryan_Lekivetz for the explanation. We run an assembly machine with adhesives and control machine settings in a DOE. The covariate in this case is the weight of adhesive that is applied by the machine, since it changes continuously from run to run. We don't have a way to precisely control the weight, only an approximate dose control.

The question: can we add the measured weight as a covariate? This information is obtained when we perform the DOE run. How does this look in JMP 17?

Jed_Campbell · ‎11-29-2023

Hi @MichaelR1. If I'm understanding you correctly, the measured weight would be considered an "Uncontrolled" factor, which you'd select from the same menu (first screenshot on this page).

MichaelR1 · ‎12-06-2023

@Jed_Campbell that's correct. We try to control it with machine settings but it could be considered an uncontrolled factor. I wasn't aware that you could add an uncontrolled factor into the design. How does this work?

Jed_Campbell · ‎12-06-2023

@MichaelR1 Assuming you're using the Custom Designer, you'd select "Uncontrolled" in the "Add Factor" dropdown menu. During the experiment, record the weight for each run, and the model will account for it.

Victor_G · ‎12-06-2023

@MichaelR1 You can find explanations about the different factors type and their use in the JMP Help section : Factors (jmp.com)