cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
] />

Discussions

Solve problems, and share tips and tricks with other JMP users.
Choose Language Hide Translation Bar
frankderuyck
Level VII

How to make a Covariate DOE with sampling in different levels of a categorical variable?

In attachment a data set with 500 combinations X1 and X2; 17 clustered in 17 categories.

How, with covariates X1 and X2, to generate a 5x17 = 85 run covariate DOE so that from each cluster/category 5 samples are selected? 

3 ACCEPTED SOLUTIONS

Accepted Solutions
Victor_G
Super User

Re: How to make a Covariate DOE with sampling in different levels of a categorical variable?

Hi @frankderuyck,

Is the 5 samples/cluster a necessary condition?

It seems your clusters can be separated easily based on X1-X2 coordinates, and the number of samples per cluster can vary from 16 to 60, so maybe an adaptative sampling based only on the continuous coordinates would be sufficient. Looking at the plot of the cluster points in the X1-X2 experimental space, it reminds me of the clustering process behind the Fast Flexible algorithm (that I described in this blog post):

Capture d'écran 2026-03-27 083645.png

So I tried a more simple way, without the condition 5 samples/cluster, to sample some experiments evenly in the X1-X2 covariates experimental space using the Fast Flexible algorithm:

  1. Open the Space-Filling Designs platform.
  2. In the red triangle, click on "Load design" and choose your X1 and X2 covariates.
  3. Specify a number of runs equal to 85 and click on Fast Flexible Design.
  4. Once the design is created, use Join tables to get the cluster number of the selected samples in the design.

You'll get a good coverage of your covariate space but with a different sampling than you imagined:

Capture d'écran 2026-03-27 085235.png

Number of samples per cluster goes from 3 to 9, so it could be acceptable depending on your objectives.
Here is how the selected points (diamonds) look like on the original full covariate set:

Capture d'écran 2026-03-27 085909.png

Please find attached the design created.

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

View solution in original post

Victor_G
Super User

Re: How to make a Covariate DOE with sampling in different levels of a categorical variable?

@frankderuyck I also thought about using Hard to change factors to have exactly 5 samples per cluster, but since X1-X2 coordinates are directly linked to the clusters and in the same covariate table, you'll have either all or none of the covariate factors as "Hard to change", which will prevent from having the kind of hierarchical structure you wanted to create (5 samples per cluster, and good repartition of X1-X2 coordinates inside each cluster). With covariate factors as hard to change factors, you'll get 5 times one sample per cluster. 

Using the Custom design platform, simply selecting the covariate factors and entering them as "Easy to change", setting a main effects + interactions model with a run size of 85 allow you to get a design close to your needs:

  • 85 runs in total
  • 4 to 6 samples per cluster
  • All 17 clusters are present in the design 

Repartition of the runs in the experimental space is less homogeneous compared to space filling design:

Capture d'écran 2026-03-27 091012.png

Please find attached this other option.

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

View solution in original post

Victor_G
Super User

Re: How to make a Covariate DOE with sampling in different levels of a categorical variable?

In the covariate model, either use the coordinates system with factors X1 and X2, or use Cluster as factor. They represent the same information, but with a different granularity/detail for different objectives and use. Depending on your topic and objectives, you already might have an idea about which factors system (X1+X2 or Cluster) is the most interesting.

I would prefer using X1+X2 as factors in a first intention, as they are the most precise (continuous factors), will require fewer degrees of freedom in the model than 8-levels categorical factor Cluster (which will require 7 Dfs vs. 5 Dfs for main effects, interaction and quadratic effects for the continuous X1 and X2 factors), and you can always go from coordinates to Cluster if you find an optimum, something interesting, or if you need the Cluster information for explanation purposes. The opposite is not true, you'll only have an area to investigate further with the coordinates if you have found one or several clusters of interest with your model.

Hope this answer will help you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

View solution in original post

8 REPLIES 8
frankderuyck
Level VII

Re: How to make a Covariate DOE with sampling in different levels of a categorical variable?

Guess  I need to make Cluster hard to change? 

Victor_G
Super User

Re: How to make a Covariate DOE with sampling in different levels of a categorical variable?

Hi @frankderuyck,

Is the 5 samples/cluster a necessary condition?

It seems your clusters can be separated easily based on X1-X2 coordinates, and the number of samples per cluster can vary from 16 to 60, so maybe an adaptative sampling based only on the continuous coordinates would be sufficient. Looking at the plot of the cluster points in the X1-X2 experimental space, it reminds me of the clustering process behind the Fast Flexible algorithm (that I described in this blog post):

Capture d'écran 2026-03-27 083645.png

So I tried a more simple way, without the condition 5 samples/cluster, to sample some experiments evenly in the X1-X2 covariates experimental space using the Fast Flexible algorithm:

  1. Open the Space-Filling Designs platform.
  2. In the red triangle, click on "Load design" and choose your X1 and X2 covariates.
  3. Specify a number of runs equal to 85 and click on Fast Flexible Design.
  4. Once the design is created, use Join tables to get the cluster number of the selected samples in the design.

You'll get a good coverage of your covariate space but with a different sampling than you imagined:

Capture d'écran 2026-03-27 085235.png

Number of samples per cluster goes from 3 to 9, so it could be acceptable depending on your objectives.
Here is how the selected points (diamonds) look like on the original full covariate set:

Capture d'écran 2026-03-27 085909.png

Please find attached the design created.

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)
Victor_G
Super User

Re: How to make a Covariate DOE with sampling in different levels of a categorical variable?

@frankderuyck I also thought about using Hard to change factors to have exactly 5 samples per cluster, but since X1-X2 coordinates are directly linked to the clusters and in the same covariate table, you'll have either all or none of the covariate factors as "Hard to change", which will prevent from having the kind of hierarchical structure you wanted to create (5 samples per cluster, and good repartition of X1-X2 coordinates inside each cluster). With covariate factors as hard to change factors, you'll get 5 times one sample per cluster. 

Using the Custom design platform, simply selecting the covariate factors and entering them as "Easy to change", setting a main effects + interactions model with a run size of 85 allow you to get a design close to your needs:

  • 85 runs in total
  • 4 to 6 samples per cluster
  • All 17 clusters are present in the design 

Repartition of the runs in the experimental space is less homogeneous compared to space filling design:

Capture d'écran 2026-03-27 091012.png

Please find attached this other option.

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)
frankderuyck
Level VII

Re: How to make a Covariate DOE with sampling in different levels of a categorical variable?

Thanks Victor! I will try both

frankderuyck
Level VII

Re: How to make a Covariate DOE with sampling in different levels of a categorical variable?

Usig hierarchical clustering I reduced to 8 clusters (17 is too much) and tried to sample 6 units/cluster: I am  interested - not only between - but also in within cluster variation. The custom DOE method gives me a more homogenuous sampling across clusters.  I specified a simple interaction model without X*cluster interactions. 

Victor_G
Super User

Re: How to make a Covariate DOE with sampling in different levels of a categorical variable?

This is a good idea. It avoids creating singular clusters with few samples, and should help balancing the number of samples per cluster in the design.

The X*cluster interactions won't add any information in the model (as well as including both X's and cluster information, there is repetition of information), as the clusters already depend on X's, so we might expect multicollinearity if modeling a response with a model including X's, Clusters and X's*cluster interactions.

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)
frankderuyck
Level VII

Re: How to make a Covariate DOE with sampling in different levels of a categorical variable?

You have a point Victor; so for covariate selection, in the DOE covariate model you propose to  include only 1 factor: X1 or X2 or Cluster? 

Victor_G
Super User

Re: How to make a Covariate DOE with sampling in different levels of a categorical variable?

In the covariate model, either use the coordinates system with factors X1 and X2, or use Cluster as factor. They represent the same information, but with a different granularity/detail for different objectives and use. Depending on your topic and objectives, you already might have an idea about which factors system (X1+X2 or Cluster) is the most interesting.

I would prefer using X1+X2 as factors in a first intention, as they are the most precise (continuous factors), will require fewer degrees of freedom in the model than 8-levels categorical factor Cluster (which will require 7 Dfs vs. 5 Dfs for main effects, interaction and quadratic effects for the continuous X1 and X2 factors), and you can always go from coordinates to Cluster if you find an optimum, something interesting, or if you need the Cluster information for explanation purposes. The opposite is not true, you'll only have an area to investigate further with the coordinates if you have found one or several clusters of interest with your model.

Hope this answer will help you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

Recommended Articles