Discussions

frankderuyck · Mar 26, 2026 12:09 PM

In attachment a data set with 500 combinations X1 and X2; 17 clustered in 17 categories.

How, with covariates X1 and X2, to generate a 5x17 = 85 run covariate DOE so that from each cluster/category 5 samples are selected?

Victor_G · Mar 27, 2026 04:02 AM

Hi @frankderuyck,

Is the 5 samples/cluster a necessary condition?

It seems your clusters can be separated easily based on X1-X2 coordinates, and the number of samples per cluster can vary from 16 to 60, so maybe an adaptative sampling based only on the continuous coordinates would be sufficient. Looking at the plot of the cluster points in the X1-X2 experimental space, it reminds me of the clustering process behind the Fast Flexible algorithm (that I described in this blog post):

So I tried a more simple way, without the condition 5 samples/cluster, to sample some experiments evenly in the X1-X2 covariates experimental space using the Fast Flexible algorithm:

Open the Space-Filling Designs platform.
In the red triangle, click on "Load design" and choose your X1 and X2 covariates.
Specify a number of runs equal to 85 and click on Fast Flexible Design.
Once the design is created, use Join tables to get the cluster number of the selected samples in the design.

You'll get a good coverage of your covariate space but with a different sampling than you imagined:

Number of samples per cluster goes from 3 to 9, so it could be acceptable depending on your objectives.
Here is how the selected points (diamonds) look like on the original full covariate set:

Please find attached the design created.

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

View solution in original post

Victor_G · Mar 27, 2026 2:40 AM

@frankderuyck I also thought about using Hard to change factors to have exactly 5 samples per cluster, but since X1-X2 coordinates are directly linked to the clusters and in the same covariate table, you'll have either all or none of the covariate factors as "Hard to change", which will prevent from having the kind of hierarchical structure you wanted to create (5 samples per cluster, and good repartition of X1-X2 coordinates inside each cluster). With covariate factors as hard to change factors, you'll get 5 times one sample per cluster.

Using the Custom design platform, simply selecting the covariate factors and entering them as "Easy to change", setting a main effects + interactions model with a run size of 85 allow you to get a design close to your needs:

85 runs in total
4 to 6 samples per cluster
All 17 clusters are present in the design

Repartition of the runs in the experimental space is less homogeneous compared to space filling design:

Please find attached this other option.

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

View solution in original post

Victor_G · Mar 28, 2026 2:55 AM

In the covariate model, either use the coordinates system with factors X1 and X2, or use Cluster as factor. They represent the same information, but with a different granularity/detail for different objectives and use. Depending on your topic and objectives, you already might have an idea about which factors system (X1+X2 or Cluster) is the most interesting.

I would prefer using X1+X2 as factors in a first intention, as they are the most precise (continuous factors), will require fewer degrees of freedom in the model than 8-levels categorical factor Cluster (which will require 7 Dfs vs. 5 Dfs for main effects, interaction and quadratic effects for the continuous X1 and X2 factors), and you can always go from coordinates to Cluster if you find an optimum, something interesting, or if you need the Cluster information for explanation purposes. The opposite is not true, you'll only have an area to investigate further with the coordinates if you have found one or several clusters of interest with your model.

Hope this answer will help you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

View solution in original post

frankderuyck · Mar 26, 2026 12:21 PM

Guess I need to make Cluster hard to change?

Victor_G · Mar 27, 2026 04:02 AM

Hi @frankderuyck,

Is the 5 samples/cluster a necessary condition?

It seems your clusters can be separated easily based on X1-X2 coordinates, and the number of samples per cluster can vary from 16 to 60, so maybe an adaptative sampling based only on the continuous coordinates would be sufficient. Looking at the plot of the cluster points in the X1-X2 experimental space, it reminds me of the clustering process behind the Fast Flexible algorithm (that I described in this blog post):

So I tried a more simple way, without the condition 5 samples/cluster, to sample some experiments evenly in the X1-X2 covariates experimental space using the Fast Flexible algorithm:

Open the Space-Filling Designs platform.
In the red triangle, click on "Load design" and choose your X1 and X2 covariates.
Specify a number of runs equal to 85 and click on Fast Flexible Design.
Once the design is created, use Join tables to get the cluster number of the selected samples in the design.

You'll get a good coverage of your covariate space but with a different sampling than you imagined:

Number of samples per cluster goes from 3 to 9, so it could be acceptable depending on your objectives.
Here is how the selected points (diamonds) look like on the original full covariate set:

Please find attached the design created.

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

Victor_G · Mar 27, 2026 2:40 AM

@frankderuyck I also thought about using Hard to change factors to have exactly 5 samples per cluster, but since X1-X2 coordinates are directly linked to the clusters and in the same covariate table, you'll have either all or none of the covariate factors as "Hard to change", which will prevent from having the kind of hierarchical structure you wanted to create (5 samples per cluster, and good repartition of X1-X2 coordinates inside each cluster). With covariate factors as hard to change factors, you'll get 5 times one sample per cluster.

Using the Custom design platform, simply selecting the covariate factors and entering them as "Easy to change", setting a main effects + interactions model with a run size of 85 allow you to get a design close to your needs:

85 runs in total
4 to 6 samples per cluster
All 17 clusters are present in the design

Repartition of the runs in the experimental space is less homogeneous compared to space filling design:

Please find attached this other option.

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

frankderuyck · Mar 27, 2026 04:52 AM

Thanks Victor! I will try both

frankderuyck · Mar 27, 2026 06:02 AM

Usig hierarchical clustering I reduced to 8 clusters (17 is too much) and tried to sample 6 units/cluster: I am interested - not only between - but also in within cluster variation. The custom DOE method gives me a more homogenuous sampling across clusters. I specified a simple interaction model without X*cluster interactions.

Victor_G · Mar 27, 2026 02:41 PM

This is a good idea. It avoids creating singular clusters with few samples, and should help balancing the number of samples per cluster in the design.

The X*cluster interactions won't add any information in the model (as well as including both X's and cluster information, there is repetition of information), as the clusters already depend on X's, so we might expect multicollinearity if modeling a response with a model including X's, Clusters and X's*cluster interactions.

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

frankderuyck · Mar 28, 2026 05:15 AM

You have a point Victor; so for covariate selection, in the DOE covariate model you propose to include only 1 factor: X1 or X2 or Cluster?

Victor_G · Mar 28, 2026 2:55 AM

In the covariate model, either use the coordinates system with factors X1 and X2, or use Cluster as factor. They represent the same information, but with a different granularity/detail for different objectives and use. Depending on your topic and objectives, you already might have an idea about which factors system (X1+X2 or Cluster) is the most interesting.

I would prefer using X1+X2 as factors in a first intention, as they are the most precise (continuous factors), will require fewer degrees of freedom in the model than 8-levels categorical factor Cluster (which will require 7 Dfs vs. 5 Dfs for main effects, interaction and quadratic effects for the continuous X1 and X2 factors), and you can always go from coordinates to Cluster if you find an optimum, something interesting, or if you need the Cluster information for explanation purposes. The opposite is not true, you'll only have an area to investigate further with the coordinates if you have found one or several clusters of interest with your model.

Hope this answer will help you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

Discussions

How to make a Covariate DOE with sampling in different levels of a categorical variable?

Re: How to make a Covariate DOE with sampling in different levels of a categorical variable?

Re: How to make a Covariate DOE with sampling in different levels of a categorical variable?

Re: How to make a Covariate DOE with sampling in different levels of a categorical variable?

Re: How to make a Covariate DOE with sampling in different levels of a categorical variable?

Re: How to make a Covariate DOE with sampling in different levels of a categorical variable?

Re: How to make a Covariate DOE with sampling in different levels of a categorical variable?

Re: How to make a Covariate DOE with sampling in different levels of a categorical variable?

Re: How to make a Covariate DOE with sampling in different levels of a categorical variable?

Re: How to make a Covariate DOE with sampling in different levels of a categorical variable?

Re: How to make a Covariate DOE with sampling in different levels of a categorical variable?

Re: How to make a Covariate DOE with sampling in different levels of a categorical variable?

Recommended Articles