Discussions

Tiago · Apr 20, 2026 11:47 AM

Hey,
I have count data that I collected within the framework of DSD confounding.
I realize the Fit DSD module was probably not meant for this type of data. So I was wondering what would be the best workflow here for model selection, given the confounding within second order terms?

I have tested GLM with ZI Gamma and Forward AICc, but I can't help but consider the effect of the singularities on this.

Thank you!

Victor_G · Apr 21, 2026 12:40 AM

Hi @Tiago,

Welcome in the Community !

It may be hard to help you without more context or an anonymized dataset, but here are some remarks that may help you:

Concerning the distribution to specify for Generalized Regression models, based on your description I would recommend trying a Zero-Inflated Poisson distribution for non-negative count data.
If you have some doubts about the distribution type to choose, you can check using the platform Distributions the one that seems the most appropriate by fitting distributions and testing the Goodness of Fit.
Concerning the estimation method, I would recommend trying these two options (with AICc validation): Pruned Forward Selection with effect heredity option enforced (checked) and Two Stage Forward selection, as the mechanism of this estimation method is very similar to what the Fit DSD platform is doing:
1. first, build a main effects model and estimate these,
2. second, the higher order terms (like interaction and quadratic effects) are considered in the model.

Hope these first remarks may help you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

View solution in original post

Victor_G · Apr 21, 2026 12:40 AM

Hi @Tiago,

Welcome in the Community !

It may be hard to help you without more context or an anonymized dataset, but here are some remarks that may help you:

Concerning the distribution to specify for Generalized Regression models, based on your description I would recommend trying a Zero-Inflated Poisson distribution for non-negative count data.
If you have some doubts about the distribution type to choose, you can check using the platform Distributions the one that seems the most appropriate by fitting distributions and testing the Goodness of Fit.
Concerning the estimation method, I would recommend trying these two options (with AICc validation): Pruned Forward Selection with effect heredity option enforced (checked) and Two Stage Forward selection, as the mechanism of this estimation method is very similar to what the Fit DSD platform is doing:
1. first, build a main effects model and estimate these,
2. second, the higher order terms (like interaction and quadratic effects) are considered in the model.

Hope these first remarks may help you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

Tiago · Apr 21, 2026 03:57 AM

Thank you for the prompt and very helpful reply, @Victor_G

I think this does indeed solve most of my core questions. Just a small note, I can't use ZI Poisson, because I average technical replicates, so I don't have integers anymore, even though it is count data. But I assume this works just as well with Gamma, right?

Thanks again!

Victor_G · Apr 21, 2026 1:40 AM

Hi @Tiago,

Just for clarification, are you talking about replicates or repetitions ?

Repetition is about making multiple response(s) measurements on the same experimental run(s) (same sample(s) without any resetting between measurements). Repetitions only reduce the variation from the measurement system (by using the average of the repeated measurements). Repetitions can be added manually in a data table as new columns, as you're repeating the measurement on the same experimental unit. You can then use these columns to calculate and model the average measurement, variance, etc ...
Replication is about making multiple independent randomized experimental runs (multiple samples with resetting between each runs) for each treatment combination. Replications reduce the total experimental variation (process + measurements) in order to provide an estimate for pure error and reduce the prediction error (with more accurate parameters estimates). They are added automatically (after design generation or augmentation) in a data table as new rows, as they are independent experimental runs.

In the first case, you can use the average with the appropriate distribution (Gamma may work in this setting, to be checked with the Distributions platform). In the second case, the ZI-Poisson distribution may be a more appropriate distribution (only positive integers).

Hope this clarify the situation,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

Tiago · Apr 21, 2026 04:48 AM

Hey @Victor_G ,
You're right. Just a habit in the lab to refer to repeated measurements of the same sample as technical replicates. Sorry for the ambiguity and thank you for the clarification!

Discussions

DSD analysis workflow for zero-inflated, positive continuous count data

Re: DSD analysis workflow for zero-inflated, positive continuous count data

Re: DSD analysis workflow for zero-inflated, positive continuous count data

Re: DSD analysis workflow for zero-inflated, positive continuous count data

Re: DSD analysis workflow for zero-inflated, positive continuous count data

Re: DSD analysis workflow for zero-inflated, positive continuous count data

Recommended Articles