cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
] />

Discussions

Solve problems, and share tips and tricks with other JMP users.
Choose Language Hide Translation Bar
Tiago
Level II

DSD analysis workflow for zero-inflated, positive continuous count data

Hey,
I have count data that I collected within the framework of DSD confounding.
I realize the Fit DSD module was probably not meant for this type of data. So I was wondering what would be the best workflow here for model selection, given the confounding within second order terms?

I have tested GLM with ZI Gamma and Forward AICc, but I can't help but consider the effect of the singularities on this.

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
Victor_G
Super User

Re: DSD analysis workflow for zero-inflated, positive continuous count data

Hi @Tiago,

Welcome in the Community !

It may be hard to help you without more context or an anonymized dataset, but here are some remarks that may help you:

  • Concerning the distribution to specify for Generalized Regression models, based on your description I would recommend trying a Zero-Inflated Poisson distribution for non-negative count data.
    If you have some doubts about the distribution type to choose, you can check using the platform Distributions the one that seems the most appropriate by fitting distributions and testing the Goodness of Fit.
  • Concerning the estimation method, I would recommend trying these two options (with AICc validation): Pruned Forward Selection with effect heredity option enforced (checked) and Two Stage Forward selection, as the mechanism of this estimation method is very similar to what the Fit DSD platform is doing:
    1. first, build a main effects model and estimate these,
    2. second, the higher order terms (like interaction and quadratic effects) are considered in the model.

Hope these first remarks may help you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

View solution in original post

4 REPLIES 4
Victor_G
Super User

Re: DSD analysis workflow for zero-inflated, positive continuous count data

Hi @Tiago,

Welcome in the Community !

It may be hard to help you without more context or an anonymized dataset, but here are some remarks that may help you:

  • Concerning the distribution to specify for Generalized Regression models, based on your description I would recommend trying a Zero-Inflated Poisson distribution for non-negative count data.
    If you have some doubts about the distribution type to choose, you can check using the platform Distributions the one that seems the most appropriate by fitting distributions and testing the Goodness of Fit.
  • Concerning the estimation method, I would recommend trying these two options (with AICc validation): Pruned Forward Selection with effect heredity option enforced (checked) and Two Stage Forward selection, as the mechanism of this estimation method is very similar to what the Fit DSD platform is doing:
    1. first, build a main effects model and estimate these,
    2. second, the higher order terms (like interaction and quadratic effects) are considered in the model.

Hope these first remarks may help you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)
Tiago
Level II

Re: DSD analysis workflow for zero-inflated, positive continuous count data

Thank you for the prompt and very helpful reply, @Victor_G

I think this does indeed solve most of my core questions. Just a small note, I can't use ZI Poisson, because I average technical replicates, so I don't have integers anymore, even though it is count data. But I assume this works just as well with Gamma, right?

Thanks again!
 

Victor_G
Super User

Re: DSD analysis workflow for zero-inflated, positive continuous count data

Hi @Tiago,

Just for clarification, are you talking about replicates or repetitions ?

  • Repetition is about making multiple response(s) measurements on the same experimental run(s) (same sample(s) without any resetting between measurements). Repetitions only reduce the variation from the measurement system (by using the average of the repeated measurements). Repetitions can be added manually in a data table as new columns, as you're repeating the measurement on the same experimental unit. You can then use these columns to calculate and model the average measurement, variance, etc ...
  • Replication is about making multiple independent randomized experimental runs (multiple samples with resetting between each runs) for each treatment combination. Replications reduce the total experimental variation (process + measurements) in order to provide an estimate for pure error and reduce the prediction error (with more accurate parameters estimates). They are added automatically (after design generation or augmentation) in a data table as new rows, as they are independent experimental runs.

In the first case, you can use the average with the appropriate distribution (Gamma may work in this setting, to be checked with the Distributions platform). In the second case, the ZI-Poisson distribution may be a more appropriate distribution (only positive integers).

Hope this clarify the situation,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)
Tiago
Level II

Re: DSD analysis workflow for zero-inflated, positive continuous count data

Hey @Victor_G ,
You're right. Just a habit in the lab to refer to repeated measurements of the same sample as technical replicates. Sorry for the ambiguity and thank you for the clarification!

Recommended Articles