Building and validating zero-inflated negative binomial regression models

stat_ranger · Sep 4, 2024 04:57 AM

Dear JMP community,

I am presented with a problem of building a model using over-dispersed zero-inflated count data. I hope you can help me with this.

I have attached both training and validation datasets to this post. My objective is to build my model on the training dataset and validate it using the validation dataset. I have not worked with such count data before and therefore do not know validate count data models.

In the training dataset,

Columns x1 to x9 are my main effect predictors
Columns with a “_quad” suffix are my quadratic effect predictors
Columns with a “int_” prefix are my interaction terms.
Column Y is my count response.

I have run zero-inflated negative binomial (ZINB) regression based on the following estimation methods:

Lasso
Double lasso
Adaptive lasso
Adaptive double lasso
SVEM lasso
Elastic net
Adaptive elastic net
Ridge

The lasso-based models can be obtained by running the ”ZINB – Lasso selection” script, whereas, the elastic net based models can be obtained by running the “ZINB - Elastic net selection” script.

Predictions based on most of the models have been extracted into columns.

I would like your help on the following:

How do I test for presence of zero-inflation? And how do I interpret this test result?
How do I test for presence of over-dispersion? And how do I interpret this result?
For the different estimation methods, what parameters must I tune to obtain better fit of the model?
Using the validation dataset, how do I validate the above-mentioned predictions? I would like to validate prediction of 0 counts as well as non-zero counts.

I am using JMP Pro 17.1.0

Please advise.

Thank you.