Dear JMP community,
I am presented with a problem of building a model using over-dispersed zero-inflated count data. I hope you can help me with this.
I have attached both training and validation datasets to this post. My objective is to build my model on the training dataset and validate it using the validation dataset. I have not worked with such count data before and therefore do not know validate count data models.
In the training dataset,
- Columns x1 to x9 are my main effect predictors
- Columns with a “_quad” suffix are my quadratic effect predictors
- Columns with a “int_” prefix are my interaction terms.
- Column Y is my count response.
I have run zero-inflated negative binomial (ZINB) regression based on the following estimation methods:
- Lasso
- Double lasso
- Adaptive lasso
- Adaptive double lasso
- SVEM lasso
- Elastic net
- Adaptive elastic net
- Ridge
The lasso-based models can be obtained by running the ”ZINB – Lasso selection” script, whereas, the elastic net based models can be obtained by running the “ZINB - Elastic net selection” script.
Predictions based on most of the models have been extracted into columns.
I would like your help on the following:
- How do I test for presence of zero-inflation? And how do I interpret this test result?
- How do I test for presence of over-dispersion? And how do I interpret this result?
- For the different estimation methods, what parameters must I tune to obtain better fit of the model?
- Using the validation dataset, how do I validate the above-mentioned predictions? I would like to validate prediction of 0 counts as well as non-zero counts.
I am using JMP Pro 17.1.0
Please advise.
Thank you.