Hi
I wonder if anyone can help out with my question?
I have a dataset on one endoparasite in fish stomachs in a lake with four different fish phenotypes (Ab, Dw, Pi, Pl). The data fits well to a negative binomial distribution (see graph).
I use a generalized regression with negative binomial distribution to analyse (Y) number of parasites in the four fish phenotypes (x1) with body length (x2) as a covariate.
In the results (se output below) both x1 and x2 are significant and I want to extract "least squares means" in the four fish phenotypes after doing a multiple Tukey comparison (that contrast x1 and corrects for x2). What I get is the incidence rate estimates (with st.err). How can I "back calculate" to get real numbers (i.e. means) of parasites in the four fish phenotypes?
Number (Y) of endoparasites (X) in fish stomachs.
Output from analysis
Parameter estimates:
Tukey multiple test:
Hi @Kjartan ,
It seems that you are asking about a few different things.
First of all, are you sure that a negative binomial is the right distribution to use for your data? Sure, it might fit, but does it make sense to use it? According to JMP online help, the negative binomial distribution is good for modeling a number of successes before a specified number of failures. The data that you're looking at doesn't really sound like it's following this model. Based on your description, it almost sounds like a Poisson (or ZI Poisson) distribution would be more appropriate as it's the counts of the endoparasite within a certain fish phenotype -- or even just overall.
As far as the model goes, it sounds like you have two factors, X1 (fish phenotype) and X2 (body length) -- do you know if there is any cross term present -- X1*X2, that is, the phenotype and body length have an interaction. If not, then leave them separate and individual factors.
Regarding the means per phenotype, you could just do an ANOVA (or MANOVA) of the response vs. phenotype and look at the means for each phenotype.
Hope this helps,
DS