I'd greatly appreciate advice about analyzing a set of proportion data, where I would like to include both random and fixed factors.
The data are from an experiment with fruit flies. Each of 3 "social" treatments was applied to 3 independent populations of flies, such that there are 9 populations total. A high or low diet treatment was then applied to individual flies within each of the 9 populations. Each of these flies was placed in a separate vial and competed to sire offspring. Offspring sired by these experimental males had red eyes, whereas offspring not sired by them had white eyes, with the total number of red and white eyed offspring varying among vials.
I want to know whether the proportion of offspring sired by these flies differed depending on the social treatment, the diet treatment or the interaction between them. I have set up a model with the fixed factors of diet (with two levels), social treatment (three levels), and the diet by treatment interaction. The model also has the random effect of population nested within the social treatment. I have as the response variable the proportion of offspring with red eyes within a vial, weighted by the total number of offspring within the vial.
In running the model, the residuals were not normally distributed. I applied an arcsin-square root transformation, which helped, but the residuals remained not normally distributed. One option is to apply the logit transformation, which does make the residuals normally distributed.
However, I wonder if a better approach would be to use a generalised linear model with some specified error distribution that matches the data. I imagine that might be a logit link function. I'd greatly appreciate advice about this; especially about whether this would be better than using a logit transformation within a linear model, how to select an error distribution for a GLM and how to set the model up with JMP.
You could use the binomial distribution for the model with the logit link function. Instead of proportion, though, you can use a response column with red and white values and another column with a count of each for all conditions. Assign the column with the counts to the Freq analysis role.
Thanks very much for this suggestion.
I've set this up, such that there are two rows per individual male, for the number of red and white eyed offspring.
In trying to run the model, I run into the difficulty that the GLM won't allow a random effect.
I have the two random effects of individual male, and population. If I turn them into fixed effects, the model loses degrees of freedom for testing the fixed effects I'm interested in.
Any ideas of how to handle this?
I don't see random effects in your study. Sex is a fixed effect. Population is a fixed effect. The individual subjects are the residual and do not need to be explicitly entered as a random effect.
Thanks for this. I thought of population as a random effect because I am not interested in estimates about each of the populations, and if I conducted the experiment again I'd set up new populations of flies. Would you suggest otherwise?
Also, looks like I misunderstood your suggestion of how to set up the model. I set up a column for red and white values and another column with counts, such that each male has two rows, one with red counts and another with white counts. With this set up I need to link the two rows that are associated with each male & so included individual in the model. Is there a different way to set it up without including individual?
More generally, is it the case that a random factor won't work with this kind of model?
Yes, JMP cannot fit a model for a binomial response with a random effect.