Hi JMP community,
I'm analyzing a dataset, and I would like to analyze how the proportion of damage measured on fruits correlate with the proportion of ants and the number of fruit flies. The data is not normally distributed (see histogram), and I'm planning on making a GLM with binomial distribution. I've learned that correlation analysis of proportion data is a bit tricky, and therefore I seek help for how to do this in the best way.
Could you attach the JMP data set instead of a Picture? You want to look for the relationships between damage and 2 types of insects? How do you measure damage? Is there a gradation of damage (or just damaged or not)? Can you differentiate the damage due to a fruit fly vs. an ant?
Yes of course, sorry for being so unclear. I have a measure of the proportion of fruits damaged by fruit flies - this measure is calculated as the number of fruits with damage divided by the total number of fruits within one square meter. This is the measure I use as the response variable. I want to test the correlation between this measure and 5 other variables, of which only two are numerical. The two numerical explanatory variables are the number of fruit flies caught in traps, and the proportion of ants on trees. Damage on fruits is only caused by fruit flies, the ants are believed to reduce the number of damages, and I want to investigate whether this holds up or not. The three categorical explanatory variables are "country", "mango variety" and "Treatment". Country relates to which country the data is from, and treatment to which kind of treatment has been applied in the orchards to reduce the number of fruit flies. Thus, the hypotheses are: Ants reduce the proportion of damage measured on fruits, and combining high ant proportions with other treatments increases this effect.
I cannot share the data, as I do not have the rights to do so. I hope this clears things up a bit.
Yes, that helps a bit. I still suggest your response variable could be improved, but I digress. A lot depends on the proportions we are talking about. If the proportions are very small, then it will be difficult to detect changes without large sample sizes (and distributional issues could have an impact). If the proportions are large, then distributional issues will have less impact.
I would start with fit model. Enter your proportion damaged into the Y and your 5 input variables into the model.
BTW, you can always send a similar but fake data set so we can show you options to navigate the analysis.
Thank you, I'll try that!
I have about 13.000 data points for each variable, so the sample size is quite large, but the variables are zero-inflated. So model-wise I might try my luck with a zero-inflated beta-binomial distribution, but I've never tried it before, so wish me luck!
If you know which of the two outcomes occurred in each case, then you could also use logistic regression or a binary generalized linear model. I agree with @statman's suggestions. I am just adding an alternative approach to the analysis.
There are no labels assigned to this post.