Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- JMP User Community
- :
- Discussions
- :
- Correlation analysis of non-normal proportion data

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Correlation analysis of non-normal proportion data

Created:
May 1, 2020 12:38 AM
| Last Modified: May 1, 2020 9:23 AM
(784 views)

Hi JMP community,

I'm analyzing a dataset, and I would like to analyze how the proportion of damage measured on fruits correlate with the proportion of ants and the number of fruit flies. The data is not normally distributed (see histogram), and I'm planning on making a GLM with binomial distribution. I've learned that correlation analysis of proportion data is a bit tricky, and therefore I seek help for how to do this in the best way.

6 REPLIES 6

Highlighted
##

Could you attach the JMP data set instead of a Picture? You want to look for the relationships between damage and 2 types of insects? How do you measure damage? Is there a gradation of damage (or just damaged or not)? Can you differentiate the damage due to a fruit fly vs. an ant?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Correlation analysis of non-normal proportion data

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Correlation analysis of non-normal proportion data

Yes of course, sorry for being so unclear. I have a measure of the proportion of fruits damaged by fruit flies - this measure is calculated as the number of fruits with damage divided by the total number of fruits within one square meter. This is the measure I use as the response variable. I want to test the correlation between this measure and 5 other variables, of which only two are numerical. The two numerical explanatory variables are the number of fruit flies caught in traps, and the proportion of ants on trees. Damage on fruits is only caused by fruit flies, the ants are believed to reduce the number of damages, and I want to investigate whether this holds up or not. The three categorical explanatory variables are "country", "mango variety" and "Treatment". Country relates to which country the data is from, and treatment to which kind of treatment has been applied in the orchards to reduce the number of fruit flies. Thus, the hypotheses are: Ants reduce the proportion of damage measured on fruits, and combining high ant proportions with other treatments increases this effect.

I cannot share the data, as I do not have the rights to do so. I hope this clears things up a bit.

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Correlation analysis of non-normal proportion data

Yes, that helps a bit. I still suggest your response variable could be improved, but I digress. A lot depends on the proportions we are talking about. If the proportions are very small, then it will be difficult to detect changes without large sample sizes (and distributional issues could have an impact). If the proportions are large, then distributional issues will have less impact.

I would start with fit model. Enter your proportion damaged into the Y and your 5 input variables into the model.

BTW, you can always send a similar but fake data set so we can show you options to navigate the analysis.

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Correlation analysis of non-normal proportion data

Thank you, I'll try that!

I have about 13.000 data points for each variable, so the sample size is quite large, but the variables are zero-inflated. So model-wise I might try my luck with a zero-inflated beta-binomial distribution, but I've never tried it before, so wish me luck!

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Correlation analysis of non-normal proportion data

If you know which of the two outcomes occurred in each case, then you could also use logistic regression or a binary generalized linear model. I agree with @statman's suggestions. I am just adding an alternative approach to the analysis.

Learn it once, use it forever!

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Correlation analysis of non-normal proportion data

Thank you! I've been recommended to try a beta-binomial or zero-inflated beta-binomial model, due to the zero-inflation of my data, but I've never done any of those before, so if it fails I'll try a "normal" binary GLM, which I'm much more familar with.

Article Labels

There are no labels assigned to this post.