cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
Maxpower1123
Level II

Correlation for binary data in different columns

Hi

 

So I have a very large dataset full of 1's and 0's for each column. I am trying to see if there is any correlation between some of the particular columns.

I am trying to correlate data on inside an area vs. outside of the area to see if there are any significant differences between people inside and outside these areas. And I am doing it for many different areas. It is a survey with yes and no answers, so the columns would look something like this with "no" as 0 and "yes" as 1.

 

Inside area               Outside area

0                               0

0                               0

0                               1

0                               1

1                               1

1                               1

1                               1

etc.                           etc.

 

I am not sure how to go about this. I have tried the "Fit Y by X" option with column 1 as Y, Response and column 2 as X, Factor but the result don't really look right. 

 

Can anyone help with this?

 

Thanks

10 REPLIES 10
Maxpower1123
Level II

Re: Correlation for binary data in different columns

Yes, I tried to use the method on a couple of different areas and it looks good!

 

I was looking at the number of 1's and the are 44 in the control group and 66 in the test, so I was just expecting there to be a larger difference than 22 in order for it to be significant. But as you mention one would also need to consider that there is a difference in total sample size between the two groups, so I can see that now. The p-value of 0.0001 just threw me of a bit, as I was not expecting it to be that low.

 

But thank you so much, Dan! You have been extremely helpful and I sincerely appreciate it!