## Correlations with binary data

Community Trekker

Joined:

Jan 6, 2016

Hi everyone,

I have a large data table, and each column has a range check property.  If the data is within the range, the value remains, and if it's not within range, the data is removed/missing.  I then copy the data table, and for every value missing, I replace it with a 0, and for data that is not missing, I replace it with a 1.  So basically, I have a large data table full of 1 and 0s (indicating pass or fail).

I tried the correlation table and I have a few questions.

1)  If two columns are entirely filled with 1s, why is the correlation 0?  From the equation for Pearson Product Moment Correlation, I would be dividing by 0.

2)  If a value in the correlation table is very close to 1 (or -1), does that mean if a value in one column has passed, it is likely that is passed in the other column?

1 ACCEPTED SOLUTION

Accepted Solutions

Super User

Joined:

Jun 22, 2012

Solution

Natalie,

I assume that 2 columns wirh all 1's just can't calculate the correct correlation since there is no variance.

Concerning your second question, a value close to +1 would indicate that a 1 in one of the columns would predict that a 1 would be in the other column.  If you square the Pearson r, you will get the % of variance predicted between the 2 columns.  If you have a value close to a -1, you would predict a zero if there in a one in the other column.

Jim
2 REPLIES

Super User

Joined:

Jun 22, 2012

Solution

Natalie,

I assume that 2 columns wirh all 1's just can't calculate the correct correlation since there is no variance.

Concerning your second question, a value close to +1 would indicate that a 1 in one of the columns would predict that a 1 would be in the other column.  If you square the Pearson r, you will get the % of variance predicted between the 2 columns.  If you have a value close to a -1, you would predict a zero if there in a one in the other column.

Jim

Community Trekker

Joined:

Jan 6, 2016

Thanks Jim!  I did some more research on correlations and came to this conclusion.