Solved: Re: Binary Data and Correlations - Page 2

Report Inappropriate Content · Dec 2, 2016 02:35 PM

The other day, I was asking about binary data and correlations. I had been using the multivariate correlation tool which outputs the correlation coefficients into a matrix. However, from further research, I don't think using this correlation tool is meaningful for categorical data because it uses the Pearson Product Moment Correlation. In my data, each value is a 0 or 1 (pass or fail) which is binary data. I have read that I should use the Phi Coefficient to calculate the correlation for binary numbers.

I have two questions:

1) Can I interpret the coefficient the same way as the PPMC? I know that the closer it is to +1, there is a strong positive correlation. If I square this coefficient, can I still use it to calculate coefficient of determination?

2) Can I calculate this using JMP? Is there a way to change the correlation matrix so it calculares this phi coefficient instead of the PPMC?

Thanks, Natalie

Mark_Bailey · Dec 3, 2016 10:35 AM

Here is a script that collects RSquare (U) and Kappa from all combinations of your binary columns:

Names Default To Here( 1 );

dt# = Current Data Table();
If( Is Empty( dt# ),
	Throw( "Data table missing" )
);

// user choices.
dlg# = Column Dialog(
	Title( "Binary Agreement" ),
	yCol# = Col List( "Binary Columns",
		Min Col( 2 )
	),
	"Select columns for agreement"
);

// check if user decides to quit.
If( dlg#["Button"] == -1,
	Throw( "User cancelled" );
);

// process information returned from dialog.
Remove From( dlg# ); Eval List( dlg# );

n cols# = N Items( yCol# );
r sqr u# = agree# = Identity( n cols# );
measure# = List();
For( col1# = 1, col1# < n cols#, col1#++,
	Insert Into( measure#, yCol#[col1#] << Get Name);
	For( col2# = 2, col2# <= n cols#, col2#++,
		ct# = dt# << Contingency(
			Y( yCol#[col1#] ),
			X( yCol#[col2#] ),
			Contingency Table( 0 ),
			Mosaic Plot( 0 ),
			Tests( 1 ),
			Agreement Statistic( 1 ),
			Invisible
		);
		ctr# = ct# << Report;
		r sqr u#[col1#,col2#] = r sqr u#[col2#,col1#] = ctr#["Tests"][TableBox(1)][NumberColBox(4)][1];
		agree#[col1#,col2#]   = agree#[col2#,col1#]   = ctr#["Kappa Coefficient"][TableBox(1)][NumberColBox(1)][1];
		ct# << Close Window;
	);
);
Insert Into( measure#, yCol#[col1#] << Get Name );

New Window( "Binary Agreement",
	Outline Box( "RSquare (U)",
		tb1 = Table Box(
			String Col Box( "Measure", measure# )
		)
	),
	Outline Box( "Kappa Coefficient",
		tb2 = Table Box(
			String Col Box( "Measure", measure# )
		)
	)
);

For( col# = 1, col# <= n cols#, col#++,
	tb1 << Append(
		Number Col Box( measure#[col#], r sqr u#[0,col#], << Set Format( 7, 4 ) )
	);
	tb2 << Append(
		Number Col Box( measure#[col#], agree#[0,col#], << Set Format( 7, 4 ) )
	);
);

natalie_ · Dec 13, 2016 09:57 AM

Okay, so I found where to find the Kappa Coefficient; it's under the "Agreement Statistic" table. The "Agreement Statistic" table only appears when both X and Y variables have the same levels. What does that mean? Is there a way to check for this?

Mark_Bailey · Dec 13, 2016 09:01 PM

Binary Data and Correlations

Re: Binary Data and Correlations

Re: Binary Data and Correlations

Re: Binary Data and Correlations

Recommended Articles