Solved: Re: How to do Model Selection from Contingency Analysis

TCM · Jun 10, 2023 1:40 PM

My domain is Consumer Research. I have 7 explanatory variables and 1 response variable. All are categorical.

My objective is to replace the response variable with one or a combination of the explanatory variables.

My approach:

Perform contingency analyses of each of the explanatory variables with the response.
From the contingency analysis, select the explanatory variable with the highest R-square (U). Because the number of levels are different, I don’t think Likelihood Ratio or Pearson chi-square values would be helpful in the model selection.
From the Measures of Association table, use Lambda and Uncertainty values. Choose the explanatory variable with the highest values.

Below is a summary describing the variables and results of contingency analysis.

Variable #	# levels	Rsq(U)	LR-chi sq	Pearson chi sq	Lambda Asym (C\|R, R\|C)	Lambda Sym.	Uncertainty coef (C\|R, R\|C)	Uncertainty coef (Sym)
A	4	.08	248	268	.08, .13	.1	.08, .08	.08
B	3	.02	76	75	.06, .02	.04	.02, .03	.03
C	10	.04	136	153	.05, .04	.045	.04, .03	.034
D	10	.31	961	1207	.33, .13	.22	.3, .2	.245
E	18	.34	1056	1498	.34, .1	.21	.34, .19	.24
F	40	.345	1084	1590	.34, .05	.17	.35, .15	.21
G	6	.32	1000	1286	.35, .28	.31	.32, .26	.29

The response variable has 6 levels.

Levels in variables, D, E, F, G are ordered by ascending intensity. It is assumed the low and high boundaries are similar.

Questions:

Is the approach as outlined valid?
Can I combine one of the variables from D,E,F,G ( I am inclined to select G) with one or more from A,B,C to get a better model (i.e., better replacement for the response)? If so, how might one do this and what metrics might be used to select the best model?

Mark_Bailey · Nov 12, 2020 08:00 AM

The two-way contingency table analysis is valid in its own right, but it is not sufficient for your purpose. Logistic regression using a linear predictor that combines all the variables will satisfy your need better.

Start here in the JMP on-line documentation to learn what logistic regression is, how to set up your data, how to launch the analysis platform, and the results that are available to answer your questions.

View solution in original post

Mark_Bailey · Nov 12, 2020 08:00 AM

The two-way contingency table analysis is valid in its own right, but it is not sufficient for your purpose. Logistic regression using a linear predictor that combines all the variables will satisfy your need better.

Start here in the JMP on-line documentation to learn what logistic regression is, how to set up your data, how to launch the analysis platform, and the results that are available to answer your questions.

TCM · Nov 12, 2020 11:15 AM

Thank you, Mark!
I have used Logistic Regression in the past but only with binary responses (e.g., Stable/Unstable). This instance would be a great learning opportunity. Will get right to it!