Solved: Logistic regression: adjust numerical variables by subtracting sample mean befor...

noskaga · Dec 8, 2014 05:04 AM

I intend to develop a survival prediction model for outcome following major trauma.

Dependent variable: survival vs. death 30 days after injury. Independent variables: age (numeric), anatomical injury (numeric), physiological derangement on admission (numeric), and pre-injury comorbidity (categorical).

In a recent publication I have read the following statement: "For numerical variables, the statistical package Stata adjusts the variables by subtracting the sample mean before model coefficients are calculated".

I have performed the same logistic regression (same data set) both in JMP 11 and STATA and obtains the same Odds ratios and the same coefficients, exactly as expected.

"...subtracting the sample mean before model coefficients are calculated"; Is that something that happens both in JMP and STATA (seeing that I have the same results of the logistic regression in both packages) independent of whether or not the user have knowledge of such adjustments? Can anyone help me understand the statement above? Do I have to do any maneuver in JMP to adhere to this, or should I use the regression coefficients for numerical variables as presented in the output when presenting my new model?

I would appreciate if anyone could help me understand.

Best wishes from noskaga

julian · Dec 8, 2014 05:08 PM

Hi noskaga,

The procedure you're describing, centering a variable, is the default option in JMP for situations in which you are fitting powers or interactions between variables. This process, also called "Centering Polynomials" (which is the option name in the Fit Model dialog under the top red triangle), centers each variable before operating on it (through powers or cross-products with other variables) so that the lower order terms are unconfounded with higher-order terms, and maintain an easy interpretation (the "average" effect of a variable assuming other variables are held constant at their mean).

For models with a single variable, or one with no interactions or powers, centering will have no effect on the coefficients (except for the intercept of the model). Odds-ratios are thus unaffected (and this should square with intuition: shifting the mean of the predictor does nothing to change the relationship that predictor has with the response other than the change the "level" of the response at 0, the intercept).

To manually center numeric variables you can use Analyze > Distribution, specify one or more numeric variables, and hit OK. Then select the Red Triangle (next to the variable name) > Save > Centered. If you have many variables, hold down the control key (or command if you're on a mac), then click the Red Triangle (for any of the variables) > Save > Centered, then let go. This will broadcast the command so you save centered versions of every column you have in Distribution.

I hope this helps!

Julian

View solution in original post

julian · Dec 8, 2014 05:08 PM

Hi noskaga,

The procedure you're describing, centering a variable, is the default option in JMP for situations in which you are fitting powers or interactions between variables. This process, also called "Centering Polynomials" (which is the option name in the Fit Model dialog under the top red triangle), centers each variable before operating on it (through powers or cross-products with other variables) so that the lower order terms are unconfounded with higher-order terms, and maintain an easy interpretation (the "average" effect of a variable assuming other variables are held constant at their mean).

For models with a single variable, or one with no interactions or powers, centering will have no effect on the coefficients (except for the intercept of the model). Odds-ratios are thus unaffected (and this should square with intuition: shifting the mean of the predictor does nothing to change the relationship that predictor has with the response other than the change the "level" of the response at 0, the intercept).

To manually center numeric variables you can use Analyze > Distribution, specify one or more numeric variables, and hit OK. Then select the Red Triangle (next to the variable name) > Save > Centered. If you have many variables, hold down the control key (or command if you're on a mac), then click the Red Triangle (for any of the variables) > Save > Centered, then let go. This will broadcast the command so you save centered versions of every column you have in Distribution.

I hope this helps!

Julian

noskaga · Dec 9, 2014 04:24 AM

Hello Julian,

Thank you for your answer, it really helped me understand the expression "centering of a variable" and when it is used by default in JMP.

Nils Oddvar Skaga

Logistic regression: adjust numerical variables by subtracting sample mean before coefficients calculation?

Re: Logistic regression: adjust numerical variables by subtracting sample mean before coefficients calculation?

Re: Logistic regression: adjust numerical variables by subtracting sample mean before coefficients calculation?

Re: Logistic regression: adjust numerical variables by subtracting sample mean before coefficients calculation?