Discussions

philc86 · Sep 3, 2024 04:52 AM

Dear All,

I have used JMP to assess the effect of age at entry (AAE) and smoking status on a measure of mitochondrial function (continuous). Smoking status coded 1 or 0 (yes, no), using multiple regression.

I entered them all into fit model - Y = mitochondrial parameter, Construct model effects - AAE and smoking status, and run a standard least squares model. I noted that this output produced different parameter estimates to identical analysis using the same data on SPSS.

I then went back to JMP and performed the analysis but used the generalized regression option, and selected standard least squares as my estimation method. The output for this was then different to that from the original least squares analysis (with a different constant and different parameter estimates), and the SPSS model. The overall R2, adjusted R2 etc were all the same.

Whilst I believe if I entered an age and smoking status into any of the models it would give me the same estimate for the mitochondrial measure corrected for age and smoking status, I want to understand why the parameter estimates and constants are different for all three?

I have attached screenshots of the model outputs below.

Victor_G · Sep 3, 2024 7:41 AM

Hi @philc86,

Note that in Generalized Regression, you can still display Parameter Estimates for Centered and Scaled Predictors, so you can have parameter estimates with centering (and scaling).

There is no best option, it has no impact on prediction but it really depends how you consider the variation of your response with regards to your continuous variable age and nominal factor smoking status and how you want to use and interpret the results :

In Generalized regression, the estimate for "smoking status[No-Yes]" is an estimate of the difference between the mean response at that level [No] and the mean response at the last level [Yes] (see Launch the Generalized Regression Personality). So it directly indicates the magnitude and sign of the change between these two nominal levels.
In Fit Model, the estimate for "smoking status[Yes]" is an estimate of the difference between the mean response at that level [Yes] and the overall average response[Yes+No]. So you compare the influence of this factor level to the average response for both levels, which might not be useful in this case (you won't have an individual with no smoking status or both). With nominal/ordinal factor at 2 levels, you can still have an estimate of the variation between the two levels by summing the estimates of both levels from Fit Model Report ; for example, if you want to calculate the response variation from a smoker to a non-smoker, simply add the negative estimate value of level Yes and the estimate value of level No. You will find the same value as the estimate from Generalized Regression platform.

You can follow the discussion on this topic and with the comparison between the platforms and nominal/ordinal coding here :

https://community.jmp.com/t5/Discussions/Random-effect-test/m-p/659523/highlight/true#M84878

"No matter" which model/platform you use, you'll be able to predict correctly the values for given age and smoking status if your model is adequate for the task with a sufficient precision and a representative sample.

Hope this will help you clarify the situation,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

View solution in original post

Victor_G · Sep 3, 2024 05:39 AM

Hi @philc86,

You're comparing parameter estimates obtained through different ways.

Continuous factors are centered by their mean in the Fit Model platform, unless you have specific column properties like Mixture or Coding : Continuous Factors (jmp.com). Centering factors help making sense of the regression equation, and help reduce multicollinearity in the presence of interaction terms or polynomial terms in the model.
More infos : gurnsey_onlineapp17.pdf (sagepub.com)
If you center your variables "manually" in your datatable, using the formula column "Center", the estimates between the Fit Model and Generalized Regression platform will be the same (but not the intercept) :

The predictors are not centered or scaled when using the Generalized Regression platform : Launch the Generalized Regression Personality (jmp.com)
The Fit Model and the Generalized Regression platforms do not handle and code the categorical factors in the same way, see https://community.jmp.com/t5/Discussions/Random-effect-test/m-p/659245/highlight/true#M84852
Nominal Factors (jmp.com)
Statistical Details for Nominal Effects Coding (jmp.com)

Your SPSS output shows that estimates are "unstandardized". If you center and standardize your original variables, you'll have the same outputs as in the Fit Model platform.

I hope this will help you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

philc86 · Sep 3, 2024 09:05 AM

Hi Victor_G,

Thank you very much for such a comprehensive response. Given what you have said about centering, I am correct in thinking it is "best" to use the fit model approach rather than the Generalized Regression approach? My main goal is to use the regression line to predict values for given ages and smoking status.

Many thanks,

Phil

statman · Sep 3, 2024 10:11 AM

Regardless of how you analyzed the data "in hand", whether the model you get provides a reasonable prediction based on age and smoking depends completely on how representative your initial study was of future "conditions". All of your analysis statistics (e.g., p-value, R-squares, RMSE, etc) are enumerative.They describe the data in hand, that is all.

"All models are wrong, some are useful" G.E.P. Box

Victor_G · Sep 3, 2024 7:41 AM

Hi @philc86,

Note that in Generalized Regression, you can still display Parameter Estimates for Centered and Scaled Predictors, so you can have parameter estimates with centering (and scaling).

There is no best option, it has no impact on prediction but it really depends how you consider the variation of your response with regards to your continuous variable age and nominal factor smoking status and how you want to use and interpret the results :

In Generalized regression, the estimate for "smoking status[No-Yes]" is an estimate of the difference between the mean response at that level [No] and the mean response at the last level [Yes] (see Launch the Generalized Regression Personality). So it directly indicates the magnitude and sign of the change between these two nominal levels.
In Fit Model, the estimate for "smoking status[Yes]" is an estimate of the difference between the mean response at that level [Yes] and the overall average response[Yes+No]. So you compare the influence of this factor level to the average response for both levels, which might not be useful in this case (you won't have an individual with no smoking status or both). With nominal/ordinal factor at 2 levels, you can still have an estimate of the variation between the two levels by summing the estimates of both levels from Fit Model Report ; for example, if you want to calculate the response variation from a smoker to a non-smoker, simply add the negative estimate value of level Yes and the estimate value of level No. You will find the same value as the estimate from Generalized Regression platform.

You can follow the discussion on this topic and with the comparison between the platforms and nominal/ordinal coding here :

https://community.jmp.com/t5/Discussions/Random-effect-test/m-p/659523/highlight/true#M84878

"No matter" which model/platform you use, you'll be able to predict correctly the values for given age and smoking status if your model is adequate for the task with a sufficient precision and a representative sample.

Hope this will help you clarify the situation,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

dlehman1 · Sep 3, 2024 06:50 AM

Victor_G has provided a more thorough answer, but I want to add one point. The JMP and SPSS least squares outputs are equivalent in what you have provided. The smoking status variable is treated differently in the 2: in JMP the coefficient for the shown value is for that value (0) compared with the average of the two values. So, the difference between smoking and not smoking is double the size of that coefficient: this matches exactly what SPSS is showing. So, there really isn't any difference between those 2 outputs.

philc86 · Sep 3, 2024 09:01 AM

Hi,

Yes, I have just noticed this. Many thanks!

Discussions

Differences in parameter estimates using same multiple regression analysis.

Re: Differences in parameter estimates using same multiple regression analysis.

Re: Differences in parameter estimates using same multiple regression analysis.

Re: Differences in parameter estimates using same multiple regression analysis.

Re: Differences in parameter estimates using same multiple regression analysis.

Re: Differences in parameter estimates using same multiple regression analysis.

Re: Differences in parameter estimates using same multiple regression analysis.

Re: Differences in parameter estimates using same multiple regression analysis.

Recommended Articles