cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
philc86
Level I

Differences in parameter estimates using same multiple regression analysis.

Dear All,

 

I have used JMP to assess the effect of age at entry (AAE) and smoking status on a measure of mitochondrial function (continuous). Smoking status coded 1 or 0 (yes, no), using multiple regression. 

 

I entered them all into fit model - Y = mitochondrial parameter, Construct model effects - AAE and smoking status, and run a standard least squares model. I noted that this output produced different parameter estimates to identical analysis using the same data on SPSS. 

 

I then went back to JMP and performed the analysis but used the generalized regression option, and selected standard least squares as my estimation method. The output for this was then different to that from the original least squares analysis (with a different constant and different parameter estimates), and the SPSS model. The overall R2, adjusted R2 etc were all the same. 

 

Whilst I believe if I entered an age and smoking status into any of the models it would give me the same estimate for the mitochondrial measure corrected for age and smoking status, I want to understand why the parameter estimates and constants are different for all three?

 

I have attached screenshots of the model outputs below.  

1 ACCEPTED SOLUTION

Accepted Solutions
Victor_G
Super User

Re: Differences in parameter estimates using same multiple regression analysis.

Hi @philc86,

 

Note that in Generalized Regression, you can still display Parameter Estimates for Centered and Scaled Predictors, so you can have parameter estimates with centering (and scaling).

 

There is no best option, it has no impact on prediction but it really depends how you consider the variation of your response with regards to your continuous variable age and nominal factor smoking status and how you want to use and interpret the results :

  • In Generalized regression, the estimate for "smoking status[No-Yes]"  is an estimate of the difference between the mean response at that level [No] and the mean response at the last level [Yes] (see Launch the Generalized Regression Personality). So it directly indicates the magnitude and sign of the change between these two nominal levels.
  • In Fit Model, the estimate for "smoking status[Yes]" is an estimate of the difference between the mean response at that level [Yes] and the overall average response[Yes+No]. So you compare the influence of this factor level to the average response for both levels, which might not be useful in this case (you won't have an individual with no smoking status or both). With nominal/ordinal factor at 2 levels, you can still have an estimate of the variation between the two levels by summing the estimates of both levels from Fit Model Report ; for example, if you want to calculate the response variation from a smoker to a non-smoker, simply add the negative estimate value of level Yes and the estimate value of level No. You will find the same value as the estimate from Generalized Regression platform.  

 
You can follow the discussion on this topic and with the comparison between the platforms and nominal/ordinal coding here : 

https://community.jmp.com/t5/Discussions/Random-effect-test/m-p/659523/highlight/true#M84878

"No matter" which model/platform you use, you'll be able to predict correctly the values for given age and smoking status if your model is adequate for the task with a sufficient precision and a representative sample.

 

Hope this will help you clarify the situation,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

View solution in original post

6 REPLIES 6
Victor_G
Super User

Re: Differences in parameter estimates using same multiple regression analysis.

Hi @philc86,

 

You're comparing parameter estimates obtained through different ways.

Your SPSS output shows that estimates are "unstandardized". If you center and standardize your original variables, you'll have the same outputs as in the Fit Model platform.

 

I hope this will help you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)
philc86
Level I

Re: Differences in parameter estimates using same multiple regression analysis.

Hi Victor_G,

 

Thank you very much for such a comprehensive response. Given what you have said about centering, I am correct in thinking it is "best" to use the fit model approach rather than the Generalized Regression approach? My main goal is to use the regression line to predict values for given ages and smoking status. 

 

Many thanks,

 

Phil

statman
Super User

Re: Differences in parameter estimates using same multiple regression analysis.

Regardless of how you analyzed the data "in hand", whether the model you get provides a reasonable prediction based on age and smoking depends completely on how representative  your initial study was of future "conditions".  All of your analysis statistics (e.g., p-value, R-squares, RMSE, etc) are enumerative.They describe the data in hand, that is all.

"All models are wrong, some are useful" G.E.P. Box
Victor_G
Super User

Re: Differences in parameter estimates using same multiple regression analysis.

Hi @philc86,

 

Note that in Generalized Regression, you can still display Parameter Estimates for Centered and Scaled Predictors, so you can have parameter estimates with centering (and scaling).

 

There is no best option, it has no impact on prediction but it really depends how you consider the variation of your response with regards to your continuous variable age and nominal factor smoking status and how you want to use and interpret the results :

  • In Generalized regression, the estimate for "smoking status[No-Yes]"  is an estimate of the difference between the mean response at that level [No] and the mean response at the last level [Yes] (see Launch the Generalized Regression Personality). So it directly indicates the magnitude and sign of the change between these two nominal levels.
  • In Fit Model, the estimate for "smoking status[Yes]" is an estimate of the difference between the mean response at that level [Yes] and the overall average response[Yes+No]. So you compare the influence of this factor level to the average response for both levels, which might not be useful in this case (you won't have an individual with no smoking status or both). With nominal/ordinal factor at 2 levels, you can still have an estimate of the variation between the two levels by summing the estimates of both levels from Fit Model Report ; for example, if you want to calculate the response variation from a smoker to a non-smoker, simply add the negative estimate value of level Yes and the estimate value of level No. You will find the same value as the estimate from Generalized Regression platform.  

 
You can follow the discussion on this topic and with the comparison between the platforms and nominal/ordinal coding here : 

https://community.jmp.com/t5/Discussions/Random-effect-test/m-p/659523/highlight/true#M84878

"No matter" which model/platform you use, you'll be able to predict correctly the values for given age and smoking status if your model is adequate for the task with a sufficient precision and a representative sample.

 

Hope this will help you clarify the situation,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)
dlehman1
Level V

Re: Differences in parameter estimates using same multiple regression analysis.

Victor_G has provided a more thorough answer, but I want to add one point.  The JMP and SPSS least squares outputs are equivalent in what you have provided.  The smoking status variable is treated differently in the 2:  in JMP the coefficient for the shown value is for that value (0) compared with the average of the two values.  So, the difference between smoking and not smoking is double the size of that coefficient:  this matches exactly what SPSS is showing.  So, there really isn't any difference between those 2 outputs.

philc86
Level I

Re: Differences in parameter estimates using same multiple regression analysis.

Hi, 

Yes, I have just noticed this. Many thanks!