cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
KJLM
Level I

Fit Model pValues change when removing statistically insignificant interactions, why?

JMP version 17

I have a question about the fit model pValues.  

When you put interactions into the fit model and then perform the "stepwise" personality. You can remove extraneous interactions using the pValue threshold. See below: 

KJLM_0-1691009387154.png

Here I have the regression set to remove any interaction that has a Pvalue more than 0.1.  However, when I go to make the model with "standard least squares" personality and "Effect Leverage" Emphasis, I get Pvalues above 0.1 for the same interactions that I just saw in the Regression Control. Why? See below: 

 

KJLM_1-1691009910095.png

 

Additionally, when you remove the worst of the interactions (based on highest pValue), some of the other interactions get better pValues--to the point of becoming statistically significant, when before the deletion, they were not. I assume that the interactions removed do have some effect and that effect must then be redistributed to the other variables, but I don't understand why it makes such a big difference or what is really going on. See below (yes, matching colors indicate the same interactions in the two "effect summary" images). 

KJLM_2-1691010600228.png

 

I am a chemist and I have only the most basic of knowledge of statistics. So, this may be a simple enough answer for an experienced user, but I feel like I am running blind with these models. At what point do you actually know that your model is good? When do you know to remove certain variables? What is the reasoning behind removing certain interactions over others-purely based on the statistics?

 

Any help would be appreciated. Thank you. 

 

1 ACCEPTED SOLUTION

Accepted Solutions
statman
Super User

Re: Fit Model pValues change when removing statistically insignificant interactions, why?

The simple answer is every  model and subsequent significance values are contingent on the terms in the model.  Change the terms in the model and the statistics will likely change.  Now for your situation, I don't have any context (e.g., Is this observational data or is this from a designed experiment?).  If you are using observational data (not from a sampling plan), then there may be other issues.  Have you tested for multicollinearity?  (If you right click on the parameter estimates table, you can select VIF to get a look at this issue.)  My advice for model building, is to start with the SME.  What are the hypotheses that support terms being in the model?  Design sampling plans/DOE to get insight into those hypotheses.  Plan on iterating.  I never rely solely on one statistic to determine appropriate model effects (e.g., p-values).  You need to assess multiple elements of the model (e.g., R-square-R-square adjusted delta, RMSE, Residuals) and never turn off engineering/science.

"All models are wrong, some are useful" G.E.P. Box

View solution in original post

1 REPLY 1
statman
Super User

Re: Fit Model pValues change when removing statistically insignificant interactions, why?

The simple answer is every  model and subsequent significance values are contingent on the terms in the model.  Change the terms in the model and the statistics will likely change.  Now for your situation, I don't have any context (e.g., Is this observational data or is this from a designed experiment?).  If you are using observational data (not from a sampling plan), then there may be other issues.  Have you tested for multicollinearity?  (If you right click on the parameter estimates table, you can select VIF to get a look at this issue.)  My advice for model building, is to start with the SME.  What are the hypotheses that support terms being in the model?  Design sampling plans/DOE to get insight into those hypotheses.  Plan on iterating.  I never rely solely on one statistic to determine appropriate model effects (e.g., p-values).  You need to assess multiple elements of the model (e.g., R-square-R-square adjusted delta, RMSE, Residuals) and never turn off engineering/science.

"All models are wrong, some are useful" G.E.P. Box