cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Check out the JMP® Marketplace featured Capability Explorer add-in
Choose Language Hide Translation Bar
ShriHanuman
Level I

Mixed Stepwise Regression: P-Value Stopping Rules

Hi,

I am trying to perform Mixed Stepwise Regression:

Select Analyze > Fit Model.

Select the response column and click Y, Response.

Select the predictor colulmns and click Add or use one of the Macros if you want to include higher order effects.

Click the button next to Personality and select Stepwise.

Click Run.

Click Minimum BIC and select P-Value Threshold.

Click Forward and select Mixed.

Click Go.

How should I decide upon the entering and leaving P value?? If I dont want to over fit my model?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Mixed Step Wise regression: P value

The stopping rules based on p-values were the first implementation of the Stepwise platform. The default values to enter and to leave are good in the absence of any prior knowledge about the data or the best model. (But then, if you knew the answer, you wouldn't need stepwise regression to search for the best fit.)

Stepwise does not guarantee that you won't under-fit or over-fit your data. It is a 'productivity' tool that saves you time examining inferior fits and gets you close to, if not at, the best fit.

Modern criterion include AICc and BIC. Both criteria attempt to minimize bias (under-fitting) while also minimizing variance.(over-fitting). They do not require absolute thresholds like the p-value stopping rules. I recommend that you use one of these stopping rules.

The history allows you to examine some key information of the different models encountered during the search path. You can select any of them using the radio buttons on the right side of this list. You can then return to the top and click Make Model (for the Fit Model dialog) or Run Model (for the Fit Least Squares platform) where you can make fine adjusts within the Effect Summary table.

View solution in original post

3 REPLIES 3

Re: Mixed Step Wise regression: P value

The stopping rules based on p-values were the first implementation of the Stepwise platform. The default values to enter and to leave are good in the absence of any prior knowledge about the data or the best model. (But then, if you knew the answer, you wouldn't need stepwise regression to search for the best fit.)

Stepwise does not guarantee that you won't under-fit or over-fit your data. It is a 'productivity' tool that saves you time examining inferior fits and gets you close to, if not at, the best fit.

Modern criterion include AICc and BIC. Both criteria attempt to minimize bias (under-fitting) while also minimizing variance.(over-fitting). They do not require absolute thresholds like the p-value stopping rules. I recommend that you use one of these stopping rules.

The history allows you to examine some key information of the different models encountered during the search path. You can select any of them using the radio buttons on the right side of this list. You can then return to the top and click Make Model (for the Fit Model dialog) or Run Model (for the Fit Least Squares platform) where you can make fine adjusts within the Effect Summary table.

lazzybug
Level III

Re: Mixed Step Wise regression: P value

Hi Mark_Bailey,

 

Thank you so much for your kind help to answer most of our questions.

 

Could you please answer one of my questions related to stepwise regression?

 

I have a custom design with 7 continuous factors with the full response surface model. The total design has 48 experiments with 2 blocks. After the first block was finished, I tried to use stepwise regression control to fit my data. By using P-value threshold and Minimum BIC, I got the similar model. Both model gave me P-value<0.05, and R^2=0.999998, Rsquare Adj = 0.99996. R^2=1 means this model is overfit. P-value is less than 0.05 means the model is significant. How I can interpret this model? Is it enough not run the block two? 

Re: Mixed Step Wise regression: P value

First of all, understand that Stepwise regression is a productivity tool to quickly reject models that are inferior but it cannot guarantee that the last model is the best one. It might be, or it might be close. You must intervene. Second, the result is from a guided search in the forward or backward direction. The path of the search does not examine many possible models, so you might have missed the best model (i.e., local versus global). Third, you must also examine the data and the estimated errors (residuals) to determine if there is aberrant data or violations of linear regression that could misdirect such a search.

 

The choice of a criterion for the Stepwise search is important. Each one has advantages and disadvantages. I don't use many 'rules of thumb,' so I don't consider a high R square as an indication of over-fitting. It happens. Over-fitting means that the model does not generalize to new data, which is important in your situation. The adjusted R square attempts to address over-fitting. Cross-validation is the best way, but you do not have much data for the various CV techniques to accomplish this method. You might try the Bootstrap feature in if you have JMP Pro.

 

Is the second block merely a duplicate of the first block, or does it contain unique treatments that are not present in the first block? Also, what is the cost (time, money, et cetera) of running the second block.