Hello, I just wanted to know what the difference was between using Stepwise - All Possible Models estimation and the Generalized Regression - Best Subset estimation method. It seems to me that they should produce equal results if there response is normally distributed. I have JMP 14 32-bit and found that I can run larger max effects using Best Subset, than I can with All Possible Models due to memory limitations.
Stepwise regression follows a path in the forward direction, backward direction, or in mixed directions. It considers changing the model by a single term based on the current model and the candidate terms. So there is a chance that the best model was not visited and the chance that this might occur increases with the number of candidate terms. This approach allows you to consider many possible terms in most practical situations.
The All Possible Models option, on the other hand, visits all possible models up to the complexity that you specified. Your available memory and time might limit the practical number of terms that can be considered.
The Generalized Regression option is somewhere in between. Here is the explanation from Help:
Best Subset Computes parameter estimates by increasing the number of active effects in the model at each step. In each step, the model is chosen among all possible models with a number of effects given by the step number. The values on the horizontal axes of the Solution Path plots represent the number of active effects in the model. Step 0 corresponds to the intercept-only model. Step 1 corresponds to the best model of the ones that contain only one active effect. The steps continue up to the value of Max Number of Effects specified in the Advanced Controls in the Model Launch report. See “Advanced Controls”
on page 309. Tip: The Best Subset Estimation Method is computationally intensive. It is not recommended for large problems.
Hope this explanation helps.
So you won't necessarily get the same model from all these methods and the distribution has nothing to do with the outcome, assuming that you use the same distribution in all cases.
Remember that these methods are aids to selecting a model. None of them guarantees that it will find the best model.
Thanks for the answer. I found this PDF that explained it well: GR Platform
I'll add a few additional thoughts to supplement @markbailey 's wise advice. If you are using multiple model fitting algorithms, it's generally a best practice to use some sort of model cross validation scheme. So results might differ if for no other reason the type of cross validation scenario that is chosen. Some of these methods have a 'random' component to them so the exact data being used to fit a model may differ from one pass to another through the modeling method.
Also, within the Generalized Regression platform, there are several modeling personalities. Answers will differ a bit based on the chosen personality. Some of these personalities allow for what I'll call 'a little bit of bias in the parameter estimates' (compared to classic ordinary least squares regression) with a trade off in bias for minimizing prediction variance. Thematically you should, in all likelihood, reach consistent practical conclusions...but if you are looking for an EXACT match in results down to things like specific parameter estimates, prediction variances, etc...there will be differences.
In practice when fitting lots of different linear as well as non linear (think methods like regression/classification trees) methods, when comparing multiple modeling methods, I look for reasonableness and consistency of practical conclusions across all the methods. JMP Pro's Model Comparison platform is a very efficient one stop shopping hub for making these comparisons. This in turn raises my level of confidence that the practical conclusions I'm reaching are the best I can do.