Stepwise Regression vs Multivariate Modelling - Why both?
Sep 10, 2020 6:12 AM(165 views)
I apologise for a naive question but, if I have a list of continuous numerical variables and want to determine which contribute significantly to a binomial categorical outcome variable, I know that stepwise regression is probably the goto platform but, wouldn't an iterative multivariate modelling approach using a generalised binomial model do the same? Is there benefit to doing the stepwise approach prior to the multivariate one, especially if one is not looking at interactions or, stated differently, is the multivariate model platform even needed? Taken a step further, if I wanted to partition the variables, wouldn't that platform sort the wheat from the chaff anyway?
There are very many models to choose from today and many methods of selecting the best model within and between each type of model. You can learn a lot about your data, your response, and the relationship of the response to the predictors by using more than one selection method with more than one type of model.
I do not agree with the claim "that stepwise regression is probably the goto platform." It is a useful tool but it has its weaknesses and disadvantages, like any method. And even in the best situation stepwise regression is not guaranteed to find the best model. The user hopes that it saved a lot of manual effort (productivity) and arrive at or close to the best model. But the result of any search method depends on the path taken.
Some modelers will try to identify the useful predictors prior to selecting the best model. Stepwise is primarily a model selection method.
it sounds as if you are using the same type of model (GLM with binomial response) but considering the difference between the stepwise search method and a manual search method. If that is true, then I would expect the same result.
Did you try it? If so, what did you learn? There is no superior selection method or type of model today, so the empirical performance is the final word.
Dr. Shmeuli covers lots of different but important topics wrt to selecting modeling methods.
Lastly, if you have JMP Pro, you'll find the Model Comparison and Formula Depot platforms especially helpful for efficiently comparing the performance of multiple modeling outcomes. And you'll also have access to additional modeling platforms that are adept at handling categorical response data. For example, the penalized regression and PLS platforms may be very helpful.