BookmarkSubscribeSubscribe to RSS Feed

Least squares regression - removing insignificant terms quickly?

Hi all-

I am doing least squares fits, up to cubic with 15 X's. Only the 1st several ever turn out "significant", so those at the "bottom" with small p values could be at a long as R^2 doesn't go down. This would be a slow iterative process, but would result in a much shorter equation. I know of one other tool that will do this.

Can JMP?


What it sounds like you want to do is fit a large model, and then remove insignificant terms one-by-one. You can do this in JMP using the Fit Model platform using Backward selection.

On the Fit Model dialog, enter the model terms and response, change the Personality to Stepwise, then click Run Model. A Stepwise dialog opens. Change the Direction to Backward, and change the Prob to Leave to 0.05 (or whatever significance level you want). Click the Enter All button.

At this point, you can either click the Go button or Step button. The Go button tells JMP to remove insignificant terms until all terms in the model are significant. If you want to watch each step of that process, use the Step button.

Once the process is finished, click the Make Model button. This opens the Fit Model dialog and populates it with the final model. Click Run Model to fit it.

Also of interest to you may be the All Possible Models and Model Averaging features found on the Stepwise dialog popup menu. See JMP Documentation for complete explanations of these items.

The Stepwise options seems real picky about missing terms, or terms out of order. There was no way to rearrange them in the fit window, even if I knew what order it wanted, so I recreated my terms again. After the backward was complete, that model gave a lower R^2 than the previous model with the same terms all left in (0.9867 vs. 0.9932). So I believe I am trading accuracy for a shorter equation, basically. Correct?

If you remove terms from a model, the R-squared goes down. It doesn't matter if you use backwards stepwise, or any other method, that's what happens.

Now, about your question: "I believe I am trading accuracy for a shorter equation, basically. Correct?" You are the one who brought this up first, you wanted to remove insignificant terms from a model. You get a shorter equation, yes, at the expense of a lower R-squared. However, I wouldn't say you have less accuracy here when you remove terms like this ... you have reduced the precision (increased the variance) of the estimates of predicted value, within the level of noise.
You have answered all my questions perfectly, and I thank you. I do not know the correct way to assess the "appropriateness" of the equation, since I have a possible 800+ terms, and only ~300 points, I could actually force fit it, but this doesn't predict validation points well. So, I must balance between R^2, and ability to predict validation points. Perhaps if removing the insignificant terms leaves me with a more "physics based" regression formula, despite the reduction in R^2, it might better predict the left out validation points.

So I wouldn't rule out the shortened equation.

Thanks for your patience with the "new guy" :)



Jun 23, 2011

With 800 possible terms and 300 data points it would be very easy to over fit. Remember R^2 is not penalized for the number of terms in the model. The R^2 adjusted statistics is a better measure of fit in the sense that it is penalized for each new term in the model. Consider the following data where X is just the row number and Y is a random uniform [0,1].
1 0.28248748
2 0.90870512
3 0.50182335
4 0.12685357
5 0.32946774
6 0.14868958
7 0.53642655
8 0.40279645
9 0.35250681
10 0.37013588

fitting an ordinary least squares model of y=x produces an R^2 of .04 indicating, as it should, that there is no relationship between Y and X.

Now consider the same data, but with 10 indicator variables, one for each row. Fitting a model with Y = X1 - X10 produces an R^2 of 1. Sounds like a perfect model! The problem is that the number of obs is the same as predictors and that will always generate an R^2 of 1. Of course JMP also will report Singularity Details. Meaning that there is no noise left once all the Xs are included.

Y X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
0.28248748 1 0 0 0 0 0 0 0 0 0
0.90870512 0 1 0 0 0 0 0 0 0 0
0.50182335 0 0 1 0 0 0 0 0 0 0
0.12685357 0 0 0 1 0 0 0 0 0 0
0.32946774 0 0 0 0 1 0 0 0 0 0
0.14868958 0 0 0 0 0 1 0 0 0 0
0.53642655 0 0 0 0 0 0 1 0 0 0
0.40279645 0 0 0 0 0 0 0 1 0 0
0.35250681 0 0 0 0 0 0 0 0 1 0
0.37013588 0 0 0 0 0 0 0 0 0 1
I am getting a lot of terms out of order, or missing terms errors when I do the step backwards regression. What are the "rules" for the term orders, and what cannot be left out?

> I am getting a lot of terms out of order, or missing terms errors when I do the step backwards regression. What are the "rules" for the term orders, and what cannot be left out?

You have discovered the difficulties of doing backwards stepwise modeling. This is one of the many reasons why many statisticians advise people to avoid backwards (and all forms of stepwise) modeling. See the comments regarding stepwise regression in this FAQ

Message was edited by: Paige

Message was edited by: Paige
Model selection can indeed be a difficult process sometimes. There are several things you could consider (for example: collinearity, influence, overfitting, adjusted R2, Cp, etc). The ultimate goal is to develop a model that predicts well for observations not used to build the model.

I don't know the context of the situation, or any details of the model, but maybe a difference of 0.99 vs 0.98 may not be important. I don't know. You have to make that judgement.
Definitely a challenge. I have seen an R^2 of .60, and the %errors are all <1%, which is acceptable. I have also seen the opposite. I do have noise inherent in my model. I will shoot for consistent fit of validation points. I still wonder if the "backwards" elimination of insignificant terms actually gives a more phsysics based model. If so, I'd prefer that.

Thanks again,