cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
JMP is taking Discovery online, April 16 and 18. Register today and join us for interactive sessions featuring popular presentation topics, networking, and discussions with the experts.
Choose Language Hide Translation Bar
CYLiaw
Level III

Predicted Rsquare

I wanted to use predicted Rsquare to test if my model is overfitting or not.  I am not really familiar with this, so I have a few questions regarding predicted Rsquare.  

 

1. How much smaller for the predicted Rsquare is a sign of overfitting?  If the Adjusted R-square is 0.94, and the predicted R-square is 0.84, is it okay?

 

2. I don't fully understand how the predicted Rsquare was calcuated.  I know that it takes one data point out each time, get a regression model, and put that data point back and get a R-square. It repeated for all data points and average the obtained R-squares.  But how to get those regression models ? Does JMP use machine learning approach to obtain the model? Are those regression models different from the model I choose?

 

3. I found that it is not always true to say that the predicted R-square will drop more if there are more factors in the model. I found that, for example, the model with 2 factors can have a lower predicted R-square than a model with 3 factors (although the third factor has a p value (much) bigger than 0.05).  In this case, should I include 2 or 3 factors in the model?

 

2 REPLIES 2

Re: Predicted Rsquare

You should not use the R square for model selection. It always increases when you make a model more complex (e.g., add a term to the linear predictor in regression). It always decreases when you make a model less complex.

 

Using cross-validation, though, can make the R square more useful for model selection. You might expect that the validation R square would not increase as you over-fit the data. Specifically:

 

  1. There is no way to establish how much of a difference between the training R square and the validation R square indicates over-fitting. It is a subjective decision. One can say that the model for which the two R square estimates over-fitting the least of all the candidate models.
  2. There is an efficient computation of the 'leave one out' statistics using the 'hat' or 'projection' matrix.
  3. I would not use the change in R square to select the model. I would consider other information but mostly pay attention to a criterion such as AICc or BIC.

 

 

statman
Super User

Re: Predicted Rsquare

Just to add to Mark's comments, one of the methods to determine model over specification is to use the delta between the R-square and R-square adjusted.  R-quares increase as the number of degrees of freedom in the model increase (regardless of whether those DF's are important). R-square adjusted takes into account the "importance" of the DF's in the model, so adding unimportant degrees of freedom to the model, the delta will increase (the R-square adjusted will not increase at the rate of the R-square).

"All models are wrong, some are useful" G.E.P. Box