Our World Statistics Day conversations have been a great reminder of how much statistics can inform our lives. Do you have an example of how statistics has made a difference in your life? Share your story with the Community!
Choose Language Hide Translation Bar

Generalized Regression: Model Selection

I'm using the Generalized Regression platform and try to find the best model for my data. When I compare the different models in the Generalized Regression Platform, I get


The Generalized Rsquare values are all around 0.94 for the training set and 0.95 for the validation set.


Then I've used the Formula Depot -> Model Comparison to get further information about the models. I get this table



The Rsquare values for the training set are in the range 0.53-0.62 and for the validation set 0.8-0.9.


I have two questions:

- Is it normal that the Rsquare values are so much lower than the Generalized Rsquare values?

- Is it indicative for a problem that the Rsquare for the validation set are higher than for the training set (I've tries 3 different splits of the data between the validation and the training set.) If the Rsquare for the training set was larger than for the validation set, that would be an indication for over-fitting. But here it's the other way around.


Re: Generalized Regression: Model Selection

It turns out the issue was related to the points included in the training and validation sets. There were some outlier points which had a big influence on the Rsquare. So depending on where they ended up the values changed quite a lot.

Excluding them from the evaluation has improved the situation.

Article Labels