Subscribe Bookmark RSS Feed

Re: Very High R-Square Value

ShriHanuman

Occasional Contributor

Joined:

Feb 25, 2017

Hi,

I am trying to calulate R-Square by using Step Wise regression in the attached DataSet

 

Till now I have cleaned the data by excluding the outliers and also by removing variables : 

Garage yrblt ,1stflsf,totrmsabvgrd & Garagecars as they have very high correlation.

 

Recalculated the Lot frontage as Updated Lot Frontage as lot frontage had a lot of missing variables.

 

My problem is the that the R-Square Value in the Step Wise regression, Forward, minimum BIC is 1.

What am i doing wrong??

2 REPLIES
dale_lehman

Community Trekker

Joined:

Jan 29, 2015

I tried to reproduce your results and I do not get R squared = 1.  I get something like 0.87, so i'm not sure what you are doing that is different.  Also, there are a number of things from your description that I would not recommend - too many variables are trying to meausre the same sort of thing and using 80 something variables when you have 1400 data points seems like a poor idea to me.  I also don't recommend deleting outliers - you might try log transformations instead.

Highlighted
dale_lehman

Community Trekker

Joined:

Jan 29, 2015

I can only think of two reasons you would get R squared = 1.  You might have accidentilly included the house id as a nominal variable (althought it looks continuous in your dataset) as a factor - but you should have received a bunch of warnings in the regression results.  You could have accidentilly included the sales price as a factor (by shift-clicking when you added variables).  Either could produce that result, but I can't think of anything else.