Hi,
I ran PCA on the attached Data Set.
The output is :
I am trying to increase the R-Square Value. In doing so, I want to only consider important variable.
Till now I have cleaned the data by excluding the outliers and also by removing variables : Garage yrblt ,1stflsf,totrmsabvgrd & Garagecars as they have very high correlation.
Recalculated the Lot frontage as Updated Lot Frontage as lot frontage had a lot of missing variables.
Now my PCA analysis shows that 12 (out of 26) components can explain around 79.5% of the variability. So, am I correct if I save the formula for these 13 components and then exclude the variables that are used to get these component such as LotFrontage to UpdateLotFrontage.
So, Basically instead of using 26 variables now I will use 13 variables. After this variable reduction, I plan to run regression stepwise etc , neural network etc and try to see which method gives the best R-Square value
Is this approach correct??
Thanks
... View more