Subscribe Bookmark RSS Feed

Whats the best way to clean the attached Data

ShriHanuman

Occasional Contributor

Joined:

Feb 25, 2017

I have a data set to predict the sales price for houses. I ran a PCA analysis on the continuous variable without changing or cleaning the data. The analysis suggested that 26 component out of the 31 componnet explained 96% of variability.

So I saved the values of these 26 component.I also created the column Updated LotFrontage as lot frontage had a lot of missing value. as the  Now I am trying to run a Predictive analysis(Neural,K-Mean) by choosing all the 26 saved component and all the remaining categorical variables as X factor and Sales price as Y.

I want to have higher R-Square and a lower RMSE for the training data compared to the alidation data.

 

Is there a better way to clean this data??