I have a data set to predict the sales price for houses. I ran a PCA analysis on the continuous variable without changing or cleaning the data. The analysis suggested that 26 component out of the 31 componnet explained 96% of variability.
So I saved the values of these 26 component.I also created the column Updated LotFrontage as lot frontage had a lot of missing value. as the Now I am trying to run a Predictive analysis(Neural,K-Mean) by choosing all the 26 saved component and all the remaining categorical variables as X factor and Sales price as Y.
I want to have higher R-Square and a lower RMSE for the training data compared to the alidation data.