I am wondering how to decide and determine the continuous variables for PCA process.
I am analyizing house sales price with about 80 variables, and there are too many continuous variables with wide rage. So I want to reduce the set of numerical variables for concise model, but I don't know which variables I should apply for PCA.
Please help me to figure out this. The file is attached below.
Principal Components Analysis is not considered a modeling method but an exploratory data analysis method with a goal of variable reduction. PCA examines various aspects of correlation structures that exist within a group of variables. If variable identification for predictive models is you major goal, then starting in the JMP or JMP Pro Fit Model platform is where you want to work. There are multiple modeling personalities supported there from good old fashioned ordinary least squares (call Standard Least Squares in JMP) to stepwise, general linear models, to name three. The partition platform provides an alternative modeling method which can also be useful for variable identification. If you are running JMP Pro the Generalized Regression platform's penalized regression methods are tailor made for predictive modeling where variable identification is a primary goal. In addition you've got all sorts of flexible model cross validation constructs within JMP Pro. Finally, the JMP Pro Model Comparison and Formula Depot platforms are great for comparing multiple models performance and, if needed, exporting the model to an alternative coding format like SQL, C, or SAS.