Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- JMP User Community
- :
- Discussions
- :
- Question about PCA

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Question about PCA

Apr 7, 2017 11:55 AM
(1760 views)

Hello,

I am wondering how to decide and determine the continuous variables for PCA process.

I am analyizing house sales price with about 80 variables, and there are too many continuous variables with wide rage. So I want to reduce the set of numerical variables for concise model, but I don't know which variables I should apply for PCA.

Please help me to figure out this. The file is attached below.

Thank you.

3 REPLIES 3

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Question about PCA

That is an interesting data set. You have many categorical descriptors some of which could be considered continuous. For example the location could be given using latitude and longitude.

Is PCA the technique you require? The individual distributions of the numbers can heavily influence the outcome of the PCA analysis.

Are you trying to predict future house prices or understand what the important variables are? A simple partition analysis gives similar result to generalised regression. 90 % of the variability in prices can be described using 4 variables, all of which are numeric but only two of them are continuous.

Is PCA the technique you require? The individual distributions of the numbers can heavily influence the outcome of the PCA analysis.

Are you trying to predict future house prices or understand what the important variables are? A simple partition analysis gives similar result to generalised regression. 90 % of the variability in prices can be described using 4 variables, all of which are numeric but only two of them are continuous.

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Question about PCA

Thanks for the response.

I want to first understand what the important variables are, and then run

regression with those variables to predict future house sales price. Also,

I want to run the regression with all variables, and then compare the

result of regressions to obtain the best mode which should be simpler with

the least error at the end.

I want to first understand what the important variables are, and then run

regression with those variables to predict future house sales price. Also,

I want to run the regression with all variables, and then compare the

result of regressions to obtain the best mode which should be simpler with

the least error at the end.

Highlighted
##

Principal Components Analysis is not considered a modeling method but an exploratory data analysis method with a goal of variable reduction. PCA examines various aspects of correlation structures that exist within a group of variables. If variable identification for predictive models is you major goal, then starting in the JMP or JMP Pro Fit Model platform is where you want to work. There are multiple modeling personalities supported there from good old fashioned ordinary least squares (call Standard Least Squares in JMP) to stepwise, general linear models, to name three. The partition platform provides an alternative modeling method which can also be useful for variable identification. If you are running JMP Pro the Generalized Regression platform's penalized regression methods are tailor made for predictive modeling where variable identification is a primary goal. In addition you've got all sorts of flexible model cross validation constructs within JMP Pro. Finally, the JMP Pro Model Comparison and Formula Depot platforms are great for comparing multiple models performance and, if needed, exporting the model to an alternative coding format like SQL, C, or SAS.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Question about PCA