turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- JMP User Community
- :
- Discussions
- :
- PCA

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Apr 19, 2017 2:48 PM
(998 views)

Hi,

I ran PCA on the attached Data Set.

The output is :

**I am trying to increase the R-Square Value. In doing so, I want to only consider important variable.**

Till now I have cleaned the data by excluding the outliers and also by removing variables : Garage yrblt ,1stflsf,totrmsabvgrd & Garagecars as they have very high correlation.

Recalculated the Lot frontage as Updated Lot Frontage as lot frontage had a lot of missing variables.

Now my PCA analysis shows that 12 (out of 26) components can explain around 79.5% of the variability. So, am I correct if I save the formula for these 13 components and then exclude the variables that are used to get these component such as LotFrontage to UpdateLotFrontage.

So, Basically instead of using 26 variables now I will use 13 variables. After this variable reduction, I plan to run regression stepwise etc , neural network etc and try to see which method gives the best R-Square value

Is this approach correct??

Thanks

4 REPLIES

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Apr 20, 2017 6:40 AM
(973 views)

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Apr 20, 2017 7:53 AM
(963 views)

My Y variable is Salesprices. It is a Predictive Model.

I am trying to get the best fit model with higher R -Square.

So, If I dont remove the highly correlated variables and then run the PCA.

How do I decide on which components to consider (By runing a regression:stepwise?) and what should I do with the initail set of Variables. Do I consider the initial set of variables or just the new components while building a model?

Regards,

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Apr 22, 2017 6:18 PM
(896 views)

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Apr 24, 2017 8:48 AM
(864 views)