Subscribe Bookmark RSS Feed

Principal component analysis questions

mikedriscoll

Community Trekker

Joined:

Jun 23, 2011

Hi, I'm fairly new to PCA.  I've read up on it a bit and watched several YouTube lectures on the subject. I think I have a so-so handle on it.  I understand the multi-dimensional orthogonal nature of it, that I can use it for variable reduction and categorizing, and it is looking for linear relationships.

I'm more curious about finding trends with parameters of interest using JMP.  For example, if I've got 1 or 2 parameters of interest (Y, response), and 10 or 1000 other variables (X, factor), and am looking for a trend, I might run a script to calculate Rsquare of Y1 for all X, and Y2 for all X, and only list or plot those with an Rsquare > say 0.8, or just use the native y by x platform and plot all Y by all X.  Either method works well because they are always focusing on my chosen responses, but there can be a lot to sift through.

Can PCA help here?  I realize I can just throw my Y1, Y2 as well as all of the X's in the analysis. I'm assuming I understand the interpretation of the output plots, but is there anyway to have JMP focus on parameters of interest?  With 1000 parameters, the output plots are information-dense and parameter names don't seem to be highlighted / un-highlighted when selecting columns.  Also, please let me know if I'm way off track here with what I'm trying to do with PCA.

10 REPLIES
KarenC

Super User

Joined:

Feb 10, 2013

You might want to try the new modeling utility in JMP12 to screen your x's to narrow down to a smaller subset before going to PCA.

Screen Predictors Utility

mikedriscoll

Community Trekker

Joined:

Jun 23, 2011

Thanks Karen, that looks promising. I'll see if I can install JMP 12.

Peter_Bartell

Joined:

Jun 5, 2014

The key word in your original post I'm focusing in on is 'trends'...to me this implies a time series element to your evaluation of both x and y. Is this the case? Perhaps a multivariate time series modeling approach is called for?

mikedriscoll

Community Trekker

Joined:

Jun 23, 2011

Hi Peter,


The key word in your original post I'm focusing in on is 'trends'...to me this implies a time series element to your evaluation of both x and y. Is this the case? Perhaps a multivariate time series modeling approach is called for?



I think maybe "relationships" would have been a better word choice on my part.  My analyses aren't typically in the time domain. It is semiconductor electrical test data, and sometimes we may have a parameter that we want to understand more about it (for example if the distribution is shifted or skewed for a production lot or new product, or even just a few units in a lot). In this case I might want to check it against other parameters to see if the distribution or units of interest correlate to anything else that may shed light on it.

The general 'multivariate' platform is nice. It's been a long time since I've used it, but I just ran it again and I will add it back into my typical toolbox. It does get a little hairy with more than around 15 parameters. I have a multivariate report on my screen now with 20 parameters. It works but there's a fair amount of scrolling... maybe I should ask for a larger screen.  :-)

louv

Staff

Joined:

Jun 23, 2011

Have you tried the color map in the multivariate analysis when you have more than 15 parameters?

8853_Screen Shot 2015-06-02 at 4.49.45 PM.png

mikedriscoll

Community Trekker

Joined:

Jun 23, 2011


Have you tried the color map in the multivariate analysis when you have more than 15 parameters?


Thanks, I didn't know that was available! It definitely condenses the results.

-Mike

louv

Staff

Joined:

Jun 23, 2011

Mike,

No problem,

Yes ever since we added that feature it is my "go-to" visual. My brain seems to assimilate the data more readily.

Lou V

Peter_Bartell

Joined:

Jun 5, 2014

When you are checking one lot against others isn't there a time component to that sort of evaluation? From your second post it sounds like you might also be looking for multivariate outliers...the Outlier analysis sub platform under the Multivariate -> Correlations has a couple outlier analysis techniques there. Also if you are looking to build a model, perhaps a PLS or Generalized Regression and Model Comparison approach might yield some insights? You need JMP Pro for Generalized Regression and Model Comparison...but these are just ideas...

mikedriscoll

Community Trekker

Joined:

Jun 23, 2011


When you are checking one lot against others isn't there a time component to that sort of evaluation?


That is true for some of the analyses I do... either lot by lot in time, even unit by unit sequentially within a lot could be thought of that way as well, although I normally don't treat it that way.  You mentioned multivariate time series earlier. Is there a particular platform you had in mind? I don't have JMP Pro and I didn't see something like this.


From your second post it sounds like you might also be looking for multivariate outliers...the Outlier analysis sub platform under the Multivariate -> Correlations has a couple outlier analysis techniques there.


Thanks. I hadn't noticed these but I will look into them.

-Mike