cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Browse apps to extend the software in the new JMP Marketplace
Choose Language Hide Translation Bar
WyattH
Level II

Is there a way to do PCA in JMP with binary and continuous data?

Hi I am sorry if this is a silly question. I am still new to statistics and what I understand of PCA is that it can only accept continuous input. (Does the PCA in JMP works the same way?)

 

I have been reading online topics and saw there are nonlinear PCA? That can use binary and continuous data.

 

Just wanted to ask if there's any setting in JMP Pro 13 to do PCA with binary and continuous data. Also is it correct use of PCA if I just set as 0, 1 for the binary data column -> treat the column as continuous number and use Sparse as estimation method?

 

One last question, let's say I have columns with continuous numbers e.g., 1.0, 2.1, 3.09 

And columns with discrete numbers e.g., 1, 7, 3, 18, 20, 5 ++ 

Do JMP consider these types of columns as continuous values that are suitable for PCA inputs?

 

Any advice is appreciated. Thank you!

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
LauraCS
Staff

Re: Is there a way to do PCA in JMP with binary and continuous data?

Hi WyattH,

The PCA platform in JMP gives you a choice as to whether you want to position the origin at the centroid of the data. If you choose to do PCA  "on Covariances" or "on Correlations" the data will be mean-centered (i.e., means will be subtracted from each column) as part of the pre-processing steps (in the latter they'll also be divided by the corresponding standard deviation, creating Z-score variables). However, if you choose "on Unscaled" the data won't be corrected for the mean. The Unscaled option is mostly there for data that have been previously centered.

 

The PCA platform can handle categorical variables only as supplementary variables. Supplementary variables are not used in the actual decomposition that leads to the PCA, but are projected onto the space determined by the active variables (those in the decomposition), so they help enrich the interpretation of your PCA solution. Thus, if you use the PCA platform with 0/1 variables that have "Continuous" modeling type in the active role (enter them as Y, Columns), then the platform will treat them in the same way that any other numeric, continuous variable. That is, it'll standardize them if you go with the default method of doing the PCA "on Correlations."

 

As an alternative, JMP also has a Multiple Correspondence Analysis (MCA) platform that you might consider using. In MCA, you can only use categorical variables in the analysis and both, categorical and continuous variables can be used as supplementary.

 

HTH,

~Laura

Laura C-S

View solution in original post

6 REPLIES 6

Re: Is there a way to do PCA in JMP with binary and continuous data?

WyattH,

 

The short answer is yes, you can use binary factors in your PCA.  You are correct in that you will have to convert the binary coding to continuous.  More recently I have seen -1, 1 coding used.

 

For your last question, as long as the discrete values are coded continuous everything should be fine.  The only question I have for you is regarding the "++" designation.  Is that an entry in the discrete column or just indicating a continuation of the discrete data?  

WyattH
Level II

Re: Is there a way to do PCA in JMP with binary and continuous data?

Hi Bill_Worley,

Thank you for your reply :) yes I meant it the way you interpreted. I just add the ++ to indicate there are more of those types of numbers.

Wyatt

 

WyattH
Level II

Re: Is there a way to do PCA in JMP with binary and continuous data?

Hi bill_worley,
This is just out of my personal curiosity and to further my understanding of PCA. I refer to a poster answer in this link: https://stats.stackexchange.com/questions/16331/doing-principal-component-analysis-or-factor-analysi...

Am I correct in assuming that JMP's PCA is not the traditional kind where binary data is using (0,0) as data mean?

Re: Is there a way to do PCA in JMP with binary and continuous data?

WyattH,

I don't know the answer to your question, but someone out there might.

As for furthering your PCA knowledge the link below will likely help.

@LauraCS does a great job of describing PCA and FA in her blog.

https://community.jmp.com/t5/JMP-Blog/Principal-components-or-factor-analysis/ba-p/38347

Best,

Bill

LauraCS
Staff

Re: Is there a way to do PCA in JMP with binary and continuous data?

Hi WyattH,

The PCA platform in JMP gives you a choice as to whether you want to position the origin at the centroid of the data. If you choose to do PCA  "on Covariances" or "on Correlations" the data will be mean-centered (i.e., means will be subtracted from each column) as part of the pre-processing steps (in the latter they'll also be divided by the corresponding standard deviation, creating Z-score variables). However, if you choose "on Unscaled" the data won't be corrected for the mean. The Unscaled option is mostly there for data that have been previously centered.

 

The PCA platform can handle categorical variables only as supplementary variables. Supplementary variables are not used in the actual decomposition that leads to the PCA, but are projected onto the space determined by the active variables (those in the decomposition), so they help enrich the interpretation of your PCA solution. Thus, if you use the PCA platform with 0/1 variables that have "Continuous" modeling type in the active role (enter them as Y, Columns), then the platform will treat them in the same way that any other numeric, continuous variable. That is, it'll standardize them if you go with the default method of doing the PCA "on Correlations."

 

As an alternative, JMP also has a Multiple Correspondence Analysis (MCA) platform that you might consider using. In MCA, you can only use categorical variables in the analysis and both, categorical and continuous variables can be used as supplementary.

 

HTH,

~Laura

Laura C-S
WyattH
Level II

Re: Is there a way to do PCA in JMP with binary and continuous data?

bill_worley and LauraCS, thank you so much for helping to clarify my doubts and understanding :) I really appreciate it!