BookmarkSubscribe
Choose Language Hide Translation Bar
Alma1
Community Trekker

PCA loadings

Hi,

 

While running PCA analysis among the parameters I concluded column A and column B would it be statistically correct to use the ration of (A/B) ? Or including any mathematical relations between columns would be an overlap? 

Thanks

 

 

0 Kudos
1 REPLY 1

Re: PCA loadings

Hi Alma1, 

 

I'm not quite sure what you mean. Are you saying that you included columns A and B in a Principle Components Analysis, and you are wondering if you could also include a third variable, R=A/B, in the PCA with A and B?

If that is your question, I would suggest not to do that, although it really depends on what meaning you want to derive from your PCA. If you include other columns that are highly correlated with your original columns, you will essentially double* the effect of the original variable(s) on the variance that they explain in the principal components. (*Not necessarily double -- the amount of the effect depends on the amount of the correlation between the original variable and the new variable. But you will essentially be inflating the effect of the original variable(s) in finding which principle components explain the majority of the variance in your set of variables.)

You can find more information about the statistical effect of including correlated variables (which is one of the concerns for avariable like R=A/B, which is fully identified by knowing A and B) in a PCA by searching online for something like "can you use highly correlated variables in pca?"

In general, using a combination of variables that are functions of each other (i.e., by knowing A and B, we know completely know R=A/B) is not a good idea in a statistical model. Methods that don't depend on an underlying statistical distributional model, though, like tree-based methods, can still handle inclusion of extra variables like your ratio variable. And you could also consider using a statistical modeling method like penalized regression (LASSO, for example) to help decide which version of the correlated variables to keep for your PCA. 

0 Kudos