Discussions

databug · Feb 22, 2024 02:02 PM

I have variables that indicate the rates of disappearance of various metabolites from plasma, aka clearance from blood circulation. Since I have over 200 of these, I am using the variable clustering platform to cluster these variables. I am then using the cluster components in a PLS-DA model to see which cluster components (and their corresponding variable group) ends up discriminating two clinical groups I am studying. In doing this, how do I interpret negative eigenvectors (or are they eigenvalues, but this suggests eigenvectors)? If a positive eigenvector indicates a higher rate at which that cluster of variables disappear from plasma, then does a negative eigenvector indicate a slower disappearance relative to the ones with a positive notation? Happy to provide more details if necessary.

P_Bartell · Feb 23, 2024 12:22 PM

Your exact study is unclear to me...but here's a question...Are you clustering the x's before running the PLS-DA model and then using the eigenvectors as x's? Sure sounds like it. So now the question: Why not just let the PLS-DA platform do what it does best which is form and identify the optimal number of latent variables in the x's for the model and responses. Then you don't have to worry about clustering the x's and trying to figure out where to cut them off etc. and interpreting the eigenvectors as x's. The platform's report will do all the heavy lifting wrt to interpretation for you.

databug · Feb 27, 2024 02:31 PM

Thank you, P_Bartell.

Perhaps a little bit more insight into the data would help, as you had mentioned. My dataset is quite wide: 236 variables, with n = 44 cases, with 3 repeated measures each, classified into 3 highly unbalanced groups (n = 6, 7 and 31) based on one specific criteria score. When I let PLSDA do its best by predicting my Y (3 unbalanced groups, measured 3 times) based on X's (all 236 variables), no model converges. But, when I reduce my X's by clustering variables, it does generate one model, with 2 factors, and I can then proceed to interpret what that means.

Does this help?

P_Bartell · Feb 27, 2024 04:06 PM

Hmmm...I'm not surprised the PLS-DA had problems with the small unbalanced groups. And I'm curious which cross validation technique you chose? Did you try K-fold or Leave One Out? And with respect to the repeated measures can you share how you are classifying through a 'repeated measures' approach? Usually a repeated measure to me is invoked by a continuous response and not a classification response.

databug · Feb 27, 2024 04:45 PM

CV: Leave one out, since the n's were so small.

K-fold would likely work too, I just stuck with LOO for the time being.

I am not sure I follow what you mean by "classifying through a repeated measures approach". But my guess is that the way I had explained my dataset was not clear. So, I will try to detail it better this time. We enrolled 44 individuals on a disease spectrum. We gave them a "challenge" test 3 times (my repeated measures) and we classified them into 3 (high, low and variable) response groups based on how consistently they performed on their "challenge" test. We created these classifications based on our subject matter expertise, and it was not a statistical decision. Does that help?

P_Bartell · Feb 29, 2024 07:24 AM

Hmmmm. Sorry for the continued questions regarding your study...this 'challenge test' that each individual executed...was there a numeric output that was the basis for classification? For a hypothetical example, you asked each person to run 100 yards, those that ran under 15 seconds were classified as 'fast', those that ran between 15 and 20 seconds were classified as 'slow', and over 20 seconds 'slower still'? What I'm hoping is you can use the numeric output for modeling and then do your classification at the back end, post modeling. With a continuous response a much broader set of modeling methods becomes available.

Also on another thought...I am no longer a JMP or JMP Pro user (retired JMP senior systems engineer) but it looks like there is some PCA going on in the background of the clustering platform from your description? Have you tried PCA in the native JMP Pro PCA platform on the x's first to get a look at some of the helpful visualizations in that platform's reports? It may aid in your interpretation of the principal components as well as the eigenvectors.

databug · Feb 29, 2024 01:46 PM

No worries, and thank you for trying to help me.

WRT to using continuous vs categorical data - I have actually run both PLSR and PLSDA, and either way, when I use all 236 variables, no model converges. In fact, even after dimension reduction using the variable clustering platform and using those eigenvectors (cluster components) as Xs, I dont get a model converged when I do PLSR predicting continuous data, only time a model converges is when I use categorical Ys being predicted by the cluster component X's.

databug · Feb 29, 2024 01:47 PM

Oh the PCA on X's is a good idea, I have not yet tried that. I will do that and see what I can learn...

Discussions

Interpret Eigenvectors in variable clustering platform

Re: Interpret Eigenvectors in variable clustering platform

Re: Interpret Eigenvectors in variable clustering platform

Re: Interpret Eigenvectors in variable clustering platform

Re: Interpret Eigenvectors in variable clustering platform

Re: Interpret Eigenvectors in variable clustering platform

Re: Interpret Eigenvectors in variable clustering platform

Re: Interpret Eigenvectors in variable clustering platform

Recommended Articles