cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
Thierry_S
Super User

JMP > Dimension Reduction > Collapse Highly Correlated Variables (Total N = 7300)?

Hi JMP Community,

I have an extensive biomarker data set (7,300 variables) that contains multiple subsets of highly correlated variables (r > 0.95) that I want to collapse into representative variables (i.e., one aggregated variable for each group of highly correlated variables). While I can easily identify the highly correlated pairs of variables, I am struggling with identifying all the members of each group. I have experimented with Clustering, but I cannot get to a definitive answer.

Is there a method in the MultiVariate Methods that would allow me to collapse this dataset? 

Of note, I cannot chare the data set because of confidentiality.

Thank you for your help.

Best,

TS

Thierry R. Sornasse
2 ACCEPTED SOLUTIONS

Accepted Solutions
Thierry_S
Super User

Re: JMP > Dimension Reduction > Collapse Highly Correlated Variables (Total N = 7300)?

Hi JMP Community,

It seems that I tend to find part of the answer soon after posting. Step 1: Use the Variable Clustering platform under the Custer menu = completed. 

Now that I have the variables clustered with Most Representative and Cluster Membership results available, what is the best way to apply the output to my table (I assume that JSL script will be involved).

Thanks for your help.

Best,

TS 

Thierry R. Sornasse

View solution in original post

Thierry_S
Super User

Re: JMP > Dimension Reduction > Collapse Highly Correlated Variables (Total N = 7300)?

Hi JMP Community,

Well, I solved my question. After clustering the variables (the subset of variables with intercorrelation > 0.95), I matched the Cluster Membership and the Most Representative to the original STACKED (Tall x Narrow table). I then selected all rows with no association to a Cluster and those with a name matching the Most Representative, subsetted, and Split by Variable name. I went from 7,300 variables to 5,700 (i.e., 22% collapse).

Best,

TS

Thierry R. Sornasse

View solution in original post

3 REPLIES 3
Thierry_S
Super User

Re: JMP > Dimension Reduction > Collapse Highly Correlated Variables (Total N = 7300)?

Hi JMP Community,

It seems that I tend to find part of the answer soon after posting. Step 1: Use the Variable Clustering platform under the Custer menu = completed. 

Now that I have the variables clustered with Most Representative and Cluster Membership results available, what is the best way to apply the output to my table (I assume that JSL script will be involved).

Thanks for your help.

Best,

TS 

Thierry R. Sornasse
Thierry_S
Super User

Re: JMP > Dimension Reduction > Collapse Highly Correlated Variables (Total N = 7300)?

Hi JMP Community,

Well, I solved my question. After clustering the variables (the subset of variables with intercorrelation > 0.95), I matched the Cluster Membership and the Most Representative to the original STACKED (Tall x Narrow table). I then selected all rows with no association to a Cluster and those with a name matching the Most Representative, subsetted, and Split by Variable name. I went from 7,300 variables to 5,700 (i.e., 22% collapse).

Best,

TS

Thierry R. Sornasse
P_Bartell
Level VIII

Re: JMP > Dimension Reduction > Collapse Highly Correlated Variables (Total N = 7300)?

One other thought for you besides clustering is principal components analysis. Tailor made for dimensionality reduction.