Discussions

Thierry_S · Jun 8, 2023 5:43 PM

Hi JMP Community,

I have an extensive biomarker data set (7,300 variables) that contains multiple subsets of highly correlated variables (r > 0.95) that I want to collapse into representative variables (i.e., one aggregated variable for each group of highly correlated variables). While I can easily identify the highly correlated pairs of variables, I am struggling with identifying all the members of each group. I have experimented with Clustering, but I cannot get to a definitive answer.

Is there a method in the MultiVariate Methods that would allow me to collapse this dataset?

Of note, I cannot chare the data set because of confidentiality.

Thank you for your help.

Best,

TS

Thierry R. Sornasse

Thierry_S · Dec 23, 2021 08:48 PM

Hi JMP Community,

It seems that I tend to find part of the answer soon after posting. Step 1: Use the Variable Clustering platform under the Custer menu = completed.

Now that I have the variables clustered with Most Representative and Cluster Membership results available, what is the best way to apply the output to my table (I assume that JSL script will be involved).

Thanks for your help.

Best,

TS

Thierry R. Sornasse

View solution in original post

Thierry_S · Dec 24, 2021 12:44 AM

Hi JMP Community,

Well, I solved my question. After clustering the variables (the subset of variables with intercorrelation > 0.95), I matched the Cluster Membership and the Most Representative to the original STACKED (Tall x Narrow table). I then selected all rows with no association to a Cluster and those with a name matching the Most Representative, subsetted, and Split by Variable name. I went from 7,300 variables to 5,700 (i.e., 22% collapse).

Best,

TS

Thierry R. Sornasse

View solution in original post

Thierry_S · Dec 23, 2021 08:48 PM

Hi JMP Community,

It seems that I tend to find part of the answer soon after posting. Step 1: Use the Variable Clustering platform under the Custer menu = completed.

Now that I have the variables clustered with Most Representative and Cluster Membership results available, what is the best way to apply the output to my table (I assume that JSL script will be involved).

Thanks for your help.

Best,

TS

Thierry R. Sornasse

Thierry_S · Dec 24, 2021 12:44 AM

Hi JMP Community,

Well, I solved my question. After clustering the variables (the subset of variables with intercorrelation > 0.95), I matched the Cluster Membership and the Most Representative to the original STACKED (Tall x Narrow table). I then selected all rows with no association to a Cluster and those with a name matching the Most Representative, subsetted, and Split by Variable name. I went from 7,300 variables to 5,700 (i.e., 22% collapse).

Best,

TS

Thierry R. Sornasse

P_Bartell · Dec 24, 2021 09:09 AM

One other thought for you besides clustering is principal components analysis. Tailor made for dimensionality reduction.

Discussions

JMP > Dimension Reduction > Collapse Highly Correlated Variables (Total N = 7300)?

Re: JMP > Dimension Reduction > Collapse Highly Correlated Variables (Total N = 7300)?

Re: JMP > Dimension Reduction > Collapse Highly Correlated Variables (Total N = 7300)?

Re: JMP > Dimension Reduction > Collapse Highly Correlated Variables (Total N = 7300)?

Re: JMP > Dimension Reduction > Collapse Highly Correlated Variables (Total N = 7300)?

Re: JMP > Dimension Reduction > Collapse Highly Correlated Variables (Total N = 7300)?

Recommended Articles