cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
TCM
TCM
Level IV

How do I calculate distance between new cases and PCA model datapoints using PCA1 and PCA2 values

I created a simple PCA model with 2 components PCA1 and PCA2 from 3 variables of 40 objects.  From very helpful guidance here, the model set was given weight=1 so that the location of new cases (weight=0) could be visualized relative to the model objects.

 

Question: Is there a way to find the distance between the new objects (n=51) and each of the 40 model set objects?  I found an add-in for image measurements but couldn't find an interactive solution. I am attaching the file.  The black points seen in the biplot are the new cases.  My intention is to find the top 3 model set objects for each new object.

 

Thank you in advance for your guidance.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
SDF1
Super User

Re: How do I calculate distance between new cases and PCA model datapoints using PCA1 and PCA2 values

Hi @TCM ,

 

  I'm not sure if I follow your question exactly, but depending on what it is, you might consider looking at the T2 statistic. You already have DModX, which tells you the distance to the model plan for each data point, and the T2 statistic (the Mahalanobis distance squared) will tell you the distance from the center of the model to each data point. You can get to this in the PCA platform by clicking the hot button and then selecting Outlier Analysis. Be sure to select the proper number of components, which should be the same as when saving your DModX formula. You can save the T2 formula to the data table as well.

 

  If you literally want the distance between each row in the PC space, then you'll want to do mutlidimensional scaling using the two principal component. Go: Analyze > Multivariate Methods > Multidimensional Scaling. Select the PC group of columns for the Y, Columns and be sure to change "Data Format" to Attribute List and change the Set Dimensions value to however many PCs you have. Then, in the 2D MS report window, go to the hot button and select Show Proximity, it will give you a table of all possible distance combinations. You can then right click > make into data table for further analysis.

 

  You can then visualize it like this, maybe:

DiedrichSchmidt_0-1615317374969.png

 

 

Hope this helps!,

DS

View solution in original post

3 REPLIES 3
SDF1
Super User

Re: How do I calculate distance between new cases and PCA model datapoints using PCA1 and PCA2 values

Hi @TCM ,

 

  I'm not sure if I follow your question exactly, but depending on what it is, you might consider looking at the T2 statistic. You already have DModX, which tells you the distance to the model plan for each data point, and the T2 statistic (the Mahalanobis distance squared) will tell you the distance from the center of the model to each data point. You can get to this in the PCA platform by clicking the hot button and then selecting Outlier Analysis. Be sure to select the proper number of components, which should be the same as when saving your DModX formula. You can save the T2 formula to the data table as well.

 

  If you literally want the distance between each row in the PC space, then you'll want to do mutlidimensional scaling using the two principal component. Go: Analyze > Multivariate Methods > Multidimensional Scaling. Select the PC group of columns for the Y, Columns and be sure to change "Data Format" to Attribute List and change the Set Dimensions value to however many PCs you have. Then, in the 2D MS report window, go to the hot button and select Show Proximity, it will give you a table of all possible distance combinations. You can then right click > make into data table for further analysis.

 

  You can then visualize it like this, maybe:

DiedrichSchmidt_0-1615317374969.png

 

 

Hope this helps!,

DS

TCM
TCM
Level IV

Re: How do I calculate distance between new cases and PCA model datapoints using PCA1 and PCA2 values

The solution you gave in the second paragraph works! The MDS biplot is just the PCA biplot rotated 180 degrees. I can use the To object-From object distances to sort order of distances, which is what I was looking for.

Many many thanks!
TCM
TCM
Level IV

Follow-up question--> Re: How do I calculate distance between new cases and PCA model datapoints using PCA1 and PCA2 values

I have a table of Actual Proximity between objects in a PCA 2-D graph using PCA1 and PCA2, obtained based on the good advise below.

 

Can someone explain why the proximity value is lower  between T3 and #3 ( 0.66, presumably closer) than T3 and #2 (.67), when #2 appears closer in the graph. 

 

TCM_0-1618424871049.png