cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
utkcito
Level III

How can I know to which cluster does a new variable "best" belongs to?

Hi,

I like clustering analysis to explore my datasets. Many times I have a set of patients that are clearly one type or another, and some patients with unclear status. What I often want to do is cluster the patients whose cases are clear using a series of explanatory variables, and then compare the other ones (one by one) to try to assess to which cluster they "would belong" or "look like the most". I though of adding each "unknown" patient one at a time and see to which cluster they get aggregated to, but almost always they end up changing the original clusters. I would be grateful to hear about your ideas of how to do this.

 

thanks,

 

Uriel.

2 REPLIES 2
Thierry_S
Super User

Re: How can I know to which cluster does a new variable "best" belongs to?

Hi Uriel,
Have you considered Discriminant Analysis (Analyze > Multivariate Methods > Discriminant analysis)? If you have not, here are the basic steps I would implement:
1) Using your clearly diagnosed patients (Hide and Exclude questionable patients), select Diagnosis (or whatever you call their status) as X Categories and your variables as Y, Covariates
2) Run the analysis and under the main pull-down menu (red triangle) select Score Options > Save Formula (Several Calculation columns will be appended at the end of your table)
3) Check your Hidden/Excluded row for which the the probability of Diagnosis is automatically calculated
Now, this method assumes that there is a true diagnosis in your "reference" patients which might not be correct. Hence, you can explore the effect of including ALL patients in your discriminant analysis which will not assume that some patients are more important than others
Best,
TS
Thierry R. Sornasse
utkcito
Level III

Re: How can I know to which cluster does a new variable "best" belongs to?

I am definitely using that approach with models. Although the discriminant analysis isn't great for the specific scenario i'm working on now (I tried), PLS and GenReg are working great. But the clustering issue is different, more exploratory, it shows all or many variables at the same time, and it's easier to manipulate for exploration.

 

I'm wondering if there's an option to "set" the clusters and then compare other rows to see how well they fit into the set clusters with some metric, or after doing PCA analyzing a new row to see how it decomposes into the existing components, things along those lines.

 

Thanks for your suggestion!

 

Uriel