cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Submit your abstract to the call for content for Discovery Summit Americas by April 23. Selected abstracts will be presented at Discovery Summit, Oct. 21- 24.
Discovery is online this week, April 16 and 18. Join us for these exciting interactive sessions.
Choose Language Hide Translation Bar
jessez
Level I

Kmeans Clustering CCC problem

JMP Kmeans clustering is not calculating a CCC statistic for a subset of my data. I have a data file with a little over 4000 unique sites. Using this data set I have successfully used kmeans clustering and JMP displays a CCC statistic. I created a subset of this data using a variable and now have two new data tables (one with around 200 rows and the other with 4000+). Whenever I go through the same kmeans clustering process on the larger subset of the data JMP will not spit out a CCC statistic. Any ideas about what is going on here?

1 ACCEPTED SOLUTION

Accepted Solutions
jessez
Level I

Re: Kmeans Clustering CCC problem

After a couple hours (and you will see they were wasted hours...) of trying to figure this out I caught the problem. One of the variables I used in the clustering process was zero for most of the data set. It turns out that it was zero for all of the rows in the subset I was interested in using Kmeans clustering. The zero values blow up the CCC. Problem solved.

View solution in original post

2 REPLIES 2
jessez
Level I

Re: Kmeans Clustering CCC problem

After a couple hours (and you will see they were wasted hours...) of trying to figure this out I caught the problem. One of the variables I used in the clustering process was zero for most of the data set. It turns out that it was zero for all of the rows in the subset I was interested in using Kmeans clustering. The zero values blow up the CCC. Problem solved.

chitra
Level I

Re: Kmeans Clustering CCC problem

Just wondering how you solved this problem. Did yo ujust remove that attribute? I am facing the same situation where almost all of my columns have a large number of 0s, but since they are significant for my analysis I want to include them in my analysis.