cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

JMP Wish List

We want to hear your ideas for improving JMP. Share them here.
Choose Language Hide Translation Bar
0 Kudos

Add medoid clustering

What inspired this wish list request?  I would like to run clustering where the cluster centers (medoids) are restricted to be rows in the data table (different from kmeans where the mean may not be a row in the table and satisfy constraints).

 

 

What is the improvement you would like to see?  Perhaps add this somewhere in the Analyze > Clustering menu. Here is some example R code.

library(cluster)
n <- 200
X <- data.frame(
  x1 = rnorm(n),
  x2 = runif(n, 0, 10),
  f1 = factor(sample(letters[1:3], n, TRUE)),
  f2 = factor(sample(c("lo","mid","hi"), n, TRUE))
)
d   <- daisy(X, metric = "gower")
fit <- pam(d, k = 5, diss = TRUE)
cl <- fit$clustering          
medoid_rows <- fit$id.med    
cat("Medoid row indices:", medoid_rows, "\n\n")
cat("Medoid rows (one per cluster):\n")
print(X[medoid_rows, ], row.names = FALSE)
cat("\nCluster sizes:\n")
print(table(cl))
> cat("Medoid row indices:", medoid_rows, "\n\n")
Medoid row indices: 170 111 191 200 68 

> cat("Medoid rows (one per cluster):\n")
Medoid rows (one per cluster):
> print(X[medoid_rows, ], row.names = FALSE)
          x1       x2 f1  f2
 -0.42460961 4.116528  b mid
 -0.08807738 7.593479  b  hi
 -0.32078302 2.121084  c  lo
  0.28653902 4.962979  c mid
  0.30942313 2.652641  a  lo
> cat("\nCluster sizes:\n")

Cluster sizes:
> print(table(cl))
cl
 1  2  3  4  5 
33 43 43 35 46 

Why is this idea important? 

I want to use this on tables produced by Output Random Table in Profiler to downselect a subset of different highly desirable factor settings, and need this instead of k-means for when mixture or other constraints are present.