cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
  • Learn how to build custom Python data connectors and further customize JMP’s Data Connector Framework with the Python Data Connector Demo, available now in the JMP Marketplace!
  • See how to create experiments to support product design and ID useful product features. Register for June 12 webinar, 2pm US Eastern Time.

JMP Wish List

We want to hear your ideas for improving JMP. Share them here.
Choose Language Hide Translation Bar
0 Kudos

Add medoid clustering

What inspired this wish list request?  I would like to run clustering where the cluster centers (medoids) are restricted to be rows in the data table (different from kmeans where the mean may not be a row in the table and satisfy constraints).

 

 

What is the improvement you would like to see?  Perhaps add this somewhere in the Analyze > Clustering menu. Here is some example R code.

library(cluster)
n <- 200
X <- data.frame(
  x1 = rnorm(n),
  x2 = runif(n, 0, 10),
  f1 = factor(sample(letters[1:3], n, TRUE)),
  f2 = factor(sample(c("lo","mid","hi"), n, TRUE))
)
d   <- daisy(X, metric = "gower")
fit <- pam(d, k = 5, diss = TRUE)
cl <- fit$clustering          
medoid_rows <- fit$id.med    
cat("Medoid row indices:", medoid_rows, "\n\n")
cat("Medoid rows (one per cluster):\n")
print(X[medoid_rows, ], row.names = FALSE)
cat("\nCluster sizes:\n")
print(table(cl))
> cat("Medoid row indices:", medoid_rows, "\n\n")
Medoid row indices: 170 111 191 200 68 

> cat("Medoid rows (one per cluster):\n")
Medoid rows (one per cluster):
> print(X[medoid_rows, ], row.names = FALSE)
          x1       x2 f1  f2
 -0.42460961 4.116528  b mid
 -0.08807738 7.593479  b  hi
 -0.32078302 2.121084  c  lo
  0.28653902 4.962979  c mid
  0.30942313 2.652641  a  lo
> cat("\nCluster sizes:\n")

Cluster sizes:
> print(table(cl))
cl
 1  2  3  4  5 
33 43 43 35 46 

Why is this idea important? 

I want to use this on tables produced by Output Random Table in Profiler to downselect a subset of different highly desirable factor settings, and need this instead of k-means for when mixture or other constraints are present.

 

1 Comment
SarahGilyard
Staff
Status changed to: Acknowledged

This is an interesting suggestion. We will discuss internally. Thank you for posting.