What inspired this wish list request? I would like to run clustering where the cluster centers (medoids) are restricted to be rows in the data table (different from kmeans where the mean may not be a row in the table and satisfy constraints).
What is the improvement you would like to see? Perhaps add this somewhere in the Analyze > Clustering menu. Here is some example R code.
library(cluster)
n <- 200
X <- data.frame(
x1 = rnorm(n),
x2 = runif(n, 0, 10),
f1 = factor(sample(letters[1:3], n, TRUE)),
f2 = factor(sample(c("lo","mid","hi"), n, TRUE))
)
d <- daisy(X, metric = "gower")
fit <- pam(d, k = 5, diss = TRUE)
cl <- fit$clustering
medoid_rows <- fit$id.med
cat("Medoid row indices:", medoid_rows, "\n\n")
cat("Medoid rows (one per cluster):\n")
print(X[medoid_rows, ], row.names = FALSE)
cat("\nCluster sizes:\n")
print(table(cl))
> cat("Medoid row indices:", medoid_rows, "\n\n")
Medoid row indices: 170 111 191 200 68
> cat("Medoid rows (one per cluster):\n")
Medoid rows (one per cluster):
> print(X[medoid_rows, ], row.names = FALSE)
x1 x2 f1 f2
-0.42460961 4.116528 b mid
-0.08807738 7.593479 b hi
-0.32078302 2.121084 c lo
0.28653902 4.962979 c mid
0.30942313 2.652641 a lo
> cat("\nCluster sizes:\n")
Cluster sizes:
> print(table(cl))
cl
1 2 3 4 5
33 43 43 35 46
Why is this idea important?
I want to use this on tables produced by Output Random Table in Profiler to downselect a subset of different highly desirable factor settings, and need this instead of k-means for when mixture or other constraints are present.
... View more