cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
Yurei
Level I

Is there a way to get the optimal number of clusters through jsl ?

Hi,
I am working on automating the analysis of a database and I was wondering if it was possible to automate the clustering of my data : running a k-means clustering and saving the clusters for the optimal number (within a given range) in my data table for further analysis.

 

1 ACCEPTED SOLUTION

Accepted Solutions
jthi
Super User

Re: Is there a way to get the optimal number of clusters through jsl ?

I think you could run K-Means cluster multiple times with different cluster counts or manipulate control panel and fill Range of Clusters (not sure if this can be given as parameter), then get the Cluster Comparison table from platform.

 

Edit:

Quick example on how the filling of number edit box could be done in simple case

View more...
Names Default To Here(1);

nr_of_clusters = 3;
range_of_clusters = 5;

dt = Open("$SAMPLE_DATA/Iris.jmp");

obj = dt << K Means Cluster(
	Y(:Sepal length, :Sepal width, :Petal length, :Petal width),
	Number of Clusters(nr_of_clusters)
	//, invisible
);
(Report(obj)[OutlineBox("Control Panel")] << XPath("//NumberEditBox"))[2] << Set(range_of_clusters);
obj << Go;

dt_cluster_comparison = (Report(obj)[OutlineBox("Cluster Comparison")] << Child) << Make Into Data Table;
//obj << Close Window;

Write();
-Jarmo

View solution in original post

2 REPLIES 2
jthi
Super User

Re: Is there a way to get the optimal number of clusters through jsl ?

I think you could run K-Means cluster multiple times with different cluster counts or manipulate control panel and fill Range of Clusters (not sure if this can be given as parameter), then get the Cluster Comparison table from platform.

 

Edit:

Quick example on how the filling of number edit box could be done in simple case

View more...
Names Default To Here(1);

nr_of_clusters = 3;
range_of_clusters = 5;

dt = Open("$SAMPLE_DATA/Iris.jmp");

obj = dt << K Means Cluster(
	Y(:Sepal length, :Sepal width, :Petal length, :Petal width),
	Number of Clusters(nr_of_clusters)
	//, invisible
);
(Report(obj)[OutlineBox("Control Panel")] << XPath("//NumberEditBox"))[2] << Set(range_of_clusters);
obj << Go;

dt_cluster_comparison = (Report(obj)[OutlineBox("Cluster Comparison")] << Child) << Make Into Data Table;
//obj << Close Window;

Write();
-Jarmo

Re: Is there a way to get the optimal number of clusters through jsl ?

Hierarchical clustering methods are generally used when the number of clusters is unknown and explores this number through agglomeration of data. There is a wide choice of measures of distance between clusters, too. So you might use hierarchical clustering first to determine a reasonable number and then use the number with K-means clustering. Unsupervised modeling is not perfect, though.

 

I might have misunderstood your request, though. If you have a small number of possible numbers, then @jthi's suggestion seems reasonable.

 

Note, too, that these methods use random initial assignments, so repetitions of the procedure might not produce the exact same cluster assignments, even with the same number of clusters. You could set the random seed, but you can't really say which one is 'correct.'