Solved: Is there a way to get the optimal number of clusters through jsl ?

Yurei · Jun 9, 2023 01:05 PM

Hi,
I am working on automating the analysis of a database and I was wondering if it was possible to automate the clustering of my data : running a k-means clustering and saving the clusters for the optimal number (within a given range) in my data table for further analysis.

jthi · Jul 21, 2022 10:36 AM

I think you could run K-Means cluster multiple times with different cluster counts or manipulate control panel and fill Range of Clusters (not sure if this can be given as parameter), then get the Cluster Comparison table from platform.

Edit:

Quick example on how the filling of number edit box could be done in simple case

Names Default To Here(1);

nr_of_clusters = 3;
range_of_clusters = 5;

dt = Open("$SAMPLE_DATA/Iris.jmp");

obj = dt << K Means Cluster(
	Y(:Sepal length, :Sepal width, :Petal length, :Petal width),
	Number of Clusters(nr_of_clusters)
	//, invisible
);
(Report(obj)[OutlineBox("Control Panel")] << XPath("//NumberEditBox"))[2] << Set(range_of_clusters);
obj << Go;

dt_cluster_comparison = (Report(obj)[OutlineBox("Cluster Comparison")] << Child) << Make Into Data Table;
//obj << Close Window;

Write();

Names Default To Here(1); nr_of_clusters = 3; range_of_clusters = 5; dt = Open("$SAMPLE_DATA/Iris.jmp"); obj = dt << K Means Cluster( Y(:Sepal length, :Sepal width, :Petal length, :Petal width), Number of Clusters(nr_of_clusters) //, invisible ); (Report(obj)[OutlineBox("Control Panel")] << XPath("//NumberEditBox"))[2] << Set(range_of_clusters); obj << Go; dt_cluster_comparison = (Report(obj)[OutlineBox("Cluster Comparison")] << Child) << Make Into Data Table; //obj << Close Window; Write();

-Jarmo

View solution in original post

jthi · Jul 21, 2022 10:36 AM

I think you could run K-Means cluster multiple times with different cluster counts or manipulate control panel and fill Range of Clusters (not sure if this can be given as parameter), then get the Cluster Comparison table from platform.

Edit:

Quick example on how the filling of number edit box could be done in simple case

Names Default To Here(1);

nr_of_clusters = 3;
range_of_clusters = 5;

dt = Open("$SAMPLE_DATA/Iris.jmp");

obj = dt << K Means Cluster(
	Y(:Sepal length, :Sepal width, :Petal length, :Petal width),
	Number of Clusters(nr_of_clusters)
	//, invisible
);
(Report(obj)[OutlineBox("Control Panel")] << XPath("//NumberEditBox"))[2] << Set(range_of_clusters);
obj << Go;

dt_cluster_comparison = (Report(obj)[OutlineBox("Cluster Comparison")] << Child) << Make Into Data Table;
//obj << Close Window;

Write();

Names Default To Here(1); nr_of_clusters = 3; range_of_clusters = 5; dt = Open("$SAMPLE_DATA/Iris.jmp"); obj = dt << K Means Cluster( Y(:Sepal length, :Sepal width, :Petal length, :Petal width), Number of Clusters(nr_of_clusters) //, invisible ); (Report(obj)[OutlineBox("Control Panel")] << XPath("//NumberEditBox"))[2] << Set(range_of_clusters); obj << Go; dt_cluster_comparison = (Report(obj)[OutlineBox("Cluster Comparison")] << Child) << Make Into Data Table; //obj << Close Window; Write();

-Jarmo

Mark_Bailey · Jul 21, 2022 12:53 PM

Hierarchical clustering methods are generally used when the number of clusters is unknown and explores this number through agglomeration of data. There is a wide choice of measures of distance between clusters, too. So you might use hierarchical clustering first to determine a reasonable number and then use the number with K-means clustering. Unsupervised modeling is not perfect, though.

I might have misunderstood your request, though. If you have a small number of possible numbers, then @jthi's suggestion seems reasonable.

Note, too, that these methods use random initial assignments, so repetitions of the procedure might not produce the exact same cluster assignments, even with the same number of clusters. You could set the random seed, but you can't really say which one is 'correct.'

Is there a way to get the optimal number of clusters through jsl ?

Re: Is there a way to get the optimal number of clusters through jsl ?

Re: Is there a way to get the optimal number of clusters through jsl ?

Re: Is there a way to get the optimal number of clusters through jsl ?