cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
ih
Super User (Alumni) ih
Super User (Alumni)

Density based clustering?

Is it possible to separate the concentric circles shown in the example table below using any of the clustering methods in JMP?  There is already a JMP Wish List request for DBSCAN, or density based spatial clustering of applications with noise, which could do this.  Am I missing any existing functionality?

 

Wish list item:density based clustering 

 

ih_0-1619029996603.png

 

Script to create example data table:

 

View more...
New Table( "Cluster Circles",
	Add Rows( 500 ),
	New Script(
		"x by y",
		Graph Builder(
			Size( 527, 456 ),
			Show Control Panel( 0 ),
			Variables( X( :x ), Y( :y ), Color( :actual cluster ) ),
			Elements( Points( X, Y, Legend( 5 ) ) )
		)
	),
	New Column( "x",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Formula( If( :actual cluster == 1, 2, 1 ) * :random noise * Cos( :random ) )
	),
	New Column( "y",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Formula( If( :actual cluster == 1, 2, 1 ) * :random noise * Sin( :random ) )
	),
	New Column( "random",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Formula( Random Uniform( 0, 2 * Pi() ) )
	),
	New Column( "random noise",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Formula( Random Normal( 1, 0.05 ) )
	),
	New Column( "actual cluster",
		Numeric,
		"Nominal",
		Format( "Best", 12 ),
		Formula( Floor( Random Uniform( 0, 1.99999 ) ) ),
		Set Selected
	)
)

 

I know I could train a model to do this if I saw the relationship visually and wanted to just score points against it but I want an unsupervised method to do this.

 

Here is another example dataset:

ih_1-1619030534930.png

 

 

View more...
New Table( "Cluster Circles 2",
	Add Rows( 500 ),
	New Script(
		"x by y",
		Graph Builder(
			Size( 527, 456 ),
			Show Control Panel( 0 ),
			Variables( X( :x ), Y( :y ), Color( :actual cluster ) ),
			Elements( Points( X, Y, Legend( 5 ) ) )
		)
	),
	New Column( "x",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Formula(
			If( :actual cluster == 1,
				2 * :random noise * Cos( :random ),
				1 * :random noise * Cos( :random / 2 ) + 0.1
			)
		)
	),
	New Column( "y",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Formula(
			If( :actual cluster == 1,
				2 * :random noise * Sin( :random ),
				1 * :random noise * Sin( :random / 2 ) + 0.1
			)
		),
		Set Selected
	),
	New Column( "random",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Formula( Random Uniform( 0, 0.8 * Pi() ) )
	),
	New Column( "random noise",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Formula( Random Normal( 1, 0.1 ) )
	),
	New Column( "actual cluster",
		Numeric,
		"Nominal",
		Format( "Best", 12 ),
		Formula( Floor( Random Uniform( 0, 1.99999 ) ) )
	)
)

 

 

8 REPLIES 8
Thierry_S
Super User

Re: Density based clustering?

Hi ih,
Have considered transforming your data in polar coordinates and then cluster on R and alpha? At least with your example data, it does a pretty good job.
Cheers,
TS
Thierry R. Sornasse
Craige_Hales
Super User

Re: Density based clustering?

bivariate nonpar density

then select points by density

Capture3.PNG

 

 

 

Craige
Craige_Hales
Super User

Re: Density based clustering?

Also, the hierarchical single-linkage method worked perfectly on the disjoint rings. Not the 2nd example though.

Craige
ih
Super User (Alumni) ih
Super User (Alumni)

Re: Density based clustering?

Good find @Craige_Hales ,

 

I was confused how I missed this after trying all methods but now I see there is a bug in the platform so saving the cluster gives a good separation for the rings, but saving the cluster formula shows a different result.  As you mentioned, while I will find the useful it doesn't quite work in my actual application.

 

ih_0-1619115941135.png

Thank you!

ih
Super User (Alumni) ih
Super User (Alumni)

Re: Density based clustering?

Hi @Craige_Hales

 

I did not know this feature existed!  Unfortunately it seems to select the high density regions of all clusters which, while handy, isn't quite what I am looking for.

 

Thanks!

ih
Super User (Alumni) ih
Super User (Alumni)

Re: Density based clustering?

Hi and thank you @Thierry_S,

 

Yes indeed transforming to polar would give a great separation for both of these datasets. The problem is that only works for this example data set and it really is closer to a supervised method than I'm hoping for.  I used the rings because they illustrate my struggle with existing methods, but in fact the real data is messier, in more dimensions, and a simple transformation wouldn't quite do the job.

 

As a plug to get more folks to vote for the density based clustering wish list item, check out how it compares to different methods at the link below. I can attest to seeing similar performance in some situations.

 

https://towardsdatascience.com/how-dbscan-works-and-why-should-i-use-it-443b4a191c80

Craige_Hales
Super User

Re: Density based clustering?

The wikipedia code and JMP's KDTable function look like they could do this. It might be pretty fast since the KDTable lookup would be doing a lot of the work.

@DonMcCormackthis could be a good challenge, especially the part where it is turned into an add-in with a proper launch dialog and place in the menu system.

Craige
Craige_Hales
Super User

Re: Density based clustering?

The wikipedia algorithm seems to work. Not sure why it needs to use weird notation like

SeedSet S := N \ {P} 

but maybe that's standard set notation for remove P from the set N?

I ran it a bunch of times on random data and got results like this for a ring and 10 randomly placed clusters

 

always using the same eps and minpts, one time the ring separated in halfalways using the same eps and minpts, one time the ring separated in half

It might get too slow around 10,000 points, depending on the data and eps parameter. I'm not sure it qualifies as unsupervised; I spent a fair amount of time picking eps and minpts to get these graphs to work this well.

Craige