Re: How do I declutter K Means Cluster in JMP Scripting

RyMcQeN · Oct 13, 2017 06:58 PM

Hello,

I have a JMP script that matches a pair of cartesian coordinate data sets using K Means Cluster. The script works well for data sets that have an equal number of coordinates. I am trying to expand the script to handle scenarios where one of the data sets has missing coordinates or less coordinates than the other. I can do this manually through the K Means Cluster dialog window by first using 'Declutter' then selecting the outliers and excluding them. I would like to automate this process through the JMP script. My plan is to run the 'Declutter' function and limit number of nearest neighbors to 1 then 'Save NN Distances' to a column and exclude the rows that are outside 3 sigma of the mean. Then run the cluster analysis on the remaining coordinates.

I am not sure how to do this through JMP scripting. The action of identifying and excluding the outliers by nearest neighbors must happen before the cluster function begins. This is where I am stuck.

Would it be easier to leave the K Means Cluster dialog up with the Declutter plot and allow the user to highlight the outliers, exclude them, then run the clustering algorithm? If so, is it possible for the script to pause while the user performs these actions then continue after the cluster function is complete? I have additional actions that are performed on the cluster result.

Below is a snippet of the K Means Cluster function as I have now. The nClusters variable is defined by the number of rows from the data set with the least number of coordinates.

obj = K Means Cluster(

Y( :X, :Y),

Number of Clusters( nClusters ),

Columns Scaled Individually(0)

);

obj << Declutter(1,1);

obj << Go;

Any help would be greatly appreciated.

-Ry

ih · Oct 17, 2017 11:53 PM

In JMP 13 I believe you can do this using the screening platform:

//Use sample data
dt = Open( "$SAMPLE_DATA/Cytometry.jmp" );

//Find outliers using KNN
outliers = dt << Explore Outliers(
	Y( :CD3, :CD8 ),
	Name( "Multivariate k-Nearest Neighbor Outliers" )(K( 1 ))
);

//Save the distance to the nearest point
outliers << Save NN Distances;

//Make a column indicating that the point is an outlier.  You could skip this and select the points over a certain value directly.
dt << New Column( "Is Outlier",
	Numeric, "Nominal",
	Formula(
		If(
			:Nearest 1 Distance > Col Mean( :Nearest 1 Distance ) + 3 *
			Col Std Dev( :Nearest 1 Distance ),
			1,
			0
		)
	),
	Value Labels( {0 = "No", 1 = "Yes"} ), Use Value Labels( 1 )
);

//show which points will be excluded
dt << Graph Builder(
	Show Control Panel( 0 ),
	Variables(
		X( Transform Column( "Row", Formula( Row() ) ) ),
		Y( :Nearest 1 Distance ),
		Color( :Is Outlier )
	),
	Elements( Points( X, Y, Legend( 14 ) ) )
);

//Uncomment to Hide and exclude outliers
//dt << select where( :Is Outlier == 1 );
//dt << hide and exclude;

RyMcQeN · Oct 18, 2017 01:14 PM

Hello ih,

I'm not sure this function is available in JMP 12 (what I am using). When I test the code you listed, there is no result for the "Explore Outliers" step. Saving NN Distances does not generate a column.

Regards,

Ry

ih · Oct 18, 2017 07:32 PM

My memory of how that looked in JMP 12 is missing, but John Sall referenced it here so I expeect it can be done. Can you check Cols->Modeling Utilities? Hopefully you can work through the same analysis through the platform.

RyMcQeN · Oct 20, 2017 07:34 PM

Hello ih,

Thank you. I will explore this function further.

Best regards,

Ry