Choose Language Hide Translation Bar
Highlighted
OnlyPractice
Level I

Repeating Normal Clustering with varying cluster number

Hi. all

I am new to JMP especially working with JSL, and I have been trying to make script that automate Normal Clustering with varing cluster number

using cond

itional function such as for() however, normal mixture doesn't understand the variables and

only takes numbers. so I hope i can find ways to put variables into number arguments, or

any other alternatives.

 

my script is just like this:

 

 

nw = new window("clustering", 

for(i=1, i<5, i++,

Normal Mixtures(
Y(:column names1, :column names2, :column names3  ),
{Mixtures Tolerance( 0.00000001 ), Mixtures MaxIter( 300 ),
Mixtures N Starts( 30 ), Outlier Cluster( 0 ), Diagonal Variance( 0 ),
Number of Clusters( i ), Go})));

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted
txnelson
Super User

Re: Repeating Normal Clustering with varying cluster number

If you write out the value of "r" 

show( r );

It returns the value

r = [2];

The message

     << get rows where() 

returns a matrix of values, not a scalar value. If you wrote out the value of the first element in the matrix called "r",

Show( r[1] );

It would  display

r[1] = 2;

So the solution to your issue is to reference the 1st element of the variable "r"

dt << Normal Mixtures(
	Y( :Sepal length, :Sepal width, :Petal length ),
	{Mixtures Tolerance( 0.00000001 ), Mixtures MaxIter( 300 ), Mixtures N Starts( 30 ), Outlier Cluster( 0 ),
	Diagonal Variance( 0 ), Number of Clusters( r[1] ), go}
);
Jim

View solution in original post

5 REPLIES 5
Highlighted
txnelson
Super User

Re: Repeating Normal Clustering with varying cluster number

I am not really sure what you are doing with the Column Names1, 2 and 3.  I have guessed of their usage below.  Here is an example script that works as I think you want.

Names Default To Here( 1 );
dt = Open( "$SAMPLE_DATA/Iris.jmp" );

nw = New Window( "clustering", vlb = V List Box() );
column names1 = "Sepal length";
column names2 = "Sepal width";
column names3 = "Petal length";
For( i = 1, i <= 4, i++,
	obj = dt << Normal Mixtures(
		Y(
			Column( column names1 ),
			Column( column names2 ),
			Column( column names3 )
		),
		Number of Clusters( i )
	);
	obj << Go;

	vlb << append( Report( obj ) );
	obj << close window;
);
Jim
Highlighted
OnlyPractice
Level I

Re: Repeating Normal Clustering with varying cluster number

Hi dear txnelson

 

Thank you for reply. It really helps solving some of my problems. but still some problems aren't solved.

 

To be more detail, My Intention was to make script that choose best normal clustering numbers out of (1 to n)(integer larger than 1 that users insert in add-ins)clusters and show only that optimum normal cluster report only.

 

With the help of txnelson's answer I was able to make script that extract best normal clustering numbers(in other words, smallest BIC). But I faced another huddle with making normal cluster with only extracted best clustering number. And my script was as follows;

 

Names Default To Here( 1 );
dt = Open( "$SAMPLE_DATA/Iris.jmp" );

j=7;
nw2 = New Window( "clustering", vlb = V List Box() );
for(i=1, i<=j, i++, 
 	obj = dt << Normal Mixtures(
		Y(
			:Sepal length,
			:Sepal width,
			:Petal length
		),
		{Mixtures Tolerance( 0.00000001 ), Mixtures MaxIter( 300 ),
Mixtures N Starts( 30 ), Outlier Cluster( 0 ), Diagonal Variance( 0 ),
Number of Clusters( i )}
	);
	obj << Go;

	vlb << append( Report( obj ) );
	obj << close window();
);

    nw2[tablebox(1)] << make combined data table;
	pr = current data table();
	mn = Col Minimum(:BIC);
	r = pr << get rows where(:BIC == mn);

dt << Normal Mixtures(Y(
			:Sepal length,
			:Sepal width,
			:Petal length
		),
		{Mixtures Tolerance( 0.00000001 ), Mixtures MaxIter( 300 ),
Mixtures N Starts( 30 ), Outlier Cluster( 0 ), Diagonal Variance( 0 ),
Number of Clusters( r ),go}
	);
	

Even though i saved specific cluster number to 'r' variable, the 'normal mixture' doesn't realized that variable value. It seems to me that 'Normal Mixture' take the variable as number only if used with 'for' functions.

 

 

Hope there is any solutions or alternatives to this problems.






Highlighted
OnlyPractice
Level I

Re: Repeating Normal Clustering with varying cluster number

Hi dear txnelson



Thank you for reply. It really helps solving some of my problems. but still some problems aren't solved.



To be more detail, My Intention was to make script that choose best normal clustering numbers out of (1 to n)(integer larger than 1 that users insert in add-ins)clusters and show only that optimum normal cluster report only.



With the help of txnelson's answer I was able to make script that extract best normal clustering numbers(in other words, smallest BIC). But I faced another huddle with making normal cluster with only extracted best clustering number. And my script was as follows;



Names Default To Here( 1 );
dt = Open( "$SAMPLE_DATA/Iris.jmp" );

j=7;
nw2 = New Window( "clustering", vlb = V List Box() );
for(i=1, i<=j, i++,
obj = dt << Normal Mixtures(
Y(
:Sepal length,
:Sepal width,
:Petal length
),
{Mixtures Tolerance( 0.00000001 ), Mixtures MaxIter( 300 ),
Mixtures N Starts( 30 ), Outlier Cluster( 0 ), Diagonal Variance( 0 ),
Number of Clusters( i )}
);
obj << Go;

vlb << append( Report( obj ) );
obj << close window();
);

nw2[tablebox(1)] << make combined data table;
pr = current data table();
mn = Col Minimum(:BIC);
r = pr << get rows where(:BIC == mn);

dt << Normal Mixtures(Y(
:Sepal length,
:Sepal width,
:Petal length
),
{Mixtures Tolerance( 0.00000001 ), Mixtures MaxIter( 300 ),
Mixtures N Starts( 30 ), Outlier Cluster( 0 ), Diagonal Variance( 0 ),
Number of Clusters( r ),go}
);

Even though i saved specific cluster number to 'r' variable, the 'normal mixture' doesn't realized that variable value. It seems to me that 'Normal Mixture' take the variable as number only if used with 'for' functions.


Hope there is any solutions or alternatives to this problems.
Highlighted
txnelson
Super User

Re: Repeating Normal Clustering with varying cluster number

If you write out the value of "r" 

show( r );

It returns the value

r = [2];

The message

     << get rows where() 

returns a matrix of values, not a scalar value. If you wrote out the value of the first element in the matrix called "r",

Show( r[1] );

It would  display

r[1] = 2;

So the solution to your issue is to reference the 1st element of the variable "r"

dt << Normal Mixtures(
	Y( :Sepal length, :Sepal width, :Petal length ),
	{Mixtures Tolerance( 0.00000001 ), Mixtures MaxIter( 300 ), Mixtures N Starts( 30 ), Outlier Cluster( 0 ),
	Diagonal Variance( 0 ), Number of Clusters( r[1] ), go}
);
Jim

View solution in original post

Highlighted
OnlyPractice
Level I

Re: Repeating Normal Clustering with varying cluster number

Thanks you so much!! It really helped.
I wasn't cautious to distinguish between scalar values and vectors of specific functions.
Article Labels

    There are no labels assigned to this post.