cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
markschwab
Level IV

Adding penalties to number of splits in Partition platform to reduce number of leaves?

I have recently been spending a lot of time in the Partition platform, and currently I'm looking for a way to automate the analyses.

 

My issue is this: generally I've been splitting the data set 50/50 training/validation, and hitting "Go", which maximizes the validation RSquare. But, this often leads to models which have a larger number of splits than are really credible, e.g. here where it creates 5 splits to get to Validation RSquare 0.288, when just using two splits would get an almost-equally good fit of 0.27.

markschwab_1-1636045161149.png

 

As a result, I generally have to always manually prune the splits to get an acceptably-good Validation RSquare with an acceptably-low number of model terms.

 

Is there a way to ask JMP to apply a penalty, and create some sort of "regularized" Validation RSquare that penalizes the model selection criteria based off the number of splits? E.g., the "regularized" Validation RSquare could be [Validation RSquare] - 0.05*[N Splits]. Or maybe there's a better way to regularize (e.g. continue adding terms if they can model at least 5% of the residual variance, rather than 5% of the absolute variance.)

 

JSL solutions are welcome, as this would primarily be for automated analyses.

1 ACCEPTED SOLUTION

Accepted Solutions
SDF1
Super User

Re: Adding penalties to number of splits in Partition platform to reduce number of leaves?

Hi @markschwab ,

 

  You are using the code as intended, I had just made some guesses that weren't generic enough to work under any circumstance. I thought you had a validation column, and so scripted it to run with one. the updated script below works around that by letting you input the validation portion, just like in the partition platform window. But, don't use both a validation column and a validation portion -- the script hasn't been coded yet to account for that and is again, not fully generic enough for all possible user inputs.

 

  As for the error that you were getting, it was again because I assumed that there was more than one X factor in the process and therefore, scripted the reference to the report window accordingly. That is now fixed so that it doesn't matter if there's only 1 X or multiple X's.

 

  Again, this script is not completely generic enough to account for non-numeric/non-continuous response column, Y. If the response is categorical, this script is not designed to account for that. It can easily be modified to do so if need be.

 

Hope this helps,

DS

 

UPDATE: Ok, so I couldn't help myself to code more and I included some penalized RSquare columns and a training-validation RSquare column. You can change the formula and the coefficient of the penalized calculation by editing the code -- I set it to your concept of Rsquare-0.05*Splits. In addition to that, I also included some code to graphically display the results when done running the calculations.

 

UPDATE 2: I went back and made the code a little more fun -- Now you can see the table results alongside the graph results and by selecting the point on the graph you like best, you can then re-run the partition for that specific number of splits when you click "OK". As before, I haven't made it super-generic for any kind of user input, but tried to make it work for numeric/continuous responses. What you now have should be enough to get you going.

 

P.S. As you learn the code, use the Scripting Index in JMP's Help menu. It's very helpful in understanding some of the calls and how to use them when scripting.

Names Default To Here( 1 );

lbWidth = 168;

dt = Open( "$SAMPLE_DATA/Boston Housing.jmp" );

// Expression to store the current settings in global variables
recallRolesS1 = Expr(
	::dtchosenRecall = dtchosen
);

// Expression to clear all current settings. KW: Clear
clearRoles = Expr(
	Try(
		colListY << RemoveAll;
		ColListX << RemoveAll;
		ColListW << RemoveAll;
		ColListF << RemoveAll;
		colListV << RemoveAll;
	)
);

// Expression to store the current settings in global variables
recallRoles = Expr(
	::ycolRecall = colListY << GetItems;
	::xcolRecall = colListX << GetItems;
	::WcolRecall = colListW << GetItems;
	::FcolRecall = colListF << GetItems;
	::vcolRecall = colListV << GetItems;
	::max_spRecall = max_sp_input << get;
	::partVPRecall = partVP_input << Get;
);

//Function to choose the data table to model. KW: choose_data
choose_data_table = Function( {},
	list = {};
	For( i = 1, i <= N Table(), i++,
		list[i] = Data Table( i ) << get name
	);
	win = New Window( "Select a data table",
		<<Modal,
		hb = H List Box(
			Panel Box( "Choose a data table", dt = List Box( list, max selected( 1 ), dtchosen = Data Table( (dt << get selected)[1] ) ) ), 

		)
	);
);

Part_rerun=Expr(
	
	Close(dt_results, No Save);
	
	NPCs_num=NPCs[1];
	
	str = Eval Insert(
			"report = (dtchosen<<Partition(
				Y(Eval(ycols)),
				X(Eval(xcols)),
				Weight(Eval(wcols)),
				Freq(Eval(Fcols)),
				Validation(Eval(vcols)),
				Informative Missing(1),
				Split Best(^NPCs_num^),
				Validation Portion(^partVP^),
				Split History(1)
				)
				
			)<<Report;"
		);
		Eval( Parse( str ) );
	
);

Part_TB = Expr(

	Part_TBWin = New Window( "Select the Best split to re-run",
		<<Return Result,
		<<On Validate,
		Outline Box( "Partitioning Results (Select the best to re-run))",
			H List Box(
				If(
					N Items( vcols ) == 1 | (N Items( vcols ) == 0 & Is Missing( PartVP ) == 0),
						part_gb = Graph Builder(
							Size( 579, 417 ),
							Show Control Panel( 0 ),
							Variables(
								X( :Splits ),
								Y( :Training RSquare ),
								Y( :Validation RSquare, Position( 1 ) ),
								Y( :RSquare Diff ),
								Y( :Penalized Valid RSquare, Position( 2 ) )
							),
							Elements( Position( 1, 1 ), Line( X, Y( 1 ), Y( 2 ), Legend( 12 ) ), Points( X, Y( 1 ), Y( 2 ), Legend( 14 ) ) ),
							Elements( Position( 1, 2 ), Line( X, Y( 1 ), Y( 2 ), Legend( 13 ) ), Points( X, Y( 1 ), Y( 2 ), Legend( 15 ) ) ),
							SendToReport(
								Dispatch(
									{},
									"graph title",
									TextEditBox,
									{Set Text( "Training & Validation RSquare / RSquare Diff & Penalized Valid RSquare vs. Splits" )}
								)
							)
						),
					N Items( vcols ) == 0 & Is Missing( PartVP ) == 1,
						part_gb = Graph Builder(
							Size( 531, 456 ),
							Show Control Panel( 0 ),
							Variables( X( :Splits ), Y( :Training RSquare ), Y( :Penalized Train RSquare ) ),
							Elements( Position( 1, 1 ), Line( X, Y, Legend( 7 ) ), Points( X, Y, Legend( 9 ) ) ),
							Elements( Position( 1, 2 ), Line( X, Y, Legend( 8 ) ), Points( X, Y, Legend( 11 ) ) )
						)
				),
				dt_tb = dt_results << Get As Report(),
				Panel Box( "Action",
					Lineup Box( N Col( 1 ),
						Text Box( "Re-run Partion", <<Justify Text( "Center" ) ),
						Button Box( "OK",
							NPCs=dt_results<<Get selected rows();
							Part_rerun;
							Part_TBWin << Close Window;
						),
						Button Box( "Cancel", Part_TBWin << Close Window ),
						Spacer Box( Size( 0, 8 ) ),
						Button Box( "Relaunch",
							Part_TBWin << Close Window;
							Firstwin;
						),
						Spacer Box( Size( 0, 8 ) ),
						Button Box( "Help", Web( "https://www.jmp.com/en_ch/support/online-help-search.html?q=*%3A*" ) )
					)
				)
			)
		)
	);
);


Part = Expr(
	If(
		N Items( vcols ) == 1,
			dt_results = New Table( "Partition_Results",
				Add Rows( max_sp ),
				New Column( "Splits", Numeric, Continuous ),
				New Column( "Training RSquare", Numeric, Continuous ),
				New Column( "Validation RSquare", Numeric, Continuous ),
				New Column( "Training RASE", Numeric, Continuous ),
				New Column( "Validation RASE", Numeric, Continuous ),
				New Column( "Training N", Numeric, Continuous ),
				New Column( "Validation N", Numeric, Continuous ),
				New Column( "RSquare Diff", Numeric, Continuous, Formula( "Training RSquare"n - "Validation RSquare"n ) ),
				New Column( "Penalized Valid RSquare", Numeric, Continuous, Formula( "Training RSquare"n - 0.05 * "Splits"n ) )
			),
		N Items( vcols ) == 0 & Is Missing( partVP ) == 0,
			dt_results = New Table( "Partition_Results",
				Add Rows( max_sp ),
				New Column( "Splits", Numeric, Continuous ),
				New Column( "Training RSquare", Numeric, Continuous ),
				New Column( "Validation RSquare", Numeric, Continuous ),
				New Column( "Training RASE", Numeric, Continuous ),
				New Column( "Validation RASE", Numeric, Continuous ),
				New Column( "Training N", Numeric, Continuous ),
				New Column( "Validation N", Numeric, Continuous ),
				New Column( "RSquare Diff", Numeric, Continuous, Formula( "Training RSquare"n - "Validation RSquare"n ) ),
				New Column( "Penalized Valid RSquare", Numeric, Continuous, Formula( "Validation RSquare"n - 0.05 * "Splits"n ) )
			),
		N Items( vcols ) == 0,
			dt_results = New Table( "Partition_Results",
				Add Rows( max_sp ),
				New Column( "Splits", Numeric, Continuous ),
				New Column( "Training RSquare", Numeric, Continuous ),
				New Column( "Training RASE", Numeric, Continuous ),
				New Column( "Training N", Numeric, Continuous ),
				New Column( "Penalized Train RSquare", Numeric, Continuous, Formula( "Training RSquare"n - 0.05 * "Splits"n ) )
			)
	);
	
	For( i = 1, i <= max_sp, i++,
		dt_results:"Splits"n[i] = i
	);
	
	dt_name = dtchosen << Get Name;
	
	For( i = 1, i <= N Rows( dt_results ), i++,
		Psplit = dt_results:"Splits"n[i];
		str = Eval Insert(
			"report = (dtchosen<<Partition(
				Y(Eval(ycols)),
				X(Eval(xcols)),
				Weight(Eval(wcols)),
				Freq(Eval(Fcols)),
				Validation(Eval(vcols)),
				Informative Missing(1),
				Split Best(^Psplit^),
				Validation Portion(^partVP^),
				Invisible
				)
				
			)<<Report;"
		);
		Eval( Parse( str ) );
		
		partWinName = Report << Get Window Title;
		
		w = Window( partWinName );
		RSq_mat = w[Outline Box( "Partition for " || ycols[1] )][Table Box( 1 )] << Get As matrix;
		
		If(
			N Items( vcols ) == 1,
				dt_results:"Training RSquare"n[i] = RSq_mat[1];
				dt_results:"Training RASE"n[i] = RSq_mat[2];
				dt_results:"Training N"n[i] = RSq_mat[3];
				dt_results:"Validation RSquare"n[i] = RSq_mat[2, 1];
				dt_results:"Validation RASE"n[i] = RSq_mat[2, 2];
				dt_results:"Validation N"n[i] = RSq_mat[2, 3];,
			N Items( vcols ) == 0 & Is Missing( PartVP ) == 0,
				dt_results:"Training RSquare"n[i] = RSq_mat[1];
				dt_results:"Training RASE"n[i] = RSq_mat[2];
				dt_results:"Training N"n[i] = RSq_mat[3];
				dt_results:"Validation RSquare"n[i] = RSq_mat[2, 1];
				dt_results:"Validation RASE"n[i] = RSq_mat[2, 2];
				dt_results:"Validation N"n[i] = RSq_mat[2, 3];,
			N Items( vcols ) == 0,
				dt_results:"Training RSquare"n[i] = RSq_mat[1];
				dt_results:"Training RASE"n[i] = RSq_mat[2];
				dt_results:"Training N"n[i] = RSq_mat[3];
		);
		Report << CloseWindow;
	);
	
	Part_TB;
);

AutoPart = Expr(
	Partwin = New Window( "",
		<<Return Result,
		<<On Validate,
		Border Box( Left( 3 ), Top( 2 ),
			Outline Box( "JSL to help with Partitioning",
				<<Set Font Size( 12 ),
				V List Box(
					H List Box(
						Panel Box( "Select Columns",
							V List Box( colListData = Col List Box( dtchosen, All, Grouped, width( lbWidth ), nLines( 12 ) ) ),
							Spacer Box( Size( 0, 16 ) )
						),
						Panel Box( "Cast Selected Columns into Roles",
							Lineup Box( N Col( 2 ), Spacing( 3, 2 ),
								Button Box( "Y, Response", colListY << Append( colListData << GetSelected ) ),
								colListY = Col List Box( width( lbWidth + 40 ), nLines( 4 ), Min Items( 1 ) ),
								Button Box( "X, Factor", colListX << Append( colListData << GetSelected ) ),
								colListX = Col List Box( width( lbWidth ), nLines( 4 ), Min Items( 1 ) ),
								Button Box( "Weight", colListW << Append( colListData << GetSelected ) ),
								colListW = Col List Box(
									width( lbWidth ),
									Max Selected( 1 ),
									Max Items( 1 ),
									nLines( 1 ),
									<<Set Data Type( "Numeric" )
								),
								Button Box( "Freq", colListF << Append( colListData << GetSelected ) ),
								colListF = Col List Box(
									width( lbWidth ),
									Max Selected( 1 ),
									Max Items( 1 ),
									nLines( 1 ),
									<<Set Data Type( "Numeric" )
								),
								Button Box( "Validation", colListV << Append( colListData << GetSelected ) ),
								colListV = Col List Box(
									width( lbWidth ),
									nLines( 1 ),
									Max Selected( 1 ),
									Max Items( 1 ),
									<<Set Data Type( "Numeric" ), 

								)
							)
						),
						Panel Box( "Action",
							Lineup Box( N Col( 1 ),
								Button Box( "OK",
									recallRoles;
									max_sp = max_sp_input << get;
									partVP = partVP_input << Get;
									ycols = ColListY << Get Items;
									xcols = ColListX << Get Items;
									Wcols = ColListX << Get Items;
									Fcols = ColListF << Get Items;
									vcols = ColListV << Get items;
									Part;
									Partwin << Close Window;
								),
								Button Box( "Cancel", Partwin << Close Window ),
								Spacer Box( Size( 0, 22 ) ),
								Button Box( "Remove",
									colListY << RemoveSelected;
									colListX << RemoveSelected;
									colListW << RemoveSelected;
									colListF << RemoveSelected;
									colListV << RemoveSelected;
								),
								Button Box( "Recall",
									clearRoles;
									Try(
										colListY << Append( ::ycolRecall );
										colListX << Append( ::xcolRecall );
										colListW << Append( ::WcolRecall );
										colListF << Append( ::FcolRecall );
										colListV << Append( ::vcolRecall );
										max_sp_input << Set( ::max_spRecall );
										partVP_input << Set( ::partVPRecall );
									);
								),
								Button Box( "Relaunch",
									FirstWin;
									Partwin << Close Window;
								),
								Spacer Box( Size( 0, 22 ) ),
								Button Box( "Help", Web( "https://www.jmp.com/en_ch/support/online-help-search.html?q=*%3A*" ) )
							)
						)
					),
					H List Box(
						Panel Box( "Max partition splits", max_sp_input = Number Edit Box( 20, 6 ) ),
						Panel Box( "Validation Portion (if no validation column)", partVP_input = Number Edit Box( ., 6 ) )
					)
				)
			)
		)
	)
);

//Interactive dialogue window to start Generalized Tuning
FirstWin = Expr(
	AutoTuneDlg1 = New Window( "Partioning Automation",
		<<Return Result,
		<<On Validate,
		Border Box( Left( 3 ), top( 2 ),
			Outline Box( "Something to help simplify partitioning",
				<<Set Font Size( 12 ),
				H List Box(
					V List Box(
						Panel Box( "Select Data Table",
							H List Box( Button Box( "Select Data Table", choose_data_table ), Spacer Box( Size( 100, 0 ) ) )
						), 

					),
					Panel Box( "Action",
						Lineup Box( N Col( 1 ),
							Button Box( "OK",
								recallRolesS1;
								AutoPart;
								AutoTuneDlg1 << Close Window;
							),
							Button Box( "Cancel", AutoTuneDlg1 << Close Window ),
							Spacer Box( Size( 0, 25 ) ),
							Button Box( "Recall",
								Try(
									MMObj << Set( ::MMObjRecall );
									Try( dtchosen = ::dtchosenRecall );
								)
							),
							Button Box( "Help", Web( "https://www.jmp.com/en_ch/support/online-help-search.html?q=*%3A*" ) )
						)
					)
				)
			)
		)
	)
);

FirstWin;

View solution in original post

3 REPLIES 3
SDF1
Super User

Re: Adding penalties to number of splits in Partition platform to reduce number of leaves?

Hi @markschwab ,

 

  Regarding your concern, is it that the number of splits is really credible, or that it's a reasonably low number of splits? I believe the number of credible splits is the rows in the data table minus 1, but not 100% on that.

 

  Anyway, I'm including a snippet of code that you can use to do your automated partitioning. I did this with the Boston Housing.jmp file.

 

  If you're familiar with JSL, you should be able to modify it for your needs. The code is fairly basic, it's assuming you have a numeric, continuous response column. It runs the partitioning for however many max partitions you define from the first window and then saves the partitioning statistics. You can easily create your own column formula to have a "penalized" validation RSquare for example or however you choose to penalize the partition. You can then select which partition is the best for your criteria and re-run it with the fixed number of splits you determine. I have some ideas of how to fancy-up the program, but that's for later.

 

  On a side note, you might consider creating a validation column (if you don't have Pro, you can do this by searching it in the Scripting Index) and stratifying by your response column, this will give the training/validation sets a more equal representation when dealing with non 50/50 splits. This way, you can have more data to train your partition.

 

Hope this helps!,

DS

Names Default To Here( 1 );

lbWidth = 168;

//dt = Open( "$SAMPLE_DATA/Boston Housing.jmp" );

// Expression to store the current settings in global variables
recallRolesS1 = Expr(
	::dtchosenRecall = dtchosen
);

// Expression to clear all current settings. KW: Clear
clearRoles = Expr(
	Try(
		colListY << RemoveAll;
		ColListX << RemoveAll;
		ColListW << RemoveAll;
		ColListF << RemoveAll;
		colListV << RemoveAll;
	)
);

// Expression to store the current settings in global variables
recallRoles = Expr(
	::ycolRecall = colListY << GetItems;
	::xcolRecall = colListX << GetItems;
	::WcolRecall = colListW << GetItems;
	::FcolRecall = colListF << GetItems;
	::vcolRecall = colListV << GetItems;
);

//Function to choose the data table to model. KW: choose_data
choose_data_table = Function( {},
	list = {};
	For( i = 1, i <= N Table(), i++,
		list[i] = Data Table( i ) << get name
	);
	win = New Window( "Select a data table",
		<<Modal,
		hb = H List Box(
			Panel Box( "Choose a data table", dt = List Box( list, max selected( 1 ), dtchosen = Data Table( (dt << get selected)[1] ) ) ), 

		)
	);
);

Part = Expr(

	dt_results = New Table( "Partition_Results",
		Add Rows( max_sp ),
		New Column( "Splits", Numeric, Continuous ),
		New Column( "Training RSquare", Numeric, Continuous ),
		New Column( "Validation RSquare", Numeric, Continuous ),
		New Column( "Training RASE", Numeric, Continuous ),
		New Column( "Validation RASE", Numeric, Continuous ),
		New Column( "Training N", Numeric, Continuous ),
		New Column( "Validation N", Numeric, Continuous )
	);
	
	For( i = 1, i <= max_sp, i++,
		dt_results:"Splits"n[i] = i
	);
	
	dt_name=dtchosen<<Get Name;
	
	For( i = 1, i <= N Rows( dt_results ), i++,
		Psplit = dt_results:"Splits"n[i];
		str = Eval Insert(
			"report = (dtchosen<<Partition(
				Y(Eval(ycols)),
				X(Eval(xcols)),
				Weight(Eval(wcols)),
				Freq(Eval(Fcols)),
				Validation(Eval(vcols)),
				Informative Missing(1),
				Split Best(^Psplit^),
				Invisible
				)
				
			)<<Report;"
		);
		Eval( Parse( str ) );
		
		w=Window(dt_name||" - "||"Partition of "|| ycols[1]);
		RSq_mat = w[Outline Box( "Partition for "|| ycols[1] )][Table Box( 1 )]<<Get As matrix;
		
		dt_results:"Training RSquare"n[i]=RSq_mat[1];
		dt_results:"Training RASE"n[i]=RSq_mat[2];
		dt_results:"Training N"n[i]=RSq_mat[3];
		dt_results:"Validation RSquare"n[i]=RSq_mat[2,1];
		dt_results:"Validation RASE"n[i]=RSq_mat[2,2];
		dt_results:"Validation N"n[i]=RSq_mat[2,3];
		Report<<CloseWindow;
	);
);

AutoPart = Expr(
	Partwin = New Window( "",
		<<Return Result,
		<<On Validate,
		Border Box( Left( 3 ), Top( 2 ),
			Outline Box( "JSL to help with Partitioning",
				<<Set Font Size( 12 ),
				V List Box(
					H List Box(
						Panel Box( "Select Columns",
							V List Box( colListData = Col List Box( dtchosen, All, Grouped, width( lbWidth ), nLines( 12 ) ) ),
							Spacer Box( Size( 0, 16 ) )
						),
						Panel Box( "Cast Selected Columns into Roles",
							Lineup Box( N Col( 2 ), Spacing( 3, 2 ),
								Button Box( "Y, Response", colListY << Append( colListData << GetSelected ) ),
								colListY = Col List Box( width( lbWidth + 40 ), nLines( 4 ), Min Items( 1 ) ),
								Button Box( "X, Factor", colListX << Append( colListData << GetSelected ) ),
								colListX = Col List Box( width( lbWidth ), nLines( 4 ), Min Items( 1 ) ),
								Button Box( "Weight", colListW << Append( colListData << GetSelected ) ),
								colListW = Col List Box(
									width( lbWidth ),
									Max Selected( 1 ),
									Max Items( 1 ),
									nLines( 1 ),
									<<Set Data Type( "Numeric" )
								),
								Button Box( "Freq", colListF << Append( colListData << GetSelected ) ),
								colListF = Col List Box(
									width( lbWidth ),
									Max Selected( 1 ),
									Max Items( 1 ),
									nLines( 1 ),
									<<Set Data Type( "Numeric" )
								),
								Button Box( "Validation", colListV << Append( colListData << GetSelected ) ),
								colListV = Col List Box(
									width( lbWidth ),
									nLines( 1 ),
									Max Selected( 1 ),
									Max Items( 1 ),
									<<Set Data Type( "Numeric" ), 

								)
							)
						),
						Panel Box( "Action",
							Lineup Box( N Col( 1 ),
								Button Box( "OK",
									recallRoles;
									max_sp = max_sp_input << get;
									ycols = ColListY << Get Items;
									xcols = ColListX << Get Items;
									Wcols = ColListX << Get Items;
									Fcols = ColListF << Get Items;
									vcols = ColListV << Get items;
									Part;
									Partwin << Close Window;
								),
								Button Box( "Cancel", Partwin << Close Window ),
								Spacer Box( Size( 0, 22 ) ),
								Button Box( "Remove",
									colListY << RemoveSelected;
									colListX << RemoveSelected;
									colListW << RemoveSelected;
									colListF << RemoveSelected;
									colListV << RemoveSelected;
								),
								Button Box( "Recall",
									clearRoles;
									Try(
										colListY << Append( ::ycolRecall );
										colListX << Append( ::xcolRecall );
										colListW << Append( ::WcolRecall );
										colListF << Append( ::FcolRecall );
										colListV << Append( ::vcolRecall );
									);
								),
								Button Box( "Relaunch",
									FirstWin;
									Partwin << Close Window;
								),
								Spacer Box( Size( 0, 22 ) ),
								Button Box( "Help", Web( "https://www.jmp.com/en_ch/support/online-help-search.html?q=*%3A*" ) )
							)
						)
					),
					Panel Box( "Max partition splits", max_sp_input = Number Edit Box( 20, 6 ) )
				)
			)
		)
	)
);

//Interactive dialogue window to start Generalized Tuning
FirstWin = Expr(
	AutoTuneDlg1 = New Window( "Partioning Automation",
		<<Return Result,
		<<On Validate,
		Border Box( Left( 3 ), top( 2 ),
			Outline Box( "Something to help simplify partitioning",
				<<Set Font Size( 12 ),
				H List Box(
					V List Box(
						Panel Box( "Select Data Table",
							H List Box( Button Box( "Select Data Table", choose_data_table ), Spacer Box( Size( 100, 0 ) ) )
						), 

					),
					Panel Box( "Action",
						Lineup Box( N Col( 1 ),
							Button Box( "OK",
								recallRolesS1;
								AutoPart;
								AutoTuneDlg1 << Close Window;
							),
							Button Box( "Cancel", AutoTuneDlg1 << Close Window ),
							Spacer Box( Size( 0, 25 ) ),
							Button Box( "Recall",
								Try(
									MMObj << Set( ::MMObjRecall );
									Try( dtchosen = ::dtchosenRecall );
								)
							),
							Button Box( "Help", Web( "https://www.jmp.com/en_ch/support/online-help-search.html?q=*%3A*" ) )
						)
					)
				)
			)
		)
	)
);

FirstWin;
markschwab
Level IV

Re: Adding penalties to number of splits in Partition platform to reduce number of leaves?

Thanks @SDF1 for the detailed response! It will take me some time to digest the script.

 

For now, would you be able to just give me a quick primer with an example of how to use it? Here is what I tried and I am getting an error:

1) Uncomment the line "dt = Open( "$SAMPLE_DATA/Boston Housing.jmp" );" and run the script

2) Click "Select Data Table" -> "Boston Housing" -> OK

3) Under "Action", click "OK"

4) In the window "JSL to help with Partitioning" I cast "nox" into Y and "age" into X, and hit "OK"

5) I see JMP Alert "Subscript problem{20} in access or evaluation of 'w[Outline Box("Partition for " || ycols[1])]' , w[/*###*/Outline Box( "Partition for " || ycols[1] )]". The table "Partition_Results" is null except for the column "Splits" which ranges from 1-20.

SDF1
Super User

Re: Adding penalties to number of splits in Partition platform to reduce number of leaves?

Hi @markschwab ,

 

  You are using the code as intended, I had just made some guesses that weren't generic enough to work under any circumstance. I thought you had a validation column, and so scripted it to run with one. the updated script below works around that by letting you input the validation portion, just like in the partition platform window. But, don't use both a validation column and a validation portion -- the script hasn't been coded yet to account for that and is again, not fully generic enough for all possible user inputs.

 

  As for the error that you were getting, it was again because I assumed that there was more than one X factor in the process and therefore, scripted the reference to the report window accordingly. That is now fixed so that it doesn't matter if there's only 1 X or multiple X's.

 

  Again, this script is not completely generic enough to account for non-numeric/non-continuous response column, Y. If the response is categorical, this script is not designed to account for that. It can easily be modified to do so if need be.

 

Hope this helps,

DS

 

UPDATE: Ok, so I couldn't help myself to code more and I included some penalized RSquare columns and a training-validation RSquare column. You can change the formula and the coefficient of the penalized calculation by editing the code -- I set it to your concept of Rsquare-0.05*Splits. In addition to that, I also included some code to graphically display the results when done running the calculations.

 

UPDATE 2: I went back and made the code a little more fun -- Now you can see the table results alongside the graph results and by selecting the point on the graph you like best, you can then re-run the partition for that specific number of splits when you click "OK". As before, I haven't made it super-generic for any kind of user input, but tried to make it work for numeric/continuous responses. What you now have should be enough to get you going.

 

P.S. As you learn the code, use the Scripting Index in JMP's Help menu. It's very helpful in understanding some of the calls and how to use them when scripting.

Names Default To Here( 1 );

lbWidth = 168;

dt = Open( "$SAMPLE_DATA/Boston Housing.jmp" );

// Expression to store the current settings in global variables
recallRolesS1 = Expr(
	::dtchosenRecall = dtchosen
);

// Expression to clear all current settings. KW: Clear
clearRoles = Expr(
	Try(
		colListY << RemoveAll;
		ColListX << RemoveAll;
		ColListW << RemoveAll;
		ColListF << RemoveAll;
		colListV << RemoveAll;
	)
);

// Expression to store the current settings in global variables
recallRoles = Expr(
	::ycolRecall = colListY << GetItems;
	::xcolRecall = colListX << GetItems;
	::WcolRecall = colListW << GetItems;
	::FcolRecall = colListF << GetItems;
	::vcolRecall = colListV << GetItems;
	::max_spRecall = max_sp_input << get;
	::partVPRecall = partVP_input << Get;
);

//Function to choose the data table to model. KW: choose_data
choose_data_table = Function( {},
	list = {};
	For( i = 1, i <= N Table(), i++,
		list[i] = Data Table( i ) << get name
	);
	win = New Window( "Select a data table",
		<<Modal,
		hb = H List Box(
			Panel Box( "Choose a data table", dt = List Box( list, max selected( 1 ), dtchosen = Data Table( (dt << get selected)[1] ) ) ), 

		)
	);
);

Part_rerun=Expr(
	
	Close(dt_results, No Save);
	
	NPCs_num=NPCs[1];
	
	str = Eval Insert(
			"report = (dtchosen<<Partition(
				Y(Eval(ycols)),
				X(Eval(xcols)),
				Weight(Eval(wcols)),
				Freq(Eval(Fcols)),
				Validation(Eval(vcols)),
				Informative Missing(1),
				Split Best(^NPCs_num^),
				Validation Portion(^partVP^),
				Split History(1)
				)
				
			)<<Report;"
		);
		Eval( Parse( str ) );
	
);

Part_TB = Expr(

	Part_TBWin = New Window( "Select the Best split to re-run",
		<<Return Result,
		<<On Validate,
		Outline Box( "Partitioning Results (Select the best to re-run))",
			H List Box(
				If(
					N Items( vcols ) == 1 | (N Items( vcols ) == 0 & Is Missing( PartVP ) == 0),
						part_gb = Graph Builder(
							Size( 579, 417 ),
							Show Control Panel( 0 ),
							Variables(
								X( :Splits ),
								Y( :Training RSquare ),
								Y( :Validation RSquare, Position( 1 ) ),
								Y( :RSquare Diff ),
								Y( :Penalized Valid RSquare, Position( 2 ) )
							),
							Elements( Position( 1, 1 ), Line( X, Y( 1 ), Y( 2 ), Legend( 12 ) ), Points( X, Y( 1 ), Y( 2 ), Legend( 14 ) ) ),
							Elements( Position( 1, 2 ), Line( X, Y( 1 ), Y( 2 ), Legend( 13 ) ), Points( X, Y( 1 ), Y( 2 ), Legend( 15 ) ) ),
							SendToReport(
								Dispatch(
									{},
									"graph title",
									TextEditBox,
									{Set Text( "Training & Validation RSquare / RSquare Diff & Penalized Valid RSquare vs. Splits" )}
								)
							)
						),
					N Items( vcols ) == 0 & Is Missing( PartVP ) == 1,
						part_gb = Graph Builder(
							Size( 531, 456 ),
							Show Control Panel( 0 ),
							Variables( X( :Splits ), Y( :Training RSquare ), Y( :Penalized Train RSquare ) ),
							Elements( Position( 1, 1 ), Line( X, Y, Legend( 7 ) ), Points( X, Y, Legend( 9 ) ) ),
							Elements( Position( 1, 2 ), Line( X, Y, Legend( 8 ) ), Points( X, Y, Legend( 11 ) ) )
						)
				),
				dt_tb = dt_results << Get As Report(),
				Panel Box( "Action",
					Lineup Box( N Col( 1 ),
						Text Box( "Re-run Partion", <<Justify Text( "Center" ) ),
						Button Box( "OK",
							NPCs=dt_results<<Get selected rows();
							Part_rerun;
							Part_TBWin << Close Window;
						),
						Button Box( "Cancel", Part_TBWin << Close Window ),
						Spacer Box( Size( 0, 8 ) ),
						Button Box( "Relaunch",
							Part_TBWin << Close Window;
							Firstwin;
						),
						Spacer Box( Size( 0, 8 ) ),
						Button Box( "Help", Web( "https://www.jmp.com/en_ch/support/online-help-search.html?q=*%3A*" ) )
					)
				)
			)
		)
	);
);


Part = Expr(
	If(
		N Items( vcols ) == 1,
			dt_results = New Table( "Partition_Results",
				Add Rows( max_sp ),
				New Column( "Splits", Numeric, Continuous ),
				New Column( "Training RSquare", Numeric, Continuous ),
				New Column( "Validation RSquare", Numeric, Continuous ),
				New Column( "Training RASE", Numeric, Continuous ),
				New Column( "Validation RASE", Numeric, Continuous ),
				New Column( "Training N", Numeric, Continuous ),
				New Column( "Validation N", Numeric, Continuous ),
				New Column( "RSquare Diff", Numeric, Continuous, Formula( "Training RSquare"n - "Validation RSquare"n ) ),
				New Column( "Penalized Valid RSquare", Numeric, Continuous, Formula( "Training RSquare"n - 0.05 * "Splits"n ) )
			),
		N Items( vcols ) == 0 & Is Missing( partVP ) == 0,
			dt_results = New Table( "Partition_Results",
				Add Rows( max_sp ),
				New Column( "Splits", Numeric, Continuous ),
				New Column( "Training RSquare", Numeric, Continuous ),
				New Column( "Validation RSquare", Numeric, Continuous ),
				New Column( "Training RASE", Numeric, Continuous ),
				New Column( "Validation RASE", Numeric, Continuous ),
				New Column( "Training N", Numeric, Continuous ),
				New Column( "Validation N", Numeric, Continuous ),
				New Column( "RSquare Diff", Numeric, Continuous, Formula( "Training RSquare"n - "Validation RSquare"n ) ),
				New Column( "Penalized Valid RSquare", Numeric, Continuous, Formula( "Validation RSquare"n - 0.05 * "Splits"n ) )
			),
		N Items( vcols ) == 0,
			dt_results = New Table( "Partition_Results",
				Add Rows( max_sp ),
				New Column( "Splits", Numeric, Continuous ),
				New Column( "Training RSquare", Numeric, Continuous ),
				New Column( "Training RASE", Numeric, Continuous ),
				New Column( "Training N", Numeric, Continuous ),
				New Column( "Penalized Train RSquare", Numeric, Continuous, Formula( "Training RSquare"n - 0.05 * "Splits"n ) )
			)
	);
	
	For( i = 1, i <= max_sp, i++,
		dt_results:"Splits"n[i] = i
	);
	
	dt_name = dtchosen << Get Name;
	
	For( i = 1, i <= N Rows( dt_results ), i++,
		Psplit = dt_results:"Splits"n[i];
		str = Eval Insert(
			"report = (dtchosen<<Partition(
				Y(Eval(ycols)),
				X(Eval(xcols)),
				Weight(Eval(wcols)),
				Freq(Eval(Fcols)),
				Validation(Eval(vcols)),
				Informative Missing(1),
				Split Best(^Psplit^),
				Validation Portion(^partVP^),
				Invisible
				)
				
			)<<Report;"
		);
		Eval( Parse( str ) );
		
		partWinName = Report << Get Window Title;
		
		w = Window( partWinName );
		RSq_mat = w[Outline Box( "Partition for " || ycols[1] )][Table Box( 1 )] << Get As matrix;
		
		If(
			N Items( vcols ) == 1,
				dt_results:"Training RSquare"n[i] = RSq_mat[1];
				dt_results:"Training RASE"n[i] = RSq_mat[2];
				dt_results:"Training N"n[i] = RSq_mat[3];
				dt_results:"Validation RSquare"n[i] = RSq_mat[2, 1];
				dt_results:"Validation RASE"n[i] = RSq_mat[2, 2];
				dt_results:"Validation N"n[i] = RSq_mat[2, 3];,
			N Items( vcols ) == 0 & Is Missing( PartVP ) == 0,
				dt_results:"Training RSquare"n[i] = RSq_mat[1];
				dt_results:"Training RASE"n[i] = RSq_mat[2];
				dt_results:"Training N"n[i] = RSq_mat[3];
				dt_results:"Validation RSquare"n[i] = RSq_mat[2, 1];
				dt_results:"Validation RASE"n[i] = RSq_mat[2, 2];
				dt_results:"Validation N"n[i] = RSq_mat[2, 3];,
			N Items( vcols ) == 0,
				dt_results:"Training RSquare"n[i] = RSq_mat[1];
				dt_results:"Training RASE"n[i] = RSq_mat[2];
				dt_results:"Training N"n[i] = RSq_mat[3];
		);
		Report << CloseWindow;
	);
	
	Part_TB;
);

AutoPart = Expr(
	Partwin = New Window( "",
		<<Return Result,
		<<On Validate,
		Border Box( Left( 3 ), Top( 2 ),
			Outline Box( "JSL to help with Partitioning",
				<<Set Font Size( 12 ),
				V List Box(
					H List Box(
						Panel Box( "Select Columns",
							V List Box( colListData = Col List Box( dtchosen, All, Grouped, width( lbWidth ), nLines( 12 ) ) ),
							Spacer Box( Size( 0, 16 ) )
						),
						Panel Box( "Cast Selected Columns into Roles",
							Lineup Box( N Col( 2 ), Spacing( 3, 2 ),
								Button Box( "Y, Response", colListY << Append( colListData << GetSelected ) ),
								colListY = Col List Box( width( lbWidth + 40 ), nLines( 4 ), Min Items( 1 ) ),
								Button Box( "X, Factor", colListX << Append( colListData << GetSelected ) ),
								colListX = Col List Box( width( lbWidth ), nLines( 4 ), Min Items( 1 ) ),
								Button Box( "Weight", colListW << Append( colListData << GetSelected ) ),
								colListW = Col List Box(
									width( lbWidth ),
									Max Selected( 1 ),
									Max Items( 1 ),
									nLines( 1 ),
									<<Set Data Type( "Numeric" )
								),
								Button Box( "Freq", colListF << Append( colListData << GetSelected ) ),
								colListF = Col List Box(
									width( lbWidth ),
									Max Selected( 1 ),
									Max Items( 1 ),
									nLines( 1 ),
									<<Set Data Type( "Numeric" )
								),
								Button Box( "Validation", colListV << Append( colListData << GetSelected ) ),
								colListV = Col List Box(
									width( lbWidth ),
									nLines( 1 ),
									Max Selected( 1 ),
									Max Items( 1 ),
									<<Set Data Type( "Numeric" ), 

								)
							)
						),
						Panel Box( "Action",
							Lineup Box( N Col( 1 ),
								Button Box( "OK",
									recallRoles;
									max_sp = max_sp_input << get;
									partVP = partVP_input << Get;
									ycols = ColListY << Get Items;
									xcols = ColListX << Get Items;
									Wcols = ColListX << Get Items;
									Fcols = ColListF << Get Items;
									vcols = ColListV << Get items;
									Part;
									Partwin << Close Window;
								),
								Button Box( "Cancel", Partwin << Close Window ),
								Spacer Box( Size( 0, 22 ) ),
								Button Box( "Remove",
									colListY << RemoveSelected;
									colListX << RemoveSelected;
									colListW << RemoveSelected;
									colListF << RemoveSelected;
									colListV << RemoveSelected;
								),
								Button Box( "Recall",
									clearRoles;
									Try(
										colListY << Append( ::ycolRecall );
										colListX << Append( ::xcolRecall );
										colListW << Append( ::WcolRecall );
										colListF << Append( ::FcolRecall );
										colListV << Append( ::vcolRecall );
										max_sp_input << Set( ::max_spRecall );
										partVP_input << Set( ::partVPRecall );
									);
								),
								Button Box( "Relaunch",
									FirstWin;
									Partwin << Close Window;
								),
								Spacer Box( Size( 0, 22 ) ),
								Button Box( "Help", Web( "https://www.jmp.com/en_ch/support/online-help-search.html?q=*%3A*" ) )
							)
						)
					),
					H List Box(
						Panel Box( "Max partition splits", max_sp_input = Number Edit Box( 20, 6 ) ),
						Panel Box( "Validation Portion (if no validation column)", partVP_input = Number Edit Box( ., 6 ) )
					)
				)
			)
		)
	)
);

//Interactive dialogue window to start Generalized Tuning
FirstWin = Expr(
	AutoTuneDlg1 = New Window( "Partioning Automation",
		<<Return Result,
		<<On Validate,
		Border Box( Left( 3 ), top( 2 ),
			Outline Box( "Something to help simplify partitioning",
				<<Set Font Size( 12 ),
				H List Box(
					V List Box(
						Panel Box( "Select Data Table",
							H List Box( Button Box( "Select Data Table", choose_data_table ), Spacer Box( Size( 100, 0 ) ) )
						), 

					),
					Panel Box( "Action",
						Lineup Box( N Col( 1 ),
							Button Box( "OK",
								recallRolesS1;
								AutoPart;
								AutoTuneDlg1 << Close Window;
							),
							Button Box( "Cancel", AutoTuneDlg1 << Close Window ),
							Spacer Box( Size( 0, 25 ) ),
							Button Box( "Recall",
								Try(
									MMObj << Set( ::MMObjRecall );
									Try( dtchosen = ::dtchosenRecall );
								)
							),
							Button Box( "Help", Web( "https://www.jmp.com/en_ch/support/online-help-search.html?q=*%3A*" ) )
						)
					)
				)
			)
		)
	)
);

FirstWin;