Solved: building multiple neural network models

abmayfield · Jun 9, 2023 2:59 PM

I am using the neural platform to make predictions, and, being new to neural networking, I am only slightly familiar with all the input parameters: number of hidden layers, number of nodes/hidden layer, boosting models, learning rate, and tours. What I want to do is try to minimize RMSE and the validation model misclassification rate. What I've been doing is iteratively changing each parameter one by one, saving the model performance parameters, and pasting them into a new JMP table, but this is going to take days since there are so many combinations of layers, nodes, tours, etc. Would it be possible to write a script to where JMP Pro builds, say, 1,000 models and dumps the data into a table so that I don't have to manually change each model input parameter?

#Hidden layers: 1 or 2

#Sigmoidal nodes: 0, 1, 2, 3, or 4

#Linear nodes: 0, 1, 2, 3, or 4

#Radial nodes: 0, 1, 2, 3, or 4

#boosting models: no idea, 1-5 maybe?

#learning rate: 0-0.5 in 0.1 increments

#tours: 1, 5, 10, 20, 100

So that would be 2 x 5 x 5 x 5 x 5 x 5 x 5=~30,000.

I guess I could widdle a few of these down once I am more familiar with the dataset. Of course, having many nodes and a high number of boosting models doesn't make sense (nor do certain other combinations), but we're still talking about potentially hundreds of models worth testing. Surely this sort of model screening/comparing could be scripted, right?

Anderson B. Mayfield

SDF1 · Dec 1, 2020 03:52 PM

Hi @abmayfield ,

This was something that I've been wanting to script up for a while and was really fun, and your post gave the fire to get it going, thanks! I've been wanting to have a way to tune the NN platform for a while as it is very sensitive to the initial conditions, and it's rather slow to do it manually.

So, I adapted some scripts that I have made for other model platforms to automate their tuning processes, for example, the boosted tree, bootstrap forest, and XGBoost platforms. The scripts are meant to be saved into your data table and run with the green hot button.

Hopefully you know some about scripting and will know where to adapt these scripts to your specific situations/data structure/column names, etc. I have entered in some comments to try and help explain what I'm doing in each larger group. I'm attaching two scripts -- one for continuous Y's and one for Nominal/Ordinal Y's. I did have to include a test to see if the output Y is nominal or ordinal because the Number Col Box() value changes depending on this. I used the Big Class Families.jmp data table in the sample data as a test data set. I'm also including two output data tables I generate using either the numerical NN tuning or nominal/ordinal NN tuning. I did nominal on :sex using :age, :weight:, :height, and then I changed :age from Ordinal to Continuous and modeled age with :weight and :sex for the continuous NN tuning. The fits and stats are horrible, but it's intended only for the purpose of seeing if the JSL works or not. I also added a :Validation column to the data table by stratifying on :age, just to check that it all worked correctly.

I'm sure someone else could script it up more elegantly to test the Y() input column to be modeling type Continuous, Nominal, or Ordinal and then have the appropriate Number Col Box() value inserted to extract the data appropriately using only a single JSL code.

You will need to edit the Y() and X() inputs to the dt << Nerual() section as well as the part where you get the name of the report window, as it's named after the variable you're modeling.

The NN tuning tables could be treating like a space filling DOE in principle where the lower and upper settings are whatever you want to set it to, except for some of them, like N_Layers can only be 1 or 2 because JMP only allows 1 or 2 layers. Penalty_Method is only the options that are valid in the NN platform, and the Transform Covariate and Robust Fit only take on values of 0 or 1 (off or on). I'm including the NN tuning table as well.

If you have any questions, let me know.

Here's my code for the Nominal/Ordinal modeling type:

Names Default To Here( 1 );

dt = Data Table( "Big Class Families" );//enter the name of the current data table
dt_parms = Data Table( "NN Tuning" );//enter name of parameter data table for tuning the NN

dt_results = dt_parms << Subset( All rows, Selected Columns Only( 0 ) );//this copies the tuning table with all the different rows in it

//these commands create columns to record the fit results
dt_results << New Column( "Generalized R² Training" );
dt_results << New Column( "Generalized R² Validation" );
dt_results << New Column( "Entropy R² Training" );
dt_results << New Column( "Entropy R² Validation" );
dt_results << New Column( "RMSE Training" );
dt_results << New Column( "RMSE Validation" );
dt_results << New Column( "Mean Abs Dev Training" );
dt_results << New Column( "Mean Abs Dev Validation" );
dt_results << New Column( "Misclassification Rate Training" );
dt_results << New Column( "Misclassification Rate Validation" );
dt_results << New Column( "-LogLiklihood Training" );
dt_results << New Column( "-LogLiklihood Validation" );
dt_results << New Column( "Sum Freq Training" );
dt_results << New Column( "Sum Freq Validation" );

//This part just creates a separate modelling progress, delete if you don't want it.
i = 1;
imax = N Row( dt_parms );
dlgStatus = New Window( "Overall Modeling Progress",
	V List Box(
		Text Box( " Overall Modeling Progress ", <<Set Font Size( 12 ), <<Justify Text( "center" ), <<Set width( 200 ) ),
		dlg_gb = Graph Box( FrameSize( 200, 15 ), X Scale( 0, 100 ), Y Scale( 0, 1 ) ),
		tb = Text Box(
			"Current step " || Char( i ) || " of " || Char( imax ),
			<<Set Font Size( 12 ),
			<<Justify Text( "center" ),
			<<Set width( 200 )
		)
	)
);
dlg_gb[Axis Box( 2 )] << Delete;
dlg_gb[Axis Box( 1 )] << Delete;

//This big for loop goes through your tuning table putting each value into the fit to generate a new NN model and save the statistics
For( i = 1, i <= imax, i++,
	prog = (i / imax) * 100;//take out if you don't want the update window
	dlgStatus[FrameBox( 1 )] << Add Graphics Script( {Fill Color( "purple" ), Rect( 0, 1, prog, 0, 1 )} );//take out if you don't want the update window
	tb << set text( "Current step " || Char( i ) || " of " || Char( imax ) );//take out if you don't want the update window
	Nlayer = dt_parms:N_Layers[i];
	If( dt_results:N_Layers[i] == 1,
		dt_results:N_TanH_2[i] = 0;
		dt_results:N_Linear_2[i] = 0;
		dt_results:N_Gauss_2[i] = 0;
	);//can't have a second layer if you have one and are boosting
	If( dt_results:N_Layers[i] == 2,
		dt_results:N_Boosts[i] = 0;
		dt_results:Learn_Rate[i] = 0.1;
	);//can't boost if you have two layers
	NTH1 = dt_results:N_TanH_1[i];
	NTH2 = dt_results:N_TanH_2[i];
	NL1 = dt_results:N_Linear_1[i];
	NL2 = dt_results:N_Linear_2[i];
	NG1 = dt_results:N_Gauss_1[i];
	NG2 = dt_results:N_Gauss_2[i];
	Nboosts = dt_results:N_Boosts[i];
	LR = dt_results:Learn_Rate[i];
	TCov = dt_results:T_Cov[i];
	RFit = dt_results:Robust_Fit[i];
	PMethod = dt_results:Penalty_Method[i];
	Ntours = dt_results:N_Tours[i];
	
	str = Eval Insert(
		"report = (dt << Neural(
       Y( :sex ),
		X(
		:height,
		:age,
		:weight
		), 
       Validation ( :Validation ),
       Informative Missing(0), 
       Transform Covariates(^TCov^),
       Fit(
       	NTanH(^NTH1^),
       	NLinear(^NL1^),
       	NGaussian(^NG1^),
       	NTanH2(^NTH2^),
       	NLinear2(^NL2^),
       	NGaussian2(^NG2^),
       	Transform Covariates(^TCov^),
       	Penalty Method(\!"PMethod\!"),
       	Number of Tours(^Ntours^),
       	N Boost(^Nboosts^),
       	Learning Rate(^LR^)
       )),
       Go,
              invisible
       ) << Report;"
	);
	Eval( Parse( str ) );
	w = Window( dt << GetName || " - " || "Neural of sex" );//here, you'll want to change the characters after "age" to whatever your response is
	
	//just a simple way to test if the Y variable is Nominal or Ordinal by testing the name of the report window

	If( !Is Missing( Regex( w << Get Window Title, "age" ) ) == 1,
		ncol_box = 14
	);
	If( !Is Missing( Regex( w << Get Window Title, "sex" ) ) == 1,
		ncol_box = 6
	);

	
	
	T_stats = w[Outline Box( 3 ), Number Col Box( 1 )] << Get;//You might need to adjust the Outline Box () and Number Col Box() values depending on your tree structure
	V_stats = w[Outline Box( 3 ), Number Col Box( ncol_box )] << Get;//You might need to adjust the Outline Box () and Number Col Box() values depending on your tree structure
	report << Close Window;
	
	//training results
	dt_results:Generalized R² Training[i] = T_stats[1];
	dt_results:Entropy R² Training[i] = T_stats[2];
	dt_results:RMSE Training[i] = T_stats[3];
	dt_results:Mean Abs Dev Training[i] = T_stats[4];
	dt_results:Misclassification Rate Training[i] = T_stats[5];
	dt_results:Name( "-LogLiklihood Training" )[i] = T_stats[6];
	dt_results:Sum Freq Training[i] = T_stats[7];
	
	//validation results
	dt_results:Generalized R² Validation[i] = V_stats[1];
	dt_results:Entropy R² Validation[i] = V_stats[2];
	dt_results:RMSE Validation[i] = V_stats[3];
	dt_results:Mean Abs Dev Validation[i] = V_stats[4];
	dt_results:Misclassification Rate Validation[i] = V_stats[5];
	dt_results:Name( "-LogLiklihood Validation" )[i] = V_stats[6];
	dt_results:Sum Freq Validation[i] = V_stats[7];
	
	
);
dlgStatus << closewindow();

One thing that is strange that I don't understand is that I get this warning about the log window I create when I run it for the NN platform, but I've never gotten it from my other platforms. I can't follow where that warning comes from, but it still runs correctly. Maybe someone can fix that part of my code?

Unexpected ",". Perhaps there is a missing ")".
Trying to parse operand for "<<" operator.
Line 23 Column 10: ))►,

As a last note: be careful about the settings as it can take a VERY long time to run the modeling if you have huge numbers of boosts or tours or transfer functions. Just be cautious not to go overboard.

Hope this helps!,

DS

View solution in original post

Mark_Bailey · Nov 30, 2020 03:06 PM

You could design an experiment with fewer levels to fit a second-order model of RMSE and then use the model to optimize the settings. Also, you can make each model in turn and then extract all the results at once by right-clicking on one of the results table and selecting Make Into Combined Table.

abmayfield · Dec 1, 2020 10:00 AM

Mark, Thanks for your suggestions. The second one I've tried and failed. I know the "make into combined table" option from other platforms, but maybe it was left out of the neural platform for some reason. Strangely, the "option" trick, whereby you hold down the option key, make a selection, and it applies to all parts of the report, does not work either in the neural platform, or else that could speed things up.

That would be cool if I could have my neural network parameters (# nodes, #tours, etc.) as X's in a DOE, with the goal of minimizing RSME, but it seems to not be possible because of the way you must enter the parameters in the particular boxes in the neural model launch. You'd have to do some really advanced scripting, and even then I'm not sure if DOE and Neural could communicate properly. But if anyone out there knows of a way in which I could get the neural platform to basically run the models reflected in this screenshot (being a sub-selection of the grand total), I would greatly appreciate it. In other words, the values in the first screenshot table would need to go to the appropriate input boxes in the second screenshot (i.e., the neural model launch).

Anderson B. Mayfield

abmayfield · Dec 1, 2020 10:04 AM

BTW, the "option" trick to get all relevant features to show the same information as the feature you are actively selecting is actually the "command" trick on a Mac (hence why that didn't previously work).

Anderson B. Mayfield

SDF1 · Dec 1, 2020 03:52 PM

Hi @abmayfield ,

This was something that I've been wanting to script up for a while and was really fun, and your post gave the fire to get it going, thanks! I've been wanting to have a way to tune the NN platform for a while as it is very sensitive to the initial conditions, and it's rather slow to do it manually.

So, I adapted some scripts that I have made for other model platforms to automate their tuning processes, for example, the boosted tree, bootstrap forest, and XGBoost platforms. The scripts are meant to be saved into your data table and run with the green hot button.

Hopefully you know some about scripting and will know where to adapt these scripts to your specific situations/data structure/column names, etc. I have entered in some comments to try and help explain what I'm doing in each larger group. I'm attaching two scripts -- one for continuous Y's and one for Nominal/Ordinal Y's. I did have to include a test to see if the output Y is nominal or ordinal because the Number Col Box() value changes depending on this. I used the Big Class Families.jmp data table in the sample data as a test data set. I'm also including two output data tables I generate using either the numerical NN tuning or nominal/ordinal NN tuning. I did nominal on :sex using :age, :weight:, :height, and then I changed :age from Ordinal to Continuous and modeled age with :weight and :sex for the continuous NN tuning. The fits and stats are horrible, but it's intended only for the purpose of seeing if the JSL works or not. I also added a :Validation column to the data table by stratifying on :age, just to check that it all worked correctly.

I'm sure someone else could script it up more elegantly to test the Y() input column to be modeling type Continuous, Nominal, or Ordinal and then have the appropriate Number Col Box() value inserted to extract the data appropriately using only a single JSL code.

You will need to edit the Y() and X() inputs to the dt << Nerual() section as well as the part where you get the name of the report window, as it's named after the variable you're modeling.

The NN tuning tables could be treating like a space filling DOE in principle where the lower and upper settings are whatever you want to set it to, except for some of them, like N_Layers can only be 1 or 2 because JMP only allows 1 or 2 layers. Penalty_Method is only the options that are valid in the NN platform, and the Transform Covariate and Robust Fit only take on values of 0 or 1 (off or on). I'm including the NN tuning table as well.

If you have any questions, let me know.

Here's my code for the Nominal/Ordinal modeling type:

Names Default To Here( 1 );

dt = Data Table( "Big Class Families" );//enter the name of the current data table
dt_parms = Data Table( "NN Tuning" );//enter name of parameter data table for tuning the NN

dt_results = dt_parms << Subset( All rows, Selected Columns Only( 0 ) );//this copies the tuning table with all the different rows in it

//these commands create columns to record the fit results
dt_results << New Column( "Generalized R² Training" );
dt_results << New Column( "Generalized R² Validation" );
dt_results << New Column( "Entropy R² Training" );
dt_results << New Column( "Entropy R² Validation" );
dt_results << New Column( "RMSE Training" );
dt_results << New Column( "RMSE Validation" );
dt_results << New Column( "Mean Abs Dev Training" );
dt_results << New Column( "Mean Abs Dev Validation" );
dt_results << New Column( "Misclassification Rate Training" );
dt_results << New Column( "Misclassification Rate Validation" );
dt_results << New Column( "-LogLiklihood Training" );
dt_results << New Column( "-LogLiklihood Validation" );
dt_results << New Column( "Sum Freq Training" );
dt_results << New Column( "Sum Freq Validation" );

//This part just creates a separate modelling progress, delete if you don't want it.
i = 1;
imax = N Row( dt_parms );
dlgStatus = New Window( "Overall Modeling Progress",
	V List Box(
		Text Box( " Overall Modeling Progress ", <<Set Font Size( 12 ), <<Justify Text( "center" ), <<Set width( 200 ) ),
		dlg_gb = Graph Box( FrameSize( 200, 15 ), X Scale( 0, 100 ), Y Scale( 0, 1 ) ),
		tb = Text Box(
			"Current step " || Char( i ) || " of " || Char( imax ),
			<<Set Font Size( 12 ),
			<<Justify Text( "center" ),
			<<Set width( 200 )
		)
	)
);
dlg_gb[Axis Box( 2 )] << Delete;
dlg_gb[Axis Box( 1 )] << Delete;

//This big for loop goes through your tuning table putting each value into the fit to generate a new NN model and save the statistics
For( i = 1, i <= imax, i++,
	prog = (i / imax) * 100;//take out if you don't want the update window
	dlgStatus[FrameBox( 1 )] << Add Graphics Script( {Fill Color( "purple" ), Rect( 0, 1, prog, 0, 1 )} );//take out if you don't want the update window
	tb << set text( "Current step " || Char( i ) || " of " || Char( imax ) );//take out if you don't want the update window
	Nlayer = dt_parms:N_Layers[i];
	If( dt_results:N_Layers[i] == 1,
		dt_results:N_TanH_2[i] = 0;
		dt_results:N_Linear_2[i] = 0;
		dt_results:N_Gauss_2[i] = 0;
	);//can't have a second layer if you have one and are boosting
	If( dt_results:N_Layers[i] == 2,
		dt_results:N_Boosts[i] = 0;
		dt_results:Learn_Rate[i] = 0.1;
	);//can't boost if you have two layers
	NTH1 = dt_results:N_TanH_1[i];
	NTH2 = dt_results:N_TanH_2[i];
	NL1 = dt_results:N_Linear_1[i];
	NL2 = dt_results:N_Linear_2[i];
	NG1 = dt_results:N_Gauss_1[i];
	NG2 = dt_results:N_Gauss_2[i];
	Nboosts = dt_results:N_Boosts[i];
	LR = dt_results:Learn_Rate[i];
	TCov = dt_results:T_Cov[i];
	RFit = dt_results:Robust_Fit[i];
	PMethod = dt_results:Penalty_Method[i];
	Ntours = dt_results:N_Tours[i];
	
	str = Eval Insert(
		"report = (dt << Neural(
       Y( :sex ),
		X(
		:height,
		:age,
		:weight
		), 
       Validation ( :Validation ),
       Informative Missing(0), 
       Transform Covariates(^TCov^),
       Fit(
       	NTanH(^NTH1^),
       	NLinear(^NL1^),
       	NGaussian(^NG1^),
       	NTanH2(^NTH2^),
       	NLinear2(^NL2^),
       	NGaussian2(^NG2^),
       	Transform Covariates(^TCov^),
       	Penalty Method(\!"PMethod\!"),
       	Number of Tours(^Ntours^),
       	N Boost(^Nboosts^),
       	Learning Rate(^LR^)
       )),
       Go,
              invisible
       ) << Report;"
	);
	Eval( Parse( str ) );
	w = Window( dt << GetName || " - " || "Neural of sex" );//here, you'll want to change the characters after "age" to whatever your response is
	
	//just a simple way to test if the Y variable is Nominal or Ordinal by testing the name of the report window

	If( !Is Missing( Regex( w << Get Window Title, "age" ) ) == 1,
		ncol_box = 14
	);
	If( !Is Missing( Regex( w << Get Window Title, "sex" ) ) == 1,
		ncol_box = 6
	);

	
	
	T_stats = w[Outline Box( 3 ), Number Col Box( 1 )] << Get;//You might need to adjust the Outline Box () and Number Col Box() values depending on your tree structure
	V_stats = w[Outline Box( 3 ), Number Col Box( ncol_box )] << Get;//You might need to adjust the Outline Box () and Number Col Box() values depending on your tree structure
	report << Close Window;
	
	//training results
	dt_results:Generalized R² Training[i] = T_stats[1];
	dt_results:Entropy R² Training[i] = T_stats[2];
	dt_results:RMSE Training[i] = T_stats[3];
	dt_results:Mean Abs Dev Training[i] = T_stats[4];
	dt_results:Misclassification Rate Training[i] = T_stats[5];
	dt_results:Name( "-LogLiklihood Training" )[i] = T_stats[6];
	dt_results:Sum Freq Training[i] = T_stats[7];
	
	//validation results
	dt_results:Generalized R² Validation[i] = V_stats[1];
	dt_results:Entropy R² Validation[i] = V_stats[2];
	dt_results:RMSE Validation[i] = V_stats[3];
	dt_results:Mean Abs Dev Validation[i] = V_stats[4];
	dt_results:Misclassification Rate Validation[i] = V_stats[5];
	dt_results:Name( "-LogLiklihood Validation" )[i] = V_stats[6];
	dt_results:Sum Freq Validation[i] = V_stats[7];
	
	
);
dlgStatus << closewindow();

One thing that is strange that I don't understand is that I get this warning about the log window I create when I run it for the NN platform, but I've never gotten it from my other platforms. I can't follow where that warning comes from, but it still runs correctly. Maybe someone can fix that part of my code?

Unexpected ",". Perhaps there is a missing ")".
Trying to parse operand for "<<" operator.
Line 23 Column 10: ))►,

As a last note: be careful about the settings as it can take a VERY long time to run the modeling if you have huge numbers of boosts or tours or transfer functions. Just be cautious not to go overboard.

Hope this helps!,

DS

abmayfield · Dec 1, 2020 04:32 PM

Yes! This is exactly what I want: a table with all the NN input parameters (e.g., number of TanH nodes, number of layers, etc.) that syncs with the NN platform. I have not tried it yet, and I do imagine it will take a while, but I am using a small 20 sample x 90 predictor "practice" dataset, so it might be doable. In reality, I've really only found that the type of activation and the number of nodes (as well as number of tours) actually lower my RMSE, with boosting not helping very much. But even then, this will be much faster than doing all ~500 models I planned to run (not nearly as many as I had intended to run yesterday thankfully since I want to stay under 4 nodes/activation type). I will try it out tomorrow and let everyone know how it goes.

Anderson B. Mayfield

abmayfield · Dec 1, 2020 09:34 PM

Well, I failed, but I think it's probably because of some tiny scripting mistake. I used your NN tuning table with my data (attached) and made all the necessary updates (I used three training+validation+test samples). If you run the last script ("neural network comparison"), it will build the new table, but no analyses are run. I just included a few X terms, but it will actually be all 86 of the columns in the "standardized data" group. When I use the debugger, it says something is going wrong with the

"For( i = 1, i <= imax, i++," area. I am hoping/guessing a more competent script writer can probably find the issue in seconds!

Anderson B. Mayfield

SDF1 · Dec 2, 2020 5:38 AM

Hi @abmayfield ,

Thanks for sharing the data table, it helped to debug the changes you made in the script.

A few thing I noticed:

You needed to add additional columns for the "Test" data set statistics you want to record.
You accidentally removed the definition for imax, which is the number of rows of the parameter table
You also accidentally changed the call to the NN platform where in the script it says dt << Neural(), this is where it sends the command to perform the neural net modeling on the data table "dt". You had changed it to something else.
The definition of the window "w" needs to have the full name of the NN window, and if you're only running a single X, then apparently JMP puts that in as part of the window name, so it needs to be in both the GetName command and the IF statement below it. But that's only true when modeling with a single X. If using more than one, then you don't need to have the "by......" in the window name or IF statement.
The last part is to make sure that the stats for the Training, Validation, and Test data are correctly called in the Number Col Box() command.

The below code works on your data table (also attached with modified script). It works with the NN Tuning table I gave earlier.

If you are really going to use all 86 columns, you need to put them in manually in the X(col1, col2,...) part of the script. You should then also edit the GetName part because it won't include all 86 names, it'll just be the window name with the response column.

Names Default To Here( 1 );
dt = Current Data Table();
dt_parms = Data Table( "NN Tuning" );
dt_results = dt_parms << Subset( All rows, Selected Columns Only( 0 ) );
dt_results << New Column( "Generalized R² Training" );
dt_results << New Column( "Generalized R² Validation" );
dt_results << New Column( "Generalized R² Test" );
dt_results << New Column( "Entropy R² Training" );
dt_results << New Column( "Entropy R² Validation" );
dt_results << New Column( "Entropy R² Test" );
dt_results << New Column( "RMSE Training" );
dt_results << New Column( "RMSE Validation" );
dt_results << New Column( "RMSE Test" );
dt_results << New Column( "Mean Abs Dev Training" );
dt_results << New Column( "Mean Abs Dev Validation" );
dt_results << New Column( "Mean Abs Dev Test" );
dt_results << New Column( "Misclassification Rate Training" );
dt_results << New Column( "Misclassification Rate Validation" );
dt_results << New Column( "Misclassification Rate Test" );
dt_results << New Column( "-LogLiklihood Training" );
dt_results << New Column( "-LogLiklihood Validation" );
dt_results << New Column( "-LogLiklihood Test" );
dt_results << New Column( "Sum Freq Training" );
dt_results << New Column( "Sum Freq Validation" );
dt_results << New Column( "Sum Freq Test" );

imax = N Row( dt_parms );

For( i = 1, i <= imax, i++,
	Nlayer = dt_parms:N_Layers[i];
	If( dt_results:N_Layers[i] == 1,
		dt_results:N_TanH_2[i] = 0;
		dt_results:N_Linear_2[i] = 0;
		dt_results:N_Gauss_2[i] = 0;
	);
	If( dt_results:N_Layers[i] == 2,
		dt_results:N_Boosts[i] = 0;
		dt_results:Learn_Rate[i] = 0.1;
	);
	NTH1 = dt_results:N_TanH_1[i];
	NTH2 = dt_results:N_TanH_2[i];
	NL1 = dt_results:N_Linear_1[i];
	NL2 = dt_results:N_Linear_2[i];
	NG1 = dt_results:N_Gauss_1[i];
	NG2 = dt_results:N_Gauss_2[i];
	Nboosts = dt_results:N_Boosts[i];
	LR = dt_results:Learn_Rate[i];
	TCov = dt_results:T_Cov[i];
	RFit = dt_results:Robust_Fit[i];
	PMethod = dt_results:Penalty_Method[i];
	Ntours = dt_results:N_Tours[i];
	
	str = Eval Insert(
		"report = (dt << Neural(
        Y( :health designation ),
		 X(
		:OFA...22_c0_g2_i1.p1
        ), 
       Validation ( :Validation with test ),
       Informative Missing(0), 
       Transform Covariates(^TCov^),
       Fit(
       	NTanH(^NTH1^),
       	NLinear(^NL1^),
       	NGaussian(^NG1^),
       	NTanH2(^NTH2^),
       	NLinear2(^NL2^),
       	NGaussian2(^NG2^),
       	Transform Covariates(^TCov^),
       	Penalty Method(\!"PMethod\!"),
       	Number of Tours(^Ntours^),
       	N Boost(^Nboosts^),
       	Learning Rate(^LR^)
       )),
       Go,
              invisible
       ) << Report;"
	);
	Eval( Parse( str ) );
	w = Window( dt << GetName || " - " || "Neural of health designation by OFA...22_c0_g2_i1.p1" );
	If( !Is Missing( Regex( w << Get Window Title, "health designation by OFA...22_c0_g2_i1.p1" ) ) == 1,
		ncol_box_v = 10;
		ncol_box_test =19;
	);
	T_stats = w[Outline Box( 3 ), Number Col Box( 1 )] << Get;//You might need to adjust the Outline Box () and Number Col Box() values depending on your tree structure
	V_stats = w[Outline Box( 3 ), Number Col Box( ncol_box_v )] << Get;//You might need to adjust the Outline Box () and Number Col Box() values depending on your tree structure
	Test_stats = w[Outline Box( 3 ), Number Col Box( ncol_box_test )] << Get;
	report << Close Window;
	
	dt_results:Generalized R² Training[i] = T_stats[1];
	dt_results:Entropy R² Training[i] = T_stats[2];
	dt_results:RMSE Training[i] = T_stats[3];
	dt_results:Mean Abs Dev Training[i] = T_stats[4];
	dt_results:Misclassification Rate Training[i] = T_stats[5];
	dt_results:Name( "-LogLiklihood Training" )[i] = T_stats[6];
	dt_results:Sum Freq Training[i] = T_stats[7];
	
	dt_results:Generalized R² Validation[i] = V_stats[1];
	dt_results:Entropy R² Validation[i] = V_stats[2];
	dt_results:RMSE Validation[i] = V_stats[3];
	dt_results:Mean Abs Dev Validation[i] = V_stats[4];
	dt_results:Misclassification Rate Validation[i] = V_stats[5];
	dt_results:Name( "-LogLiklihood Validation" )[i] = V_stats[6];
	dt_results:Sum Freq Validation[i] = V_stats[7];
	
	dt_results:Generalized R² Test[i] = Test_stats[1];
	dt_results:Entropy R² Test[i] = Test_stats[2];
	dt_results:RMSE Test[i] = Test_stats[3];
	dt_results:Mean Abs Dev Test[i] = Test_stats[4];
	dt_results:Misclassification Rate Test[i] = Test_stats[5];
	dt_results:Name( "-LogLiklihood Test" )[i] = Test_stats[6];
	dt_results:Sum Freq Test[i] = Test_stats[7];
);

Hope this helps!,

DS

abmayfield · Dec 2, 2020 09:27 AM

Aha! Thanks so much for re-doing it for me. I clearly have no knowledge of scripting, so I just changed "Neural" to the name of my data table.....which obviously wouldn't work. Once I changed the name of my data table to "Neural," added the Test columns, and made the other changes you recommended, it worked.

Now what I might do is use DOE to make a conditionally factorial design (since, as you noted, you can't boost with multiple hidden layers) and run a few hundred models on my work computer (so as not to overly tax my 8-GB RAM laptop).

I feel like your script needs to become a GUI or add-in (any takers?). JMP Pro 16, which I am beta-testing, has a really great "model screening" tool, but I think it tests across modeling types (e.g., PLS vs. Gen-Reg), not WITHIN the potentially thousands of models that could be built within each modeling type (or else it may never stop running). I have heard of the "simulator," but I think that is doing something different (not generating tons of simulated models), though I could be wrong. Again, thanks so much for your help. This will surely save me DAYS of time!

Anderson B. Mayfield

SDF1 · Dec 2, 2020 10:18 AM

Hi @abmayfield ,

Glad that it fixed your problem.

With the tree-based modeling (XGBoost, boosted trees, and bootstrap forest), you can easily use the tuning table GUI within JMP to run many models. But, one downside is that it chews up a lot of RAM. The nice thing about running it this way is that you can run 10's of thousands of different parameter settings and it doesn't eat up much ram, just CPU consumption. I've run up to 40k different settings with bootstrap forest (which is slow) and it can take a few days on my work laptop (32 GB RAM and 2GHz CPU).

I am also an early adopter for JMP 16, and the model screening is nice, but as you point out, it's not tuning any of the models, which can be a bit misleading. It just uses the default settings, which aren't always the best settings to create a predictive model. It could at least help you to determine which platforms are worth the time effort to tune further and which aren't worth looking into.

You might want to also consider the SVM platform to model your nominal data as this is supposed to also be a good method for that kind of data.

As for the simulator, that is something different and is meant as a way to test the estimates with different initial conditions (randomization). This gives a distribution of values for the estimate and gives you a confidence interval for the original estimate and you can then decide on whether that model with the given estimates is good or not.

As for the DOE, you could do a factorial or even run as a space filling design with the factors set appropriately, and I think that platform also allows for conditional constraints. Also, you might consider doing a predictor screening or bootstrapping on the predictors to see if you actually need all 89 of them or if you can really only use a smaller subset.

As a last note, since you have a short and wide data table, you might want to consider instead of using a validation column, do a Leave one Out approach for the validation. If you do that, you'll have to edit the script for that.

Good luck!,

DS

building multiple neural network models

Re: building multiple neural network models

Re: building multiple neural network models

Re: building multiple neural network models

Re: building multiple neural network models

Re: building multiple neural network models

Re: building multiple neural network models

Re: building multiple neural network models

Re: building multiple neural network models

Re: building multiple neural network models

Re: building multiple neural network models