BookmarkSubscribeRSS Feed
Choose Language Hide Translation Bar
MAS
MAS
Community Trekker

Generalized Regression--retrieve tuning parameter script

Using the Generalized Regression platform shrinkage methods (lasso, elastic net, etc.) with lognormal distribution, I need to create a scatterplot of the tuning parameter against various fit statistics (negative log-likelihood, Generalized R2). I hoped to do so by pre-setting the values of the tuning parameter and also pre-setting the number of grid points to 1 (the number of points along the grid that JMP uses for the tuning parameter).

 

The script below is not recognizing the different values of the tuning parameter (^PEN^) and reports identical values for each iteration of the model estimation loop. My best guess is that it is not recognizing the needed values for these two lines in the script:

 

Lambda Penalty( ^PEN^ ),
	Number of Grid Points( 1 ),

Any wisdom would be greatly appreciated!

 

 

 

//Script to retrieve parametrics negative logl and R2GEN
//Both data tables below must be open for script to run
dt = Data Table("JMP Test Data.jmp");
dt_pens = Data Table("Penalties1.jmp");

//Get copy of penalty table to store results
dt_results = dt_pens << Subset( All rows, Selected columns only( 0 ) );
dt_results << New Column("RsqGEN Training");
dt_results << New Column("RsqGEN Validation");
dt_results << New Column("RsqGEN Test");
dt_results << New Column("neglogl Training");
dt_results << New Column("neglogl Validation");
dt_results << New Column("neglogl Test");
dt_results << New Column("N Training");
dt_results << New Column("N Validation");
dt_results << New Column("N Test");
dt_results << New Column("Tuning Param");


for(i = 1, i<= N Row(dt_pens), i++,
	PEN = dt_pens:Penalty[i];
	str = Eval Insert("\[report = (dt << Fit Model(
		Y( :Y1 ),
		X( :X1, :X2, :X3, :X4),
	Validation( :Name( "Train/Valid/Test" ) ),
	Lambda Penalty( ^PEN^ ),
	Number of Grid Points( 1 ),
	Personality( "Generalized Regression" ),
	Generalized Distribution( "LogNormal" ),
	Run(
		Fit(
			Estimation Method( Lasso( Adaptive ) ),
			Validation Method( Validation Column )
		)
	),
		Go,
		invisible
	)) << Report;]\");
	
//Fit the model 
	Eval(Parse(str));
	
//Get the fit stats and insert them into the results table
	Train = report["Model Summary"][NumberColBox(1)] << Get;
	Valid = report["Model Summary"][NumberColBox(2)] << Get;
	Test = report["Model Summary"][NumberColBox(3)] << Get;
	report << Close Window;
	dt_results:RsqGEN Training[i] = Train[7];
	dt_results:RsqGEN Validation[i] = Valid[7];
	dt_results:RsqGEN Test[i] = Test[7];
	dt_results:neglogl Training[i] = Train[3];
	dt_results:neglogl Validation[i] = Valid[3];
	dt_results:neglogl Test[i] = Test[3];
	dt_results:N Training[i] = Train[1];
	dt_results:N Validation[i] = Valid[1];
	dt_results:N Test[i] = Test[1];
//	dt_results:Tuning Param[i] = Train[8];

);

 

 

 

0 Kudos
7 REPLIES 7
MAS
MAS
Community Trekker

Re: Generalized Regression--retrieve tuning parameter script

The second data table, Penalties1.jmp, is attached.

0 Kudos

Re: Generalized Regression--retrieve tuning parameter script

The platform currently is not set up to allow you to specify penalty values yourself. Instead JMP is trying a grid of penalty values and then presenting the penalty value that gives you the best fit. You have some control over how the grid is defined using the Advanced Controls panel in the model launch. 

 

gridCap.PNG

 

Once you have a fit, there are some JSL functions for getting the grid of penalty values and r-squares from the platform. You can do something like the example below

 

dt = Open( "$SAMPLE_DATA/Diabetes.jmp" );
fm = dt << Fit Model(
	Y( :Y ),
	Effects(
		:Age,
		:Gender,
		:BMI,
		:BP
	),
	Personality( "Generalized Regression" ),
	Generalized Distribution( "Normal" ),
	Validation( :Validation ),
	Run(
		Fit(
			Estimation Method( Lasso( Adaptive ) ),
			Validation Method( Validation Column )
		)
	)
);

penaltyVector = fm << (fit[1] << get penalty grid);
trainingR2 = fm <<(fit[1] << get training rsquare path);
validationR2 = fm << (fit[1] << get validation rsquare path);
//testR2 = fm << (fit[1] << get test rsquare path); // this example doesn't have a test set

 

 

 

MAS
MAS
Community Trekker

Re: Generalized Regression--retrieve tuning parameter script

Thanks, Clay. This was very helpful. Some follow-up questions, if I might...

1.  How do I retrieve the vectors containing the penalty grid values and Rsq values? I tried many variations on the script you supplied and came up empty-handed.

2.  How might I discover the syntax for things like "penalty grid" in your script? I ask because I would also like to retrieve neg log-likehoods...

Much appreciated.

0 Kudos

Re: Generalized Regression--retrieve tuning parameter script

1. Would you be able to post part of your script to help troubleshoot? If not, I added some comments to the example below.

 

2. Usually the best place to find syntax for this kind of thing is in the Object Scripting Index (Help > Scripting Index). In this case though, the functions don't appear in the index because they were primarily added for testing the platform internally. We'll get them surfaced in the next version though so that they're easier to find.

 

I've updated my earlier example to use the likelihoods instead of R^2 values.

 

// dt is a reference to the data table
dt = Open( "$SAMPLE_DATA/Diabetes.jmp" );
// fm is a reference to this instance of Fit Model
fm = dt << Fit Model(
	Y( :Y ),
	Effects(
		:Age,
		:Gender,
		:BMI,
		:BP
	),
	Personality( "Generalized Regression" ),
	Generalized Distribution( "Normal" ),
	Validation( :Validation ),
	Run(
		Fit(
			Estimation Method( Lasso( Adaptive ) ),
			Validation Method( Validation Column )
		)
	)
);

// take the first model in the platform and store the penalty values in a vector
penaltyVector = fm << (fit[1] << get penalty grid);
// so on for training and validation likelihoods
trainingnll = fm <<(fit[1] << Get Training Negative LogLikelihood);
validationnll = fm << (fit[1] << Get Validation Negative LogLikelihood);
as table(penaltyVector || trainingnll || validationnll);
MAS
MAS
Community Trekker

Re: Generalized Regression--retrieve tuning parameter script

Thanks, Clay. This is very helpful.

 

Glad to know that I hadn't lost my mind when I couldn't find the syntax for the penalty grid and others. Might I know the syntax for these items as well?

With thanks and appreciation...

1. BIC

2. AICc

3. The "Parameter Estimates" measure--the vertical axis on the left-most axis of the two graphs in the Solution Path output.

4. The "Scaled -Loglikehood" measure--the vertical axis on the right-most graph in the Solution Path output.

5.  The "Magnitude of Scaled Parameter Estimates"--the horizontal axis on both of the Solution Path graphs.

0 Kudos
Highlighted

Re: Generalized Regression--retrieve tuning parameter script

No problem!

 

1 and 2 - see the example below for getting the AICc and BIC values.

 

3. The parameter estimates in the solution path are the parameter estimates you get from the centered and scaled predictors. You can get them using the <<Get Solution Path command. 

 

4. The Scaled -LogLikelihood values are just the negative loglikelihoods divided by the sum of frequencies. So if you have 100 observations in your validation set, for example, you'd use  << Get Validation Negative LogLikelihood  to get a vector of the likelihoods and then divide that vector by 100. I just scale them by the sample size so that we can plot the training and validation sets on the same scale.

 

5. The horizontal axis is the L1 norm of the centered and scaled parameter estimates (excluding the intercept). So you'd want to call <<Get Solution Path and then create a formula column that adds up the absolute values of the parameter estimates at each grid point.

 

Here's an example that puts together the syntax you were looking for.

 

dt = Open("$SAMPLE_DATA/Diabetes.jmp");
fm = Fit Model(
	Y( :Y ),
	Effects( :Age, :Gender, :BMI ),
	Personality( "Generalized Regression" ),
	Generalized Distribution( "Normal" ),
	Run( Fit( Estimation Method( Lasso ), Validation Method( AICc ) ) )
);

penaltyVector = fm << (fit[1] << get penalty grid);
bics = fm << (fit[1] << get BIC path);
aiccs = fm << (fit[1] << get aicc path);
solutionPath = fm << (fit[1] << get solution path);
solutionPath = solutionPath`;		// transpose for convenience
summaryTable = as table( penaltyVector || aiccs || bics || solutionPath);
column(1) << set name("Lambda");
column(2) << set name("aicc");
column(3) << set name("bic");
column(4) << set name("Intercept");
column(5) << set name("Age parm");
column(6) << set name("Gender parm");
column(7) << set name("BMI parm");
column(8) << set name("sigma parm");
summaryTable << new column("L1 norm", Formula(abs(:Age Parm) + abs(:Gender Parm) + abs(:BMI parm)));

// run this to recreate the solution path
Graph Builder(
	Show Control Panel( 0 ),
	Variables(
		X( :L1 norm ),
		Y( :Age parm ),
		Y( :Gender parm, Position( 1 ) ),
		Y( :BMI parm, Position( 1 ) )
	),
	Elements( Points( X, Y( 1 ), Y( 2 ), Y( 3 ), Legend( 3 ) ) )
);

 

0 Kudos
gene
Community Trekker

Re: Generalized Regression--retrieve tuning parameter script

In the documantation it says we can't get the tuning parameter if we use K-fold CV.  I am able to get that with R glmnet and with Matlab's lasso.  I'm doing lasso with logistic regression.  Is there a way I might retrieve it since clearly JMP must use some method to select lambda.

0 Kudos