Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- JMP User Community
- :
- Discussions
- :
- Re: Generalized Regression--retrieve tuning parameter script

News

On June 1, we’re asking you to select a content label when starting a new topic in the Discussions area. Read more to find out why.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Generalized Regression--retrieve tuning parameter script

Aug 3, 2018 9:43 AM
(5587 views)

Using the Generalized Regression platform shrinkage methods (lasso, elastic net, etc.) with lognormal distribution, I need to create a scatterplot of the tuning parameter against various fit statistics (negative log-likelihood, Generalized R2). I hoped to do so by pre-setting the values of the tuning parameter and also pre-setting the number of grid points to 1 (the number of points along the grid that JMP uses for the tuning parameter).

The script below is not recognizing the different values of the tuning parameter (^PEN^) and reports identical values for each iteration of the model estimation loop. My best guess is that it is not recognizing the needed values for these two lines in the script:

```
Lambda Penalty( ^PEN^ ),
Number of Grid Points( 1 ),
```

Any wisdom would be greatly appreciated!

```
//Script to retrieve parametrics negative logl and R2GEN
//Both data tables below must be open for script to run
dt = Data Table("JMP Test Data.jmp");
dt_pens = Data Table("Penalties1.jmp");
//Get copy of penalty table to store results
dt_results = dt_pens << Subset( All rows, Selected columns only( 0 ) );
dt_results << New Column("RsqGEN Training");
dt_results << New Column("RsqGEN Validation");
dt_results << New Column("RsqGEN Test");
dt_results << New Column("neglogl Training");
dt_results << New Column("neglogl Validation");
dt_results << New Column("neglogl Test");
dt_results << New Column("N Training");
dt_results << New Column("N Validation");
dt_results << New Column("N Test");
dt_results << New Column("Tuning Param");
for(i = 1, i<= N Row(dt_pens), i++,
PEN = dt_pens:Penalty[i];
str = Eval Insert("\[report = (dt << Fit Model(
Y( :Y1 ),
X( :X1, :X2, :X3, :X4),
Validation( :Name( "Train/Valid/Test" ) ),
Lambda Penalty( ^PEN^ ),
Number of Grid Points( 1 ),
Personality( "Generalized Regression" ),
Generalized Distribution( "LogNormal" ),
Run(
Fit(
Estimation Method( Lasso( Adaptive ) ),
Validation Method( Validation Column )
)
),
Go,
invisible
)) << Report;]\");
//Fit the model
Eval(Parse(str));
//Get the fit stats and insert them into the results table
Train = report["Model Summary"][NumberColBox(1)] << Get;
Valid = report["Model Summary"][NumberColBox(2)] << Get;
Test = report["Model Summary"][NumberColBox(3)] << Get;
report << Close Window;
dt_results:RsqGEN Training[i] = Train[7];
dt_results:RsqGEN Validation[i] = Valid[7];
dt_results:RsqGEN Test[i] = Test[7];
dt_results:neglogl Training[i] = Train[3];
dt_results:neglogl Validation[i] = Valid[3];
dt_results:neglogl Test[i] = Test[3];
dt_results:N Training[i] = Train[1];
dt_results:N Validation[i] = Valid[1];
dt_results:N Test[i] = Test[1];
// dt_results:Tuning Param[i] = Train[8];
);
```

14 REPLIES 14

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Generalized Regression--retrieve tuning parameter script

The second data table, Penalties1.jmp, is attached.

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Generalized Regression--retrieve tuning parameter script

The platform currently is not set up to allow you to specify penalty values yourself. Instead JMP is trying a grid of penalty values and then presenting the penalty value that gives you the best fit. You have some control over how the grid is defined using the Advanced Controls panel in the model launch.

Once you have a fit, there are some JSL functions for getting the grid of penalty values and r-squares from the platform. You can do something like the example below

```
dt = Open( "$SAMPLE_DATA/Diabetes.jmp" );
fm = dt << Fit Model(
Y( :Y ),
Effects(
:Age,
:Gender,
:BMI,
:BP
),
Personality( "Generalized Regression" ),
Generalized Distribution( "Normal" ),
Validation( :Validation ),
Run(
Fit(
Estimation Method( Lasso( Adaptive ) ),
Validation Method( Validation Column )
)
)
);
penaltyVector = fm << (fit[1] << get penalty grid);
trainingR2 = fm <<(fit[1] << get training rsquare path);
validationR2 = fm << (fit[1] << get validation rsquare path);
//testR2 = fm << (fit[1] << get test rsquare path); // this example doesn't have a test set
```

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Generalized Regression--retrieve tuning parameter script

Thanks, Clay. This was very helpful. Some follow-up questions, if I might...

1. How do I retrieve the vectors containing the penalty grid values and Rsq values? I tried many variations on the script you supplied and came up empty-handed.

2. How might I discover the syntax for things like "penalty grid" in your script? I ask because I would also like to retrieve neg log-likehoods...

Much appreciated.

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Generalized Regression--retrieve tuning parameter script

1. Would you be able to post part of your script to help troubleshoot? If not, I added some comments to the example below.

2. Usually the best place to find syntax for this kind of thing is in the Object Scripting Index (Help > Scripting Index). In this case though, the functions don't appear in the index because they were primarily added for testing the platform internally. We'll get them surfaced in the next version though so that they're easier to find.

I've updated my earlier example to use the likelihoods instead of R^2 values.

```
// dt is a reference to the data table
dt = Open( "$SAMPLE_DATA/Diabetes.jmp" );
// fm is a reference to this instance of Fit Model
fm = dt << Fit Model(
Y( :Y ),
Effects(
:Age,
:Gender,
:BMI,
:BP
),
Personality( "Generalized Regression" ),
Generalized Distribution( "Normal" ),
Validation( :Validation ),
Run(
Fit(
Estimation Method( Lasso( Adaptive ) ),
Validation Method( Validation Column )
)
)
);
// take the first model in the platform and store the penalty values in a vector
penaltyVector = fm << (fit[1] << get penalty grid);
// so on for training and validation likelihoods
trainingnll = fm <<(fit[1] << Get Training Negative LogLikelihood);
validationnll = fm << (fit[1] << Get Validation Negative LogLikelihood);
as table(penaltyVector || trainingnll || validationnll);
```

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Generalized Regression--retrieve tuning parameter script

Thanks, Clay. This is very helpful.

Glad to know that I hadn't lost my mind when I couldn't find the syntax for the penalty grid and others. Might I know the syntax for these items as well?

With thanks and appreciation...

1. BIC

2. AICc

3. The "Parameter Estimates" measure--the vertical axis on the left-most axis of the two graphs in the Solution Path output.

4. The "Scaled -Loglikehood" measure--the vertical axis on the right-most graph in the Solution Path output.

5. The "Magnitude of Scaled Parameter Estimates"--the horizontal axis on both of the Solution Path graphs.

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Generalized Regression--retrieve tuning parameter script

No problem!

1 and 2 - see the example below for getting the AICc and BIC values.

3. The parameter estimates in the solution path are the parameter estimates you get from the centered and scaled predictors. You can get them using the <<Get Solution Path command.

4. The Scaled -LogLikelihood values are just the negative loglikelihoods divided by the sum of frequencies. So if you have 100 observations in your validation set, for example, you'd use << Get Validation Negative LogLikelihood to get a vector of the likelihoods and then divide that vector by 100. I just scale them by the sample size so that we can plot the training and validation sets on the same scale.

5. The horizontal axis is the L1 norm of the centered and scaled parameter estimates (excluding the intercept). So you'd want to call <<Get Solution Path and then create a formula column that adds up the absolute values of the parameter estimates at each grid point.

Here's an example that puts together the syntax you were looking for.

```
dt = Open("$SAMPLE_DATA/Diabetes.jmp");
fm = Fit Model(
Y( :Y ),
Effects( :Age, :Gender, :BMI ),
Personality( "Generalized Regression" ),
Generalized Distribution( "Normal" ),
Run( Fit( Estimation Method( Lasso ), Validation Method( AICc ) ) )
);
penaltyVector = fm << (fit[1] << get penalty grid);
bics = fm << (fit[1] << get BIC path);
aiccs = fm << (fit[1] << get aicc path);
solutionPath = fm << (fit[1] << get solution path);
solutionPath = solutionPath`; // transpose for convenience
summaryTable = as table( penaltyVector || aiccs || bics || solutionPath);
column(1) << set name("Lambda");
column(2) << set name("aicc");
column(3) << set name("bic");
column(4) << set name("Intercept");
column(5) << set name("Age parm");
column(6) << set name("Gender parm");
column(7) << set name("BMI parm");
column(8) << set name("sigma parm");
summaryTable << new column("L1 norm", Formula(abs(:Age Parm) + abs(:Gender Parm) + abs(:BMI parm)));
// run this to recreate the solution path
Graph Builder(
Show Control Panel( 0 ),
Variables(
X( :L1 norm ),
Y( :Age parm ),
Y( :Gender parm, Position( 1 ) ),
Y( :BMI parm, Position( 1 ) )
),
Elements( Points( X, Y( 1 ), Y( 2 ), Y( 3 ), Legend( 3 ) ) )
);
```

Highlighted
##

In the documantation it says we can't get the tuning parameter if we use K-fold CV. I am able to get that with R glmnet and with Matlab's lasso. I'm doing lasso with logistic regression. Is there a way I might retrieve it since clearly JMP must use some method to select lambda.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Generalized Regression--retrieve tuning parameter script

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Generalized Regression--retrieve tuning parameter script

Sorry to revisit this, Clay. It's been a while... You kindly provided me with a script that allows me to vary the value of the penalty/tuning parameter for lasso models and retrieve various information about the model for each value of the penalty. I now need to do something more: For each value of the penalty, save the prediction value to a table, then concetenate those columns across all values of the penalty. I have been unsuccessful in doing so. For this particular situation, the penalty will be varied 150 different times and, thus, 150 lasso models will be estimated.

Any help would be enormously appreciated.

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Generalized Regression--retrieve tuning parameter script

Hey,

Here is one way that you could do that. I've written a small example below using the diabetes data set from the sample data folder. The key is to send the Set Solution ID message to the fit. Since you have 150 points in your parameter grid, you'd send it 0,1,2,...,149 to set the parameter vector for each solution in the path. I regret making the first solution indexed to 0 instead of 1, but I think we're stuck with that syntax for now.

Let me know if this doesn't make sense or if it doesn't do what you need it to do.

Thanks,

Clay

```
dt = Open( "$SAMPLE_DATA/Diabetes.jmp" );
fm = Fit Model(
Y( :Y ),
Effects( :BMI, :BP, :Total Cholesterol, :LDL, :HDL ),
Personality( "Generalized Regression" ),
Generalized Distribution( "Normal" ),
Run( Fit( Estimation Method( Lasso ), Validation Method( AICc ) ) )
);
// loop through each solution and save the predictions
for(i=0, i<150, i++, // the first solution is solution 0
fm << (Fit[1] << set solution ID(i));
fm << (Fit[1] << save prediction formula);
// if you want to give the column a different name, send a message here.
);
```