News
On June 1, we’re asking you to select a content label when starting a new topic in the Discussions area. Read more to find out why.
Choose Language Hide Translation Bar
Highlighted
Lu
Lu
Level I

Tuning Design table in Predictive Modeling

Hello,

 

I want to use the "tuning design table" function in predictive modeling (boosted tree and bootstrap forest).  This tuning design table seems not to be generated by the program itself. Where can find a tuning table to be used when I activate the Tuning design?

Looking forward for response

Rgd,

 

Lu

 

0 Kudos
1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Tuning Design table in Predictive Modeling

Instead of focusing on the design or the design type to use, focus on what you REALLY want to do.

 

A tuning table allows you to try fitting the Bootstrap Forest under many different conditions to determine which settings seem to be the best. Which of those parameters are you wanting to change? According to your attached table you were looking to change six items: Number of trees in the forest, Number of terms sampled per split, Bootstrap sample rate, minimum splits per tree, maximum splits per tree, and minimum size split. 

 

Given that you want to change all of these items, what are the ranges for each of these items? For example, number of trees: the default number of trees is 100. Perhaps you want to entertain anything from 100 trees to 200 trees. Now, would you want to try EVERY value between 100 to 200 (100, 101, 102, 103, ..., 200), or perhaps every 25 (100, 125, 150, 175, 200)? Ideally, trying every value is best because there is no smooth relationshp between these parameters and the model fitting results. You could have a GREAT model at 119 trees and awful models at both 118 and 120. Unfortunately, trying every value is very time consuming and is still dependent on the range for the parameters that you picked. 

 

Once you have put all of the thought into what you are actually trying to do with each of the parameters you can create your table. If your table will only have a few rows (a few different conditions to try), you could create it by hand. If it will have several conditions (more than you want to type by hand), then you can try a designed experiment. The design type that you pick will depend on what you plan on doing with those results. This is where the knowledge of experimental design comes into play. Regardles of the design type, you enter the names of the factors as the names indicated from JMP help (see my first post). That is how JMP knows which columns go with which parameters -- the column names need to match (again, refer to the JMP help). Also, the ranges for the factors need to match valid values for each of the tuning parameters. So in the Number of Trees example that I gave earlier, I would have a factor named Number of Trees with a range from 100 to 200. The values you currently have from -1 to 1 will not work because you need to have positive values for all of these tuning parameters. There is no such thing as a Bootstrap Forest with -1 trees.

 

I hope this helps.

Dan Obermiller

View solution in original post

0 Kudos
6 REPLIES 6
Highlighted

Re: Tuning Design table in Predictive Modeling

You need to create the tuning design table. JMP cannot do this for you because there is no way for JMP to know the ranges for the various parameters for fitting the tree (how many layers should you consider? How many splits per tree? What range of the learning rate do you wish to look over?)

 

You could certainly just create the table and pick the options that you wish to consider. However, you could use a designed experiment to create the possible parameter settings over the range you specified.

The designed experiment approach will guarantee that you cover the ranges of your model parameters in an efficient fashion. The only caution is that designed experiments often rely on interpolation to find the best settings. The model parameters for Bootstrap Forest and Boosted Tree do not always follow the "smooth functions" that designed experiments rely on for the modeling.

 

From the JMP help:

Use Tuning Table Design
Opens a window where you can select a data table containing values for the Forest panel tuning parameters, called a tuning design table. A tuning design table has a column for each option that you want to specify and has one or multiple rows that each represent a single Bootstrap Forest model design. If an option is not specified in the tuning design table, the default value is used.
For each row in the table, JMP creates a Bootstrap Forest model using the tuning parameters specified. If more than one model is specified in the tuning design table, the Model Validation-Set Summaries report lists the RSquare value for each model. The Bootstrap Forest report shows the fit statistics for the model with the largest RSquare value.
You can create a tuning design table using the Design of Experiments facilities. A bootstrap forest tuning design table can contain the following case-insensitive columns in any order:
Number Trees
Number Terms
Portion Bootstrap
Minimum Splits per Tree
Maximum Splits per Tree
Minimum Size Split
Dan Obermiller
Highlighted
Lu
Lu
Level I

Re: Tuning Design table in Predictive Modeling

Dear Dan,

 

Thx for your help. I wander in DOE which table design I have to use to create a table for this purpose. Any suggestion what works best?

I generated a table in DOE using surface Design. Changing the parameters causes errors when performing the "using Design Table" function in Boosted Forest analysis (see table attached). When I keep the column code as -1 to 1, there is no problem. Or there any limitations in the parameter settings? How does JMP know which column is used for the paramaters (e.g. number of trees  and number of terms per split). Is it by column number or column name?

Hope you can help me further

0 Kudos
Highlighted

Re: Tuning Design table in Predictive Modeling

Instead of focusing on the design or the design type to use, focus on what you REALLY want to do.

 

A tuning table allows you to try fitting the Bootstrap Forest under many different conditions to determine which settings seem to be the best. Which of those parameters are you wanting to change? According to your attached table you were looking to change six items: Number of trees in the forest, Number of terms sampled per split, Bootstrap sample rate, minimum splits per tree, maximum splits per tree, and minimum size split. 

 

Given that you want to change all of these items, what are the ranges for each of these items? For example, number of trees: the default number of trees is 100. Perhaps you want to entertain anything from 100 trees to 200 trees. Now, would you want to try EVERY value between 100 to 200 (100, 101, 102, 103, ..., 200), or perhaps every 25 (100, 125, 150, 175, 200)? Ideally, trying every value is best because there is no smooth relationshp between these parameters and the model fitting results. You could have a GREAT model at 119 trees and awful models at both 118 and 120. Unfortunately, trying every value is very time consuming and is still dependent on the range for the parameters that you picked. 

 

Once you have put all of the thought into what you are actually trying to do with each of the parameters you can create your table. If your table will only have a few rows (a few different conditions to try), you could create it by hand. If it will have several conditions (more than you want to type by hand), then you can try a designed experiment. The design type that you pick will depend on what you plan on doing with those results. This is where the knowledge of experimental design comes into play. Regardles of the design type, you enter the names of the factors as the names indicated from JMP help (see my first post). That is how JMP knows which columns go with which parameters -- the column names need to match (again, refer to the JMP help). Also, the ranges for the factors need to match valid values for each of the tuning parameters. So in the Number of Trees example that I gave earlier, I would have a factor named Number of Trees with a range from 100 to 200. The values you currently have from -1 to 1 will not work because you need to have positive values for all of these tuning parameters. There is no such thing as a Bootstrap Forest with -1 trees.

 

I hope this helps.

Dan Obermiller

View solution in original post

0 Kudos
Highlighted
Lu
Lu
Level I

Re: Tuning Design table in Predictive Modeling

Thx Dan,

I check it out this way.

Are you aware of anly guidelines for the parameter  limit setting for the bootstrap forest analysis. Or is it trial and error?

Regards,

 

Ludo

0 Kudos
Highlighted

Re: Tuning Design table in Predictive Modeling

I'm not aware of any maximum values. Of course the higher you go the longer it will take, so people usually try to balance the complexity with the practical aspect of how long it will take. Make sure that the early stopping is checked so that if there is no improvement, that particular trial will be stopped early rather than running to completion.

Dan Obermiller
0 Kudos
Highlighted
Lu
Lu
Level I

Re: Tuning Design table in Predictive Modeling

I like to include studies which analyzed the effect of paramater tuning on the perfomance and variable importance measures in random Forests modelling.

 

Regards,

 

Ludo

0 Kudos