cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
  • We’re improving the Learn JMP page, and want your feedback! Take the survey
  • JMP monthly Newswire gives user tips and learning events. Subscribe
Choose Language Hide Translation Bar
andrewdy04
Level I

Advice on Boosted Tree Model and How to Export to Excel

Hello,

 

I am working on a model for my work where the goal is they would be able to input parameters and then have a predicted final moisture at the end. I have a large amount of data to work with (about 4000 data points) so I was looking at using the boosted tree model. I have never used this function before and it auto filled to have 200 layers, 12 splits per tree, and learning rate of 0.121. I wasn't sure if these were good parameters or if it would be too specific for just this data set. It gave me an Rsquared of 0.75 and RASE of 0.45 which is the best I have gotten from an model so far.

I was also tried exporting it to excel but I couldn't get the function to be there. I know the boosted tree functions are really complicated so it might just not work. Ideally I would like to be able to have the boosted tree function in excel since not everyone at my work uses JMP.

 

Any advice or help is appreciated. Thanks!

2 REPLIES 2
lala
Level VIII

回复: Advice on Boosted Tree Model and How to Export to Excel

Thank you. I also want to know

 

The prediction formula for the decision tree (Partition) is very simple.

However, the prediction formulas for lift trees and random forests are completely different and it is very difficult to break them down into independent formulas.

 

Thanks Experts!

Victor_G
Super User

Re: Advice on Boosted Tree Model and How to Export to Excel

Hi @andrewdy04,

 

Welcome in the Community !

 

Tree-based models are simple in their mechanisms, as they rely on if-else statement using thresholds values on the factors.

If you want to use your tree-based model in Excel, you can  save the prediction formula in JMP, and replace the if-else JSL functions by the corresponding Excel functions. However, Boosted Tree and Random Forests can have quite complex (and long) prediction formula :

Victor_G_0-1749542384302.png

 

On the topic of modeling strategy, I'm afraid I won't have enough information to guide you. Here are some (non exhaustive !) questions to help you :

  • How the data has been collected ? Through experimental data strategies like DoE ? Or observational data/production data ? Quantity of data is not sufficient to have a good model, you should prioritize your efforts on collecting high-quality information data, to make sure you have enough variability for you models.
  • What is your objective ? Predictive modeling, explainative, both ? How much model interpretability/explainability is important for you (understand the key factors/drivers of the prediction model) ?
  • Do you have any constraints or specification that could guide the model choice ? For example, do you expect non-linearity ? Curvature ? Would you like smooth prediction values over your experimental space or are "step-based" predictions (from tree-based models) acceptable ? You can check my post here to know more about this : model-comparison-and-selection 
  • What is your performance metric(s) and the acceptability threshold(s) ? What is the performance metric(s) you'll be evaluating, comparing and selecting your model(s) on ? Based on the measurement capacity (repeatability, reproducibility, precision, ...), what is the threshold value for each performance metric where you can assess the model performance is "good enough" ?
  • What is your validation strategy ? Since you seem to be in a predictive modeling objective, what is your validation strategy in order to prevent overfitting : k-folds cross-validation, validation column, other ... ? Boosted Tree may be more prompt to overfitting than other tree-based methods (like bootstrap forest), so it's always best to have a validation strategy fixed and set before trying to optimize the performances (whether with Boosted Tree model or with others as well). Some posts are discussing this :
    Solved: cross validation using k-fold fit quality - JMP User Community
    Solved: Bootstrap Forest Platform > "validation" column vs "validation" portion - JMP User Community
    Solved: Re: CROSS VALIDATION - VALIDATION COLUMN METHOD - JMP User Community
  • There is also the topic of hyperparameter tuning if you're using Machine Learning models, as some algorithms may be more sensitive to hyperparameters tuning than other. Typically, Bootstrap/Random Forests are a lot less sensitive to hyperparameters tuning than Boosted models like Boosted Tree.

Hope this first answer may help you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

Recommended Articles