cancel
Showing results for
Show  only  | Search instead for
Did you mean:
Choose Language Hide Translation Bar
Level IV

## What does an "estimate" represent for Boosted Trees ?

Hello

Questions for Boosted Trees under Predictive Modeling platform:

1- What does an "estimate" represent at the split end mathematically ? Is it an expected change in response per unit change in the parameter (ie.. sensitivity ) ?

2- Assuming e.g. 200 layers of boosted trees model , is the final predicted model equal to the 200th i.e. last layers' model or an average of all 200 models ? I am not too sure which one as a boosted tree learns from the residuals from the prior one .. so I am inclined to think that perhaps it is the last model but ensemble methods are aggerated methods (counter argument).

3- Is there a practical method of estimating confidence intervals in a boosted tree model ? Perhaps, getting each layers' result to then create a distribution ?

Appreciate the support. thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
Super User

## Re: What does an "estimate" represent for Boosted Trees ?

Hello @altug_bayram,

1- As each individual tree of the boosted trees model is constructed on the residual (errors) of the previous tree, I would say an estimate is a number to add to the mean of your results (what you try to model/predict) depending on the category/split you are. Tree-based methods give "step profilers" (see screenshot), where you can clearly see how changing the value of one factor may increase suddenly the response depending on where the split is.

2- In a bagging tree-based method (like Random Forest), the final result would be a mean/average result of the individual trees. In a boosting tree algorithm, you would have to sum up all the "estimates" of the different trees (and add the mean result to it) to have the predicted results (depending on the "path" of the split chosen) : "The tree is fit based on the residuals of the previous layers, which allows each layer to correct the fit for bad fitting data from the previous layers. The final prediction for an observation is the sum of the predicted residuals for that observation over all the layers" (source : Boosted Tree - JMP 13 Predictive and Specialized Modeling [Book] (oreilly.com)).

3- Not sure about this one. Either using the residuals of the model, or I would try boostrapping RASE results of the model, in order to build 95% bootstrap intervals for RASE to have a better assessment of the model's (Root Average Squared) Prediction Error, or using the K-folds cross-validation of the "Model screening" platform to have a better estimate of model's RASE.

Victor GUILLER
Scientific Expertise Engineer
L'Oréal - Data & Analytics
2 REPLIES 2
Super User

## Re: What does an "estimate" represent for Boosted Trees ?

Hello @altug_bayram,

1- As each individual tree of the boosted trees model is constructed on the residual (errors) of the previous tree, I would say an estimate is a number to add to the mean of your results (what you try to model/predict) depending on the category/split you are. Tree-based methods give "step profilers" (see screenshot), where you can clearly see how changing the value of one factor may increase suddenly the response depending on where the split is.

2- In a bagging tree-based method (like Random Forest), the final result would be a mean/average result of the individual trees. In a boosting tree algorithm, you would have to sum up all the "estimates" of the different trees (and add the mean result to it) to have the predicted results (depending on the "path" of the split chosen) : "The tree is fit based on the residuals of the previous layers, which allows each layer to correct the fit for bad fitting data from the previous layers. The final prediction for an observation is the sum of the predicted residuals for that observation over all the layers" (source : Boosted Tree - JMP 13 Predictive and Specialized Modeling [Book] (oreilly.com)).

3- Not sure about this one. Either using the residuals of the model, or I would try boostrapping RASE results of the model, in order to build 95% bootstrap intervals for RASE to have a better assessment of the model's (Root Average Squared) Prediction Error, or using the K-folds cross-validation of the "Model screening" platform to have a better estimate of model's RASE.