cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Discussions

Solve problems, and share tips and tricks with other JMP users.
Choose Language Hide Translation Bar
lala
Level IX

JMP Pro"Bootstrap Forest " VS python "lightgbm"?

from lightgbm import LGBMRegressor

Thanks Experts!

6 REPLIES 6
Victor_G
Super User

Re: JMP Pro"Bootstrap Forest " VS python "lightgbm"?

Hi @lala,

 

What is the question exactly ? These two algorithms are tree-based models, but Random Forest (also called Bootstrap Forest in JMP Pro) is a "bagging" (bootstrap aggregating) model type and LightGBM a boosting model type.

That means Random Forest consists of several trees that are trained in parallel on various datasets. These datasets are similar but different, and created using the bootstrap method (sampling with replacement of the original dataset to create the bootstrap datasets). This operation enable to have similar but different datasets, which help the tree to learn slight different patterns. Once these trees are trained, the final prediction result is the average of the individual tree predictions (for regression) or the most likely class from the trees (for classification). 

Victor_G_3-1763539488966.png

Image from https://www.geeksforgeeks.org/machine-learning/bagging-vs-boosting-in-machine-learning/ 

Random Forest have also a random feature selection process at each split which makes it very good to handle multicollinearity, and allow the creation of more diverse trees (while reducing the risk of overfitting).

On the opposite, Boosted trees are trained sequentially : you start with a simple tree, then evaluate the residual from this model, and you will in the second model use these residuals to train the second tree where the residuals are the highest, in order to improve the prediction performances of the ensemble of trees. So each tree try to "fix" the prediction issues from the previous tree. At the end of the tree "chain", you obtain your prediction result (class or value depending on the task):

Victor_G_4-1763539518879.png

Image from https://www.geeksforgeeks.org/machine-learning/bagging-vs-boosting-in-machine-learning/ 

So the training mode (parallel vs. sequential) is the biggest difference between the two algorithms you mention:

Comparison-between-a-random-forest-and-b-gradient-boosting-methods-In-the-random.png

The particularity of LightGBM compared to other boosting tree-based model is the method to grow the tree, which is a leaf-wise growth, compared to XGBoost or other boosting models who will grow the tree level-wise : that means you can't develop a node further if the tree is not "balanced" :

Victor_G_1-1763538838164.png

The choice of these models depends on the task and on the bias/variance tradeoff : Bagging algorithms like Random Forest reduce variance (overfitting), while boosting models like Boosted Trees reduce bias.

 

Hope this answer may help you,

 

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)
lala
Level IX

Re: JMP Pro"Bootstrap Forest " VS python "lightgbm"?

Thank you, expert!

I often use JSL's Partition because its prediction formula is simple and easy to understand.

It basically meets the requirements.

 

I learned by asking AI questions

Knowing that python's lightgbm function is also very powerful, but AI only says it is a random forest.

I run it through the python code provided by AI and by using JSL calls.

After statistics, its effect is better than the random forest of JMP Pro

Of course, I won't adjust the parameters too much either.Just a simple comparison.

 

 

AI said that the prediction formula generated by Bootstrap Forest in JMP Pro 19 has been significantly improved

But I couldn't use JMP Pro 19

Thanks!

Victor_G
Super User

Re: JMP Pro"Bootstrap Forest " VS python "lightgbm"?

JMP Partition is a Decision Tree (the basis/core of any tree-based method like LightGBM or Random Forest).

Be careful with the answers from AI, I don't know which one you use, but on very specific and technical topic they may be very prone to hallucinations. I tend to use Gemini (from Google) when I want to deep dive a technical topic as I found it the most interesting one, but there are still some hallucinations.

Clearly, LightGBM is NOT a Random Forest, but a Boosted Tree algorithm (see description between the two types in my initial post).
I would recommend reading the technical documentation about LightGBM to learn more about it : https://github.com/Microsoft/LightGBM and https://lightgbm.readthedocs.io/en/stable/index.html 
And the related paper LightGBM: A Highly Efficient Gradient Boosting Decision Tree.

The "better" effect you are seeing may be the use of boosting, that indeeds tend to provide better predictions (as sequential trees try to reduce errors as much as possible from previous trees, resulting in an ensemble with high precision) as it reduces the bias. But you need a robust validation strategy to avoid overfitting, as reducing bias may increase the variance : Bias–variance tradeoff - Wikipedia ; By having a very "precise" algorithm that learns very specific patterns, you may end up "fitting the noise" and having predictions that are not consistent when deploying the model and using it on new unseen data.  

A big difference between Random Forest and Boosted Tree methods is the hyperparameter finetuning sensitivity of these algorithms: Random Forest is very robust to hyperparameter tuning, and deliver good results without finetuning, while Boosted Tree may require stronger efforts on hyperparameters finetuning to get better performances. See [1802.09596] Tunability: Importance of Hyperparameters of Machine Learning Algorithms for more info and comparison of the importance of hyperparameters tuning on different algorithms performances.

To check how to do hyperparameters tuning in JMP, you can check some related posts like Boosted Tree - Tuning TABLE DESIGN.

 

Hope this complementary answer will help you, 

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)
lala
Level IX

Re: JMP Pro"Bootstrap Forest " VS python "lightgbm"?

Thanks Experts!
Well, I'm asking this question about gemini2.5 pro

 

 

2025-11-19_21-58-04.png

It says that lightgbm far exceeds JMP.


When it comes to programming questions, I always ask grok4's expert mode. It's the smart one and it searches for online knowledge to answer them.But the number of times it can be used for free each day is limited, so I cherish the opportunity to use it.

lala
Level IX

Re: JMP Pro"Bootstrap Forest " VS python "lightgbm"?

Yesterday, grok4.1 was updated. Instead, I feel that its AI illusion has been strengthened.

Victor_G
Super User

Re: JMP Pro"Bootstrap Forest " VS python "lightgbm"?

Haha, don't expect an AI to "know" everything, and particularly about specific use cases. It "knows" nothing, and only display the most probable sequence of words aligned to your query. It doesn't have a sense of right or false, only similarity/likelihood based on its training dataset, with all the right and wrong in it...

It doesn't mean anything to say "lightgbm far exceeds JMP", as it compares a specific algorithm (LightGBM) to a commercial software (JMP) handling several Machine Learning algorithms. This is completely nonsense. 
Besides that, please take care about general sentence like "Model A is always better than model B", such general affirmation should always be received with a healthy dose of skepticism. Those general statements are either not supported by data but by other motives, or may be guided by benchmark studies in the best case, but these benchmark and datasets may not reflect your particular situation and data, so their "ranking" may not be appropriate and/or relevant for your situation. 

Depending on your dataset, objectives, precision needed, deployment, ... one or another algorithm may be prefered. For a noisy response, I would better trust a Random Forest outcome than a Boosted Tree one, as the risk of overfitting is higher with the latter one. With precise and low-noise measurement, and if my dataset is large enough to split my data into representative training/validation/test sets, then I might use a Boosted Tree approach instead of a Random Forest if I want to have better predictive performances. 

Test and learn seems the safest and best option.

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

Recommended Articles