What inspired this wish list request?
I find using multiple trees helpful for getting insight into complex, observational data sets.
What is the improvement you would like to see?
Several suggestions:
- Allow the starting of k trees inside Partition.
- Allow an option on tree building to stop a path within a tree when the logworth is less than k.
- As output from multiple trees, give the correlations of pairs of variables that appear together or not in a tree. If two predictors are correlated, they will tend not to appear together in a tree. Also, each will appear less important, as the effect of the two variables will appear in one tree or another. Sort the variables based on the number of times a variable is used. Upper triangle, correlations, Lower triangle, number of times the two variables appear in the same tree.
Why is this idea important?
Large, complex observational data sets, e.g. nutrition, medical, etc., are becoming available. Understanding the relationships among variables helps understand potential causal relationships.