I am running partition models, including bootstrap forest, decision tree and boosted tree and would really appreciate some suggestions on how to solve this problem. For example, I've run bootstrap forest on my data for several times and the RSquare for validation could range from -0.2 to 0.8. If I look at column contributions, the rank of variables will change a lot in each run. I am not sure what caused the unstable results (small data set?).
Here is a short description of my data set. I also attach the data file.
48 observations, 5 continuous predictor variables
Response variable (i.e. the one to be predicted) is continuous
Does anyone have advice that would help solve this problem? Thank!
This is not a problem. It is inherent in the methods that you are using. They rely on a random assignment, so each run will necessarily be different.
You can set the random seed to the same value before each run and you will always get the same results. By why is a particular sample better than another?
Do you know about the methods that you are using? You might find your answer by selecting Help > Books > Predictive and Specialized Modeling. There are chapters devoted to the methods that you mention.
The JMP guides are not meant to replace an education about predictive modeling but they are still valuable and informative resources.
Oh, and yes, 46 observations is a small data set for these methods.
Have you done your preliminary analysis prior to modeling to understand the data? Distribution of predictors and response? Multivariate survey (e.g., scatter plot matrix)? What did you find? Outliers? Collinearity among predictors? Are transformations suggested?
Have you tried other methods like penalized regression (Fit Model > Generalized Regression)? They do not involve a random assignment unless you use cross-validation. (And if you do use it, you must use K-fold cross-validation or leave-one-out cross-validation with only 46 observations and 10-15 potential terms in the linear predictor.)