cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
dale_lehman
Level VII

Conceptual question about bagged predictions

I have struggled for a long while looking for uncertainty measures from machine learning models that are comparable to the standard errors that you routinely get from regression models.  Only recently I have become aware of some of the capabilities of the profiler - in particular, the bagged predictions.  But I don't really understand how to (or if I should) interpret those bagged predictions.  When I run a machine learning model (for example, a neural net) and I save the bagged predictions from the profiler, I get a Bagged mean, the standard error of the bagged mean, and the bagged standard deviation.  Comparing these with a regression model (multiple regression, for example), I've observed the following relationships:

 

  • The predicted bagged mean from the NN is very similar to the prediction formula from the multiple regression.
  • Mean prediction intervals from the multiple regression model are much narrower than the individual prediction intervals as expected (in the example I am looking at, the standard error for the mean prediction is about 1/10 the size of the standard error of the individual predictions).
  • The standard error of the bagged mean from the NN is much smaller than the bagged standard deviation (about 1/10 the size in the example I am looking at).

These observations tempt me to think of the standard error of the bagged mean from the NN as analogous to the standard error of the mean predictions from the regression model.  Similarly, the bagged standard deviation may be similar to the standard error of the individual predictions from the regression model. 

 

However, the standard errors from the NN and the regression models do not resemble each other at all!  So, my question is whether my interpretation makes any sense - or, exactly how can the standard errors from the bagged mean be interpreted or used.

 

Thanks in advance for any insights.  I am attached an concrete example in case it helps with my question (this is the validation data set from my modeling example - with the predictions from the multiple regression model and NN included).

26 REPLIES 26

Re: Conceptual question about bagged predictions

Let's review how the bagging works in the Profiler. Note that it is based on the fitted model. It uses the fitted model and resamples the data to inflate the data set or sample size but does not alter the model. Here is an excerpt from the JMP Help that covers Profiler bagging:

 

"Bagging automatically creates new columns in the original data table. All M sets of bagged predictions are saved as hidden columns. The final prediction is saved in a column named “Pred Formula <colname> Bagged Mean”. The standard deviation of the final prediction is saved in a column named “<colname> Bagged Std Dev”. The standard error of the bagged mean is saved in a column named “StdError <colname> Bagged Mean.” The standard error is the standard deviation divided by Sqrt( M-1 ). Here, <colname> identifies the column in the report that was bagged.

The standard error gives insight about the precision of the prediction. A very small standard error indicates a precise prediction for that observation. For more information about bagging, see Hastie et al. (2009)."

 

So you are not comparing the SEs from a regression analysis and NN. You are comparing SEs from any model and the SEs from bagging in the Profiler with the same model. The difference will be a factor of Sqrt( M-1 ).

dale_lehman
Level VII

Re: Conceptual question about bagged predictions

We are almost on the same page.  Ignoring NN for the moment, and using the simple regression example I provided, the standard error from the bagged mean is about 10% of that from the regression model:  this matches the square root of M-1 (M=100 here).  That is true for the mean predictions - for the individual predictions it is about 20% (I'm not sure why that sq rt (M-1) applies on the former but not the latter, but I don't think that is very important).  So, the question is:  if I want a confidence interval for the mean prediction, which standard error do I use?  The difference is an order of magnitude!

 

Assuming for the moment that I should use the standard error from the regression model (which is the larger of the two) - and that is what I suspect is the case - then it raises the question of what the bagged predictions are good for.  Here, I think the NN (and other machine learning models) comes in - there is no standard error from many of these models without some type of empirically derived one, such as provided by the profiler.  So, it would seem very useful to use that standard error from the bagged predictions to construct confidence intervals for those predictions.  However, from the regression example, I am wondering if this might underestimate the degree uncertainty by an order of magnitude - which would not be so useful.

Re: Conceptual question about bagged predictions

"So, the question is:  if I want a confidence interval for the mean prediction, which standard error do I use?  The difference is an order of magnitude!" Use the SE from the model that was used to predict. If you use the saved regression model, then use the saved SE or CI. If you are using the bagged predictions from the Profiler, then use the bagged SE. Here is the result using :weight versus :height in the Big Class data set. The highlighted pairs of prediction and SE data columns would be used together.

 

bagged.PNG

dale_lehman
Level VII

Re: Conceptual question about bagged predictions

Surely that can't be right!  Your example looks just like mine - the means are almost identical but the standard deviations differ by a factor of 10 (due to the sq rt (M-1) depending on which sets of columns you use.  So, while I see the logic of pairing the mean prediction with its associated standard deviation (either from the model or from the bagging), the practical effect is to have roughly the same mean predictions, but one confidence interval ends up being 10% as wide as the other.  Which one is the appropriate measure of variability for the mean prediction?  It can't be both - unless they answer different questions.  And, if that is the case, can you tell me what question each answers?

Re: Conceptual question about bagged predictions

"Surely that can't be right"

OK, you got me. It is all made up, faked. I was just seeing how far I could string you along.

 

(Serious discussion resumes...)

 

"Which one is the appropriate measure of variability for the mean prediction?"

The one that was estimated for prediction you will use. The pairing thing...

 

Bagging is more about predictive modeling than explanatory modeling, as I explained. Bagging decreases the uncertainty in the prediction. The use of bagging in this case relies on your belief in the quality and validity of the data and the model. it is not cheating. If the model fails, it is because of a problem with the quality or validity of the data or the model.

 

"It can't be both - unless they answer different questions."

That is exactly what I have been saying.

 

"And, if that is the case, can you tell me what question each answers?"

I can. I actually did: again, the first pair answers the question about the uncertainty in the prediction of the original model (e.g., linear regression, neural network, partition). The second pair answers the question about the uncertainty in the prediction using bagging.

dale_lehman
Level VII

Re: Conceptual question about bagged predictions

We are converging.

"And, if that is the case, can you tell me what question each answers?"

I can. I actually did: again, the first pair answers the question about the uncertainty in the prediction of the original model (e.g., linear regression, neural network, partition). The second pair answers the question about the uncertainty in the prediction using bagging.

 

This says that I have more uncertainty about my model predictions than I have about the bagged predictions - a lot more.  Why would anyone use the model predictions and their confidence intervals then?  I realize that a smaller standard error is not always good - only if the underlying model is good.  But in the case we are looking at, the same model underlies both measures and the mean predictions are almost identical.  So, why would I choose the much wider confidence interval?

 

I suppose there is the issue of coverage - the narrow interval might not provide enough coverage of the true value.  I will try some simulations to see if I can shed any light on that - but do you know of any references that speak to the accuracy of the two standard error measures relative to each other?

Re: Conceptual question about bagged predictions

"So, why would I choose the much wider confidence interval?"

Because you question the quality or the validity of the data or the model.

 

I would likely not use bagging and its predictions with a screening experiment because the model is likely biased. I would likely not use bagging and its predictions with a model based on a small sample of observational data.

 

Let me be clear. Bagging is valid. It is not cheating. But it is not always appropriate or beneficial.

 

Unfortunately, bagging in the profiler is something JMP developed. I do not have external references.

dale_lehman
Level VII

Re: Conceptual question about bagged predictions

In my simulated example, I have no reason to question the quality of the data or appropriateness of the model.  It is not observational data, nor is it a small sample (true, n=100 out of a population of 1000 is not large, but the confidence interval coverage is so disparate that I think the example shows us something).  I won't say bagging is cheating (a loaded term).  But I don't feel like I can trust the standard errors that it provides even for my simulated case.  I am very reluctant to use it for a case where the data and model are more suspect.  Can you provide any guidance for where it can be used?  I don't mean to be antagonistic:  I love JMP and love the profiler, I'm just trying to see whether the confidence intervals it can provide are useful.

Re: Conceptual question about bagged predictions

Then use performance measures from the validation hold out set to evaluate different models.

dale_lehman
Level VII

Re: Conceptual question about bagged predictions

Attached is a simulated example.  The untitled dataset contains simulated data and a regression model based on a random sample of 100 of the 1000 rows.  I saved the standard errors from the model and from the bagged predictions.  The subset of untitled data set then contains the 900 rows not in the random sample.  Using approximate 95% confidence intervals (2 standard errors around the corresponding mean prediction, using the standard errors for the individual predictions), the coverage of the actual Y value was 866 out of 900 rows (around 95%) for the model predictions, but only 222 out of 900 for the bagged predictions.  I couldn't figure out how to generate a comparison of coverage of mean confidence intervals since I only have a single Y observation on each row.

 

Given how extreme the results are, this suggests to me that the standard errors from the model (at least for this well behaved model) are accurate measures of the uncertainty in the predictions.  But the bagged standard errors are too small.  Given the simplicity of this example, it sure seems like I wouldn't want to rely on the standard errors from the bagged predictions.  Do you think that is a reasonable conclusion here?

 

Now, to the real potential uses.  If I run a classification model using NN, random forests, boosted trees, etc., one shortcoming compared with logistic regression is that these machine learning models do not provide a measure of uncertainty in the predictions (without invoking another procedure such as conformal prediction, which I have been playing with).  The profiler could readily provide me with bagged predictions of the mean probabilities of classifications along with their standard errors.  As useful as that would be, I am inclined to say that I can't really use those standard errors to represent the uncertainty in these machine learning models.  Is that correct?  Perhaps a more general question is, what can I use the bagged standard errors for?