Solved: Re: Conceptual question about bagged predictions

Report Inappropriate Content · Jun 8, 2023 5:45 PM

I have struggled for a long while looking for uncertainty measures from machine learning models that are comparable to the standard errors that you routinely get from regression models. Only recently I have become aware of some of the capabilities of the profiler - in particular, the bagged predictions. But I don't really understand how to (or if I should) interpret those bagged predictions. When I run a machine learning model (for example, a neural net) and I save the bagged predictions from the profiler, I get a Bagged mean, the standard error of the bagged mean, and the bagged standard deviation. Comparing these with a regression model (multiple regression, for example), I've observed the following relationships:

The predicted bagged mean from the NN is very similar to the prediction formula from the multiple regression.
Mean prediction intervals from the multiple regression model are much narrower than the individual prediction intervals as expected (in the example I am looking at, the standard error for the mean prediction is about 1/10 the size of the standard error of the individual predictions).
The standard error of the bagged mean from the NN is much smaller than the bagged standard deviation (about 1/10 the size in the example I am looking at).

These observations tempt me to think of the standard error of the bagged mean from the NN as analogous to the standard error of the mean predictions from the regression model. Similarly, the bagged standard deviation may be similar to the standard error of the individual predictions from the regression model.

However, the standard errors from the NN and the regression models do not resemble each other at all! So, my question is whether my interpretation makes any sense - or, exactly how can the standard errors from the bagged mean be interpreted or used.

Thanks in advance for any insights. I am attached an concrete example in case it helps with my question (this is the validation data set from my modeling example - with the predictions from the multiple regression model and NN included).

peng_liu · Feb 5, 2022 03:23 PM

Yes, I confirm that in Profiler platform (under Graph menu), when a Prediction Bagged Mean formula, accompanied by a StdError of the Bagged Mean formula, enters into the launch dialog, the Profiler platform use "StdError of the Bagged Mean" to produce intervals.

I agree that there are things are not clear enough, and could be improved. The things that I see not clear is the use of "Confidence Intervals" in the documentation on the page that you pointed to: Example of Confidence Intervals Using Bagging . It is not clear about confidence intervals of what, and the wording is probably misleading.

Meanwhile, your desire of using Bagged Std Dev to produce intervals is a good request. I tried to put that column into Profiler platform, and it won't recognize.

Would you please reach JMP Tech Support at support@jmp.com and report your concerns and request?

Thank you very much for looking into the software so closely!

View solution in original post

dale_lehman · Jan 31, 2022 04:29 PM

Addendum - in the file I attached, I (over)estimated the standard errors from the multiple regression. I estimated them by using the saved 95% confidence intervals. Attached is a revised version where I saved the standard errors directly. My question remains, as the standard errors from the NN seem to be less than half as large as from the multiple regression (on average) with little apparent correlation within the validation set. Both models fit the data quite well. So, I am wondering if the interpretation of the standard errors of the bagged mean are appropriately interpreted as standard errors associated with the predictions in the same way that the standard errors of the predictions from the multiple regression are interpreted.

Mark_Bailey · Jan 31, 2022 05:22 PM

Yes.

The Neural platform uses a complex model, even with only a handful of hidden nodes. Imagine adding lots of terms to the regression model. What happens to the RMSE? What, in turn, happens to the SEs? Is your regression model complexity comparable to that of the NN?

I am not suggesting that you are misusing the NN or other ML methods. Just remember that they are ALL about prediction, NOT AT ALL about inference. So uncertainty in estimates is unimportant, accuracy / reproducibility / generalization are everything. So prediction models (not explanatory models) use measures of the total MSE (bias + variance) to select the model, and the 'honest assessment,' or cross-validation, to confirm the model that was selected.

But the main point is that there is no reason for the SEs from different models to be the same, even if their mean response predictions are similar. This difference might give one model an advantage in your application.

dale_lehman · Jan 31, 2022 07:23 PM

Mark, thanks. That helps somewhat. It might explain why the NN gives smaller standard errors (though I'm still surprised at the size of the difference when both models have such good fits to the data). But it really doesn't seem to explain why there is almost no correlation in the standard errors associated with each prediction. The data I posted shows virtually no meaningful relationship between each observation's standard error of prediction between the two models. Now, for a multiple regression model I have some sense of what determines the standard errors associated with different observations - but for the NN, I really don't. Perhaps that is the reason they are not related to each other? Is this a dimension related to the lack of interpretability of NN models?

Mark_Bailey · Feb 1, 2022 12:02 PM

The NN is an ensemble model that is highly non-linear. Compare that with your regression model that might have second-order terms. The standard errors are very different for these two kinds of models.

I personally disagree with the notion that NN are not interpretable or that linear regression models are interpretable. (I think it is silly.) The effect of predictor X is a linear combination of all the terms that include it. For example, it is nonsense to talk about a 'quadratic effect.' There is only a quadratic term in the model. So how do we interpret the effect of X when it appears in the model as X + X*X2 + X*X + X*X*X? We are just used to thinking in these terms - we had many years of exposure to it and time to think about it. A NN is more of the same (linear predictor) put through a non-linear activation function, and added to more of the same for each node. We just have to think harder.

No, we don't. We have a profiler that works with any kind of function (model).

dale_lehman · Feb 1, 2022 02:39 PM

I'm following what you say - but I think my question has become something different. Let's leave NN out of it. Since the profiler is available from the multiple regression platform as well, I did some experimenting to see how the standard errors of the bagged predictions compare with those saved from the regression model. They are correlated, though far from perfectly. The individual prediction standard deviation is much larger than the mean prediction standard deviation, as it should be, and this also applies to the two standard errors you get when you save the bagged means. However, what surprises me and I don't understand, is why the standard errors from saving the bagged predictions (either the individual or mean version) are an order of magnitude smaller than the standard errors from the regression model. My understanding (which could be wrong) is that the standard errors of the predictions are theoretically derived in the regression model and are the result of bootstrapping in the bagged predictions. In theory, those should be similar (at least with enough bootstrap samples - I used 100 and 1000 and both give similar results) - but they are not.

So, it appears that the standard errors from the profiler are qualitatively different than the standard errors from the regression model. Why is that the case?

Mark_Bailey · Feb 2, 2022 10:18 AM

"My understanding (which could be wrong) is that the standard errors of the predictions are theoretically derived in the regression model and are the result of bootstrapping in the bagged predictions. In theory, those should be similar (at least with enough bootstrap samples - I used 100 and 1000 and both give similar results) - but they are not."

Which theory are you referring to?

dale_lehman · Feb 2, 2022 10:31 AM

I believe the prediction confidence intervals for a multiple regression are derived via formulae that are derived from assumptions about the error structure and random sampling (ultimately derived from logic that underlies the Central Limit Theorem). On the other hand, I believe the bagged predictions are derived from bootstrapping - a nonparametric empirical approach to deriving confidence intervals. I also believe these two approaches are generally close, unless the underlying data has unusual distributions and/or insufficient sample sizes are used for the bootstrap.

In the files I attached, I don't think any of these issues arise. The mean predictions are almost identical from the 2 approaches, but the standard errors from the profiler are much much smaller than from the theoretically derived values. This is what makes me think I am misunderstanding what the profiler standard errors mean. Otherwise, why would anyone ever use the theoretically derived standard errors?

Mark_Bailey · Feb 2, 2022 11:14 AM

"I believe the prediction confidence intervals for a multiple regression are derived via formulae that are derived from assumptions about the error structure and random sampling (ultimately derived from logic that underlies the Central Limit Theorem). On the other hand, I believe the bagged predictions are derived from bootstrapping - a nonparametric empirical approach to deriving confidence intervals. I also believe these two approaches are generally close, unless the underlying data has unusual distributions and/or insufficient sample sizes are used for the bootstrap."

Yes! That is, if you use the theoretical expression for the CI and the bootstrap CI for the SAME linear regression model, the CI ESTIMATES should agree.

dale_lehman · Feb 2, 2022 11:19 AM

Mark

Then I would ask you to look at the last example dataset I posted. The confidence intervals (standard errors) are not even close to agreeing. That is for a simple linear regression and comparing the prediction standard errors with those that come from saving the bagged predictions from the profiler. The latter are much much smaller than the former. That is why I am confused.

Conceptual question about bagged predictions

Re: Conceptual question about bagged predictions

Re: Conceptual question about bagged predictions

Re: Conceptual question about bagged predictions

Re: Conceptual question about bagged predictions

Re: Conceptual question about bagged predictions

Re: Conceptual question about bagged predictions

Re: Conceptual question about bagged predictions

Re: Conceptual question about bagged predictions

Re: Conceptual question about bagged predictions

Re: Conceptual question about bagged predictions

Recommended Articles

Multiple-Group Analysis in Structural Equation Modeling

Get Going with JMP: Essentials for Using JMP