"Surely that can't be right"
OK, you got me. It is all made up, faked. I was just seeing how far I could string you along.
(Serious discussion resumes...)
"Which one is the appropriate measure of variability for the mean prediction?"
The one that was estimated for prediction you will use. The pairing thing...
Bagging is more about predictive modeling than explanatory modeling, as I explained. Bagging decreases the uncertainty in the prediction. The use of bagging in this case relies on your belief in the quality and validity of the data and the model. it is not cheating. If the model fails, it is because of a problem with the quality or validity of the data or the model.
"It can't be both - unless they answer different questions."
That is exactly what I have been saying.
"And, if that is the case, can you tell me what question each answers?"
I can. I actually did: again, the first pair answers the question about the uncertainty in the prediction of the original model (e.g., linear regression, neural network, partition). The second pair answers the question about the uncertainty in the prediction using bagging.