Recent discussion about how to interpret standard errors from the profiler have not converged to a solution, so I will start this again - with a different example and some more clarity (I hope) on my question. I've attached one of the famous Anscombe quartet datasets - the one where a linear regression model is appropriate. I fit a multiple regression model and saved the mean prediction standard error and the individual prediction standard error - as expected, the former is much narrower than the latter. I also used the profiler and saved the bagged predictions - this gave me a standard error for the bagged mean (which I interpret to be analogous to the mean prediction standard error from the regression model) and a Bagged Std dev (which I interpret as analogous to the individual prediction standard error from the regression). When I say I interpret these to be analogous, I realize that the standard errors from the model are derived from the regression model while the bagged standard errors are derived from a non-parametric bootstrap (in this case, with 100 bootstrap samples).
In this case (unlike my last example), the individual standard errors are close, although somewhat smaller for the bagged predictions than the model predictions. But the standard error of the bagged mean is around 10% as large as the standard error of mean predictions from the model. In fact, this can be verified by the documentation that says that the bagged standard error = standard deviation of the bagged predictions divided by the square root of (M-1) where M is the number of bagged samples (which is 100 here).
My question concerns how to use the standard errors of the bagged mean and the bagged standard deviation. If I look at the confidence interval shown in the profiler, it appears to me to be consistent with the mean prediction standard error from the regression model - not with the standard error of the bagged mean (which is 10% as large). If that is the case, then what are the standard errors from the bagged samples good for?
I think the importance (for me at least) is that JMP provides profilers for all the machine learning models. This allows saving the bagged means and these two types of standard errors. If these standard errors can be used to construct confidence intervals, then we get something akin to a sense of statistical significance from neural networks, random forests, etc. I think that would be an important feature since most of those models don't readily provide any inferential information. This would apply to classification models as well as regression models.
But I am not sure the bagged standard errors make much sense. In the example I've provided, they look too small. And, it isn't clear to me whether the confidence interval shown in the profiler is based on the standard error fo the bagged mean (it doesn't look that way in this example). So, please help shed some light on my questions.