Solved: accounting for individual sample error when training a regression model for futu...

Report Inappropriate Content · Jun 10, 2023 1:46 PM

Hi - I am doing a multiple linear regression to predict ecoli. My independent variables are turbidity and streamflow. My dependent variable is ecoli. I am training the regression model based on 188 values of turbidity, streamflow, and ecoli. I will then use future turbidity and streamflow values to predict the expected ecoli value.

My question is this. How do I account for the fact that each of the ecoli meaurements in my training dataset (n=188) can vary, on average, 16% due to measurement variations that are unrelated to turbidity and streamflow. (That is, I collected ecoli duplicates for 29 of the 188 samples and the average variability was +/- 16%.) So....the ecoli used to train the regression model is imprecise. How do I account for this when I attempt to use my regression results to predict a future value of ecoli?

Thanks in advance!

Mark_Bailey · Sep 27, 2021 02:24 PM

The regression model includes a term for the errors in Y (dependent variable). Linear regression assumes that the errors are normally distributed with a mean of zero and a standard deviation that is constant (i.e., same for all Y). The errors are estimated by the residuals from the model and the data. The assumption above allows regression to pool the residuals and estimate the root mean square error.

You do not need to do anything more to account for the error.

The predicted response will therefore have uncertainty that is represented by a confidence interval of the predicted mean. A wider interval represents the uncertainty in the location of individual observations.

View solution in original post

Mark_Bailey · Sep 27, 2021 02:24 PM

The regression model includes a term for the errors in Y (dependent variable). Linear regression assumes that the errors are normally distributed with a mean of zero and a standard deviation that is constant (i.e., same for all Y). The errors are estimated by the residuals from the model and the data. The assumption above allows regression to pool the residuals and estimate the root mean square error.

You do not need to do anything more to account for the error.

The predicted response will therefore have uncertainty that is represented by a confidence interval of the predicted mean. A wider interval represents the uncertainty in the location of individual observations.

learning_JSL · Sep 28, 2021 08:21 AM

Thank you Mark. That makes sense, yes. I wanted to confirm that that is the case.

accounting for individual sample error when training a regression model for future predictions

Re: accounting for individual sample error when training a regression model for future predictions

Re: accounting for individual sample error when training a regression model for future predictions

Re: accounting for individual sample error when training a regression model for future predictions

Recommended Articles

Get Going with JMP: Essentials for Using JMP

Multiple-Group Analysis in Structural Equation Modeling