cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
learning_JSL
Level IV

accounting for individual sample error when training a regression model for future predictions

Hi - I am doing a multiple linear regression to predict ecoli.  My independent variables are turbidity and streamflow.  My dependent variable is ecoli.  I am training the regression model based on 188 values of turbidity, streamflow, and ecoli.  I will then use future turbidity and streamflow values to predict the expected ecoli value.  

 

My question is this.  How do I account for the fact that each of the ecoli meaurements in my training dataset (n=188) can vary, on average, 16% due to measurement variations that are unrelated to turbidity and streamflow.  (That is, I collected ecoli duplicates for 29 of the 188 samples and the average variability was +/- 16%.)  So....the ecoli used to train the regression model is imprecise.  How do I account for this when I attempt to use my regression results to predict a future value of ecoli?

 

Thanks in advance!  

1 ACCEPTED SOLUTION

Accepted Solutions

Re: accounting for individual sample error when training a regression model for future predictions

The regression model includes a term for the errors in Y (dependent variable). Linear regression assumes that the errors are normally distributed with a mean of zero and a standard deviation that is constant (i.e., same for all Y). The errors are estimated by the residuals from the model and the data. The assumption above allows regression to pool the residuals and estimate the root mean square error.

 

You do not need to do anything more to account for the error.

 

The predicted response will therefore have uncertainty that is represented by a confidence interval of the predicted mean. A wider interval represents the uncertainty in the location of individual observations.

View solution in original post

2 REPLIES 2

Re: accounting for individual sample error when training a regression model for future predictions

The regression model includes a term for the errors in Y (dependent variable). Linear regression assumes that the errors are normally distributed with a mean of zero and a standard deviation that is constant (i.e., same for all Y). The errors are estimated by the residuals from the model and the data. The assumption above allows regression to pool the residuals and estimate the root mean square error.

 

You do not need to do anything more to account for the error.

 

The predicted response will therefore have uncertainty that is represented by a confidence interval of the predicted mean. A wider interval represents the uncertainty in the location of individual observations.

learning_JSL
Level IV

Re: accounting for individual sample error when training a regression model for future predictions

Thank you Mark.  That makes sense, yes.  I wanted to confirm that that is the case.