cancel
Showing results for
Show  only  | Search instead for
Did you mean:
Choose Language Hide Translation Bar
Level II

Why is the RMSE returned by the Linear Fit deviating the sqrt(sum(residuals^2)/n)?

I am confused regarding the Root Mean Square Error reported in the "Summary of Fit" in the Linear Fit platform.

``````x = [1.309, 1.471, 1.49, 1.565, 1.611, 1.68];
y = [2.138, 3.421, 3.597, 4.34, 4.882, 5.66];
{Estimates, Std_Error, Diagnostics} = Linear Regression( y, x, <<printToLog );
z=Estimates[1]+Estimates[2]*x;

rmse_lin_reg=sqrt(sum((z-y)^2)/nrows(y));

as table(x,y,<<Column Names({"x","y"}));
biv=Bivariate(
Y( :y ),
X( :x ),
Fit Line( )
);
rmse_lin_fit=((biv<< report())["Summary of Fit"][Number Col Box(1)] <<get())[3];
show(rmse_lin_reg,rmse_lin_fit);``````

The result of the last line:

``````rmse_lin_reg = 0.111508840972607;
rmse_lin_fit = 0.136569881096024;``````

What is wrong here?

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
Community Manager

Re: Why is the RMSE returned by the Linear Fit deviating the sqrt(sum(residuals^2)/n)?

Hi @ragnarl,

I can see how this appear contradictory! All mean squares involve dividing a sum of squared deviations by their degrees of freedom. In your formula working with the results of LinearRegression() you appear to be dividing by n, not df.

``rmse_lin_reg=sqrt(sum((z-y)^2)/nrows(y));``

The degrees of freedom for the mean squared error in a simple linear regression is n-2 (1 df lost to estimating the intercept, and 1 more is lost to estimating the slope of x). If you adjust your script as below to have `rmse_lin_reg=sqrt(sum((z-y)^2)/(nrows(y)-2)) `you will find the same value for MSE (and thus RMSE).

I hope this helps!

``````x = [1.309, 1.471, 1.49, 1.565, 1.611, 1.68];
y = [2.138, 3.421, 3.597, 4.34, 4.882, 5.66];
{Estimates, Std_Error, Diagnostics} = Linear Regression( y, x, <<printToLog );
z=Estimates[1]+Estimates[2]*x;

rmse_lin_reg=sqrt(sum((z-y)^2)/(nrows(y)-2));

as table(x,y,<<Column Names({"x","y"}));
biv=Bivariate(
Y( :y ),
X( :x ),
Fit Line( )
);
rmse_lin_fit=((biv<< report())["Summary of Fit"][Number Col Box(1)] <<get())[3];
show(rmse_lin_reg,rmse_lin_fit);``````

returns

``````rmse_lin_reg = 0.13656988109602;
rmse_lin_fit = 0.13656988109602;``````
Community Manager

Re: Why is the RMSE returned by the Linear Fit deviating the sqrt(sum(residuals^2)/n)?

Hi @ragnarl,

I can see how this appear contradictory! All mean squares involve dividing a sum of squared deviations by their degrees of freedom. In your formula working with the results of LinearRegression() you appear to be dividing by n, not df.

``rmse_lin_reg=sqrt(sum((z-y)^2)/nrows(y));``

The degrees of freedom for the mean squared error in a simple linear regression is n-2 (1 df lost to estimating the intercept, and 1 more is lost to estimating the slope of x). If you adjust your script as below to have `rmse_lin_reg=sqrt(sum((z-y)^2)/(nrows(y)-2)) `you will find the same value for MSE (and thus RMSE).

I hope this helps!

``````x = [1.309, 1.471, 1.49, 1.565, 1.611, 1.68];
y = [2.138, 3.421, 3.597, 4.34, 4.882, 5.66];
{Estimates, Std_Error, Diagnostics} = Linear Regression( y, x, <<printToLog );
z=Estimates[1]+Estimates[2]*x;

rmse_lin_reg=sqrt(sum((z-y)^2)/(nrows(y)-2));

as table(x,y,<<Column Names({"x","y"}));
biv=Bivariate(
Y( :y ),
X( :x ),
Fit Line( )
);
rmse_lin_fit=((biv<< report())["Summary of Fit"][Number Col Box(1)] <<get())[3];
show(rmse_lin_reg,rmse_lin_fit);``````

returns

``````rmse_lin_reg = 0.13656988109602;
rmse_lin_fit = 0.13656988109602;``````