cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
ragnarl
Level II

Why is the RMSE returned by the Linear Fit deviating the sqrt(sum(residuals^2)/n)?

I am confused regarding the Root Mean Square Error reported in the "Summary of Fit" in the Linear Fit platform. 

x = [1.309, 1.471, 1.49, 1.565, 1.611, 1.68];
y = [2.138, 3.421, 3.597, 4.34, 4.882, 5.66];
{Estimates, Std_Error, Diagnostics} = Linear Regression( y, x, <<printToLog );
z=Estimates[1]+Estimates[2]*x;

rmse_lin_reg=sqrt(sum((z-y)^2)/nrows(y));

as table(x,y,<<Column Names({"x","y"}));
biv=Bivariate(
	Y( :y ),
	X( :x ),
	Fit Line( )
);
rmse_lin_fit=((biv<< report())["Summary of Fit"][Number Col Box(1)] <<get())[3];
show(rmse_lin_reg,rmse_lin_fit);

The result of the last line:

rmse_lin_reg = 0.111508840972607;
rmse_lin_fit = 0.136569881096024;

What is wrong here? 

 

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
julian
Community Manager Community Manager

Re: Why is the RMSE returned by the Linear Fit deviating the sqrt(sum(residuals^2)/n)?

Hi @ragnarl,

 

I can see how this appear contradictory! All mean squares involve dividing a sum of squared deviations by their degrees of freedom. In your formula working with the results of LinearRegression() you appear to be dividing by n, not df.

rmse_lin_reg=sqrt(sum((z-y)^2)/nrows(y));

The degrees of freedom for the mean squared error in a simple linear regression is n-2 (1 df lost to estimating the intercept, and 1 more is lost to estimating the slope of x). If you adjust your script as below to have rmse_lin_reg=sqrt(sum((z-y)^2)/(nrows(y)-2)) you will find the same value for MSE (and thus RMSE).

 

I hope this helps!

 

@julian 

 

x = [1.309, 1.471, 1.49, 1.565, 1.611, 1.68];
y = [2.138, 3.421, 3.597, 4.34, 4.882, 5.66];
{Estimates, Std_Error, Diagnostics} = Linear Regression( y, x, <<printToLog );
z=Estimates[1]+Estimates[2]*x;

rmse_lin_reg=sqrt(sum((z-y)^2)/(nrows(y)-2));

as table(x,y,<<Column Names({"x","y"}));
biv=Bivariate(
	Y( :y ),
	X( :x ),
	Fit Line( )
);
rmse_lin_fit=((biv<< report())["Summary of Fit"][Number Col Box(1)] <<get())[3];
show(rmse_lin_reg,rmse_lin_fit);

returns

rmse_lin_reg = 0.13656988109602;
rmse_lin_fit = 0.13656988109602;

View solution in original post

1 REPLY 1
julian
Community Manager Community Manager

Re: Why is the RMSE returned by the Linear Fit deviating the sqrt(sum(residuals^2)/n)?

Hi @ragnarl,

 

I can see how this appear contradictory! All mean squares involve dividing a sum of squared deviations by their degrees of freedom. In your formula working with the results of LinearRegression() you appear to be dividing by n, not df.

rmse_lin_reg=sqrt(sum((z-y)^2)/nrows(y));

The degrees of freedom for the mean squared error in a simple linear regression is n-2 (1 df lost to estimating the intercept, and 1 more is lost to estimating the slope of x). If you adjust your script as below to have rmse_lin_reg=sqrt(sum((z-y)^2)/(nrows(y)-2)) you will find the same value for MSE (and thus RMSE).

 

I hope this helps!

 

@julian 

 

x = [1.309, 1.471, 1.49, 1.565, 1.611, 1.68];
y = [2.138, 3.421, 3.597, 4.34, 4.882, 5.66];
{Estimates, Std_Error, Diagnostics} = Linear Regression( y, x, <<printToLog );
z=Estimates[1]+Estimates[2]*x;

rmse_lin_reg=sqrt(sum((z-y)^2)/(nrows(y)-2));

as table(x,y,<<Column Names({"x","y"}));
biv=Bivariate(
	Y( :y ),
	X( :x ),
	Fit Line( )
);
rmse_lin_fit=((biv<< report())["Summary of Fit"][Number Col Box(1)] <<get())[3];
show(rmse_lin_reg,rmse_lin_fit);

returns

rmse_lin_reg = 0.13656988109602;
rmse_lin_fit = 0.13656988109602;