- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Why is the RMSE returned by the Linear Fit deviating the sqrt(sum(residuals^2)/n)?
I am confused regarding the Root Mean Square Error reported in the "Summary of Fit" in the Linear Fit platform.
x = [1.309, 1.471, 1.49, 1.565, 1.611, 1.68];
y = [2.138, 3.421, 3.597, 4.34, 4.882, 5.66];
{Estimates, Std_Error, Diagnostics} = Linear Regression( y, x, <<printToLog );
z=Estimates[1]+Estimates[2]*x;
rmse_lin_reg=sqrt(sum((z-y)^2)/nrows(y));
as table(x,y,<<Column Names({"x","y"}));
biv=Bivariate(
Y( :y ),
X( :x ),
Fit Line( )
);
rmse_lin_fit=((biv<< report())["Summary of Fit"][Number Col Box(1)] <<get())[3];
show(rmse_lin_reg,rmse_lin_fit);
The result of the last line:
rmse_lin_reg = 0.111508840972607;
rmse_lin_fit = 0.136569881096024;
What is wrong here?
Thank you
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Why is the RMSE returned by the Linear Fit deviating the sqrt(sum(residuals^2)/n)?
Hi @ragnarl,
I can see how this appear contradictory! All mean squares involve dividing a sum of squared deviations by their degrees of freedom. In your formula working with the results of LinearRegression() you appear to be dividing by n, not df.
rmse_lin_reg=sqrt(sum((z-y)^2)/nrows(y));
The degrees of freedom for the mean squared error in a simple linear regression is n-2 (1 df lost to estimating the intercept, and 1 more is lost to estimating the slope of x). If you adjust your script as below to have rmse_lin_reg=sqrt(sum((z-y)^2)/(nrows(y)-2))
you will find the same value for MSE (and thus RMSE).
I hope this helps!
x = [1.309, 1.471, 1.49, 1.565, 1.611, 1.68];
y = [2.138, 3.421, 3.597, 4.34, 4.882, 5.66];
{Estimates, Std_Error, Diagnostics} = Linear Regression( y, x, <<printToLog );
z=Estimates[1]+Estimates[2]*x;
rmse_lin_reg=sqrt(sum((z-y)^2)/(nrows(y)-2));
as table(x,y,<<Column Names({"x","y"}));
biv=Bivariate(
Y( :y ),
X( :x ),
Fit Line( )
);
rmse_lin_fit=((biv<< report())["Summary of Fit"][Number Col Box(1)] <<get())[3];
show(rmse_lin_reg,rmse_lin_fit);
returns
rmse_lin_reg = 0.13656988109602;
rmse_lin_fit = 0.13656988109602;
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Why is the RMSE returned by the Linear Fit deviating the sqrt(sum(residuals^2)/n)?
Hi @ragnarl,
I can see how this appear contradictory! All mean squares involve dividing a sum of squared deviations by their degrees of freedom. In your formula working with the results of LinearRegression() you appear to be dividing by n, not df.
rmse_lin_reg=sqrt(sum((z-y)^2)/nrows(y));
The degrees of freedom for the mean squared error in a simple linear regression is n-2 (1 df lost to estimating the intercept, and 1 more is lost to estimating the slope of x). If you adjust your script as below to have rmse_lin_reg=sqrt(sum((z-y)^2)/(nrows(y)-2))
you will find the same value for MSE (and thus RMSE).
I hope this helps!
x = [1.309, 1.471, 1.49, 1.565, 1.611, 1.68];
y = [2.138, 3.421, 3.597, 4.34, 4.882, 5.66];
{Estimates, Std_Error, Diagnostics} = Linear Regression( y, x, <<printToLog );
z=Estimates[1]+Estimates[2]*x;
rmse_lin_reg=sqrt(sum((z-y)^2)/(nrows(y)-2));
as table(x,y,<<Column Names({"x","y"}));
biv=Bivariate(
Y( :y ),
X( :x ),
Fit Line( )
);
rmse_lin_fit=((biv<< report())["Summary of Fit"][Number Col Box(1)] <<get())[3];
show(rmse_lin_reg,rmse_lin_fit);
returns
rmse_lin_reg = 0.13656988109602;
rmse_lin_fit = 0.13656988109602;