Solved: Why is the RMSE returned by the Linear Fit deviating the sqrt(sum(residuals^2)/n...

Report Inappropriate Content · Jun 9, 2023 4:43 PM

I am confused regarding the Root Mean Square Error reported in the "Summary of Fit" in the Linear Fit platform.

x = [1.309, 1.471, 1.49, 1.565, 1.611, 1.68];
y = [2.138, 3.421, 3.597, 4.34, 4.882, 5.66];
{Estimates, Std_Error, Diagnostics} = Linear Regression( y, x, <<printToLog );
z=Estimates[1]+Estimates[2]*x;

rmse_lin_reg=sqrt(sum((z-y)^2)/nrows(y));

as table(x,y,<<Column Names({"x","y"}));
biv=Bivariate(
	Y( :y ),
	X( :x ),
	Fit Line( )
);
rmse_lin_fit=((biv<< report())["Summary of Fit"][Number Col Box(1)] <<get())[3];
show(rmse_lin_reg,rmse_lin_fit);

The result of the last line:

rmse_lin_reg = 0.111508840972607;
rmse_lin_fit = 0.136569881096024;

What is wrong here?

Thank you

julian · Nov 12, 2020 07:31 AM

Hi @ragnarl,

I can see how this appear contradictory! All mean squares involve dividing a sum of squared deviations by their degrees of freedom. In your formula working with the results of LinearRegression() you appear to be dividing by n, not df.

rmse_lin_reg=sqrt(sum((z-y)^2)/nrows(y));

The degrees of freedom for the mean squared error in a simple linear regression is n-2 (1 df lost to estimating the intercept, and 1 more is lost to estimating the slope of x). If you adjust your script as below to have rmse_lin_reg=sqrt(sum((z-y)^2)/(nrows(y)-2)) you will find the same value for MSE (and thus RMSE).

I hope this helps!

@julian

x = [1.309, 1.471, 1.49, 1.565, 1.611, 1.68];
y = [2.138, 3.421, 3.597, 4.34, 4.882, 5.66];
{Estimates, Std_Error, Diagnostics} = Linear Regression( y, x, <<printToLog );
z=Estimates[1]+Estimates[2]*x;

rmse_lin_reg=sqrt(sum((z-y)^2)/(nrows(y)-2));

as table(x,y,<<Column Names({"x","y"}));
biv=Bivariate(
	Y( :y ),
	X( :x ),
	Fit Line( )
);
rmse_lin_fit=((biv<< report())["Summary of Fit"][Number Col Box(1)] <<get())[3];
show(rmse_lin_reg,rmse_lin_fit);

returns

rmse_lin_reg = 0.13656988109602;
rmse_lin_fit = 0.13656988109602;

View solution in original post

julian · Nov 12, 2020 07:31 AM

Hi @ragnarl,

I can see how this appear contradictory! All mean squares involve dividing a sum of squared deviations by their degrees of freedom. In your formula working with the results of LinearRegression() you appear to be dividing by n, not df.

rmse_lin_reg=sqrt(sum((z-y)^2)/nrows(y));

The degrees of freedom for the mean squared error in a simple linear regression is n-2 (1 df lost to estimating the intercept, and 1 more is lost to estimating the slope of x). If you adjust your script as below to have rmse_lin_reg=sqrt(sum((z-y)^2)/(nrows(y)-2)) you will find the same value for MSE (and thus RMSE).

I hope this helps!

@julian

x = [1.309, 1.471, 1.49, 1.565, 1.611, 1.68];
y = [2.138, 3.421, 3.597, 4.34, 4.882, 5.66];
{Estimates, Std_Error, Diagnostics} = Linear Regression( y, x, <<printToLog );
z=Estimates[1]+Estimates[2]*x;

rmse_lin_reg=sqrt(sum((z-y)^2)/(nrows(y)-2));

as table(x,y,<<Column Names({"x","y"}));
biv=Bivariate(
	Y( :y ),
	X( :x ),
	Fit Line( )
);
rmse_lin_fit=((biv<< report())["Summary of Fit"][Number Col Box(1)] <<get())[3];
show(rmse_lin_reg,rmse_lin_fit);

returns

rmse_lin_reg = 0.13656988109602;
rmse_lin_fit = 0.13656988109602;

Why is the RMSE returned by the Linear Fit deviating the sqrt(sum(residuals^2)/n)?

Re: Why is the RMSE returned by the Linear Fit deviating the sqrt(sum(residuals^2)/n)?

Re: Why is the RMSE returned by the Linear Fit deviating the sqrt(sum(residuals^2)/n)?

Recommended Articles

Get Going with JMP: Essentials for Using JMP

Getting Started with JMP: On Demand Course