cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
The Discovery Summit 2025 Call for Content is open! Submit an abstract today to present at our premier analytics conference.
Get the free JMP Student Edition for qualified students and instructors at degree granting institutions.
Choose Language Hide Translation Bar
View Original Published Thread

Why is the RMSE returned by the Linear Fit deviating the sqrt(sum(residuals^2)/n)?

ragnarl
Level II

I am confused regarding the Root Mean Square Error reported in the "Summary of Fit" in the Linear Fit platform. 

x = [1.309, 1.471, 1.49, 1.565, 1.611, 1.68];
y = [2.138, 3.421, 3.597, 4.34, 4.882, 5.66];
{Estimates, Std_Error, Diagnostics} = Linear Regression( y, x, <<printToLog );
z=Estimates[1]+Estimates[2]*x;

rmse_lin_reg=sqrt(sum((z-y)^2)/nrows(y));

as table(x,y,<<Column Names({"x","y"}));
biv=Bivariate(
	Y( :y ),
	X( :x ),
	Fit Line( )
);
rmse_lin_fit=((biv<< report())["Summary of Fit"][Number Col Box(1)] <<get())[3];
show(rmse_lin_reg,rmse_lin_fit);

The result of the last line:

rmse_lin_reg = 0.111508840972607;
rmse_lin_fit = 0.136569881096024;

What is wrong here? 

 

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
julian
Community Manager Community Manager


Re: Why is the RMSE returned by the Linear Fit deviating the sqrt(sum(residuals^2)/n)?

Hi @ragnarl,

 

I can see how this appear contradictory! All mean squares involve dividing a sum of squared deviations by their degrees of freedom. In your formula working with the results of LinearRegression() you appear to be dividing by n, not df.

rmse_lin_reg=sqrt(sum((z-y)^2)/nrows(y));

The degrees of freedom for the mean squared error in a simple linear regression is n-2 (1 df lost to estimating the intercept, and 1 more is lost to estimating the slope of x). If you adjust your script as below to have rmse_lin_reg=sqrt(sum((z-y)^2)/(nrows(y)-2)) you will find the same value for MSE (and thus RMSE).

 

I hope this helps!

 

@julian 

 

x = [1.309, 1.471, 1.49, 1.565, 1.611, 1.68];
y = [2.138, 3.421, 3.597, 4.34, 4.882, 5.66];
{Estimates, Std_Error, Diagnostics} = Linear Regression( y, x, <<printToLog );
z=Estimates[1]+Estimates[2]*x;

rmse_lin_reg=sqrt(sum((z-y)^2)/(nrows(y)-2));

as table(x,y,<<Column Names({"x","y"}));
biv=Bivariate(
	Y( :y ),
	X( :x ),
	Fit Line( )
);
rmse_lin_fit=((biv<< report())["Summary of Fit"][Number Col Box(1)] <<get())[3];
show(rmse_lin_reg,rmse_lin_fit);

returns

rmse_lin_reg = 0.13656988109602;
rmse_lin_fit = 0.13656988109602;

View solution in original post

1 REPLY 1
julian
Community Manager Community Manager


Re: Why is the RMSE returned by the Linear Fit deviating the sqrt(sum(residuals^2)/n)?

Hi @ragnarl,

 

I can see how this appear contradictory! All mean squares involve dividing a sum of squared deviations by their degrees of freedom. In your formula working with the results of LinearRegression() you appear to be dividing by n, not df.

rmse_lin_reg=sqrt(sum((z-y)^2)/nrows(y));

The degrees of freedom for the mean squared error in a simple linear regression is n-2 (1 df lost to estimating the intercept, and 1 more is lost to estimating the slope of x). If you adjust your script as below to have rmse_lin_reg=sqrt(sum((z-y)^2)/(nrows(y)-2)) you will find the same value for MSE (and thus RMSE).

 

I hope this helps!

 

@julian 

 

x = [1.309, 1.471, 1.49, 1.565, 1.611, 1.68];
y = [2.138, 3.421, 3.597, 4.34, 4.882, 5.66];
{Estimates, Std_Error, Diagnostics} = Linear Regression( y, x, <<printToLog );
z=Estimates[1]+Estimates[2]*x;

rmse_lin_reg=sqrt(sum((z-y)^2)/(nrows(y)-2));

as table(x,y,<<Column Names({"x","y"}));
biv=Bivariate(
	Y( :y ),
	X( :x ),
	Fit Line( )
);
rmse_lin_fit=((biv<< report())["Summary of Fit"][Number Col Box(1)] <<get())[3];
show(rmse_lin_reg,rmse_lin_fit);

returns

rmse_lin_reg = 0.13656988109602;
rmse_lin_fit = 0.13656988109602;