Choose Language Hide Translation Bar
Super User

"Stability" of RMSE across validation/training/test sets

Is there a good common practice of saying that RMSE (or whatever model parameter) are "stable" across the three sets.  Is there any rule of thumb stuff?  Within 10% of each other, etc?


Currently I'm just taking another RMSE vs the mean for the points, but don't know if that would be a valid metric or if there's already somethign out there that accomplishes this. This assumes the same dataset. 


Names default to here(1);
//Training, Validation, Testing Respectively
RMSE1 = [12.23434, 15.1546, 10.6572];
RMSE2 = [12.23434, 12.526, 12.0000];
RMSE3 = [12.23434, 12.23434, 12.23434];

stable1 = sqrt(mean((mean(RMSE1)-RMSE1)^2));
stable2 = sqrt(mean((mean(RMSE2)-RMSE2)^2));
stable3 = sqrt(mean((mean(RMSE3)-RMSE3)^2));

show(stable1, stable2, stable3);

What I'm eventually trying to do is pick a model that has "Good" stability and a low RMSE.  I feel like this has to be a common thing, but I can't find good practices. 

Vince Faller - Predictum
0 Kudos

Re: "Stability" of RMSE across validation/training/test sets

Sorry, Vince. I'm not aware of any common practices on this. I agree it seems like a really good idea. However, I think it will depend on a lot of variables. For example, data size: as data size increases I would expect the "best" models to be more stable across validation sets. It sounds like this could be an interesting MSc project for someone to do some simulations and see if they find something useful.
0 Kudos