The goal of model is selection is generalization, not best fit. You can over-fit the training data such that the prediction of new observations (or hold out data) is poor. The model was trained to include noise in the features as information, but the new observations have different (random) noise, so the predictions do not generalize to new data.
Does that answer explain your case?