First, welcome to the community.
My thoughts:
There are a number of statistics used to evaluate a given model (and its adequacy). Plots of residuals (including studentized) may be very helpful in identifying outliers. No one plot is the best in all circumstances to do this. When plots identify these potentially unusual points you must remember it is not the actual data that is unusual, but that the mode did a poor job of predicting the actual data point. This is an indicator the model may need to be re-evaluated (and perhaps more importantly you may get a better understanding of the true mechanisms/causal relationships at work). Also remember, the model and all statistics associated with the evaluation of the model (RMSE, p-values, R-square-R-square adjusted delta, etc.) are ALL CONDITIONAL. Change what is in the model or what estimates the MSE or the inference space , etc. and the model adequacy can/will change (hence why when you removed data, a new model was created and changed the residual plots). If outliers are identified, I always use practical significance first, then it is possible the terms in the model do not adequately predict actual values. This often is the result of the effect of noise in the system and possibly inconsistent noise. When plotting the residuals by row, always make sure the data is first sorted in run order. This may offer clues as to when the model has issues.
"All models are wrong, some are useful" G.E.P. Box