topic Re: When to remove outliers when reducing model? in Discussions

When to remove outliers when reducing model?

MetaLizard62080 — Fri, 21 Mar 2025 15:13:37 GMT

Hi,

When I am reducing a model, I watch the externally studentized residuals to know when outliers are appearing. When should I remove these outliers throughout the model reducing process?

If I see no outliers when the model contains all terms, but then I remove a term and an outlier appears, do you remove that outlier immediately and continue reducing, or continue reducing and then remove the outliers at the end?

Re: When to remove outliers when reducing model?

Victor_G — Sun, 23 Mar 2025 14:30:17 GMT

Hi @MetaLizard62080,

Did you read responses from @statman and I about your previous post ?

To make it clear and repeat it, Studentized residuals may be a good way to identify outliers based on an assumed model. See more infos about how the studentized residuals are calculated here : Row Diagnostics (jmp.com)

"Points that fall outside the red limits should be treated as probable outliers. Points that fall outside the green limits but within the red limits should be treated as possible outliers, but with less certainty."

You can use them as a model diagnostic of the complexity adequacy of the model fitted on the data. This illustrates notably how/why your studentized residuals results are dependant on model, as removing/adding a term in the model change the diagnostic about which points may be model-based outliers : the behavior of these points are not described/predicted well by the model, but that doesn't make them outliers in every other cases/modeling options.
You should NOT discard/delete points based on model-based outliers analysis, these tools are great to refine your model and adjust its complexity, with the help of other statistical metrics and criterion (R²/R² adjusted, RMSE, p-values, Information criterion like AICc/BIC, ...).

See other related posts :

Identifying and analyzing outliers should be done before modeling, with adequate tools. If you want to investigate if points in your dataset may be outliers, try to use multivariate methods based on distances like Mahalanobis, Jackknife or T² distances : Outlier Analysis You also have a range of other analysis in the menu Explore Outliers.
In any case, a statistical analysis is not sufficient to discard points that may be outliers, you have to investigate these strange points and understand how/why the measured values of these points seem strange compared to others : measurement error, experimental error, operator change/error, or perhaps something unexpected is happening ?

Hope this answer may help you,

Re: When to remove outliers when reducing model?

MetaLizard62080 — Sun, 23 Mar 2025 14:40:49 GMT

Hi Victor,

I did read the response. Often in my line of work, we have high assay variability which can easily explain erratic results that could be deemed as outliers.

I usually start my analysis with JackKnife Z using the multivariate platform to asses general responses. This does not always reveal outliers to the model though. For example, in my last DoE, JackKnife shows values < 2 for an outlier that is found in every case by Externally Studentized residuals. I was unable to find a reason why this point was an outlier, but without removing it, my model showed an adj R^2 of 0.61 whereas with the point removed, the Adj R^2 increased to 0.99. Along with this, the model with the point removed also made "Scientific Sense" whereas without the point removed, it was generally chaotic.

I always like to compare possible models when I do remove outliers to see if there is even a significant impact to the prediction. In this case, depending on when I remove the point, (At beginning, in the middle, or at the end) of model reduction, I did find I had different models, but the practical predictive capability was roughly the same. In one case, I had a very slight quadratic for example, but it was not a dominating factor. While in this case, all three models were most likely similarly useful, I would like to know the best practice for settling on the most likely model.

I understand there is more to removing points than just following the studentized residual process, however if I know there is an outlier or have a strong sense there could be, is it best to remove that before, in the middle of, of after the model reduction, as that will influence the results the model converges to.

Re: When to remove outliers when reducing model?

Victor_G — Sun, 23 Mar 2025 16:25:21 GMT

Hi @MetaLizard62080,

If you expect high assay variability, do you account for this noise source by using blocking ?

It's great if you can have a model that you're able to validate with domain expertise, this should reduce the possibility of errors. Can you perhaps repeat the tests (or measurements only) that seem to be strange ? This could help you figure out if it's a "systematic error" or "random error" and inform your decision-making.

Instead of directly removing points that seem strange and not described precisely by the model, I would still use them, but lower their influence on the model, by creating a column "weight" with value 1 for "normal" points and a lower value for "strange points", and use this "weight" column in the Model dialog as a Weight variable : Elements in the Fit Model Launch Window

Removing points so that the model converges faster will bias the model, and probably create falsely optimistic model results.

Hope some of these points make sense to you,