Discussions

MetaLizard62080 · Mar 16, 2025 11:17 PM

Hi,

Is there any other metric I can use to help determine which outlier is more likely the true outlier in the dataset?

I performed a DoE, and during the modeling, noticed that there are two equal outliers (By Externally studentized residuals: 48 & -48). If I exclude one or the other, the model's adj R2 immediately and majorly improves from around 0.65 to 0.99, and the other outlier is now found to be within trend.

Depending on the outlier I choose to exclude, it impacts my model. In one case, I get only linear and interaction effects, in another case, I get some quadratic effects as well (I expect quadratic but cannot be certain).

Is there any other metric I can use to help determine which outlier is more likely the true outlier in the dataset? I want to avoid any scientific bias if at all possible. In looking at the experiment, I cannot identify anything incorrect in the execution of the conditions which would sway my decision.

Victor_G · Mar 17, 2025 03:58 AM

Hi @MetaLizard62080,

Studentized residuals may be a good way to identify outliers based on an assumed model. See more infos about how the studentized residuals are calculated here : Row Diagnostics (jmp.com)

"Points that fall outside the red limits should be treated as probable outliers. Points that fall outside the green limits but within the red limits should be treated as possible outliers, but with less certainty." As you can see from the definition, there is no definitive certainty about the nature of outliers, as it may depend on the assumed model you are fitting.

Maybe the model you're fitting is too simple/not relevant for all points and measurements collected (it seems these quadratic effects may be relevant) ?

What you could do is to analyze your dataset with model-agnostic outliers method, to check if these two points seem to be "strange", or if it's only a problem of adequate model fitting. Model-agnostic outlier detection methods, like Mahalanobis or Jackknife distances, don't rely on a specified model and just compare distance between points based on variables/factors/features. So an outlier identified by this type of methods indicates that this point looks "strange" and doesn't seem to be part of the factors distributions of the other points.
Outliers Episode 3: Detecting outliers using the Mahalanobis distance (and T2)
Outliers Episode 4: Detecting outliers using jackknife distance

From the limited info provided and the diagnostic only done in regards to a specific model fitting, you shouldn't remove any points if you have not justified you "can" do it, based on statistical properties (outliers, ...) AND domain expertise (erroneous values, typo in the measurement recording, bug/problem in the measurement system, ...). In your example, you can still compare the outcomes of two models, one model with all points, and the other one by "hiding and excluding" the two "strange" points (or simply adding a column for weighing these two outliers points less than other points), and see if the outcomes are very different or not. But you may check before some other options, metrics like Press RMSE/R2, Cook's distances, and checking multiple linear regression assumptions, to check your model is acceptable. Maybe your data could use a Box-Cox Y Transformation ?

Hope this anwer may help you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

statman · Mar 17, 2025 12:04 PM

Here are my thoughts and I agree with Victor, don't just remove data without know what happened for that treatment (those data points may be the most informative in your study):

1. Using the Deming terminology, special cause data points in an experiment, ironically, are not that unusual. This is likely due to the fact that you are doing something quite unusual in experimentation (e.g., manipulating factors at bold level settings). The question about these particular data points is; Are they due to the factors in the experiment or are they assignable to the noise?

2. There are many ways to diagnose the quality of the data from an experiment, residuals analysis is one of those and it can be quite useful as residuals analysis is done after the model has been defined and the variation explained by the model is known. For example, other ways to diagnose the quality of the data:

Sort the Y (down for each Y) in order best to worst. Get a MR chart. Points at either end of the MR chart that are OOC are likely not due to treatment effects but due to noise. (This method is called ANOG, analysis of good)
Compare the actual data for each treatment to predicted values (this requires you predict the data á priori). Don't turn off engineering! What are your hypotheses as to what was happening for each treatment? Remember, for residuals, the actual data may not look that unusual. It is the fact that the model does a poor job of predicting the results of that particular treatment that is of interest. Why would the model be poor for those treatments?
Perform Rank analysis. With the data table sorted in ANOG fashion, add a new column and label it Rank. Put sequential numbers in that column (e.g., for a 16 run experiment, you will have numerous 1-16). Analyse the Rank along with the actual Y and see how the results compare. The Rank removes large jumps in the actual data, so the leveraged effect of the special cause data point is minimized.

If you had anticipated potential special cause, you could run repeats and if one of the repeated data points is unusual, you still have other repeats for that treatment to use for analysis.

It is always good to perform diagnostics for DOE data. To paraphrase one of my favorite DOE authors (Cuthbert Daniel):

The commonest of defects in DOE are

Oversaturation: too many effects for the number of treatments
Overconservativeness: too many observations for the desired estimates
Failure to study the data for bad values
Failure to take into account all of the aliasing
Imprecision due to misunderstanding the error variance.

As a side note, you should be more interested in the delta between R-Square and R-Square Adjusted for model refinement and use the R-Square Adjusted to evaluate model adequacy.

"All models are wrong, some are useful" G.E.P. Box

Discussions

Choosing to exclude from 2 equal outliers in DoE

Re: Choosing to exclude from 2 equal outliers in DoE

Re: Choosing to exclude from 2 equal outliers in DoE

Recommended Articles