Hi,
Is there any other metric I can use to help determine which outlier is more likely the true outlier in the dataset?
I performed a DoE, and during the modeling, noticed that there are two equal outliers (By Externally studentized residuals: 48 & -48). If I exclude one or the other, the model's adj R2 immediately and majorly improves from around 0.65 to 0.99, and the other outlier is now found to be within trend.
Depending on the outlier I choose to exclude, it impacts my model. In one case, I get only linear and interaction effects, in another case, I get some quadratic effects as well (I expect quadratic but cannot be certain).
Is there any other metric I can use to help determine which outlier is more likely the true outlier in the dataset? I want to avoid any scientific bias if at all possible. In looking at the experiment, I cannot identify anything incorrect in the execution of the conditions which would sway my decision.
Hi @MetaLizard62080,
Studentized residuals may be a good way to identify outliers based on an assumed model. See more infos about how the studentized residuals are calculated here : Row Diagnostics (jmp.com)
"Points that fall outside the red limits should be treated as probable outliers. Points that fall outside the green limits but within the red limits should be treated as possible outliers, but with less certainty." As you can see from the definition, there is no definitive certainty about the nature of outliers, as it may depend on the assumed model you are fitting.
Maybe the model you're fitting is too simple/not relevant for all points and measurements collected (it seems these quadratic effects may be relevant) ?
What you could do is to analyze your dataset with model-agnostic outliers method, to check if these two points seem to be "strange", or if it's only a problem of adequate model fitting. Model-agnostic outlier detection methods, like Mahalanobis or Jackknife distances, don't rely on a specified model and just compare distance between points based on variables/factors/features. So an outlier identified by this type of methods indicates that this point looks "strange" and doesn't seem to be part of the factors distributions of the other points.
Outliers Episode 3: Detecting outliers using the Mahalanobis distance (and T2)
Outliers Episode 4: Detecting outliers using jackknife distance
From the limited info provided and the diagnostic only done in regards to a specific model fitting, you shouldn't remove any points if you have not justified you "can" do it, based on statistical properties (outliers, ...) AND domain expertise (erroneous values, typo in the measurement recording, bug/problem in the measurement system, ...). In your example, you can still compare the outcomes of two models, one model with all points, and the other one by "hiding and excluding" the two "strange" points (or simply adding a column for weighing these two outliers points less than other points), and see if the outcomes are very different or not. But you may check before some other options, metrics like Press RMSE/R2, Cook's distances, and checking multiple linear regression assumptions, to check your model is acceptable. Maybe your data could use a Box-Cox Y Transformation ?
Hope this anwer may help you,
Here are my thoughts and I agree with Victor, don't just remove data without know what happened for that treatment (those data points may be the most informative in your study):
1. Using the Deming terminology, special cause data points in an experiment, ironically, are not that unusual. This is likely due to the fact that you are doing something quite unusual in experimentation (e.g., manipulating factors at bold level settings). The question about these particular data points is; Are they due to the factors in the experiment or are they assignable to the noise?
2. There are many ways to diagnose the quality of the data from an experiment, residuals analysis is one of those and it can be quite useful as residuals analysis is done after the model has been defined and the variation explained by the model is known. For example, other ways to diagnose the quality of the data:
If you had anticipated potential special cause, you could run repeats and if one of the repeated data points is unusual, you still have other repeats for that treatment to use for analysis.
It is always good to perform diagnostics for DOE data. To paraphrase one of my favorite DOE authors (Cuthbert Daniel):
The commonest of defects in DOE are
As a side note, you should be more interested in the delta between R-Square and R-Square Adjusted for model refinement and use the R-Square Adjusted to evaluate model adequacy.