I have a dataset where I'm trying to identify outliers or otherwise points of interest. Looking at my models actual by predicted plot reveals points with star marker. However the Mahalanobis distance and Jacknife distance do not necesarrily identify these points in the same manner. Is there a recommendation on outlier analysis?
Hi @sanch1,
You're comparing two very different ways of assessing outliers with different goals :
More info on outliers in my previous answer on similar topic : https://community.jmp.com/t5/Discussions/Supress-the-effect-of-outliers-when-fitting-the-model-and-i...
I hope this answer will help you,
Hi @sanch1,
You're comparing two very different ways of assessing outliers with different goals :
More info on outliers in my previous answer on similar topic : https://community.jmp.com/t5/Discussions/Supress-the-effect-of-outliers-when-fitting-the-model-and-i...
I hope this answer will help you,
How was the data collected? This has a huge effect on what analysis is appropriate. As Victor indicates, Mahalanobis is a multivariate outlier detector. If your response is univariate, you may want to use good old control charts, but again, it depends on how the data was gathered.
Nice Blog post series by @JerryFish about Mahalanobis and Jackknife outlier detection:
Outliers Episode 3: Detecting outliers using the Mahalanobis distance (and T2)
Outliers Episode 4: Detecting outliers using jackknife distance
how did you calculate the distances?
There is another option called Cook's D Influence. You save Cook's D Influence through the Save Columns option under the Response red triangle. Select the saved column and then run a Distribution. If any of your data points are >= 1 as measured by Cook's D they can be considered as potential outliers. See the example distribution image below.