Outliers detection models screening

Victor_G · ‎03-07-2023

What inspired this wish list request?

Currently, outliers detection techniques are available in different menus :

Some univariate and multivariate outliers detection techniques are available in "Screening" -> "Explore Outliers",
Some multivariate techniques are available in "Multivariate Methods" -> "Multivariate" and different from the other detection techniques from the Screening platforms.

Finally, some techniques or models used in litterature (Isolation Tree/Forest, One-Class SVM, k-Nearest Neigbors...) are not directly available in JMP.

What is the improvement you would like to see?

Following the idea of the Python library pyOD (pyod 1.0.7 documentation) and other libraries/packages, it would be great to have only one single JMP menu dedicated to outliers detection, using several methods combined like in the "Model Screening" platform, where several options would be available :

If a response is present in the dataset and used in the "Y, column" dialog, the emphasis and techniques used would be supervised (semi or fully supervised). If no response is added in the "Y, column" dialog menu, then the techniques used could be unsupervised.
In the same way as "Model Screening" platform is proposing several models in a auto-ML way, it would be great if JMP user could choose all, several or only one methods among those proposed. In order to help the user have more visibility on the different techniques, it could be perhaps helpful to group some techniques behind group, like for the type of anomaly there could be several groups of methods for local, global, dependency and clustered types of anomalies. Under each group, the user could select or deselect any of the techniques, to have more freedom and flexibility.
Finally, when the different outliers detection techniques are used, it could be great to have results in a "macro" and "micro" view :
- "Micro" view : For each detection techniques, being able to know which points were detected as outliers, in order to compare the techniques and see the agreements/disagreements between the methods,
- "Macro" view : For the ensemble of detection techniques, instead of "hard voting" or binary result like outlier vs. normal, it could be great to have instead a continuous "anomaly score" from ensembling the different models (ranging from 0: never detected as outlier with all selected techniques to 1: always detected as outlier), in order to use it in a transformed way (perhaps in a "reciprocal" transformation: 1/anomaly score) in other modeling platforms as a "Frequency" or "Weight" variable, in order to create "weighted models" that take into account outliers, but reduce their importance.

Why is this idea important?

Two points are motivating this request :

Unify and group the different outliers detection techniques in a single platform could help user be aware of the different techniques proposed,
It could be a great starting point to add new models to the JMP platform focused on this outlier detection aspect.