Choose Language Hide Translation Bar

Anomaly Detection and JMP(R) Pro ( 2019-EU-45MP-115 )

Level: Intermediate
Job Function: Analyst / Scientist / Engineer
Michael Crotty, JMP Senior Statistical Writer, SAS
Marie Gaudard, Statistical Consultant
Colleen McKendry, JMP Technical Writer, SAS


Best Invited Paper Finalist


In situations where anomaly detection is the goal of a predictive model, the underlying data often exhibit an imbalanced class distribution. Namely, the anomalous class is significantly smaller than the non-anomalous class. The modeling goal is usually to identify members of the minority class. However, a straightforward application of predictive modeling techniques can result in a biased and inaccurate model. Many techniques have been proposed to address these issues. We seek to guide JMP Pro users in developing predictive models for imbalanced data. We address JMP Pro approaches to classification into an underrepresented class. We first describe general aspects of the imbalanced class problem: bias, performance measures and approaches to addressing the modeling issues. We then discuss the sampling methods we use in our study; these include weighting, under-sampling, over-sampling and the synthetic minority oversampling technique (SMOTE). For several real data sets that exhibit varying class proportions, we compare the fits obtained using these sampling methods in combination with predictive models available in JMP Pro classification platforms. We perform a similar exploration of sampling techniques and predictive models for a limited range of simulated data sets. For the simulated data sets, we attempt to identify the degree of under-representation for which standard models begin to be affected by class imbalance. We also present conclusions about the relative performance of the sampling methods and predictive models.