Our World Statistics Day conversations have been a great reminder of how much statistics can inform our lives. Do you have an example of how statistics has made a difference in your life? Share your story with the Community!
Choose Language Hide Translation Bar

Anomaly Detection and JMP(R) Pro ( 2019-EU-45MP-115 )

Level: Intermediate
Job Function: Analyst / Scientist / Engineer
Michael Crotty, JMP Senior Statistical Writer, SAS
Marie Gaudard, Statistical Consultant
Colleen McKendry, JMP Technical Writer, SAS

 

Best Invited Paper Finalist

 

In situations where anomaly detection is the goal of a predictive model, the underlying data often exhibit an imbalanced class distribution. Namely, the anomalous class is significantly smaller than the non-anomalous class. The modeling goal is usually to identify members of the minority class. However, a straightforward application of predictive modeling techniques can result in a biased and inaccurate model. Many techniques have been proposed to address these issues. We seek to guide JMP Pro users in developing predictive models for imbalanced data. We address JMP Pro approaches to classification into an underrepresented class. We first describe general aspects of the imbalanced class problem: bias, performance measures and approaches to addressing the modeling issues. We then discuss the sampling methods we use in our study; these include weighting, under-sampling, over-sampling and the synthetic minority oversampling technique (SMOTE). For several real data sets that exhibit varying class proportions, we compare the fits obtained using these sampling methods in combination with predictive models available in JMP Pro classification platforms. We perform a similar exploration of sampling techniques and predictive models for a limited range of simulated data sets. For the simulated data sets, we attempt to identify the degree of under-representation for which standard models begin to be affected by class imbalance. We also present conclusions about the relative performance of the sampling methods and predictive models.

 

An add-in that performs the analyses discussed in this presentation is now available: Imbalanced Classification Add-In. The add-in expands the capabilities of the scripts found in Scripts_and_Results.zip.  Among other features, the add-in does not depend on R, it performs SMOTE, Tomek, and SMOTE plus Tomek sampling on data sets that include nominal and ordinal predictors, and it provides extensive documentation through the dialog's Help button.

Comments

Thanks. it is really great and helpful!

one question, can JMPpro be used for real-time anomaly detection(such as sensor data)?

Thank you!

I don't know of anything built into JMP Pro that would handle real-time sensor data, but with some scripting, I'm sure you could do it. That would be a good question for the larger JMP Community.

 

Thanks for your interest and question!


Michael