Sep 10, 2020 1:21 PM
| Last Modified: Sep 14, 2020 8:50 AM
The Imbalanced Classification add-in features sampling techniques that attempt to impose a more balanced distribution between the two classes. The sampling techniques include the synthetic minority oversampling technique (SMOTE), Tomek links, and a combination of the two, as well as some basic sampling approaches. The Tomek Sampling, SMOTE Observations, and SMOTE plus Tomek options enable you to apply these sampling techniques on their own to support your specific modeling efforts.
The comprehensive Evaluate Models option enables you to fit models using various sampling methods and compare them on a test set to select thresholds using Precision-Recall, ROC, and Cumulative Gains curves, as well as other measures of classification accuracy. The other three options do not fit models, but rather enable you to apply the Tomek, SMOTE, and SMOTE plus Tomek sampling schemes to your own data.
The SMOTE, Tomek, and combined SMOTE and Tomek sampling techniques use the concept of nearest neighbors. The add-in uses Gower distance as its distance metric, which allows for continuous, nominal, and ordinal predictors.
Note: All options require JMP version 15.2 or higher. Excluded rows and rows with missing response values are ignored by the add-in.