K Nearest Neighbors

Use a proximity-based algorithm to predict a categorical outcome (classify) or prediction the value of a continuous outcome for new observations based upon the outcomes of similar observations (i.e., their nearest neighbors).

K Nearest Neighbors

From an open JMP^® table, select Analyze > Predictive Modeling > K Nearest Neighbors.
Select a categorical or continuous response variable from Select Columns and click Y, Response. Here, we illustrate using a categorial response variable.
Select candidate predictor variables and click X, Factor.
If desired, enter the Validation Portion or select a validation column and click Validation. Click OK.

JMP displays:

Graph and table showing the misclassification rates and counts across a range of values for K.
Confusion Matrix detailing the classification performance for the value of K with the smallest misclassification rate.
Mosaic plots (not shown here) which graphically shows the values in the confusion matrix.

Equity.jmp (Help > Sample Data Folder)

Resuts of the K Nearest Neighbors to predict the risk level (Bad/Good)
• There are 1,192 observations in the Validation Data. The misclassifcation rate is the lowest when the prediction is based on only 1 nearest neighbor: 85/1,192 (7.1%) were misclassified. Note that the misclassification rate increases as the number of nearest neighbors increase. Of these total misclassifications in the Validation Data, 3/(3+194) = 0.3% of the Good Risk observations were misclassified as Bad Risk. 80/(80+195) = 29% of the Bad Risk observations were misclassified as Good Risk.
• There were 1,192 observation set aside for the Test Data (Results not shown). The total misclassification rate for these observations using 1 nearest neighbor is 6.9%. 0.4% of the Good Risk observations were misclassifed as Bad Risk and 3.2% of the Bad Risk observations were misclassified as Good Risk. These results are often considered to produce the most accurate estimate of what the misclassification rate will be in future data as these observations were not part of the model training nor selection process.

Notes: Additional options, such as Lift Curves, Saving Predicteds, Save Prediction Formula, and Publish Prediction Formula are accessible from the red triangle near the top next to the response variable name.

Visit Predictive and Specialized Models > K Nearest Neighbors in JMP Help to learn more.

Learning Library