cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
MAS
MAS
Level III

kNN (k Nearest Neighbors) Displays Different Fit Statistics in JMP Pro v16

A heads-up to those of you hoping to use kNN in JMP v16 with cross-validation...

 

The Word attachment indicates that, for k = 3:

 Training R2Validation R2
2nd exhibit0.928390.90874
3rd exhibit0.97060.9086

 

The validation R2s are close, but not identical. Although this is minor, the difference in the training R2s is not: 0.93 vs 0.97.

 

Documentation about this issue is not available, although there is mention of the fact that the kNN algorithm produces different columns of predictions depending on whether Saved Predicteds or Save Prediction Formula is used, which most likely explains the discrepancy in fit statistics. However, we do not know which of the competing sets of fit statistics are correct.

 

We need accurate training R2s results to properly gauge the amount of overfitting exhibited by a model.

 

JMP v16 uses a new algorithm for kNN, so this discrepancy is not evidenced in JMP v15 or earlier.

 

I also attach the JMP data file used to produce the results shown in the Word document.

 

 

 

 

 

 

1 REPLY 1
Byron_JMP
Staff

Re: kNN (k Nearest Neighbors) Displays Different Fit Statistics in JMP Pro v16

Is it possible you may be comparing r-squared from different models? For example your table has Least Squares model with fairly different fit statistics than the KNN model. 

I ran KNN as described on your data set in both JMP 15 and 16. The training and validation r-squared statistics for version 15 and 16 are slightly different but very similar. (random seed was the same), the script is blow. Not included are comparisons between 14 and 15, which were identical. 

 

If someone from JMP Dev doesn't comment, maybe send a note to support@jmp.com and reference this discussion thread. Although I don't know any details, I suspect the method behind KNN was updated to support very wide tables. It is possible that JMP16 is providing a more precise estimate. 

 

 

Screen Shot 2021-08-11 at 6.53.28 PM.png

 

 

K Nearest Neighbors(
	Validation( :"Train/Validate"n ),
	Y( :MSRP ),
	X(
		:"Curb Weight (lbs)"n, :"Displacement (cc)"n, :Cylinders, :Horsepower,
		:MPG City, :MPG Highway, :"Length (in)"n, :Seating Capacity,
		:"Torque (lbs/ft)"n, :"Wheelbase (in)"n, :Car Type, :Make
	),
	K( 10 ),
	Set Random Seed( 123 ),
	Response( "MSRP", Plot Actual by Predicted( 1 ) )
)

 

JMP Systems Engineer, Health and Life Sciences (Pharma)