kNN (k Nearest Neighbors) Displays Different Fit Statistics in JMP Pro v16

MAS — Sat, 10 Jun 2023 20:46:13 GMT

A heads-up to those of you hoping to use kNN in JMP v16 with cross-validation...

The Word attachment indicates that, for k = 3:

	Training R2	Validation R2
2nd exhibit	0.92839	0.90874
3rd exhibit	0.9706	0.9086

The validation R2s are close, but not identical. Although this is minor, the difference in the training R2s is not: 0.93 vs 0.97.

Documentation about this issue is not available, although there is mention of the fact that the kNN algorithm produces different columns of predictions depending on whether Saved Predicteds or Save Prediction Formula is used, which most likely explains the discrepancy in fit statistics. However, we do not know which of the competing sets of fit statistics are correct.

We need accurate training R2s results to properly gauge the amount of overfitting exhibited by a model.

JMP v16 uses a new algorithm for kNN, so this discrepancy is not evidenced in JMP v15 or earlier.

I also attach the JMP data file used to produce the results shown in the Word document.

Re: kNN (k Nearest Neighbors) Displays Different Fit Statistics in JMP Pro v16

Byron_JMP — Wed, 11 Aug 2021 23:02:48 GMT

Is it possible you may be comparing r-squared from different models? For example your table has Least Squares model with fairly different fit statistics than the KNN model.

I ran KNN as described on your data set in both JMP 15 and 16. The training and validation r-squared statistics for version 15 and 16 are slightly different but very similar. (random seed was the same), the script is blow. Not included are comparisons between 14 and 15, which were identical.

If someone from JMP Dev doesn't comment, maybe send a note to support@jmp.com and reference this discussion thread. Although I don't know any details, I suspect the method behind KNN was updated to support very wide tables. It is possible that JMP16 is providing a more precise estimate.

K Nearest Neighbors(
	Validation( :"Train/Validate"n ),
	Y( :MSRP ),
	X(
		:"Curb Weight (lbs)"n, :"Displacement (cc)"n, :Cylinders, :Horsepower,
		:MPG City, :MPG Highway, :"Length (in)"n, :Seating Capacity,
		:"Torque (lbs/ft)"n, :"Wheelbase (in)"n, :Car Type, :Make
	),
	K( 10 ),
	Set Random Seed( 123 ),
	Response( "MSRP", Plot Actual by Predicted( 1 ) )
)

topic kNN (k Nearest Neighbors) Displays Different Fit Statistics in JMP Pro v16 in Discussions

kNN (k Nearest Neighbors) Displays Different Fit Statistics in JMP Pro v16

Re: kNN (k Nearest Neighbors) Displays Different Fit Statistics in JMP Pro v16