<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic kNN (k Nearest Neighbors) Displays Different Fit Statistics in JMP Pro v16 in Discussions</title>
    <link>https://community.jmp.com/t5/Discussions/kNN-k-Nearest-Neighbors-Displays-Different-Fit-Statistics-in-JMP/m-p/405667#M65572</link>
    <description>&lt;P&gt;A heads-up to those of you hoping to use kNN in JMP v16 with cross-validation...&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The Word attachment indicates that, for &lt;EM&gt;&lt;STRONG&gt;k = 3&lt;/STRONG&gt;&lt;/EM&gt;:&lt;/P&gt;&lt;TABLE border="1"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;Training R2&lt;/TD&gt;&lt;TD&gt;Validation R2&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2nd exhibit&lt;/TD&gt;&lt;TD&gt;0.92839&lt;/TD&gt;&lt;TD&gt;0.90874&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;3rd exhibit&lt;/TD&gt;&lt;TD&gt;0.9706&lt;/TD&gt;&lt;TD&gt;0.9086&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The validation R2s are close, but not identical. Although this is minor, the difference in the training R2s is not: 0.93 vs 0.97.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Documentation about this issue is not available, although there is mention of the fact that the kNN algorithm produces different columns of predictions depending on whether Saved Predicteds or Save Prediction Formula is used, which most likely explains the discrepancy in fit statistics. However, we do not know which of the competing sets of fit statistics are correct.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We need accurate training R2s results to properly gauge the amount of overfitting exhibited by a model.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;JMP v16 uses a new algorithm for kNN, so this discrepancy is not evidenced in JMP v15 or earlier.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I also attach the JMP data file used to produce the results shown in the Word document.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Sat, 10 Jun 2023 20:46:13 GMT</pubDate>
    <dc:creator>MAS</dc:creator>
    <dc:date>2023-06-10T20:46:13Z</dc:date>
    <item>
      <title>kNN (k Nearest Neighbors) Displays Different Fit Statistics in JMP Pro v16</title>
      <link>https://community.jmp.com/t5/Discussions/kNN-k-Nearest-Neighbors-Displays-Different-Fit-Statistics-in-JMP/m-p/405667#M65572</link>
      <description>&lt;P&gt;A heads-up to those of you hoping to use kNN in JMP v16 with cross-validation...&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The Word attachment indicates that, for &lt;EM&gt;&lt;STRONG&gt;k = 3&lt;/STRONG&gt;&lt;/EM&gt;:&lt;/P&gt;&lt;TABLE border="1"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;Training R2&lt;/TD&gt;&lt;TD&gt;Validation R2&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2nd exhibit&lt;/TD&gt;&lt;TD&gt;0.92839&lt;/TD&gt;&lt;TD&gt;0.90874&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;3rd exhibit&lt;/TD&gt;&lt;TD&gt;0.9706&lt;/TD&gt;&lt;TD&gt;0.9086&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The validation R2s are close, but not identical. Although this is minor, the difference in the training R2s is not: 0.93 vs 0.97.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Documentation about this issue is not available, although there is mention of the fact that the kNN algorithm produces different columns of predictions depending on whether Saved Predicteds or Save Prediction Formula is used, which most likely explains the discrepancy in fit statistics. However, we do not know which of the competing sets of fit statistics are correct.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We need accurate training R2s results to properly gauge the amount of overfitting exhibited by a model.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;JMP v16 uses a new algorithm for kNN, so this discrepancy is not evidenced in JMP v15 or earlier.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I also attach the JMP data file used to produce the results shown in the Word document.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 10 Jun 2023 20:46:13 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/kNN-k-Nearest-Neighbors-Displays-Different-Fit-Statistics-in-JMP/m-p/405667#M65572</guid>
      <dc:creator>MAS</dc:creator>
      <dc:date>2023-06-10T20:46:13Z</dc:date>
    </item>
    <item>
      <title>Re: kNN (k Nearest Neighbors) Displays Different Fit Statistics in JMP Pro v16</title>
      <link>https://community.jmp.com/t5/Discussions/kNN-k-Nearest-Neighbors-Displays-Different-Fit-Statistics-in-JMP/m-p/408955#M65874</link>
      <description>&lt;P&gt;Is it possible you may be comparing r-squared from different models? For example your table has Least Squares model with fairly different fit statistics than the KNN model.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I ran KNN as described on your data set in both JMP 15 and 16. The training and validation r-squared statistics for version 15 and 16 are slightly different but very similar. (random seed was the same), the script is blow. Not included are comparisons between 14 and 15, which were identical.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If someone from JMP Dev doesn't comment, maybe send a note to &lt;A href="mailto:support@jmp.com" target="_blank"&gt;support@jmp.com&lt;/A&gt;&amp;nbsp;and reference this discussion thread. Although I don't know any details, I suspect the method behind KNN was updated to support very wide tables. It is possible that JMP16 is providing a more precise estimate.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-left" image-alt="Screen Shot 2021-08-11 at 6.53.28 PM.png" style="width: 999px;"&gt;&lt;img src="https://community.jmp.com/t5/image/serverpage/image-id/34959i16AEB5BD8BF0B5C5/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screen Shot 2021-08-11 at 6.53.28 PM.png" alt="Screen Shot 2021-08-11 at 6.53.28 PM.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-jsl"&gt;K Nearest Neighbors(
	Validation( :"Train/Validate"n ),
	Y( :MSRP ),
	X(
		:"Curb Weight (lbs)"n, :"Displacement (cc)"n, :Cylinders, :Horsepower,
		:MPG City, :MPG Highway, :"Length (in)"n, :Seating Capacity,
		:"Torque (lbs/ft)"n, :"Wheelbase (in)"n, :Car Type, :Make
	),
	K( 10 ),
	Set Random Seed( 123 ),
	Response( "MSRP", Plot Actual by Predicted( 1 ) )
)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 11 Aug 2021 23:02:48 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/kNN-k-Nearest-Neighbors-Displays-Different-Fit-Statistics-in-JMP/m-p/408955#M65874</guid>
      <dc:creator>Byron_JMP</dc:creator>
      <dc:date>2021-08-11T23:02:48Z</dc:date>
    </item>
  </channel>
</rss>

