<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic JMP Partition/Decision tree - Model Selection and Validation with Test Set in Discussions</title>
    <link>https://community.jmp.com/t5/Discussions/JMP-Partition-Decision-tree-Model-Selection-and-Validation-with/m-p/1152#M1152</link>
    <description>Hi forum members,&lt;BR /&gt;&lt;BR /&gt;I have the following problem. I want to do classification using SAS Jmp Partition.&lt;BR /&gt;The dataset has 2 classes: positive and negative. I wrote a JSL script that loads&lt;BR /&gt;the dataset from file. The test set is defined by excluding a certain amount of rows&lt;BR /&gt;and doing training of the decision tree on the training set (given by all included rows).&lt;BR /&gt;The training is done by hitting the "Go" button via the JSL script (i.e. sending the Go&lt;BR /&gt;message to the partition instance).&lt;BR /&gt;&lt;BR /&gt;Now my question : How does SAS Jmp Partition know when to stop growing the tree ?&lt;BR /&gt;According to the r-squared plots it seems tha JMP makes use of the test set to base&lt;BR /&gt;its decision to stop growing the tree on. This would be clearly unsatisfactory since I want&lt;BR /&gt;to use the test set performance (using the ROC curve and its AUC) to quanitfy the generalization&lt;BR /&gt;performance of the tree on unseen data.&lt;BR /&gt;&lt;BR /&gt;Or is the stopping point determined by the crossvalidation ROC/AUC ? But it seems the&lt;BR /&gt;AUC of the Crossvalidation run can be maximized without any problems near 0.999 by growing&lt;BR /&gt;the tree infinitely large, which would be clear overfitting.&lt;BR /&gt;&lt;BR /&gt;I use the following command to start Decision tree learning:&lt;BR /&gt;&lt;BR /&gt;part = Partition(&lt;BR /&gt;	Minimum Size Split( 20 ),&lt;BR /&gt;	Show Tree( 1 ),&lt;BR /&gt;	ROC Curve( 1 ),&lt;BR /&gt;        Column Contributions( 1 ),&lt;BR /&gt;	Split History( 1 ),&lt;BR /&gt;	Criterion( "Maximize Significance" ),&lt;BR /&gt;	K Fold Crossvalidation( 2 ),&lt;BR /&gt;	SendToReport( .... )&lt;BR /&gt;);&lt;BR /&gt;part &amp;lt;&amp;lt; ColorPoints &amp;lt;&amp;lt; Go &amp;lt;&amp;lt; LeafReport;&lt;BR /&gt;&lt;BR /&gt;and selection of test set rows is done using :&lt;BR /&gt;&lt;BR /&gt;Set Row States( [0,2,0,......] );&lt;BR /&gt;&lt;BR /&gt;The 2 encode excluded rows.&lt;BR /&gt;&lt;BR /&gt;Thanks in advance for any hints&lt;BR /&gt;&lt;BR /&gt;Marc</description>
    <pubDate>Wed, 13 Jan 2010 17:11:07 GMT</pubDate>
    <dc:creator />
    <dc:date>2010-01-13T17:11:07Z</dc:date>
    <item>
      <title>JMP Partition/Decision tree - Model Selection and Validation with Test Set</title>
      <link>https://community.jmp.com/t5/Discussions/JMP-Partition-Decision-tree-Model-Selection-and-Validation-with/m-p/1152#M1152</link>
      <description>Hi forum members,&lt;BR /&gt;&lt;BR /&gt;I have the following problem. I want to do classification using SAS Jmp Partition.&lt;BR /&gt;The dataset has 2 classes: positive and negative. I wrote a JSL script that loads&lt;BR /&gt;the dataset from file. The test set is defined by excluding a certain amount of rows&lt;BR /&gt;and doing training of the decision tree on the training set (given by all included rows).&lt;BR /&gt;The training is done by hitting the "Go" button via the JSL script (i.e. sending the Go&lt;BR /&gt;message to the partition instance).&lt;BR /&gt;&lt;BR /&gt;Now my question : How does SAS Jmp Partition know when to stop growing the tree ?&lt;BR /&gt;According to the r-squared plots it seems tha JMP makes use of the test set to base&lt;BR /&gt;its decision to stop growing the tree on. This would be clearly unsatisfactory since I want&lt;BR /&gt;to use the test set performance (using the ROC curve and its AUC) to quanitfy the generalization&lt;BR /&gt;performance of the tree on unseen data.&lt;BR /&gt;&lt;BR /&gt;Or is the stopping point determined by the crossvalidation ROC/AUC ? But it seems the&lt;BR /&gt;AUC of the Crossvalidation run can be maximized without any problems near 0.999 by growing&lt;BR /&gt;the tree infinitely large, which would be clear overfitting.&lt;BR /&gt;&lt;BR /&gt;I use the following command to start Decision tree learning:&lt;BR /&gt;&lt;BR /&gt;part = Partition(&lt;BR /&gt;	Minimum Size Split( 20 ),&lt;BR /&gt;	Show Tree( 1 ),&lt;BR /&gt;	ROC Curve( 1 ),&lt;BR /&gt;        Column Contributions( 1 ),&lt;BR /&gt;	Split History( 1 ),&lt;BR /&gt;	Criterion( "Maximize Significance" ),&lt;BR /&gt;	K Fold Crossvalidation( 2 ),&lt;BR /&gt;	SendToReport( .... )&lt;BR /&gt;);&lt;BR /&gt;part &amp;lt;&amp;lt; ColorPoints &amp;lt;&amp;lt; Go &amp;lt;&amp;lt; LeafReport;&lt;BR /&gt;&lt;BR /&gt;and selection of test set rows is done using :&lt;BR /&gt;&lt;BR /&gt;Set Row States( [0,2,0,......] );&lt;BR /&gt;&lt;BR /&gt;The 2 encode excluded rows.&lt;BR /&gt;&lt;BR /&gt;Thanks in advance for any hints&lt;BR /&gt;&lt;BR /&gt;Marc</description>
      <pubDate>Wed, 13 Jan 2010 17:11:07 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/JMP-Partition-Decision-tree-Model-Selection-and-Validation-with/m-p/1152#M1152</guid>
      <dc:creator />
      <dc:date>2010-01-13T17:11:07Z</dc:date>
    </item>
    <item>
      <title>Re: JMP Partition/Decision tree - Model Selection and Validation with Test Set</title>
      <link>https://community.jmp.com/t5/Discussions/JMP-Partition-Decision-tree-Model-Selection-and-Validation-with/m-p/1153#M1153</link>
      <description>From p816 of the JMP Stat and Graph Guide (From JMP, Help&amp;gt;Books&amp;gt;JMP Stat and Graph Guide):&lt;BR /&gt;&lt;BR /&gt;&lt;B&gt;Automatic Splitting&lt;/B&gt;&lt;BR /&gt;The Go button (shown in Figure 37.12) appears when you have cross-validation enabled. This is done&lt;BR /&gt;by either using the K Fold Crossvalidation command, or excluding at least 20% of rows as a holdout&lt;BR /&gt;sample.&lt;BR /&gt;The Go button provides for repeated splitting without having to repeatedly click the Split button.&lt;BR /&gt;&lt;B&gt;&lt;I&gt;When you click the Go button, the platform performs repeated splitting until the cross-validation&lt;BR /&gt;R-Square is better than what the next 10 splits would obtain.&lt;/I&gt;&lt;/B&gt; This rule may produce complex trees that&lt;BR /&gt;are not very interpretable, but have good predictive power.&lt;BR /&gt;Using the Go button turns on the Split History command. Also, if using the Go button results in a tree&lt;BR /&gt;with more than 40 nodes, the Show Tree command is turned off.</description>
      <pubDate>Wed, 13 Jan 2010 20:48:06 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/JMP-Partition-Decision-tree-Model-Selection-and-Validation-with/m-p/1153#M1153</guid>
      <dc:creator>mpb</dc:creator>
      <dc:date>2010-01-13T20:48:06Z</dc:date>
    </item>
  </channel>
</rss>

