<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic unbalanced data and weight classification in Discussions</title>
    <link>https://community.jmp.com/t5/Discussions/unbalanced-data-and-weight-classification/m-p/239851#M47392</link>
    <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a database with an highly unbalanced oridnal response. The class of major personal interest is a minority class (patients with a rare but severe infection, around 5% of the observations).&lt;/P&gt;&lt;P&gt;The classification of the other oridinal classes work well with a boosted forest tree classifier but these classes or not at my interest (less severe infection or no infection at all). I wander whether I can use the "weight" option to give more weight to the minority class. Moreover, I want to calculate at which optimal threshold of "weight" the classifier performed best. In unbalanced data, the AUC ROC seems not to be the appropriate performance measure.&amp;nbsp; A F1-score (measure of false positive and false negative rates) seems more adequate for this.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is anyone aware of autocalculation of F1-scores in JMP?&lt;/P&gt;&lt;P&gt;Is there a way to fine tune&amp;nbsp; the option "Weight" in JMP classification models in highly unbalanced data in order maxiamalize the F1-scores ?&lt;/P&gt;&lt;P&gt;Does somebody has other suggestions to solve this problem?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Lu&lt;/P&gt;</description>
    <pubDate>Sat, 28 Dec 2019 17:10:40 GMT</pubDate>
    <dc:creator>Lu</dc:creator>
    <dc:date>2019-12-28T17:10:40Z</dc:date>
    <item>
      <title>unbalanced data and weight classification</title>
      <link>https://community.jmp.com/t5/Discussions/unbalanced-data-and-weight-classification/m-p/239851#M47392</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a database with an highly unbalanced oridnal response. The class of major personal interest is a minority class (patients with a rare but severe infection, around 5% of the observations).&lt;/P&gt;&lt;P&gt;The classification of the other oridinal classes work well with a boosted forest tree classifier but these classes or not at my interest (less severe infection or no infection at all). I wander whether I can use the "weight" option to give more weight to the minority class. Moreover, I want to calculate at which optimal threshold of "weight" the classifier performed best. In unbalanced data, the AUC ROC seems not to be the appropriate performance measure.&amp;nbsp; A F1-score (measure of false positive and false negative rates) seems more adequate for this.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is anyone aware of autocalculation of F1-scores in JMP?&lt;/P&gt;&lt;P&gt;Is there a way to fine tune&amp;nbsp; the option "Weight" in JMP classification models in highly unbalanced data in order maxiamalize the F1-scores ?&lt;/P&gt;&lt;P&gt;Does somebody has other suggestions to solve this problem?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Lu&lt;/P&gt;</description>
      <pubDate>Sat, 28 Dec 2019 17:10:40 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/unbalanced-data-and-weight-classification/m-p/239851#M47392</guid>
      <dc:creator>Lu</dc:creator>
      <dc:date>2019-12-28T17:10:40Z</dc:date>
    </item>
    <item>
      <title>Re: unbalanced data and weight classification</title>
      <link>https://community.jmp.com/t5/Discussions/unbalanced-data-and-weight-classification/m-p/239859#M47393</link>
      <description>If you turn on the Confusion Matrix option for a classification model, you can get the false positive/negative rates.&lt;BR /&gt;You can also set different cut-offs for positive/negative classification (default is 0.5).</description>
      <pubDate>Sat, 28 Dec 2019 22:51:43 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/unbalanced-data-and-weight-classification/m-p/239859#M47393</guid>
      <dc:creator>Mark_Zwald</dc:creator>
      <dc:date>2019-12-28T22:51:43Z</dc:date>
    </item>
    <item>
      <title>Re: unbalanced data and weight classification</title>
      <link>https://community.jmp.com/t5/Discussions/unbalanced-data-and-weight-classification/m-p/239882#M47398</link>
      <description>&lt;P&gt;I have also been interested in F1 scores but don't believe they are calculated automatically in JMP.&amp;nbsp; Unless someone has a better idea, I'd refrain from using weights to address your issue - I think that will make it hard to see what is going on.&amp;nbsp; My own inclination would be to recode your response variable into 2 categories - the disease you are interested in and everything else.&amp;nbsp; Then you can manually fine tune the cutoff probabilities (the add-in for this is quite good) to see what cutoff probability does the best job of identifying the disease you are interested in.&amp;nbsp; Then, you can go back to the full categorization of your response variable and see if that cutoff probability seems robust.&amp;nbsp; For me, trial and error may be more illuminating than looking for an automatic solution to your problem (but if there is an automatic methodology, I'd like to know it).&lt;/P&gt;</description>
      <pubDate>Mon, 30 Dec 2019 15:06:39 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/unbalanced-data-and-weight-classification/m-p/239882#M47398</guid>
      <dc:creator>dale_lehman</dc:creator>
      <dc:date>2019-12-30T15:06:39Z</dc:date>
    </item>
    <item>
      <title>Re: unbalanced data and weight classification</title>
      <link>https://community.jmp.com/t5/Discussions/unbalanced-data-and-weight-classification/m-p/239883#M47399</link>
      <description>I think there are two somewhat different issues here. One is the unbalanced nature of the data - some of your categories are much more common than others, so simple misclassification rates may "ignore" categories with small numbers - I believe the F1 is primarily a way to address this. The other problem is asymmetry of errors - false positives and false negatives may have very different consequences. I don't think the F1 will help with that - only better models and changing the probability cutoffs can deal with that issue. I'm not aware of a systematic relationship between the two issues (there may be one, and that would be very interesting to learn about).</description>
      <pubDate>Mon, 30 Dec 2019 15:52:30 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/unbalanced-data-and-weight-classification/m-p/239883#M47399</guid>
      <dc:creator>dale_lehman</dc:creator>
      <dc:date>2019-12-30T15:52:30Z</dc:date>
    </item>
    <item>
      <title>Re: unbalanced data and weight classification</title>
      <link>https://community.jmp.com/t5/Discussions/unbalanced-data-and-weight-classification/m-p/247727#M48632</link>
      <description>&lt;P&gt;Hello, I found a nice Add-in which can help defining the optimal Threshold for classification (attached below). Or search for "confusion matrix" in the File Exchange - Add-ins of JMP community website.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 16 Feb 2020 12:01:28 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/unbalanced-data-and-weight-classification/m-p/247727#M48632</guid>
      <dc:creator>Lu</dc:creator>
      <dc:date>2020-02-16T12:01:28Z</dc:date>
    </item>
    <item>
      <title>Re: unbalanced data and weight classification</title>
      <link>https://community.jmp.com/t5/Discussions/unbalanced-data-and-weight-classification/m-p/247728#M48633</link>
      <description>&lt;P&gt;A recent publications describe clearly the advantages of Matthias Correlation Coëfficiënt (MCC) as an optimal perfomance meassure for classification models, even in unbalanced data.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A title="MCC" href="https://www.ncbi.nlm.nih.gov/pubmed/31898477" target="_self"&gt;https://www.ncbi.nlm.nih.gov/pubmed/31898477&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://www.ncbi.nlm.nih.gov/pubmed/28574989" target="_self"&gt;https://www.ncbi.nlm.nih.gov/pubmed/28574989&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 16 Feb 2020 12:10:23 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/unbalanced-data-and-weight-classification/m-p/247728#M48633</guid>
      <dc:creator>Lu</dc:creator>
      <dc:date>2020-02-16T12:10:23Z</dc:date>
    </item>
    <item>
      <title>Re: unbalanced data and weight classification</title>
      <link>https://community.jmp.com/t5/Discussions/unbalanced-data-and-weight-classification/m-p/247729#M48634</link>
      <description>&lt;P&gt;Sorry, I mean Matthews CC&lt;/P&gt;</description>
      <pubDate>Sun, 16 Feb 2020 12:13:23 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/unbalanced-data-and-weight-classification/m-p/247729#M48634</guid>
      <dc:creator>Lu</dc:creator>
      <dc:date>2020-02-16T12:13:23Z</dc:date>
    </item>
  </channel>
</rss>

