<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Nominal Logistic Regression question re: the Save Probability Formula in Discussions</title>
    <link>https://community.jmp.com/t5/Discussions/Nominal-Logistic-Regression-question-re-the-Save-Probability/m-p/354161#M60388</link>
    <description>&lt;P&gt;I'm a bit confused...Where did you get the data to create the model? The reason most models don't actually predict well is the data used to create the model was NOT REPRESENTATIVE of the future conditions. &amp;nbsp;This is why planning the data collection is way more important than the data analysis. &amp;nbsp;Although I will say that is a difficult situation. &amp;nbsp; Many things can change in 25 years that could or were not anticipated.&lt;/P&gt;</description>
    <pubDate>Sat, 30 Jan 2021 01:17:58 GMT</pubDate>
    <dc:creator>statman</dc:creator>
    <dc:date>2021-01-30T01:17:58Z</dc:date>
    <item>
      <title>Nominal Logistic Regression question re: the Save Probability Formula</title>
      <link>https://community.jmp.com/t5/Discussions/Nominal-Logistic-Regression-question-re-the-Save-Probability/m-p/354119#M60386</link>
      <description>&lt;P&gt;I am trying to develop a model to predict whether a tree will live or die over a 25 year period given some environmental attributes associated with each tree.&amp;nbsp; I ran the Nominal Logistic Regression model, which showed that the Prob&amp;gt;ChiSq for each attribute was &amp;lt; 0.0001, which would seem to suggest that the environmental attributes could be used as a good predictor.&amp;nbsp; Wanting to see how good of a prediction they were I selected the Save Probability Formula option.&amp;nbsp; This added several columns to the table including the final column, which provided a prediction (Most Likely), Live or Dead.&amp;nbsp; The predictions were terrible.&amp;nbsp; The model predicted a total of 140 trees would be alive after 25 years, when in fact the number was about 1000.&amp;nbsp; Not only that, but many of the 140 predicted live trees in fact had died.&amp;nbsp; So, my question is, why were the predictions not even close to the actual Live/Dead data?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 10 Jun 2023 23:25:26 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Nominal-Logistic-Regression-question-re-the-Save-Probability/m-p/354119#M60386</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2023-06-10T23:25:26Z</dc:date>
    </item>
    <item>
      <title>Re: Nominal Logistic Regression question re: the Save Probability Formula</title>
      <link>https://community.jmp.com/t5/Discussions/Nominal-Logistic-Regression-question-re-the-Save-Probability/m-p/354139#M60387</link>
      <description>&lt;P&gt;You can have a statistically significant model that does a poor job of prediction. Statistical significance is telling you that those significant terms HELP to explain the response, but does not guarantee a good fit.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For example, with a continuous response, look at this picture:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Dan_Obermiller_0-1611960404007.png" style="width: 307px;"&gt;&lt;img src="https://community.jmp.com/t5/image/serverpage/image-id/29837i4ECBA9FDFF76DD36/image-dimensions/307x237?v=v2" width="307" height="237" role="button" title="Dan_Obermiller_0-1611960404007.png" alt="Dan_Obermiller_0-1611960404007.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;The model is statistically significant, but clearly would not predict well.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Determining why the model does not fit well can be difficult with a nominal logistic regression. A few things that can help you determine predictive ability rather than "eyeballing" the table.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;* You can ask for the ROC curve. The Area Under the Curve (AUC) is an indication of predictive ability. An AUC of 0.5 is the "baseline". This is like using a coin to determine if the tree will live or not. Anything above 0.5 starts providing evidence of predictive ability. AUC above 0.7 is starting to get to decent predictive ability. Above 0.9 is fantastic.&lt;/P&gt;
&lt;P&gt;* You can also ask for the confusion matrix. This is a table of observed results versus predicted results. This table is essentially what you were considering by saving the prediction formula. This can help you determine where your model is starting to have trouble. Is it having trouble classifying trees that are dead? Or is it having trouble with just trees that are alive? Or both?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Don't forget to assess the data that the model was built on. How "balanced" is the data between live and dead trees? For example, if there are only 1% of the trees being dead, a great predictive model would be to say that all trees are alive. 99% accuracy! Not very helpful though. For this reason having a response that is pretty close to balanced can be helpful.&lt;/P&gt;
&lt;P&gt;Was the model only containing main effects? Would interactions help? What about quadratic terms? Is the data "rich" enough to support a model with these higher order terms (this is a great question if those higher-order terms are already in the model, too!).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You could also try a different modeling technique. Perhaps a Partition or tree model would be a good thing to try. There are other tools, too.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'm sure others can add more things to help, but this is where your fun REALLY starts! Best of luck!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 29 Jan 2021 22:58:35 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Nominal-Logistic-Regression-question-re-the-Save-Probability/m-p/354139#M60387</guid>
      <dc:creator>Dan_Obermiller</dc:creator>
      <dc:date>2021-01-29T22:58:35Z</dc:date>
    </item>
    <item>
      <title>Re: Nominal Logistic Regression question re: the Save Probability Formula</title>
      <link>https://community.jmp.com/t5/Discussions/Nominal-Logistic-Regression-question-re-the-Save-Probability/m-p/354161#M60388</link>
      <description>&lt;P&gt;I'm a bit confused...Where did you get the data to create the model? The reason most models don't actually predict well is the data used to create the model was NOT REPRESENTATIVE of the future conditions. &amp;nbsp;This is why planning the data collection is way more important than the data analysis. &amp;nbsp;Although I will say that is a difficult situation. &amp;nbsp; Many things can change in 25 years that could or were not anticipated.&lt;/P&gt;</description>
      <pubDate>Sat, 30 Jan 2021 01:17:58 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Nominal-Logistic-Regression-question-re-the-Save-Probability/m-p/354161#M60388</guid>
      <dc:creator>statman</dc:creator>
      <dc:date>2021-01-30T01:17:58Z</dc:date>
    </item>
    <item>
      <title>Re: Nominal Logistic Regression question re: the Save Probability Formula</title>
      <link>https://community.jmp.com/t5/Discussions/Nominal-Logistic-Regression-question-re-the-Save-Probability/m-p/354162#M60389</link>
      <description>&lt;P&gt;Dan, I ran the Confusion Matrix and it showed that the model was doing decently predicting the dead trees, correctly predicting 3/4 of them as dead.&amp;nbsp; But it was doing terrible predicting the live trees, predicting 63% of them as dead.&amp;nbsp; The distribution of live and dead trees is about 45%,55%, so pretty well balanced.&amp;nbsp; Why would the model do a decent job predicting the dead trees but not the live trees?--Mark&lt;/P&gt;</description>
      <pubDate>Sat, 30 Jan 2021 01:25:08 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Nominal-Logistic-Regression-question-re-the-Save-Probability/m-p/354162#M60389</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2021-01-30T01:25:08Z</dc:date>
    </item>
    <item>
      <title>Re: Nominal Logistic Regression question re: the Save Probability Formula</title>
      <link>https://community.jmp.com/t5/Discussions/Nominal-Logistic-Regression-question-re-the-Save-Probability/m-p/354166#M60391</link>
      <description>&lt;P&gt;Dan's comments are relevant, but one thing you should check is that the predictions (alive or dead) are based upon a 50% threshold for classifying the trees.&amp;nbsp; If the AUC (from the ROC curve) is high (closer to 1 than to 0.5), then it is likely that using a different threshold value for classifying the trees will result in better predictions.&amp;nbsp; This is especially true if the data is very unbalanced (i.e., if there are very few dead trees, then it is likely that most of the probabilities estimated from the logistic regression will be quite low, resulting in the default prediction of "not dead").&amp;nbsp; There is a nice add-in you can use (just search for it) that will produce confusion matrices for alternative cut-off classification probabilities.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;On other thing struck me about your description.&amp;nbsp; It sound like this is censored data - after 25 years, the trees that are alive are probably not all of the same health.&amp;nbsp; If environmental conditions contribute to tree death, then the surviving trees are probably also affected, but not dead yet.&amp;nbsp; If you have a measure of the time at which the trees died, you can try a survival analysis, where the dependent variable is the time of death and censored data can be used in the analysis.&lt;/P&gt;</description>
      <pubDate>Sat, 30 Jan 2021 16:25:32 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Nominal-Logistic-Regression-question-re-the-Save-Probability/m-p/354166#M60391</guid>
      <dc:creator>dale_lehman</dc:creator>
      <dc:date>2021-01-30T16:25:32Z</dc:date>
    </item>
    <item>
      <title>Re: Nominal Logistic Regression question re: the Save Probability Formula</title>
      <link>https://community.jmp.com/t5/Discussions/Nominal-Logistic-Regression-question-re-the-Save-Probability/m-p/354239#M60399</link>
      <description>&lt;P&gt;Dale may be right. But assuming that the cutoffs are not playing much of a role, your model is biased towards predicting Dead. That seems to indicate that there is something that is likely missing from your model that is indicative of a live tree. I can't tell you what that is. I also am not completely confident that my statement is correct. This is where looking at the data and having knowledge of the field is necessary.&lt;/P&gt;</description>
      <pubDate>Sun, 31 Jan 2021 01:44:55 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Nominal-Logistic-Regression-question-re-the-Save-Probability/m-p/354239#M60399</guid>
      <dc:creator>Dan_Obermiller</dc:creator>
      <dc:date>2021-01-31T01:44:55Z</dc:date>
    </item>
    <item>
      <title>Re: Nominal Logistic Regression question re: the Save Probability Formula</title>
      <link>https://community.jmp.com/t5/Discussions/Nominal-Logistic-Regression-question-re-the-Save-Probability/m-p/354279#M60407</link>
      <description>&lt;P&gt;Dan, thanks for getting back to me. First, thanks very much for the ROC and confusion matrix (odd name) suggestions. They were very helpful. My best model produced an ROC of 0.798, which was pretty decent. Of course this was predicting the overall number of live and dead trees 25 years later, not the dead or alive predictions for individual trees, which as described below came out around 63%. The 2020 trees were about 1/3 live and 2/3 dead over the 25 year period. While not 50/50 balanced, still not too skewed. I took Dale's suggestion and played around adjusting the prediction threshold (set at 0.5) used to classify/predict individual trees as live or dead. By adjusting the threshold to 0.5&lt;U&gt;+&lt;/U&gt;0.284, I was able to get the correct prediction percentages for Live and Dead trees balanced at 63%, though the number of correct predictions actually declined slightly (4%) from the original highly skewed toward dead predictions resulting from a 0.5 prediction threshold. You might think the 67% tree mortality rate over 25 years was really high. It would be under normal conditions. The high mortality rate was because the area where I was tracking the roughly 10,000 trees was burned 8 times over the 25 year period. --Mark&lt;/P&gt;</description>
      <pubDate>Sun, 31 Jan 2021 15:30:22 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Nominal-Logistic-Regression-question-re-the-Save-Probability/m-p/354279#M60407</guid>
      <dc:creator>MarkAD</dc:creator>
      <dc:date>2021-01-31T15:30:22Z</dc:date>
    </item>
    <item>
      <title>Re: Nominal Logistic Regression question re: the Save Probability Formula</title>
      <link>https://community.jmp.com/t5/Discussions/Nominal-Logistic-Regression-question-re-the-Save-Probability/m-p/355105#M60505</link>
      <description>&lt;P&gt;How would I set up a table to calculate mean or median survival times for seven groups of trees over a 25 year period.&amp;nbsp; Nearly 10,000 trees are involved and I don't want to enter the time to death for each tree.&amp;nbsp; What I do have for each group is the number of trees still alive for each of the 25 time periods as well as the number of trees that died in each time period.&amp;nbsp; All the groups have trees still alive after 25 years (some few, some a lot), so I will have to incorporate censoring into the table as well.&amp;nbsp; Is it possible to create a survival analysis table using the above data?&amp;nbsp; I hope so since the prospect of having to enter the survival time for each of the 10,000 trees is pretty dim.&amp;nbsp; Thanks so much for the prior comments and suggestions.&amp;nbsp; They have been very helpful!--Mark&lt;/P&gt;</description>
      <pubDate>Tue, 02 Feb 2021 16:32:29 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Nominal-Logistic-Regression-question-re-the-Save-Probability/m-p/355105#M60505</guid>
      <dc:creator>MarkAD</dc:creator>
      <dc:date>2021-02-02T16:32:29Z</dc:date>
    </item>
    <item>
      <title>Re: Nominal Logistic Regression question re: the Save Probability Formula</title>
      <link>https://community.jmp.com/t5/Discussions/Nominal-Logistic-Regression-question-re-the-Save-Probability/m-p/355266#M60522</link>
      <description>&lt;P&gt;Ok, I think I've figured out how to enter data with a frequency column that enables me to run the survival analysis.&amp;nbsp; After selecting the 'compare groups' option the results show the mortality curve and also the mean survival times for the two groups, the latter which is what I was looking for.&amp;nbsp; When I select the 'life distribution' option, the results show the individual life survival curves for each group in separate graphs.&amp;nbsp; Is there a way to get the survival curves for both groups on a single graph? (I know I can easily show the both survival curves on a single graph by copying the data into Excel, but it seems like I should be able to do this on JMP as well.)&amp;nbsp; Any suggestions?&amp;nbsp; FYI I've included the data table.&amp;nbsp; Is there a way to get the the results to show the survival curves on a single graph?&amp;nbsp; Thanks for the suggestions.--Mark&lt;/P&gt;</description>
      <pubDate>Wed, 03 Feb 2021 00:30:38 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Nominal-Logistic-Regression-question-re-the-Save-Probability/m-p/355266#M60522</guid>
      <dc:creator>MarkAD</dc:creator>
      <dc:date>2021-02-03T00:30:38Z</dc:date>
    </item>
    <item>
      <title>Re: Nominal Logistic Regression question re: the Save Probability Formula</title>
      <link>https://community.jmp.com/t5/Discussions/Nominal-Logistic-Regression-question-re-the-Save-Probability/m-p/356288#M60620</link>
      <description>&lt;P&gt;The data table you attached does not have any data on the number of trees alive and dead in each time period (I'm not sure what "Count" represents, but it doesn't account for 10,000 trees).&amp;nbsp; So, I can't attempt to get that visual from your data.&amp;nbsp; But if you have a survival graph for each group separately and want them on the same graph, you can use "copy graph" to copy and paste one graph into the other and they should be overlaid.&amp;nbsp; There may well be a way to get the survival analysis to do that automatically, but I can't tell what your data is like from the table you attached.&lt;/P&gt;</description>
      <pubDate>Fri, 05 Feb 2021 00:51:09 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Nominal-Logistic-Regression-question-re-the-Save-Probability/m-p/356288#M60620</guid>
      <dc:creator>dale_lehman</dc:creator>
      <dc:date>2021-02-05T00:51:09Z</dc:date>
    </item>
    <item>
      <title>Re: Nominal Logistic Regression question re: the Save Probability Formula</title>
      <link>https://community.jmp.com/t5/Discussions/Nominal-Logistic-Regression-question-re-the-Save-Probability/m-p/356357#M60631</link>
      <description>&lt;P&gt;Sorry, Dale.&amp;nbsp; The Count column was the number of trees that died during that time period.&amp;nbsp; The reason Count doesn't add up to near 10,000 is because these were data from just one of the three burn units.&amp;nbsp; I did find out how to get the survival curves for all 7 groups on one graph.&amp;nbsp; After selecting Reliability and Survival then select Survival (instead of Life Distribution).&amp;nbsp; This will put the survival curves for all 7 groups on one graph.&amp;nbsp; The Survival optin expects that the data column (Count in this case) was listing the number of units that died, or failed, in each time period.&amp;nbsp; Very nifty.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 05 Feb 2021 13:03:12 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Nominal-Logistic-Regression-question-re-the-Save-Probability/m-p/356357#M60631</guid>
      <dc:creator>MarkAD</dc:creator>
      <dc:date>2021-02-05T13:03:12Z</dc:date>
    </item>
    <item>
      <title>Re: Nominal Logistic Regression question re: the Save Probability Formula</title>
      <link>https://community.jmp.com/t5/Discussions/Nominal-Logistic-Regression-question-re-the-Save-Probability/m-p/356476#M60640</link>
      <description>&lt;P&gt;Actually, you CAN plot more than one group with Life Distribution.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="survival.JPG" style="width: 659px;"&gt;&lt;img src="https://community.jmp.com/t5/image/serverpage/image-id/30048i474DF6C437B0C1B7/image-size/large?v=v2&amp;amp;px=999" role="button" title="survival.JPG" alt="survival.JPG" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 05 Feb 2021 16:53:22 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Nominal-Logistic-Regression-question-re-the-Save-Probability/m-p/356476#M60640</guid>
      <dc:creator>Mark_Bailey</dc:creator>
      <dc:date>2021-02-05T16:53:22Z</dc:date>
    </item>
    <item>
      <title>Re: Nominal Logistic Regression question re: the Save Probability Formula</title>
      <link>https://community.jmp.com/t5/Discussions/Nominal-Logistic-Regression-question-re-the-Save-Probability/m-p/356514#M60642</link>
      <description>&lt;P&gt;Yes, I did know this, but it's not the typical step-wise depiction of survival curves.&amp;nbsp; Also, this option provides the mean survival time for each category.&amp;nbsp; Thanks for all your attention and advice!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="MarkAD_0-1612545223080.png" style="width: 364px;"&gt;&lt;img src="https://community.jmp.com/t5/image/serverpage/image-id/30049iF2796E76BB563F4A/image-dimensions/364x520?v=v2" width="364" height="520" role="button" title="MarkAD_0-1612545223080.png" alt="MarkAD_0-1612545223080.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 05 Feb 2021 17:15:23 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Nominal-Logistic-Regression-question-re-the-Save-Probability/m-p/356514#M60642</guid>
      <dc:creator>MarkAD</dc:creator>
      <dc:date>2021-02-05T17:15:23Z</dc:date>
    </item>
  </channel>
</rss>

