<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Data mining with Small samples...Regression with Small Samples? in Discussions</title>
    <link>https://community.jmp.com/t5/Discussions/Data-mining-with-Small-samples-Regression-with-Small-Samples/m-p/6710#M6704</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Dear all JMP experts, I know that this kind of discussion might be both a silly...very silly and a hot issue simultaneously...but I thought it would be nice to raise it : Which types of linear AND non-linear regression does JMP offer when you have a small sample size (e.g. 50-70 records) and you want to predict a continuous outcome from 5-7 predictors (both categorical and continuous) Can k-fold cross-validated stepwise linear regression OR Partial Least Regression is a remedy to this problem? Are boosted trees OR Neural networks just insane even to think about them? Your responses are GREATLY WELCOMED and VERY MUCH APPRECIATED. Respectfully, Chris&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Thu, 09 May 2013 19:00:48 GMT</pubDate>
    <dc:creator>triunk</dc:creator>
    <dc:date>2013-05-09T19:00:48Z</dc:date>
    <item>
      <title>Data mining with Small samples...Regression with Small Samples?</title>
      <link>https://community.jmp.com/t5/Discussions/Data-mining-with-Small-samples-Regression-with-Small-Samples/m-p/6710#M6704</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Dear all JMP experts, I know that this kind of discussion might be both a silly...very silly and a hot issue simultaneously...but I thought it would be nice to raise it : Which types of linear AND non-linear regression does JMP offer when you have a small sample size (e.g. 50-70 records) and you want to predict a continuous outcome from 5-7 predictors (both categorical and continuous) Can k-fold cross-validated stepwise linear regression OR Partial Least Regression is a remedy to this problem? Are boosted trees OR Neural networks just insane even to think about them? Your responses are GREATLY WELCOMED and VERY MUCH APPRECIATED. Respectfully, Chris&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 09 May 2013 19:00:48 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Data-mining-with-Small-samples-Regression-with-Small-Samples/m-p/6710#M6704</guid>
      <dc:creator>triunk</dc:creator>
      <dc:date>2013-05-09T19:00:48Z</dc:date>
    </item>
    <item>
      <title>Re: Data mining with Small samples...Regression with Small Samples?</title>
      <link>https://community.jmp.com/t5/Discussions/Data-mining-with-Small-samples-Regression-with-Small-Samples/m-p/6711#M6705</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Whether these procedures will work or not at those sample sizes depends on signal-to-noise. If you have lots of signal and low noise, you probably won't even need 50 records. Other way around, low signal and high noise, and you're probably wasting your time.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 09 May 2013 20:15:13 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Data-mining-with-Small-samples-Regression-with-Small-Samples/m-p/6711#M6705</guid>
      <dc:creator>paigemiller</dc:creator>
      <dc:date>2013-05-09T20:15:13Z</dc:date>
    </item>
    <item>
      <title>Re: Data mining with Small samples...Regression with Small Samples?</title>
      <link>https://community.jmp.com/t5/Discussions/Data-mining-with-Small-samples-Regression-with-Small-Samples/m-p/6712#M6706</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;The first thing I do with such skimpy datasets is a regression tree analysis. It is a quick and intuitive way to explore the relationship between your dependent and predictor variables. The first three or four nodes are generally meaningful. - PG&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 09 May 2013 20:21:22 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Data-mining-with-Small-samples-Regression-with-Small-Samples/m-p/6712#M6706</guid>
      <dc:creator>pgstats</dc:creator>
      <dc:date>2013-05-09T20:21:22Z</dc:date>
    </item>
    <item>
      <title>Re: Data mining with Small samples...Regression with Small Samples?</title>
      <link>https://community.jmp.com/t5/Discussions/Data-mining-with-Small-samples-Regression-with-Small-Samples/m-p/6713#M6707</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;First of all thank you very much for your prompt reply. I really appreciate it&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class="GINGER_SOFATWARE_noSuggestion GINGER_SOFATWARE_correct"&gt;PGStats&lt;/SPAN&gt; do you mean Classification and Regression Tree analysis? CART?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Alternatively do you think that multivariate adaptive regression splines could work also for small sample size?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Many thanks again!&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 09 May 2013 22:50:34 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Data-mining-with-Small-samples-Regression-with-Small-Samples/m-p/6713#M6707</guid>
      <dc:creator>triunk</dc:creator>
      <dc:date>2013-05-09T22:50:34Z</dc:date>
    </item>
    <item>
      <title>Re: Data mining with Small samples...Regression with Small Samples?</title>
      <link>https://community.jmp.com/t5/Discussions/Data-mining-with-Small-samples-Regression-with-Small-Samples/m-p/6714#M6708</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;P.S. I know that MARS is not offered in JMP but it is offered from SAS...so &lt;SPAN class="GINGER_SOFATWARE_correct"&gt;scipt&lt;/SPAN&gt; might help.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 09 May 2013 22:52:14 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Data-mining-with-Small-samples-Regression-with-Small-Samples/m-p/6714#M6708</guid>
      <dc:creator>triunk</dc:creator>
      <dc:date>2013-05-09T22:52:14Z</dc:date>
    </item>
    <item>
      <title>Re: Data mining with Small samples...Regression with Small Samples?</title>
      <link>https://community.jmp.com/t5/Discussions/Data-mining-with-Small-samples-Regression-with-Small-Samples/m-p/6715#M6709</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi PaigeMiller&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Would you please be so kind and elaborate a little bit more about signal and noise? (&lt;SPAN class="GINGER_SOFATWARE_correct"&gt;just&lt;/SPAN&gt; one sentence)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Please accept my apologies but for me these two terms are pretty allegorical to my silly mind.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;THANK YOU!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Chris&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 09 May 2013 23:12:13 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Data-mining-with-Small-samples-Regression-with-Small-Samples/m-p/6715#M6709</guid>
      <dc:creator>triunk</dc:creator>
      <dc:date>2013-05-09T23:12:13Z</dc:date>
    </item>
    <item>
      <title>Re: Data mining with Small samples...Regression with Small Samples?</title>
      <link>https://community.jmp.com/t5/Discussions/Data-mining-with-Small-samples-Regression-with-Small-Samples/m-p/6716#M6710</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Yes I meant CART-type analysis, called Partition in JMP. When dependent variable is continuous, the result is a regression tree, otherwise, it's a classification (or decision) tree. I think regression splines eat up too many degrees of freedom to be applicable to your size of dataset. - PG&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 10 May 2013 01:29:32 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Data-mining-with-Small-samples-Regression-with-Small-Samples/m-p/6716#M6710</guid>
      <dc:creator>pgstats</dc:creator>
      <dc:date>2013-05-10T01:29:32Z</dc:date>
    </item>
    <item>
      <title>Re: Data mining with Small samples...Regression with Small Samples?</title>
      <link>https://community.jmp.com/t5/Discussions/Data-mining-with-Small-samples-Regression-with-Small-Samples/m-p/6717#M6711</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Thanks PG, this helps. I will check it out.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 10 May 2013 01:54:21 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Data-mining-with-Small-samples-Regression-with-Small-Samples/m-p/6717#M6711</guid>
      <dc:creator>triunk</dc:creator>
      <dc:date>2013-05-10T01:54:21Z</dc:date>
    </item>
    <item>
      <title>Re: Data mining with Small samples...Regression with Small Samples?</title>
      <link>https://community.jmp.com/t5/Discussions/Data-mining-with-Small-samples-Regression-with-Small-Samples/m-p/6718#M6712</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;PRE __jive_macro_name="quote" class="jive_text_macro jive_macro_quote" modifiedtitle="true"&gt;Would you please be so kind and elaborate a little bit more about signal and noise? (&lt;SPAN class="GINGER_SOFATWARE_correct"&gt;just&lt;/SPAN&gt;one sentence)&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;&lt;BR /&gt;&lt;P&gt;Please accept my apologies but for me these two terms are pretty allegorical to my silly mind.&lt;/P&gt;&lt;BR /&gt;&lt;/PRE&gt;&lt;P&gt;Most statistical modeling attempts to determine if there is a signal (a real relationship) that is larger than the variability of the errors (noise). Each modeling technique that I am aware of gives a measure of this signal-to-noise; for example, in standard regression is would be the overall F-test.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;There's no reason you can't have a very large signal and very low noise in 50 data points. It depends on the data.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 10 May 2013 16:02:38 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Data-mining-with-Small-samples-Regression-with-Small-Samples/m-p/6718#M6712</guid>
      <dc:creator>paigemiller</dc:creator>
      <dc:date>2013-05-10T16:02:38Z</dc:date>
    </item>
    <item>
      <title>Re: Data mining with Small samples...Regression with Small Samples?</title>
      <link>https://community.jmp.com/t5/Discussions/Data-mining-with-Small-samples-Regression-with-Small-Samples/m-p/6719#M6713</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;That's much clearer. Thank you VERY VERY much&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Kind Regards&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;C&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 10 May 2013 19:52:32 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Data-mining-with-Small-samples-Regression-with-Small-Samples/m-p/6719#M6713</guid>
      <dc:creator>triunk</dc:creator>
      <dc:date>2013-05-10T19:52:32Z</dc:date>
    </item>
    <item>
      <title>Re: Data mining with Small samples...Regression with Small Samples?</title>
      <link>https://community.jmp.com/t5/Discussions/Data-mining-with-Small-samples-Regression-with-Small-Samples/m-p/320391#M57089</link>
      <description>&lt;P&gt;Though it is quite long ago this question got ansked I want to chime in here as I think the discussion does not talk about some specifics which would be important to decide if you will be able to get some meaningful information.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;1. Is the data from a DOE? Then 50-70 observations will be usually way more as sufficient to have enough degrees of freedom to detect the underlying behaviour.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;2. If from a DOE what kind of. It is important if the DOE can estimate the model effects you want to take a look in your regression or other models. If it is only a main effect model the Design can estimate you will have again uncertainty if the created model will tell you the truth&amp;nbsp;&lt;/P&gt;
&lt;P&gt;3. If these are just observations, exploratory data analysis can help you to identify potential patterns, using then different modeling strategies liek previously mentioned and compare these will help you to see if you can trust the models (are these telling the same story or completely different ones?)&lt;/P&gt;
&lt;P&gt;If the latter you may want to get a better understanding from a DOE.&lt;/P&gt;
&lt;P&gt;4. Neural nets are oftaen said are not applicable with such few data ponits. Newer research shows that with certain strategies you will be able to gain still acceptable results. However, you never should trust one model alone. Test out different ones. Check how the model behaves when you simulate small variation of the actual settings, does the simulation results stay in the confidence limits or does it break out heavily.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Before you do that, you should check the quality of the data, are there outliers, missing values, are the columns correlated due to some previously known relationships like A is k*B or similar, are there many duplicates. Are the observations measurements of the same thing over time, are they repeated measures, ... All this will tell you what you can or what you cannot do with your data.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So, what I want to say, is you can create models for all, but if they are useful depends on &lt;STRONG&gt;your question,&lt;/STRONG&gt;&amp;nbsp;&lt;STRONG&gt;where the data comes from,&lt;/STRONG&gt; and &lt;STRONG&gt;how it has been measured,&lt;/STRONG&gt; and not necessarily on how much observations you have.&lt;/P&gt;</description>
      <pubDate>Mon, 12 Oct 2020 16:31:57 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Data-mining-with-Small-samples-Regression-with-Small-Samples/m-p/320391#M57089</guid>
      <dc:creator>martindemel</dc:creator>
      <dc:date>2020-10-12T16:31:57Z</dc:date>
    </item>
  </channel>
</rss>

