<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Central Limit Theorem - a naive question in Discussions</title>
    <link>https://community.jmp.com/t5/Discussions/Central-Limit-Theorem-a-naive-question/m-p/212110#M42467</link>
    <description>&lt;P&gt;&lt;a href="https://community.jmp.com/t5/user/viewprofilepage/user-id/11333"&gt;@34South&lt;/a&gt;: Regarding my earlier post where I mentioned 'mononumerosis', when I taught statistical methods to scientists and engineers in an industrial problem solving or product/process development framework I surrounded my mention of the disease with something I called 'The Gap'. "The Gap" recognizes that in hypothesis there are two types of risk involved in ANY decision making process. I think the list of ASA '...Not To...' are strongly aligned with "The Gap". The two types of risk are:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1. Statistical risk; Which is the risk we can quantify and structurally address through techniques such as sample size, population variance assumptions, beta risk, delta to detect etc. Hypothesis tests culminate in p-values to guide decision making and the statement '...statistical signficance.'&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;2. Representation risk; Which are ALL the other cumulative effects of system characteristics that impart 'risk' associated with making a decision. This family of risk, in my experience often SWAMPS statistical risk...and is often impossible to control or quantify using statistical methods. Representation risk can only be addressed by rational, thoughtful, knowledgeable domain expertise. For example, in my industry days, we often ran experiments on pilot equipment with a goal of determining product design specifications. But there was almost ALWAYS a huge issue...what we learned on pilot equipment was quite often, just not scalable to production scale equipment. Hence we had a "Gap" in understanding that was impossible to overcome with methods that ONLY involve statistical risk.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So my point on 'mononumerosis' was always, if all you report is a p-value in isolation, and don't incorporate representation risk in your decision making...well you've in all likelihood grossly underrepresented TOTAL risk of making an decision making error.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I hope this helps?&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 06 Jun 2019 13:47:03 GMT</pubDate>
    <dc:creator>P_Bartell</dc:creator>
    <dc:date>2019-06-06T13:47:03Z</dc:date>
    <item>
      <title>Central Limit Theorem - a naive question</title>
      <link>https://community.jmp.com/t5/Discussions/Central-Limit-Theorem-a-naive-question/m-p/211365#M42333</link>
      <description>&lt;P&gt;This question does not relate to JMP itself, but rather represents a basic question of interpretation. I understand that, due to the the phenomenon we call the CLT, sampling of a population will always strive towards a normal distribution of the resulting means with the overall mean and SD approaching the population mean and SD, irrespective of how the population data is distributed (skewed or normal), but provided a sufficient number of repeat samples are taken (≥30). Furthermore, as the size of each sample is increased, the precision of the estimated mean and SD increases. I also understand that, one does not need to perform such multiple sampling as the CLT is accomodated in parametric testing. What I'm trying to understand is whether there is a cut off point of sample size, above which normality can categorically be assumed without testing. Secondly, and perhaps unrelated, with the call by the The American Statistician journal to drop significance level thresholds (p&amp;lt;0.05) in favour of reporting of actual p-values and to refrain from the use of the term "statistically significant", how does one then determine the outcome in such matters as confirming normal distribution, for example through Shapiro-Wilk testing? Is there a grey line there too?&lt;/P&gt;</description>
      <pubDate>Fri, 31 May 2019 13:52:29 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Central-Limit-Theorem-a-naive-question/m-p/211365#M42333</guid>
      <dc:creator>34South</dc:creator>
      <dc:date>2019-05-31T13:52:29Z</dc:date>
    </item>
    <item>
      <title>Re: Central Limit Theorem - a naive question</title>
      <link>https://community.jmp.com/t5/Discussions/Central-Limit-Theorem-a-naive-question/m-p/211377#M42336</link>
      <description>Just to clarify, I know one can do sample size analysis based on variance, power, etc. to avoid Type I errors, but is there a sample size where non-parametric testing is avoided by default?</description>
      <pubDate>Fri, 31 May 2019 14:25:53 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Central-Limit-Theorem-a-naive-question/m-p/211377#M42336</guid>
      <dc:creator>34South</dc:creator>
      <dc:date>2019-05-31T14:25:53Z</dc:date>
    </item>
    <item>
      <title>Re: Central Limit Theorem - a naive question</title>
      <link>https://community.jmp.com/t5/Discussions/Central-Limit-Theorem-a-naive-question/m-p/211437#M42352</link>
      <description>&lt;P&gt;It may not apply, but make sure you're not suffering from &lt;A href="http://www.jmp.com/about/newsletters/jmpercable/pdf/15_summer_2004.pdf" target="_self"&gt;leptokurtosiphobia&lt;/A&gt;: the irrational fear of non-normality.&lt;/P&gt;</description>
      <pubDate>Sat, 01 Jun 2019 13:35:22 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Central-Limit-Theorem-a-naive-question/m-p/211437#M42352</guid>
      <dc:creator>Jeff_Perkinson</dc:creator>
      <dc:date>2019-06-01T13:35:22Z</dc:date>
    </item>
    <item>
      <title>Re: Central Limit Theorem - a naive question</title>
      <link>https://community.jmp.com/t5/Discussions/Central-Limit-Theorem-a-naive-question/m-p/211438#M42353</link>
      <description>Well, thanks, I'll build that into my presentation on the subject - it'll be sure to get a laugh!</description>
      <pubDate>Sat, 01 Jun 2019 06:46:25 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Central-Limit-Theorem-a-naive-question/m-p/211438#M42353</guid>
      <dc:creator>34South</dc:creator>
      <dc:date>2019-06-01T06:46:25Z</dc:date>
    </item>
    <item>
      <title>Re: Central Limit Theorem - a naive question</title>
      <link>https://community.jmp.com/t5/Discussions/Central-Limit-Theorem-a-naive-question/m-p/211486#M42361</link>
      <description>To add a bit to my former colleague &lt;a href="https://community.jmp.com/t5/user/viewprofilepage/user-id/14355"&gt;@Jeff&lt;/a&gt;_Perkinson’s reply, you may want to add some commentary on that dreaded disease ‘mononumerosis’. The affliction data analysts succumb to when they focus on a single statistic (p value being a prime example) to make a decision or guide action.</description>
      <pubDate>Sat, 01 Jun 2019 23:16:35 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Central-Limit-Theorem-a-naive-question/m-p/211486#M42361</guid>
      <dc:creator>P_Bartell</dc:creator>
      <dc:date>2019-06-01T23:16:35Z</dc:date>
    </item>
    <item>
      <title>Re: Central Limit Theorem - a naive question</title>
      <link>https://community.jmp.com/t5/Discussions/Central-Limit-Theorem-a-naive-question/m-p/211525#M42374</link>
      <description>&lt;P&gt;Although I enjoy a bit of wit, I thought it would have been tempered with at least some degree of serious contemplation. The matter of significance thresholds is a serious one which I believed protagonists of JMP/SAS would have provided a modicum of guidance. I would suggest reading the editorial in The American Statistician Vol 73;S1 (2019).&lt;/P&gt;&lt;P&gt;The essence of that article recommends:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;&lt;STRONG&gt;Not to&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;SPAN&gt; base one's conclusions solely on whether an association or effect was found to be 'statistically significant' (i.e., the p-value passed some arbitrary threshold such as&amp;nbsp;p &amp;lt; 0.05).&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;&lt;STRONG&gt;Not to&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;SPAN&gt; believe that an association or effect exists just because it was 'statistically significant'.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;&lt;STRONG&gt;Not to&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;SPAN&gt; believe that an association or effect is absent just because it was not 'statistically significant'.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;&lt;STRONG&gt;Not to&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;SPAN&gt; believe that your p-value gives the probability that chance alone produced the observed association or effect or the probability that your test hypothesis is true.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;&lt;STRONG&gt;Not to conclude anything about scientific or practical importance based on statistical significance (or lack thereof).&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;SPAN&gt;&lt;STRONG&gt;The journal further confirms that a statement&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;STRONG&gt;&amp;nbsp;“be sent to the editor-in-chief of every journal in the natural, behavioral and social sciences for forwarding to their respective editorial boards and stables of manuscript reviewers."&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am not disagreeing with these recommendations but, as I pointed out, this goes further than one's interpretation of significance levels in final inferential statistical testing to dictating which methods are used to arrive at that answer.&lt;/P&gt;</description>
      <pubDate>Mon, 03 Jun 2019 08:35:36 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Central-Limit-Theorem-a-naive-question/m-p/211525#M42374</guid>
      <dc:creator>34South</dc:creator>
      <dc:date>2019-06-03T08:35:36Z</dc:date>
    </item>
    <item>
      <title>Re: Central Limit Theorem - a naive question</title>
      <link>https://community.jmp.com/t5/Discussions/Central-Limit-Theorem-a-naive-question/m-p/211590#M42385</link>
      <description>&lt;P&gt;Some drama in the JMP Discussion boards what a nice addition to my work day lol&lt;/P&gt;</description>
      <pubDate>Mon, 03 Jun 2019 14:48:05 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Central-Limit-Theorem-a-naive-question/m-p/211590#M42385</guid>
      <dc:creator>LimitedInfo</dc:creator>
      <dc:date>2019-06-03T14:48:05Z</dc:date>
    </item>
    <item>
      <title>Re: Central Limit Theorem - a naive question</title>
      <link>https://community.jmp.com/t5/Discussions/Central-Limit-Theorem-a-naive-question/m-p/211619#M42386</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.jmp.com/t5/user/viewprofilepage/user-id/11333"&gt;@34South&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;I wouldn't say this is a "naive question" whatsoever! It actually gets fairly deep into the theoretical underpinnings and assumptions of our usual parametric testing. Let me peel apart each of your points to make some comments, and then hopefully end with some practical suggestions:&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;What the central limit asserts: as you stated, the CLT asserts that the sampling distribution of a statistic (let's assume the mean going forward) approaches a normal distribution as the sample size (n) approaches infinity. The CLT does not assert (it doesn't need to assert) the &lt;EM&gt;unbiasedness&lt;/EM&gt; of the mean (that the sampling distribution is centered at the population parameter) and the&amp;nbsp;&lt;EM&gt;asymptotic consistency&lt;/EM&gt; of the mean (as you increase the size of the sample, the variance of the sampling distribution decreases/the expected error between the estimate and the parameter decreases). These are true about the mean as an estimator whether or not the CLT is true; the CLT is about distributional form, and what it asserts is magical enough without adding in unbiasedness and consistency.&amp;nbsp;&lt;/LI&gt;
&lt;LI&gt;Why the CLT matters: In usual parametric hypothesis testing we form a test statistic based on sample data, and then generate a p-value in order to assess how unlikely the test statistic from our sample would have occurred by chance alone. Our method for knowing what would happen by "chance alone" is informed by the CLT because we presume that the sampling distribution of our statistic (the mean in this case) is normally distributed. We know the sampling distribution of the mean will be normally distributed in two cases: if the population is normal (thus a sampling distribution formed from any size n will be normally distributed), or the population is non-normal but our sample size is large enough (what large enough is we'll tackle soon). So, because we trust what the CLT asserts, we can, without actually knowing the distributional form of the population, work out the p-values based on the assumption that the sampling distribution is normal. And since we know our statistic is unbiased, mean of 0, and know how to calculate the variance (due in part to the variance sum law), we can locate our sample in the sampling distribution of the statistic assuming the null hypothesis is true, and find the proportion of samples with statistics more extreme than our statistic (the usual p-value). If we didn't have the CLT, or didn't believe in it, we would either have to know ahead of time the distributional form of the population (so we could know the form of the sampling distribution) in order to know the p-value, or would need to use simulation statistics to generate our estimate of the sampling distribution and create an empirical p-value.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;/LI&gt;
&lt;LI&gt;What is a "large enough" sample for the CLT to do its magic: this is a hard question to answer in the abstract since it depends entirely on how non-normal the population is. If there is a minor departure from normality in the population, or a departure but still good symmetry, the CLT draws the sampling distribution toward the normal rather quickly. The consequence of this is that the p-value you generate based on the normal assumption is relatively close to the true p-value you can't know. We care about this because a) we do not want to false alarm more than our stated alpha proportion fo the time, and b) we do not want our power to be less than we assume it is. In some cases, the population is shaped such that our tests become hyper-conservative (obtained p-values &amp;gt; true p-value), or the opposite (obtained p-values &amp;lt; true p-values). Whatever the case, we don't want that. So what is a large enough sample? I can tell you it's not 35 -- there is nothing magical about that number, it was just a convenient cut-off to put in college textbooks. If your population is severely kurtotic (heavy-tailed) or of a class of forms that the CLT does not work on (Cauchy Distribution, you jerk), you are going to need very, very large samples (n&amp;gt; 1000) before your obtained p-values are an acceptable distance from the true p-values.&lt;BR /&gt;&lt;BR /&gt;&lt;/LI&gt;
&lt;LI&gt;Hypothesis testing for whether you have evidence the population is non-normal: I think there is a fundamental issue with these tests (not that they're inaccurate, but that their power profile is the reverse of what we need). A&lt;SPAN&gt;s your s&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;mple size grows, norm&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;lity of the population m&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;tters less &lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;nd less &lt;/SPAN&gt;&lt;SPAN&gt;for&lt;/SPAN&gt;&lt;SPAN&gt; your inferences (“m&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;tters less” in the sense th&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;t your type I &lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;nd type II error r&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;tes &lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;re less &lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;ffected by the popul&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;tion distribution the l&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;rger your s&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;mple size is because your obtained p-values more closely approximate the true unknown p-value). C&lt;/SPAN&gt;&lt;SPAN&gt;onsider the fact&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;th&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;t tests &lt;/SPAN&gt;&lt;SPAN&gt;for&lt;/SPAN&gt;&lt;SPAN&gt; non-norm&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;lity (Kolmogorov-Smirnov, Sh&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;piro–Wilk, etc&lt;/SPAN&gt;&lt;SPAN&gt;) &lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;re &lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;lso hypothesis tests… &lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;nd rejecting the null &lt;/SPAN&gt;&lt;SPAN&gt;for&lt;/SPAN&gt;&lt;SPAN&gt; one is t&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;nt&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;mount to detecting &lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt; dep&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;rture from norm&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;lity in the popul&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;tion. The power of these tests beh&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;ves in the s&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;me w&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;y &lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;s &lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;ll hypothesis tests… l&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;rger s&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;mples, higher power… which me&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;ns th&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;t with &lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt; very l&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;rge s&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;mple (the kind where dep&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;rtures from population norm&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;lity c&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;use fewer problems &lt;/SPAN&gt;&lt;SPAN&gt;) &lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt; K-S, or S-W test will h&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;ve very high power to detect even tiny dep&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;rtures from norm&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;lity in the popul&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;tion. &amp;nbsp;So… in true &lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;bsurdity, with &lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt; very l&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;rge s&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;mple, we c&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;n detect the tiniest dep&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;rture from norm&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;lity, one th&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;t wouldn’t c&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;use &lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt; problem even with &lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt; sm&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;ll s&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;mple, &lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;nd cert&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;inly isn’t c&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;using us&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;ny issues&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;with th&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;t l&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;rge of a s&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;mple.&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;Reporting of exact p-values:&amp;nbsp;The American Statistician piece makes excellent points, and reporting of exact p-values has great merits but I believe also lulls us into a false sense of security that they really mean what they state. I don't mean in terms of the misconceptions of p-values (of which there are many), but that because they're an "exact" figure, they're known without error, which they are not. They're a sample estimate of our sample's location in a sampling distribution of that statistic we can't possibly know for sure. Reporting a number with 5 decimal places encourages a belief in their precision I think is unjustified.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;So what's an analyst to do? Simulation/Permutation tests for one. We don't need to trust our assumptions and the p-values obtained with them. Simulation and permutation tests, where we resample from our data and generate an empirical sampling distribution, allow us to relax certain assumptions like the presumption of a normally distributed sampling distribution. They're not a fix for all problems (we still need to adhere to assumptions of exchangeability/independence/equal variance, etc) but it's perhaps a step forward.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I hope some of this has been helpful! Also, I'm sorry I didn't catch your question sooner, I always love a chance to talk about the Central Limit Theorem, it's truly magical.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;a href="https://community.jmp.com/t5/user/viewprofilepage/user-id/2026"&gt;@jules&lt;/a&gt;&amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 03 Jun 2019 15:54:21 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Central-Limit-Theorem-a-naive-question/m-p/211619#M42386</guid>
      <dc:creator>jules</dc:creator>
      <dc:date>2019-06-03T15:54:21Z</dc:date>
    </item>
    <item>
      <title>Re: Central Limit Theorem - a naive question</title>
      <link>https://community.jmp.com/t5/Discussions/Central-Limit-Theorem-a-naive-question/m-p/211692#M42396</link>
      <description>Dear Julian. Thank you so much for your very detailed explanation. It provided a far greater insight into the CLT than I had hoped for and is much appreciated. On the other point, it will be interesting to see how far the call for retirement of the p-value threshold will progress. Perhaps the vast majority of scientists will again misunderstand how to apply that concept should it indeed be adopted as, I think for now, the guidelines focus not on what we should do but rather what we shouldn't. My background is not in statistics - indeed, in my day, it was hardly taught at University and I have largely gained an understanding through my own experience over the years with JMP being largely instrumental in my having become the go-to-guy for any statistical queries in my area of research. Personally, I cannot imagine investigating data with any other software package. I certainly enjoyed the humour of the other posts though and will keep them in my back pocket for cocktail parties ;-&amp;gt;</description>
      <pubDate>Tue, 04 Jun 2019 09:51:10 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Central-Limit-Theorem-a-naive-question/m-p/211692#M42396</guid>
      <dc:creator>34South</dc:creator>
      <dc:date>2019-06-04T09:51:10Z</dc:date>
    </item>
    <item>
      <title>Re: Central Limit Theorem - a naive question</title>
      <link>https://community.jmp.com/t5/Discussions/Central-Limit-Theorem-a-naive-question/m-p/211725#M42399</link>
      <description>&lt;P&gt;You're very welcome! I'm also curious to see how things will change with regard to p-values and inferential statistics. I'm also eager to see how things change in our application of statistics in general with the continued advancements in computational speed that can make feasible the fitting of even more complicated models to even larger amounts of data. It really is a pretty exciting time to be around, and I suspect we're going to witness some truly exciting advancements in how we can extract meaning from data.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I agree with you 100% about JMP; I can't imagine having to learn from data without it!&lt;/P&gt;</description>
      <pubDate>Tue, 04 Jun 2019 11:51:37 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Central-Limit-Theorem-a-naive-question/m-p/211725#M42399</guid>
      <dc:creator>jules</dc:creator>
      <dc:date>2019-06-04T11:51:37Z</dc:date>
    </item>
    <item>
      <title>Re: Central Limit Theorem - a naive question</title>
      <link>https://community.jmp.com/t5/Discussions/Central-Limit-Theorem-a-naive-question/m-p/212110#M42467</link>
      <description>&lt;P&gt;&lt;a href="https://community.jmp.com/t5/user/viewprofilepage/user-id/11333"&gt;@34South&lt;/a&gt;: Regarding my earlier post where I mentioned 'mononumerosis', when I taught statistical methods to scientists and engineers in an industrial problem solving or product/process development framework I surrounded my mention of the disease with something I called 'The Gap'. "The Gap" recognizes that in hypothesis there are two types of risk involved in ANY decision making process. I think the list of ASA '...Not To...' are strongly aligned with "The Gap". The two types of risk are:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1. Statistical risk; Which is the risk we can quantify and structurally address through techniques such as sample size, population variance assumptions, beta risk, delta to detect etc. Hypothesis tests culminate in p-values to guide decision making and the statement '...statistical signficance.'&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;2. Representation risk; Which are ALL the other cumulative effects of system characteristics that impart 'risk' associated with making a decision. This family of risk, in my experience often SWAMPS statistical risk...and is often impossible to control or quantify using statistical methods. Representation risk can only be addressed by rational, thoughtful, knowledgeable domain expertise. For example, in my industry days, we often ran experiments on pilot equipment with a goal of determining product design specifications. But there was almost ALWAYS a huge issue...what we learned on pilot equipment was quite often, just not scalable to production scale equipment. Hence we had a "Gap" in understanding that was impossible to overcome with methods that ONLY involve statistical risk.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So my point on 'mononumerosis' was always, if all you report is a p-value in isolation, and don't incorporate representation risk in your decision making...well you've in all likelihood grossly underrepresented TOTAL risk of making an decision making error.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I hope this helps?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 06 Jun 2019 13:47:03 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Central-Limit-Theorem-a-naive-question/m-p/212110#M42467</guid>
      <dc:creator>P_Bartell</dc:creator>
      <dc:date>2019-06-06T13:47:03Z</dc:date>
    </item>
  </channel>
</rss>

