Roly, Always happy to help folks "broaden their horizons"! Perhaps a bit more broadening...Your statement " now understand that the distribution does not have to be normal, but some adjustment is needed if it is non-normal." is incorrect and highlights a major misconception of the control chart methodology (if you disagree, perhaps you can site the section of the literature that supports your argument). If the problem were enumerative, I would be interested in the distribution (and the probabilities associated with it), but since the problem is analytical, the distribution is of little consequence. I suggest you read Deming "On Probability As a Basis For Action", The American Statistician, November, 1975, Vol. 29, No. 4. Admittedly, it would appear more rational if the lower control limit were truncated to 0, but most if not all users recognize this in their assessment of the charts. The software is simply doing the math without regard to interpretation. I would say most if not all statistical outputs need to be interpreted IN CONTEXT! Regarding your comment: "Sorry for the confusion about consistent. I am not quite sure what you mean by "The # of tweets is inconsistent". If it is that the points are not distributed normally, I agree, they appear to be distributed with a gamma distribution." I understand this concept is difficult to understand. Perhaps this is the reason so many do not know how to use control charts appropriately. I'll say this again, inconsistency has nothing to do with a distribution. p. 359, Deming, "Out of the Crisis". The beauty of the Shewhart control charts is they assume no distribution. They provide a graphical look at the data in a time series (this is lost in any distributional analysis!). "A distribution only presents accumulated history of performance of a process, nothing about its capability...The capability of a process can be achieved and confirmed by use of a control chart, not by a distribution", Deming, "Out of the Crisis, P. 314. The range chart answers the question: Is the variation within subgroup (due to the variables changing within subgroup) consistent, stable and therefore predictable? Consistent in this reference means within an expected, predictable amount of variation. For the MR chart, the question being answered is similar: Is the variation in consecutive data points (due to the variables changing between consecutive data points) consistent, stable and therefore predictable? So when I say the number of tweets is inconsistent, this means the process of posting tweets is not predictable, not random and unstable. It would be difficult to summarize, with any confidence, this process with any enumerative statistics or make any probability statements as the statistics/probabilites would depend on when you looked at the data. Regarding the data source...You can't possibly answer who sent the tweets, only that they came for some account registered to "Donald". I do not have direct knowledge of the actual number of tweets, nor who sent them. Is there bias in the measurement process? Are the numbers accurate? I can't answer that. But I could suggest a sampling plan to provide insight. The distributional analysis with associated probabilities are inappropriate for a process that is inconsistent, non-random and unpredictable. I leave you with this quote, again from Deming (paper referenced above, : “Analysis of variance, t-test, confidence intervals, and other statistical techniques taught in the books, however interesting, are inappropriate because they provide no basis for prediction and because they bury the information contained in the order of production. Most if not all computer packages for analysis of data, as they are called, provide flagrant examples of inefficiency.”
... View more