cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
AAzad
Level II

How do I interpret Central Limit Theorem?

The central limit theorem says that the sampling distribution of the mean will always be normally distributed, as long as the sample size is large enough. Regardless of whether the population has a normal, Poisson, binomial, or any other distribution, the sampling distribution of the mean will be normal.

For continued process verification (CPV) or other statistical analysis, we assume that n=30 or more will ensure normal distribution and this "n" (sample number) is generally represented by the number of batches. For example, assay of an API (active pharmaceutical ingredient) in a drug product from 30 different batches is considered normally distributed. According to above definition, the assay of each batch has to be calculated mean of adequate number of samples (sample size) for each and every batch and not the composite sample assay or assay of one tablet or capsule. Hence the assumption of normal distribution on the basis mean coming from one individual assay is not right. In this regard, dissolution can be considered better candidate for the application of central limit theorem. For each mean value of dissolution time point, at least we use 6 units or more (L2, L3). Any comment will be appreciated.

13 REPLIES 13
AAzad
Level II

Re: How do I interpret Central Limit Theorem?

Thanks a lot. Your points are valid, but I was looking for a simpler answer. I am an applied user of statistics without statistical background. It is basically statistical process controls (SPC) of pharma products in commercial  manufacturing. In the process control charts, we get lot of warnings, specifically for Nelson Rules 1, 2 and 3, which are related to Special Cause of Variation. In the report, we need to explain all these warning signals and decide if we need to redesign the process to improve on those or those are due to other factors like inadequate numbers of sample size, or lack of compliance to independently and identically distributed randomly (IID) to ensure that we are not overreacting. Various articles in pharma on SPC and FDA guidance on SPC points to a sample size of 30 (mean from 30 different batches of a drug product) in order to make general assumption that the data are normally distributed. My question was: what should be the sample size of EACH of these means, say 30 means? Or, means for N1, N2, N3....N30 should be based on a sample size of what, like 5, 10, 20, 30, etc. or does not matter as long as you are using a mean sample size of N=30 (i.e. representing 30 different lots or batches) and above?

statman
Super User

Re: How do I interpret Central Limit Theorem?

Understood.  Here are my further thoughts:

It is possible you are not correctly interpreting the control limits on your control charts.  Let me make sure you understand the charts and their use. I will start with the Shewhart chart (aka X-bar, R charts).  These charts require a rational and logical subgrouping and sampling strategy:

 

“The engineer who is successful in dividing his data initially into rational subgroups based on rational theories is therefore inherently better off in the long run. . .”

                                                                              Shewhart

 

The purposes of the charts are to:

  1. Understand whether the variation is special (assignable) or common (unassignable or random)
  2. Determine which source of variation has greater influence on the metric being charted

The range chart is used to assess consistency/stability and hence predictability.  Points beyond the control limits are evidence of "special cause" variation (Deming's term) or assignable (Shewhart) due the x's that vary within subgroup.  These suggest the within subgroup sources may be acting unusually and un-predictably.  It is worthwhile (economically) to study these "events".  It also may not be wise to calculate control limits for the X-bar chart as the limits are un-stable or inconsistent.

The X-bar chart is a comparison chart.  It compares the variation due to the x's changing between subgroup (the plotted averages) to the x's changing within subgroup (the control limits).  This chart answers the question which component of variation is greater, the within (points are inside the control limits) or the between (points are outside the control limits).  This is not evidence of inconsistency or instability as in the range charts (so not special per Deming).

 

I strongly suggest you read Shewhart and Wheeler's book on SPC and some of his published articles on Rational Subgrouping and Rational sampling (https://www.qualitydigest.com/inside/standards-column/rational-subgrouping-060115.html

https://www.qualitydigest.com/inside/statistics-column/rational-sampling-070115.html

 

Reference:

Shewhart, Walter A. (1931) “Economic Control of Quality of Manufactured Product”, D. Van Nostrand Co., NY

Wheeler, Donald, and Chambers, David (1992) “Understanding Statistical Process Control” SPC Press (ISBN 0-945320-13-2)

Woodall, William H. (2000), "Controversies and Contradictions in Statistical Process Control", Journal of Quality Technology, Vol. 32, No.4 October 2000

The last reference has a discussion associated with it.

"All models are wrong, some are useful" G.E.P. Box
MRB3855
Super User

Re: How do I interpret Central Limit Theorem?

Hi @AAzad : WRT to the CLT...I'll restrict my comment to a short answer to your last question.

Q: My question was: what should be the sample size of EACH of these means, say 30 means? Or, means for N1, N2, N3....N30 should be based on a sample size of what, like 5, 10, 20, 30, etc. or does not matter as long as you are using a mean sample size of N=30 (i.e. representing 30 different lots or batches) and above?

A: It doesn't matter what the sample size is of each of the means. I can expand on this if you like.

 

And, as you can see from others' learned comments, there is a lot of nuance around your particular application. What I've answered here is the narrow question about the CLT. 

Re: How do I interpret Central Limit Theorem?

You don't mention how the run ruiles were selected for each chart. There are many rules to choose from. Like outlier tests, each is designed with the generating process (for the special cause) in mind. They should not be applied automatically to every chart. They will increase the rate of false alarms (decrease the average run length when in control).