cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
The Discovery Summit 2025 Call for Content is open! Submit an abstract today to present at our premier analytics conference.
Choose Language Hide Translation Bar
View Original Published Thread

Confidence Intervals for data with non normal distribution

Noeleen20350465
Level I

I'm very new to advanced statistics and JMP so I'm not sure if I'm even picking the right analysis to do, but I'm hoping someone can help direct me.

I have a list of 56 patients who were offered an appointment, and I have the length of time that each of them waited in days.  The data has a non-normal distribution with a significant right skew.  Following a distribution fit, it's showing that the data is Fitted Johnson Sb Distribution.

I want to calculate the 95% confidence interval so that I can say something like 'The mean wait time to be offered an appointment is 209 days,  with a 95% Confidence Interval it will be between 135 days and 280 days.'

I tried doing a bootstrap but the CI I'm getting just doesn't seem to align with the original data that I have, so I don't know if I'm doing it the right way.

Is it possible, and appropriate, to do a CI on the data that I have?  If so, how should I do it? 

Thanks in advance...

1 ACCEPTED SOLUTION

Accepted Solutions
dlehman1
Level V


Re: Confidence Intervals for data with non normal distribution

That may be too skewed or the kurtosis may be too high.  There are rules of thumb for the required sample sizes related to each of these, but they are hard to find (indeed, I can't seem to find them now).  So, I'm not sure this data is sufficient to use those calculated confidence intervals for the mean.  However, there is something that I think is of greater concern.  You say that it is Johnson Sb that fits the data best - I think that is questionable.  First of all, the "best" fit is based on a calculated metric which doesn't always reflect a "good" fit.  Second, I think a number of distributions might fit this data equally well (almost).  For example, an exponential distribution probably would work as well.  I do think a bootstrap approach would make sense here - and you might want to use the fractional weights bootstrap given that your dataset is small and the number of outliers is even smaller.

View solution in original post

5 REPLIES 5


Re: Confidence Intervals for data with non normal distribution

The beauty here is that you are forming a confidence interval for the mean. The central limit theorem shows that as the sample size gets large enough, the mean will come from a normal distribution. So, unless the skewness is REALLY severe, a sample size of 56 is likely large enough for the central limit theorem to hold. Further, because you are probably estimating the standard deviation from this same data, you will be using a t-value (JMP's default for confidence interval calculations) rather than a z-value, which provides even greater assurance that the confidence interval will be appropriate. 

 

Just remember the confidence interval is only for the mean. Many people still like to make inferences about individual observations by using the confidence interval.

Dan Obermiller


Re: Confidence Intervals for data with non normal distribution

Hi @Dan_Obermiller,

Thank you very much for your response.

Here is the distribution graph for the raw data.  It's Johnson Sb...

Noeleen20350465_0-1738048337768.png

Is this too skewed for the central limit theorem to hold?

 

Here are the summary stats incase that helps.

 
 

Noeleen20350465_4-1738048542500.png

Does this CI make sense for this data set?

Noeleen20350465_5-1738048573745.png

 

Or do I need to do something differently based on it's skewness?

Thanks,

Noeleen

 

 

 

 

dlehman1
Level V


Re: Confidence Intervals for data with non normal distribution

That may be too skewed or the kurtosis may be too high.  There are rules of thumb for the required sample sizes related to each of these, but they are hard to find (indeed, I can't seem to find them now).  So, I'm not sure this data is sufficient to use those calculated confidence intervals for the mean.  However, there is something that I think is of greater concern.  You say that it is Johnson Sb that fits the data best - I think that is questionable.  First of all, the "best" fit is based on a calculated metric which doesn't always reflect a "good" fit.  Second, I think a number of distributions might fit this data equally well (almost).  For example, an exponential distribution probably would work as well.  I do think a bootstrap approach would make sense here - and you might want to use the fractional weights bootstrap given that your dataset is small and the number of outliers is even smaller.


Re: Confidence Intervals for data with non normal distribution

Thank you,

I tried the fractional weights bootstrap and the results are in line with what I anticipated.

I'll look into the 'best' vs 'good' fit...thanks for the heads up on this.

 

statman
Super User


Re: Confidence Intervals for data with non normal distribution

I apologize for my response, but if I were  you, I would be interested in understanding why there is so much variation in wait times for an appointment.  I don't know what they are waiting for, but they appear really long!. Admittedly I'm deterministic and don't think confidence intervals are useful for your saturation.  The variation is likely not predictable due to factors that influence the length of time that each of them waited in days coming from different causal relationships.  I would seek to understand this (analytical) rather than develop a prediction based on confidence intervals  (enumerative).

"All models are wrong, some are useful" G.E.P. Box