Discussions

Noeleen20350465 · Jan 27, 2025 01:04 PM

I'm very new to advanced statistics and JMP so I'm not sure if I'm even picking the right analysis to do, but I'm hoping someone can help direct me.

I have a list of 56 patients who were offered an appointment, and I have the length of time that each of them waited in days. The data has a non-normal distribution with a significant right skew. Following a distribution fit, it's showing that the data is Fitted Johnson Sb Distribution.

I want to calculate the 95% confidence interval so that I can say something like 'The mean wait time to be offered an appointment is 209 days, with a 95% Confidence Interval it will be between 135 days and 280 days.'

I tried doing a bootstrap but the CI I'm getting just doesn't seem to align with the original data that I have, so I don't know if I'm doing it the right way.

Is it possible, and appropriate, to do a CI on the data that I have? If so, how should I do it?

Thanks in advance...

dlehman1 · Jan 28, 2025 07:49 AM

That may be too skewed or the kurtosis may be too high. There are rules of thumb for the required sample sizes related to each of these, but they are hard to find (indeed, I can't seem to find them now). So, I'm not sure this data is sufficient to use those calculated confidence intervals for the mean. However, there is something that I think is of greater concern. You say that it is Johnson Sb that fits the data best - I think that is questionable. First of all, the "best" fit is based on a calculated metric which doesn't always reflect a "good" fit. Second, I think a number of distributions might fit this data equally well (almost). For example, an exponential distribution probably would work as well. I do think a bootstrap approach would make sense here - and you might want to use the fractional weights bootstrap given that your dataset is small and the number of outliers is even smaller.

View solution in original post

Dan_Obermiller · Jan 27, 2025 04:17 PM

The beauty here is that you are forming a confidence interval for the mean. The central limit theorem shows that as the sample size gets large enough, the mean will come from a normal distribution. So, unless the skewness is REALLY severe, a sample size of 56 is likely large enough for the central limit theorem to hold. Further, because you are probably estimating the standard deviation from this same data, you will be using a t-value (JMP's default for confidence interval calculations) rather than a z-value, which provides even greater assurance that the confidence interval will be appropriate.

Just remember the confidence interval is only for the mean. Many people still like to make inferences about individual observations by using the confidence interval.

Dan Obermiller

Noeleen20350465 · Jan 28, 2025 02:20 AM

Hi @Dan_Obermiller,

Thank you very much for your response.

Here is the distribution graph for the raw data. It's Johnson Sb...

Is this too skewed for the central limit theorem to hold?

Here are the summary stats incase that helps.

Does this CI make sense for this data set?

Or do I need to do something differently based on it's skewness?

Thanks,

Noeleen

dlehman1 · Jan 28, 2025 07:49 AM

That may be too skewed or the kurtosis may be too high. There are rules of thumb for the required sample sizes related to each of these, but they are hard to find (indeed, I can't seem to find them now). So, I'm not sure this data is sufficient to use those calculated confidence intervals for the mean. However, there is something that I think is of greater concern. You say that it is Johnson Sb that fits the data best - I think that is questionable. First of all, the "best" fit is based on a calculated metric which doesn't always reflect a "good" fit. Second, I think a number of distributions might fit this data equally well (almost). For example, an exponential distribution probably would work as well. I do think a bootstrap approach would make sense here - and you might want to use the fractional weights bootstrap given that your dataset is small and the number of outliers is even smaller.

Noeleen20350465 · Jan 28, 2025 02:04 PM

Thank you,

I tried the fractional weights bootstrap and the results are in line with what I anticipated.

I'll look into the 'best' vs 'good' fit...thanks for the heads up on this.

statman · Jan 28, 2025 05:09 PM

I apologize for my response, but if I were you, I would be interested in understanding why there is so much variation in wait times for an appointment. I don't know what they are waiting for, but they appear really long!. Admittedly I'm deterministic and don't think confidence intervals are useful for your saturation. The variation is likely not predictable due to factors that influence the length of time that each of them waited in days coming from different causal relationships. I would seek to understand this (analytical) rather than develop a prediction based on confidence intervals (enumerative).

"All models are wrong, some are useful" G.E.P. Box

Discussions

Confidence Intervals for data with non normal distribution

Re: Confidence Intervals for data with non normal distribution

Re: Confidence Intervals for data with non normal distribution

Re: Confidence Intervals for data with non normal distribution

Re: Confidence Intervals for data with non normal distribution

Re: Confidence Intervals for data with non normal distribution

Re: Confidence Intervals for data with non normal distribution

Recommended Articles