Our World Statistics Day conversations have been a great reminder of how much statistics can inform our lives. Do you have an example of how statistics has made a difference in your life? Share your story with the Community!
I’m sure most of you reading this have encountered the birthday probability problem:
Given a group of n people, what is the probability that at least two people have the same birthday? The answer is given by the formula
This formula is simplistic because it assumes birthdays are equally spread across all 365 days of the year, which according to some studies is not true. It also ignores leap year. But, for our purposes the formula works fine. Let’s have some fun with it!
Surprisingly, with only 23 people, the probability of at least one match is about 0.5073. With 40 people, the probability is about 0.8912, and with 60 people, the probability is about 0.9941. A plot of the formula is shown here:
For our little example, pretend I’m a teacher who teaches short courses. On the first day of class, I ask the students if they are willing to go along with a little wager. If there is at least one birthday match, everyone in the class has to give me $1. If not, I give everyone $1. What is the average amount of money I will have at the end of a year? The answer is easy to work out, and depends on the number of students in a class, and how many classes I teach during the year. The average amount at the end of the year is
where n is the number of people in the class, Prob is the probability from the formula above, and Classes is how many classes I teach during the year. I’m also interested in the standard deviation of the annual amount, given by the following:
where Mean is the average given by the first formula. Hint, Variance(x) = E(x^2)-E(x)^2. Note, these two formulas hold only if the number of classes taught each year is the same, and if each class has the same number of students. If Classes is a random variable (can change from year to year), then the formulas are different, particularly the standard deviation. So, for simplicity, we’ll assume n is the same for each class and the number of Classes is the same each year.
I want to estimate the probability that the annual amount will be $0 or less. I can use JMP’s Profiler and Simulator to do this. Furthermore, I can use the Simulator to study how the probability is affected by different combinations of n and Classes. I created a JMP file with all these formulas included, ready to Profile. It’s called BirthdayModel.jmp and can be downloaded from JMP’s File Exchange.
The column we want to simulate is called Total. The Simulator not only allows you to place distributions on the model inputs, but it also allows you to add random noise to the simulated response value. I can feed the standard deviation of the annual amount into the Simulator. Simply specify the amount as shown below:
But, there’s a problem. This feature only allows you to enter a single value for the variability in the response. In our example, the standard deviation of the annual amount changes as a function of n and Classes. So, what can I do? Look at the formula for Total:
Mean + Normal*StdDev
where Mean is the formula for average annual amount, and StdDev is the formula for standard deviation of the annual amount. It includes a factor called Normal. When I use the Simulator, I’ll make that factor have a Normal (0,1) distribution. Therefore, the resulting simulated values for Total will have the right standard deviation, and that standard deviation will change as n or Classes changes. Just what I want! Also, this forces the sampling distribution of Total (for given n and Classes) to be Normal. According to other simulations I’ve done, this is a reasonable approximation, as long as n is between 10 and 40.
Run the “Profiler” script attached to the data table. This launches the Profiler and Simulator. Economic conditions are uncertain, so I can’t predict n and Classes precisely. Instead, I can simulate for a variety of combinations of n and Classes. Therefore, I make n a Poisson variate with mean 35. And as we discussed before, Normal is a Normal (0,1) distribution. But what about Classes? I think I’ll be able to teach 1 or 2 classes per month, which equates to a min of 12 and max of 24 per year. But, how do I simulate this? I use the Expression feature, which is essentially JSL code. The expression I use creates 12 random variables (each a random integer either 1 or 2), and then sums the 12 variables to get the simulated number of classes in the year.
Add a lower Spec Limit of 0, and run the Simulator to get results similar to the following:
Incorporating the uncertainty in n and Classes, at the end of a year I will have on average $393, with a 3.7% chance of being less than $0. Should I play that game?
What happens if I ignore the uncertainty in n and Classes? Change the variables to Fixed and set n=35 and Classes=18.
The chance of being less than $0 drops to 0.03%. But ignoring uncertainty is not good risk management policy, so I’ll stick with the first result.
This example is not as complete as I would like. As I said before, this method doesn’t choose a new class size for each class. It assigns the same class size for each class in a year. To get closer to reality, I needed to use JSL. The data table contains an attached script called “Birthday Script.” Copy the script contents to a script window and close the data table before running the script. The script chooses a random class size for each class and allows for the number of classes to change each year. All you do is provide the averages you want for n and Classes. The result is a histogram of simulated annual amount.
On a different but similar note, I recently learned that two other people on my floor here at work have the same birthday as me. My floor has 37 people. You might think, "Wow, that’s surprising! That doesn’t happen too often." You might ask, "What is the probability of at least three people out of 37 having the same birthday?" The formula I gave before is for at least 2 people having the same birthday, so it doesn’t work. I searched the Internet and couldn’t find any formula for at least 3 people. But, no worries, I simulated it with JSL. For a group of 37 people, the probability that at least three have the same birthday is about 0.053. My company has about 11,200 employees, which is about 302 groups of 37. Therefore, using the Binomial distribution with p=0.053 and n=302, I predict that 9-23 of the 302 groups have at least three people with the same birthday. The JSL code I used to estimate the probability is attached to the data table and is called “At Least 3”.
Maybe it happening on my floor is not so surprising after all.