Among the new features in JMP 8 is the Beta-Binomial distribution. This distribution arrives from assuming *x*|*p* ~ Binomial(n,*p*), and *p* ~ Beta(*a*,*b*). In other words, if you want to simulate a Beta Binomial variate, first randomly select a value of *p *from the Beta(*a*,*b*) distribution, and then use that *p *to generate a Binomial *x *of size n.

Using JMP 8, you can calculate Beta-Binomial probabilities, generate random data, and fit the distribution to data. As an example of fitting the Beta-Binomial distribution to data, consider the following.

This is election season in the US, and many eagerly await for Nov. 4 to arrive so they can vote. But I’m impatient, I want to know now who will be elected president. So, suppose I do a little survey sampling of my own.

I randomly select 100 cities (among cities that have at least 100,000 people) and survey 1,000 people in each. For each city, I record the number of people who say they will vote for candidate A. (I didn’t really do it. This is a fictitious example, with fictitious data.) You can download the data, called Voting.jmp, from JMP’s file exchange. The file and a script I use in my next blog post are both part of a zip file called Voting.zip.

The histogram below (created with JMP’s Distribution platform) shows the distribution of the results. The average number of people (out of 1,000) who say they will vote for candidate A is 491, about 49%.

If the support rate for candidate A is constant for all cities, then the observed variation is explained by the binomial distribution. Fitting a Binomial distribution results in the following:

Notice that JMP overlays the fitted distribution density curve. In this case, the range in the data extends beyond the fitted curve, so there is more variation than is explained by the Binomial distribution. Perhaps the support rate, *p*, is different from city to city. If true, the Beta-Binomial distribution is more appropriate. Fitting a Beta-Binomial results in the following:

Notice the Beta-Binomial fit (green curve) encompasses all the variation in the data, and therefore it is a better fit than the Binomial.

The estimated parameters need some explaining. Remember the Beta-Binomial arises from assuming *x*|*p* ~ Binomial and *p *~ Beta(*a*,*b*). JMP estimates the following parameters:

p = *a*/(*a*+*b*), the mean of the Beta distribution.

and

delta = 1/(*a*+*b*+1), an overdispersion parameter.

If the overdispersion parameter is 0, the Beta-Binomial reduces to the Binomial (p). This means p is constant for all cities.

Of most interest to me is the distribution of the rate. I want estimates of the Beta parameters, *a *and *b* so I can quantify and visualize the distribution of *p*. To obtain estimates of *a *and *b*, use the following:

and

Doing this for our data, you get *a *= 27.64 and *b *= 28.63. The Beta(27.64, 28.63) distribution looks like the following:

The bulk of the distribution is between 0.30 to 0.70. The average of this distribution is 0.491. Does this mean candidate A will get 49.1% of the popular vote when combined across those cities I sampled? Not necessarily. You need to account for the population of the cities. Cities with a higher population have more weight when the popular vote is totaled. But how different would the results be?

Stay tuned. In my next blog post, I will continue the analysis and address the population question!

By the way, JMP 8 also introduces the Gamma Poisson distribution, which results from assuming that *x*|*lamda* ~ Poisson(*lamda*), and *lamda *~ Gamma(*alpha*, *beta*).

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.