For college football, the regular season is coming to an end, and in a few days we’ll know which teams are going to which end-of-season bowl games. Though some bowl assignments are determined by formula, each bowl often has a choice of several teams to invite to their game. Some teams have a reputation for “traveling well,” meaning they have a fan base that can raise attendance for far away games or for drawing a large TV audience. For instance, consider this blog post comment regarding Ohio State:
Buckeye Nation traveling well is an understatement. We are the best traveling college football fans in the nation.
I wondered if the numbers support the reputations and set out to collect attendance and TV ratings data. I put together 11 years of bowl games data, mostly from bcsfootball.org with some holes filled in from individual bowl Web sites. As often is the case, cleaning the data was a major undertaking. I used the Recode feature in JMP to standardize team names (“Connecticut” or “UConn”) and bowl names. Bowl names are particularly tricky as the actual names sometimes change over the years, and sometimes the same name has been used for different bowls (“Meineke Car Care Bowl” is one example). I used the JMP Geocoder add-in to determine locations to compute travel distances.
More variables would surely be helpful, though I’m hoping that the bowl itself serves as a proxy for variables like team ranking (since each bowl sits in a fairly stable slot in the bowl pecking order). Looking at models that include Year, Bowl, Year * Bowl, Team and Distance, it turns out that attendance and TV ratings behave quite differently. Almost all of the Bowl and Year * Bowl interactions are strongly correlated with Attendance, but only a few of them are correlated with TV Rating. Apparently, TV ratings are more likely influenced by which teams are playing. Distance is a relatively weak factor overall, and there’s not enough data to see how it applies to individual teams.
Bowl attendance trends vary by bowl, with some bowls being stable, some trending up and some trending down. The following graph shows the trends lines, with two from each category highlighted.
Turning to the teams, a complication with this data is that there are two Team columns and thus two team values per row. Stacking the team columns is possible, but it results in duplicate bowl-year rows (artificially reducing their p-values) and more importantly disassociates the two opposing teams so we can no longer model the shared responsibility for the results. A better approach is to create indicator columns, one per team, such that exactly two of the columns would be non-zero for each game. I found Table > Summary with Team as a subgroup to be a convenient way to generate the indicator columns.
Including the indicator columns with the Year and Bowl columns lets us see the effect of each team after accounting for the effect of the Year and Bowl. As I mentioned above, attendance is modeled very well by the combination of those two variables, and perhaps for that reason there are few teams with significant model effects. Here are those with the smallest p-values, sorted by the estimate value. Hawaii is a special circumstance since that school often plays in the Hawaii Bowl, and in general the Hawaii Bowl attendance is much higher when Hawaii plays in it.
The same model for the TV Rating response shows significant results for several schools. The top results are as follows. Of the schools with negative effects, none has a significant p-value.
One detail I didn't mention is that I excluded bowls and schools that had fewer than four games (Sorry, Duke) to avoid singularities in the model, which I only found when the model failed to fit. Further exclusions may also improve the model since most of the terms had so little significance but their estimates still participate in the final model. The advanced modeling features in JMP Pro can do such pruning automatically, and that’s what JMP statistical developer Clay Barker did when he analyzed this data. He used generalized regression, which iteratively looks for terms to remove so that the real terms can fit more of the data. Clay also combined the Attendance and TV Rating in a 1:4 ratio to produce a single combined score. Below are his top and bottom schools. A zero score means the team did not participate in the final model.
At least, Ohio State fans should be less mad at me, now. :)
If you want to examine the details or try a different model, the data file is in the JMP File Exchange. (Download requires a free SAS profile.)
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.