Subscribe Bookmark



Jun 23, 2011

Analyzing NC State Fair Attendance Using JMP

Photo of Krispy Kreme burger from NC State Fair Deep Fried blog The North Carolina State Fair is in full swing in Raleigh this week. All the talk lately in the break room at the office concerns the Fair. "Did you go to the fair this weekend?" "What day will you go?" "Are you going to try a Krispy Kreme burger?" Fried Snickers bars and chocolate-covered bacon are sooo last year -- this year's outrageous food item is the hamburger with doughnuts where the buns should be (see photo at right, used with permission from Paul Jones of the NC State Fair Deep Fried blog).

If you're a planner, you might want to attend the fair on a day when the crowds aren't as large. That way parking should be more tolerable, and you won't have to wait as long in the Krispy Kreme burger line.

Conveniently, attendance numbers are available on the NC State Fair website. Last week, a SAS colleague analyzed this data using SAS. That motivated me to see what JMP can do with it today, which is World Statistics Day.

The website contains a table of data for each day of the fair. To get the data into JMP, first I tried copying and pasting; unfortunately, the data was not formatting properly. Luckily, someone told me about a handy feature in JMP to solve this issue -- Internet Open (under the File menu). Just enter the Web address where the data resides, and JMP automatically finds it, imports it and even formats it correctly.

Note: There are two tables found on the NC State Fair website. Choose the first table to see the attendance data.

Once the table is imported into JMP, there are two housekeeping details necessary to get the data in workable form. Change "Thu." to be a numeric column, and exclude/hide the row for 2010 if it contains partial data.

The first plot I'm interested in seeing will compare trends in attendance over the various years. The Parallel Plot platform can do this with just a few clicks (be sure to select Scale Uniformly). Below I've colored the rows according to which decade they belong to. For example, in the most recent decade (red lines), you can see a sharp spike in attendance on the second Thursday over previous decades. Can anyone tell me when they started the canned food drive on Thursdays?

A colleague suggested that the raw numbers are hard to compare from year to year since the overall numbers vary so much. He would prefer to know the daily attendance for each day as a percentage within each year. To create this plot, we first need to use Transpose in order to turn Day into a column. After adding a new column for the percent, Graph Builder is easily able to produce a graph of box plots that summarizes attendance by percentages per day.

The results are not surprising -- weekends have the heaviest traffic, with the second Saturday being the most popular day to go. I personally like to go on a weeknight. But no matter when I go, no matter what, you won't catch me anywhere near a Krispy Kreme burger!