Recoding data to explore the popularity of Halloween costumes
Our World Statistics Day conversations have been a great reminder of how much statistics can inform our lives. Do you have an example of how statistics has made a difference in your life? Share your story with the Community!
What will your costume be for Halloween this year?
What will your child/children dress up as for Halloween?
What will you dress up your pet as for Halloween?
I imported the data into JMP, and I found that the data required some tidying.
Fortunately, JMP 12 has a newly designed Recode feature, which allows you to clean up your data more efficiently.
At first, the imported data looked like this:
Notice there are unwanted numerals in the Costume column, white space and missing values in the data table.
I recoded the Costume column to remove the extra characters by highlighting it, and selecting Cols > Recode. I used the Trim Whitespace option from the red triangle menu to get rid of the whitespace before and after each value.
I also used the Filter search bar to search for any numbers that I didn’t want to include in my recoded data table. By entering “1” in the search bar, every value containing a “1” is grouped at the top. After deleting the unwanted characters in the New Values column, the old values and the new values are grouped together and appear shaded. When working with a lot of data, I use the Show only Grouped/Ungrouped check boxes to help control my view.
The Group Similar Values red-triangle menu option is also a nice way to organize and tidy data — especially when checking for consistency. There are values that appear multiple times in this data table, but they have different spacing or an extra letter (for example, “Batman” and “Bat man”). I wanted to find those values and recode them so they are consistent throughout the table. The Difference Ratio and Max Character Difference options automatically group values together that differ by just a couple characters (depending on your settings). This makes it easy to find mistakes or inconsistencies. I kept the default Difference Ratio value of .25, which grouped values that are at most 25% different — in other words, values that have at least a 75% character match are grouped together.
Here’s what the data looks like after grouping similar values:
Now I can easily see the grouped similar values. I edited each group so that every instance of a given value looks the same. For instance, I changed the new value of the “Star Wars Character” group so that each of the three instances have the same spacing. After making the appropriate changes, I selected Done > In place. This way, the values in the New Values column will replace the old values in the data table. To preserve the original data, save the changes made in Recode to a new column by selecting Done > New Column or Formula Column.
Here's what the data looks like in Graph Builder when analyzing costume by percent:
Note the rows containing “Other” have been excluded and hidden.
Also, notice that the plot above is cluttered and hard to analyze. I recoded the Costume column again to organize the costumes into categories in order to make it easier to find patterns in the data. Once the Costume column was in Recode, I selected the values to group and used the right-click option, Group To. Because there are many animal costumes, I grouped them and made an “Animals” category.
After grouping all the animals under “Animal (Cat, Dog, Lion, Tiger, etc.)”, I shortened the name of the category to “Animals”. I grouped the remaining costumes under categories such as, “Superhero”, “Fantasy”, “Scary”, etc. After recoding, I selected Done > Formula Column to preserve my original costume column. I named the new column “Categories”. You can view the formula by double-clicking on the column header.
Here’s what the table looks like with the Categories column:
Now that the Costumes are binned into categories, Graph Builder provides a more interpretable plot. In Graph Builder, I used the Percent column for the X variable and Categories for the Y variable. I grouped the data by Child, Adult and Dog, and ordered it by popularity. Here’s the result:
The above graph shows the percent of costume choices by category for each group (Adult, Child, Dog). I can see which costume category is the biggest hit among adults, kids, and dogs. It appears that the Fantasy category was the most popular for adults, the Animals category for kids, and the Object category for dogs.
To further analyze the data, I used the Local Data Filter to view just the adult costume choices. Here’s what the data looks like filtered by adults and sorted by popularity:
The Witch costume is the most popular among adults this year.
Now I’ll examine which specific costume is the most popular among kids and adults. Again using Graph Builder, I analyzed the proportion of people who chose a particular costume. Because the data did not include a count for dog costumes, I created a new column called Group containing only “Adult” and “Child” before I ran the analysis. The entries for the Dog rows were missing values. Here’s a mosaic plot to illustrate the results:
The vertical axis indicates which proportion of the Costume column falls into the Child or Adult group. The overall size of each bar indicates which costumes are popular for both kids and adults. The graph shows about 75% of those who chose the witch costume were adults. About half of those who chose an animal costume were adults and half were kids. Some costume groups, such as Princess, are completely dominated by one group. In this case, kids make up the total proportion of people who chose princess as their costume.
For additional fun, I grouped the data by category using the Fit Y by X platform to create another mosaic plot. Here’s what the plot looks like:
I can see that only kids chose costumes from the Object category, and only adults chose costumes from the Occupation category. When comparing overall bar size, Fantasy, Scary and Superhero are the most popular categories for both adults and kids. I can observe the same findings in one of the bar graphs mentioned previously, however with the mosaic plot (above), I can compare costume popularity across both groups more easily.
I wish everyone a fun and safe Halloween! Look out for witches — apparently we can expect to see lots of them this year!