If you read my last post, then you know that I’m giving myself the gift of data this holiday season! For me, collecting data on my diet and fitness habits is a gift that just keeps on giving. Although I may not look at all my data sets on a daily basis, the information is there when I need to use it to help me understand my patterns.
Collecting food data isn’t just a holiday habit for me, however. I have been logging the foods I eat at each meal year-round for nearly five years, using BodyMedia’s app for most of that time before adopting MyFitnessPal earlier this year. I presented a poster at JMP Discovery 2014, which described how I imported my BodyMedia food log data into JMP by saving multiple pdfs to text, concatenating them, and using regular expressions to parse details into columns in a JMP table. MyFitnessPal saved me a few steps by producing report output that can be saved as a single text file, but I still needed to use regex parsing to create a usable JMP table.
The major challenge I recently solved was unifying these two food log data tables, and my experience solving this is relevant to anyone who has to combine data from different sources.
Minor differences like column names and format types were easy to adjust, and no matter that each table contained a few unique columns -- those simply came through with missing values for rows that didn’t have values. The main problem was that there were so few cases where item names were the same in both databases that I had to clean up the two tables using parallel but separate processes.
I already knew that my BodyMedia food log contained many instances of redundant names that referred to essentially the same food item, and this made it much harder for me to aggregate data across those items to look for patterns. I used the JMP 12 Recode platform to group similar item names into a set of cleaned item names, and described this process in an earlier blog post.
MyFitnessPal includes even more brand names and user-supplied items than BodyMedia, further magnifying this naming redundancy problem. When I switched to using MyFitnessPal to log my foods, I had tried to reuse my BodyMedia script to recode my MyFitnessPal item names. But with so few shared item names between the databases, it made more sense to create a completely separate Recode script for the MyFitnessPal project.
Now, my desire to combine my two data tables to examine my holiday eating patterns became a good motivation for discovering the most efficient way to take the cleaned item names from both tables and combine them into a single, consolidated set of item names. Of course, at the end of the exercise, I wanted to be able to save a Recode script that referred back to the original item names so I could continue to update that script as I added new items to my MyFitnessPal log.
After tweaking column names and formats for consistency, I concatenated the tables, chose the Cleaned Item Name column, and opened the Recode platform. Since I had used similar patterns to rename items in both tables, many items showed up near one another. I worked through the rest of the items, grouping similar values and making heavy use of the Recode filter field to find subgroups of related items. I recoded the full set of cleaned items names into a new column I called Consolidated Item Name.
To save a Recode script that I could reload into the dialog for use with new MyFitnessPal data, I used a formula column trick I shared in a previous blog post. It's a trick I discovered when I needed to create a Recode script “after the fact” to capture work I did in an early version of the JMP 12 platform before it supported saving and reloading a script into the dialog. Now, I used the same formula column approach (which makes use of the character Concat function) to create the "guts" for a Recode script to relate the original database item name to its final, consolidated name. After adding the appropriate header and footer to the script, I can reload my original item names and consolidated names into the Recode dialog and update the list when I import new food item names from MyFitnessPal.
With my cleaned and complete data table in hand, I was now able to update a second data cleaning script I maintain to assign my food items into primary food categories. I have found that placing individual items into higher-level categories can be quite useful for comparing related item sets over time. I created my own classification system using groupings that made sense to me.
For example, I have a food category called ChocolateCandy, which includes dark, milk and white-chocolate containing items. I rarely eat non-chocolate candy types, so giving the category a more generic name like Candy didn't work for me. When looking at major trends over time, I'm not going to split hairs about whether it was a milk chocolate or dark chocolate truffle that I ate, but I'd like to be able to determine what proportion of my treat calories belong to the whole category.
For this blog post, I created a subset table containing just meals from a handful of selected holidays from the past five years. I used Recode to aggregate those chosen dates into a new category column called Holiday, and added a Value Ordering column property to it to lock these holidays in sequential order: New Year’s Day, my birthday, Thanksgiving, Christmas Eve, Christmas Day and New Year’s Eve.
This Graph Builder heat map shows the number of calories I logged on each holiday, which varied pretty widely. When I was actively losing weight post-baby, I ate less, and in maintenance, I tend to eat more.
Previously, I shared treemap visualizations of my food log data aggregated over days, weeks, months or years, but I find treemaps to also be useful for visualizing patterns in daily data in a compact way. I created a treemap in Graph Builder to show the calorie breakdown for each holiday meal during the four years where I had complete data for all six holidays.
All of this data was collected in the BodyMedia app, which provides six potential slots for logging foods-Breakfast, AM Snack, Lunch, PM Snack, Dinner, and Late Snack. I used Year as the X grouping variable and Holiday as the Y grouping variable, and colored squares by meal. Since I assigned Calories as a Y variable, each holiday’s section of the treemap is sized in proportion to the highest number of total calories I ate, which you can see occurred on Christmas 2012.
I usually log foods into meal slots by the time of day when I ate them. For example, a late breakfast or lunch would likely be logged into a snack slot. You can see this effect in the timing of some of my holiday meals. On a non-holiday, I usually log breakfast, lunch and dinner as larger meals with smaller snacks in the morning and afternoon. On major holidays, the big meal of the day sometimes falls in the afternoon snack timeframe (for example, Christmas 2012, New Year’s Eve 2011 and 2014). In those cases, I usually logged a rather substantial dessert in the dinner time slot!
Given this variability in meal timing, I actually found it more helpful to remove the meal grouping variable and look at total daily calories grouped by food category. Grouping similar items gave me a better feel for the types of foods that tend to dominate my holiday eating. It is not at all surprising that these include lots of desserts, chocolate and caloric drinks. Holidays are definitely all about social eating for me and my family!
Like many people, I look forward to certain special holiday foods. I thought it would be interesting to look across my whole data set and contrast the patterns of holiday-only items with treats I eat more regularly. For example, cheesecake is one of my favorite desserts, and I eat plain cheesecake throughout the year. The bar graph below shows my monthly calorie totals, filtered to include a select few food items. In contrast, pumpkin cheesecake is a seasonal treat, usually restricted to the holiday season -- that is, unless I happen to stash an extra one in the freezer like I did last year! Also, a dish called Cranberry Delight that I make only on holidays usually only appears in my food log between Thanksgiving and Christmas, although it did make an appearance at Easter dinner in 2013.
So what was the biggest challenge of this blog post? You might think that it was the data collection, the recoding challenges or tweaking my visualizations to show my holiday eating patterns in the best possible light, but all those guesses are wrong. Actually, the toughest challenge was leaving the many JMP 13 updates behind and returning to JMP 12.2 to create my graphs! In fact, I blame the ongoing development of JMP 13 for making me a much less productive blogger in the second half of this year. I’ve been using development versions of JMP 13 for nearly a year now for all my work and personal projects. I am literally biting my tongue right now wanting to tell you about all the amazing new features that are coming! But you’ll just have to be patient until JMP 13 launches to hear about the new and improved additions that I can’t live without. In the meantime, happy holidays!