In an earlier blog post, I shared that I used the JMP 12 version of the Recode platform to clean up food item names in a data table containing nearly four years of food log information. I was able to halve the number of unique food item names that appeared in my ~35,000-row table, reducing the table down to ~900 unique food items. Even if you don't keep a food log, I'm sure you can envision how useful this kind of cleanup and consolidation could be when working with your own large data tables! I gave a few more details in my e-poster presentation at Discovery Summit 2014, (which you can find on the JMP User Community in PDF form), but when I wrote my first Recode blog post, it wasn't quite time to share the many new features of Recode in JMP 12 and how I used them to streamline my data cleanup process.
Now that preview posts for JMP 12 have officially kicked off, I wanted to give some more information about how I used Recode enhancements to significantly reduce the time needed to combine and group my food items. I was fortunate to be able to test many of the new Recode features on my food log data from a very early stage in their development. I have to give a big thanks to Recode developer James Preiss for patience with my frequent emails and visits to his office while he was working on this new feature! I found that many of my requests lined up well with user suggestions we had received for Recode. I really enjoyed watching these suggestions solidify in the platform as it was under active development.
Let's consider an example that I used frequently when discussing Recode ideas with James: grouping and cleaning a set of food items whose names contained the key word "chocolate." I described in my earlier post how in JMP 11, I used Data Filter on the underlying table to find all food items containing the word chocolate. With that subset of the table in view, I then scrolled up and down in the Recode dialog to locate those items and pasted a common item name into the edit box for all related items. Changing an established group name could be tedious, unless I was ready to create, apply, and edit my Recode script.
Recode in JMP 12 offers a number of time-saving shortcuts for grouping items. The filter field in the Recode dialog makes it easy to find chocolate-containing item names within my long list. I found that automatic grouping by text edit distance worked well for grouping short food item names (e.g.,"Nestle After Eight Chocolate Mints" and "After Eight Chocolate Mints" in the example below) and also helped combine nearly identical names truncated at slightly different lengths in my food log files.
Recode's new manual grouping option was very useful for consolidating longer names that didn't match up at the text edit threshold I set. I could control or shift-click to select a set of related items and right-click to group them, choosing one value as the group name. If the grouped items were distant from one another alphabetically, they would be automatically reordered to appear together. If I decided to change the group name to a new value (e.g., "Milk, Chocolate"), I could simply edit the group name to make it apply it to all grouped items.
Once I had cleaned up all my individual item names, I created a set of 30 or so food groupings that were meaningful to me for classifying my food items. I created two different classification variables for each item. My Food Category variable contained a comma-delimited list of all categories to which a food item belonged, saved as a multiple response column. The Primary Food Category variable contained a single category-the group in which I thought the food best belonged. For "whole" foods, values in these two columns were often the same, but they differed for complex food combinations. While I placed sugar snap peas under Vegetable in both food category columns, I put a salted caramel mocha into the Primary Food Category I called CoffeeMilk. In contrast, the comma-delimited Food Category list for a mocha listed the values "Chocolate, Coffee, Milk." I used my multiple response Food Category variable in the JMP Data Filter when I wanted to select all foods that belonged to a primary food group. I used my Primary Food Category as a grouping variable in graphical summaries and in cases where it made less sense for items to belong to more than one group.
I tried out a variety of different JMP graph types with this data table, and my favorite visualization was the treemap. I created item-level treemaps without a grouping variable to see which individual foods contributed most to my calorie totals over the past four years, but I found that using my primary food group categories as a second grouping level in the treemap was very helpful in comparing the contributions of food items next to other similar foods. Here is a treemap including grouped foods I ate over all four years. I used the Local Data Filter option under the Graph Builder red triangle menu Script section when I wanted to restrict the view to specific meals or years as desired.
To create a treemap using the existing JMP 11 split layout, open Graph Builder and:
(Once you have JMP 12, you only need to choose Squarify from the Layout pulldown under the Treemap properties pane on the left to get the new algorithm.)
I created a simplified version of this treemap for my e-poster showing total calories eaten by year, with the size of the squares representing the number of calories eaten from each primary food category. This graph helped me understand how my eating patterns at the food group level had shifted over time. One of the most obvious changes I observed was that I used to eat a lot more items in the Bread category than I do now. Digging into my data more deeply at the meal level revealed that this was primarily due to changes in my typical breakfast, which used to include scones with coffee but is now usually chocolate Greek yogurt. Like many people, I tend to develop a favored first meal of the day and stick with it until I get tired of eating it. Becoming aware of this shift caused me to question why I stopped baking my favorite maple oat nut scones, and I recently went back to making them more often!
To replicate this year by year treemap using the JMP 11 split algorithm:
(Again, in JMP 12, you can choose Squarify from the Layout pulldown under the Treemap properties pane. )
If you saw my earlier post about my weight loss and maintenance journey, then you may be surprised to see how many "junk" foods show up in my food log. I'll admit I eat dark chocolate almost every day in addition to my favorite cocoa powder/toffee nut syrup/plain greek yogurt/caramel sauce breakfast mixture that's been repinned hundreds of times on Pinterest! I've found that it's not necessary for me to cut out so called "junk" foods entirely while maintaining my weight in my preferred range. Keeping a close eye on the total number of calories I consume has turned out to be much more critical to maintaining my weight long-term.
Check out the first blog post in this series to learn more about my interest in quantified self (QS) data analysis. You can download a PDF version of my JMP Discovery Summit 2014 e-poster with more examples of treemaps I created from my data and download my JMP add-in to import your own BodyMedia® files here. I haven't attempted to generalize the food item recoding process with a script because I think the foods included in a food log will vary too greatly for a general script to be useful. But if you want to replicate my approach to create a consolidated set of item names in your own food log using Recode, grab your copy of JMP 12 when it comes out and see my earlier blog.
P.S. It’s free to join the JMP User Community, where you can learn from JMP users all over the world!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.