Our World Statistics Day conversations have been a great reminder of how much statistics can inform our lives. Do you have an example of how statistics has made a difference in your life? Share your story with the Community!
I ended my previous blog post at the point in my JMP Discovery Summit project when I realized the extent of food item name redundancy across my nearly four years of food logs collected with the BodyMedia® Activity Manager app. While I knew I had eaten differently prepared varieties of certain foods, the replication was also an artifact of using keyword searches to locate the right items to add to my food log. The keyword I used to search for a given item varied, and the matching item that I chose to log at a given meal also varied, so I often selected different item names for highly similar foods.
Ultimately, I wanted to summarize the number of calories I ate from related items and also total up calories eaten by food category. A sensible first step was to reduce the number of redundant food item names in my data table. I wanted the food item recoding process to be as easy as possible, and of course, reproducible through scripting so I would be able to process new data with minimal work.
I explored using the JMP 11 Recode platform to consolidate similar food item names into a single cleaned value. Before I started recoding, my data table contained 1,859 unique food item names. Since food items names were displayed in the Recode window in alphabetical order, I found it challenging to locate similar food items that were not listed alphabetically. For example, if I wanted to rename nearly identical items listed under different brand names, I had to first locate all the related items scattered throughout my item list (e.g., "CHIPS AHOY! Chewy Chocolate Chip Cookies," "Cookie, Chocolate Chip, Commercial, 12%-17% Fat," "Jason's Deli Chocolate Chip Cookie," "PILLSBURY Chocolate Chip Cookies, Refrigerated Dough") and rename them to a common cleaned value (e.g., "Cookie, Chocolate Chip"). To locate all related items, I searched the data table using the Find function or used Find under the Data Filter red triangle menu.
Once I located all the related items, I scrolled to their location in the Recode window and pasted in the cleaned item name. At one point as I worked through my data set, I accidentally closed the Recode window without saving my changes. Instead of repeating my work, I decided to explore an alternative strategy that I hoped would allow me to classify my items more quickly and easily assign new items to my food groupings.
I used the Free Text feature (found on the Multiple tab of the JMP Categorical platform launch dialog) to extract the list of unique words from my food item names. I reviewed the list to remove common or non-specific words and placed the remaining words into food categories. Then, I used a JSL loop to scan for these keywords in food item names using the PatMatch function in JMP. If I found a keyword, I added that word’s category to a comma-delimited list in a column saved with a Multiple Response column property.
While initially I thought this approach would make it simpler to classify new items, it turned out to be time-consuming for my script to search all items for all keywords. It took even longer for me to review all the classified food items and verify that they had been placed into sensible categories based on the keywords they contained. As I examined my processed table, I was dismayed to note many non-specific keyword matches. In one example, "Chicken of the Sea Chunk Light Tuna" matched both Meat (keyword: chicken) and Fish (keyword: fish) food categories. Coffeemate Non-Dairy Creamer included the keyword coffee, causing it to be incorrectly assigned to the CoffeeMilk group. I realized that I would need to fix some of the original names before the pattern match and clean up other category lists after the match. Since I needed to reproduce each step through scripting, I would need to write custom JSL or generate data cleaning JSL with Recode -- so I decided to go back to my original Recode strategy.
Right around that time, newly hired JMP developer James Preiss began to revamp Recode for JMP 12. I shared my food log Recode use case with James and many of my challenges lined up with customer requests already on his to-do list. As soon as updates to Recode began to surface in JMP 12 daily builds, I tested them with my food log files and shared a subset of my item list with James and Recode tester Rosemary Lucas. I was thrilled to see that many of the steps I did manually in JMP 11 with a combination of Recode, Data Filter, Find/Replace and JSL scripting are integrated into Recode in JMP 12.
In fact, long before the Recode platform updates were complete, I was able to create a table of cleaned, grouped item names from my food item list, in far less time than I had spent trying to script around the keyword matching problem. I then added categories for each cleaned food item name and merged them into my food log data table by joining on the original item name. Using Recode helped me cut the original number of unique names in my table (1,859) in half! Now, when I import new food log files, I return briefly to Recode to classify any new items, update my item name/category table, merge it with my data, and I am ready to proceed.
JMP 12 won’t be out till March 2015, so I’ll admit I am being purposefully vague about the many new features in the Recode platform. I love the Recode updates, and I know you will too! (Look for more detailed blog posts about Recode as March approaches and after the software is available.)
In my next blog posts, I will introduce some of the graphs I created for my Discovery Summit poster and show how I improved them with the help of Xan Gregg, creator of the Graph Builder platform and leader of the Data Discovery group at JMP.
For more background on my poster and my interests in quantified self (QS) data analysis, check out the first blog post in this series. Subsequent posts share details about how I exported my Excel-formatted Activity Summary files and Food Log files from the BodyMedia® Activity Manager software and imported them into JMP. I used custom JSL scripts to create two JMP data tables, one with 1,316 rows of activity data and the other with 34,432 rows of food items logged over nearly four years. I wrote a JMP add-in supporting these data types and CSV-formatted food log files from the free MyFitnessPal website.