I posted this entry on the main JMP blog today. Jeff suggested it would be helpful to share this content on my community blog too, so that it is searchable in here as well. I have been using the revamped JMP 12 Recode platform quite a lot since its early days and found it to be helpful for so many of my data cleanup projects. But starting as an early user had its disadvantages...to find out why, read on!
I recently used a JMP add-in I wrote to import my complete set of BodyMedia FIT food log data files, including data from Dec. 21, 2010, through the last day I logged my meals in that software on March 29, 2015. My final data table contained 39,942 rows of food items names. When combined with the 60 days (1,551 rows) of data from my MyFitnessPal food log I have been keeping since I switched devices from BodyMedia to a FitBit Charge HR late last March, I have nearly 41,000 rows of food log data!
One essential step in my data preparation process for this data table has been to clean up the food item names by consolidating similar items under a single value. Without this cleanup step, I end up with lots of unique item names that really represent the same food item. For example, I ate a variety of different dark chocolates, and indicated the correct brand names where available, but often had to substitute as many specific items were not available in the BodyMedia database. This made exact names even less meaningful, so I felt it made sense to aggregate all varieties under a single name (“Candy Bar, Dark Chocolate”).
I cleaned up the food log data table that I had presented at Discovery Summit last fall using the Recode platform in an early adopter install of JMP 12. At the time, Recode lacked the new Save/Load Script options that I now rely on for all my data cleanup projects. Instead, I recoded my items and created a lookup table that listed the original and recoded names for each unique item that appeared in my food log. I updated this lookup table each time I added new items to my food log. A section of my lookup table is shown below.
Before I left to attend the QS15 conference in San Francisco last month, I took advantage of JMP formula columns to create a script that could be reloaded into the Recode dialog to recapture my item groupings so that I could update them. This made it simple to recode new food item names and save the updated script in case I needed to tweak my work in the future.
Since I found this script creation trick so useful, I thought I would share it. If you have used a similar lookup table approach in the past, you may be thinking it would be a lot of work to recreate your approach in Recode. Using this example, you too can transition from a lookup table to a JMP 12 Recode script that you can reload in the Recode dialog and update to accommodate new data. In the end, you can easily recreate a new version of your lookup table from your final data table containing the original and recoded values.
I needed to structure my script to be identical to a standard Recode script, so I opened one I had saved for a related project. To save a sample Recode script, you can create some groups in an open Recode dialog window, then click on the platform’s red triangle and choose Script > Save to File.
I opened the JSL file containing the Recode script, and observed that it began with a call to begin updating the table, and a set of match statements that paired the original item name with the recoded name.
At the end of the script, it included the column name and end to the data update.
To fill in the list of paired original and recoded names that fell between the beginning and ending sections, I created a new formula column in my lookup table containing quoted versions of the original item name and the recoded item name.
I copied and pasted the formula column into a script window, and then added the correct statements and variable names at the top and bottom of the script.
When I reloaded it in the Recode dialog, my script created groupings like this:
I could then add to existing groups, make new ones or edit the representative group names. I used a similar approach to create a version of my script for recoding items into food groupings, since I had included food groups in my lookup table and wanted to group my new items.
One gotcha I encountered was that one food item name included a double quote character (“) that I had to replace so that it didn’t interfere with the quoted strings of the item names.
I hope you'll try out the JMP 12 Recode platform if you haven't already. Whether you are starting a new project or converting an old one to use the platform, I think you will be pleased! If you're already using it, I'd be interested to hear about the kind of projects you are working on!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.