JMPer Cable

anne_milley · Feb 26, 2018 09:18 AM

Data preparation gets even easier in the new version of JMP.

A 2014 New York Times article, For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights, seems to have generated a lot of interest in the title “data janitor.” There is even a Wikipedia page for it. Whatever you call it: Data janitoring, data wrangling or simply data cleanup, it's sadly pretty descriptive of the drudgery often associated with data preparation for analysis. In the same way the amazing late Hans Rosling described how the "magic" washing machine freed more time for his mother to pursue other things in his The Magic Washing Machine TED Talk, aren't there some innovative ways to automate "data washing"? Shouldn’t we expect a little less data drudgery and a little more time to pursue other things in this era of automation? We think so.

JMP has a long history of easing the burden of data cleanup work. JMP 14 contributes more data wrangling features and improvements to speed your data discoveries and get to the "good stuff" of data analysis faster — visualization, modeling and building dashboards and reports to sharing your findings. After getting JMP 14 next month, be sure to explore some of these must-see data cleanup time savers.

Multiple-file import

The new multiple-file import capability allows you to load dozens or hundreds of files and concatenate them into a single JMP data table — all without scripting. You can filter files by size, date, name and type. A common use-case for this is concerting a folder files that you might want to perform text exploration on — a folder full of repair transcripts for example, with one repair per file. The import that I want to accomplish is map each document to a cell in a JMP data table as a pre-processing step before text analysis. With mutliple file import, you can choose to import to single column making this once time-consuming data pre-processing step easy. Jason Brinkley, Senior Associate at Abt Associates, participated in our JMP 14 early adopter program and was happy about the time saved with this new feature:

“I recently had a big project where someone had downloaded over 150 data files scrapped together from the web in batches. These files were big and unstructured, but worst of all were downloaded at two levels (individual and group levels). There was no way to know which files had data at each level from the file names. The 'Multiple File Import' found in JMP 14 was a huge time saver in that it imported all the data, skipping the nuisance info at the top of the data files (website info and web page headers) and automatically stacked the files into the individual and group levels. The processing took about 30 minutes but in the end, I had two files that were easy to clean and make ready for analysis. I feel like this particular effort would have required significant coding in other platforms.”

Recode

Recode in the Cols menu has been around for a long time, but it continues to be improved in every new version of JMP. JMP 14 brings even more power to wrangle categorical and numeric data with enhancements like improved Text Parsing, Recode results column import, Recode numeric continuous columns, support for multiple-response columns, replace string (with or without custom regular expression parsing) and more.

I often look at titles of people who register for JMP events to see what kinds of scientists, engineers, researchers, and other analysts these events attract. The support for multiple response columns in Recode is especially nice when looking at titles with credentials. If you had a column like the one below and wanted to separate the credentials, it involved many steps — text to columns, stacking, recoding, splitting, combining, and then pasting. Now, you can just designate the column as a multiple response modeling type, recode, and you are able to perform the data cleanup tasks with ease.

Before designating the column’s Modeling Type as Multiple Response.

Change from Nominal to Multiple Response for Modeling Type.

Screen Shot 2018-02-22 at 9.49.19 AM.png

Fewer categories after recoding the credentials as multiple response.

Still want more? More data prep help

There are many other additions in JMP 14 to lessen the number of clicks and streamline the discovery process, such as:

Virtual join improvements that offer access to data in one table by another table with just a few clicks.
New column utilities that make it easier to create and work with Value Labels.
Ease of finding duplicate rows.
Other workflow improvements like preference searching and filtering and the ability to preview results in the Formula Editor.

We hope these highlights pique your curiosity about how JMP 14 can make your data janitoring more enjoyable — and give you more time to pursue other things.