JMP 13 Preview: Now you can “textcavate” your data with the new Text Explorer
Sep 2, 2016 10:22 AM
| Last Modified: Dec 6, 2016 7:31 AM
JMP users might notice that new versions of the software often bring the ability to support new kinds of data. The ability to incorporate image data came with JMP 12, and with JMP 13 comes support for text data.
In the early days of this platform’s development, we were brainstorming ideas for what to name it. I proposed “Textcavator” as the platform would help you dig a little deeper and expose more value in your text data. Text Explorer is much clearer and better reflects what the platform does, and fittingly, the name came from someone who greatly influenced its development — read on to find out who that is.
“Our customers have a lot of data in verbatim fields — in surveys, repair records, incident reports, etc.— and they have no means to analyze this data. A great deal of effort goes into collecting this text data, and we wanted to include a basic facility so our customers could derive more value from their text data,” explains John Sall, SAS co-founder and Executive VP, and chief architect of JMP.
Customers said they wanted to not only digest text data, but also interact with and explore it so they could see important phrases, create new columns and new graphics. Early adopters of JMP 13 were impressed with the speed, the interaction and the graphics, and they liked that it provided a rich facility for specifying regular expressions, like part numbers and failure codes.
Its use in education was also important to John. “We felt that Text Explorer was an important element of our data mining methods, as it’s commonly taught in data mining courses,” John says.
The platform includes standard features you would expect: word frequencies, word clouds (either count ordered or centered) and stemming to handle word endings. In addition, the platform includes these features:
Very good regular expression facility with a library of pre-defined as well as user-defined regular expressions.
Multi-word phrase detection and their use as tokens.
Built-in Recode to consolidate multiple terms.
Highly interactive command set (e.g., show the text containing phrases to easily see context).
Good performance (for single machine).
With just a click on a phrase or term in the list, you can see how the phrases or terms are used in context.
This new platform was a team effort, with seven main developers:
Chris Gotwalt: text analytics* including SVD, LCA and sparse matrix
John Sall: other text analytics* and some user interface (UI) work
James Preiss (who is now attending graduate school): Recode and text data management
Because the Text Explorer platform is a basic text exploration facility, it has no understanding of vocabulary, parts of speech or syntax, has no spelling correction, and does not do sentiment analysis. The non-language specific approach that is based on the document term matrix and derived methods like the SVD and LCA is generally called a “bag-of-words” approach in that the order of the words is ignored, and only their presence and our count in the documents is analyzed. It is quite easy to use with very limited customization features to worry about (no ontologies). As the focus is to enable users to gain more value from verbatim fields, there are no extensive tools to access text in various file formats (e.g., PDF, word processing documents or web crawlers).
You can learn more about the new Text Explorer platform and see it in action in an Analytically Speaking webcast with Adsurgo co-founder, Heath Rushing, who was very influential in the development of Text Explorer — and he is the one we have to thank for the name, Text Explorer (thank you, Heath!).
Heath also shares two more extensive demos in a Technically Speaking webcast. And along with James Wisnowski, Heath is presenting a session titled “Mind the Gap: JMP on the Text Explorer Express” in a few weeks at Discovery Summit. Developers and other early-adopter customers will be presenting more about text exploration at Discovery Summit, including a tutorial by Chris Gotwalt titled, “The U-to-the-V: A Hitchhiker’s Guide to JMP 13 Text Explorer.”
For more information on what's coming in JMP 13, visit the preview page at our website.
* We will explore the text analytics capabilities in JMP Pro 13 in a future post.