Sep 9, 2016 10:14 AM
| Last Modified: Aug 21, 2017 8:18 AM
Building on the new features in JMP 13 for exploring unstructured text data, JMP Pro 13 enables you to do more with text data, like cluster terms and phrases and use text in predictive models. You’ll be able to answer more questions, scale to larger data and stay in flow. Because organizations collect so much text data, JMP now provides visual, interactive, and easy-to-use capabilities to analyze all that text data and make valuable use of it.
As Chris Gotwalt, Director of Statistical R&D at JMP, explains, some of the capabilities required for text analysis are analogous to those required for tabular data so that adding text analytics made sense for the product. “Text analytics is like general multivariate analysis. Topic analysis is like factor analysis. Singular value decomposition (SVD) is like principal component analysis, but these algorithms need to be fast enough to be useful on text data,” Chris says.
While SVDs are a standard means for dealing with the high dimensionality that is typical of text data in most text mining software, the challenge is to do it quickly so the user’s analysis “flow” is not disrupted. Chris has implemented not only a super-fast sparse Lanczos SVD, but also designed it so that it handles messy data well and yields more meaningful factors. This SVD implementation also supports topic analysis.
The "Show Text" option allows you to see text associated with a single data point or the text in common for several selected data points. Here it's surfaced in the SVD plot.
In addition, JMP Pro 13 also includes latent class analysis (LCA), useful for another kind of topic analysis (as multinomial mixtures) as well as for clustering text data. This LCA clustering approach customized for applications within Text Explorer allows for overlapping cluster membership probabilities for each document and takes advantage of sparse data to calculate fast summaries to show where high factor loadings are, which is important when dealing with ultra-wide data so typical of text data.
Latent Class Analysis clustering allows for overlapping cluster membership probabilities for each document.
JMP Pro provides text scoring with SVD scores, but also saves the formula to calculate SVD scores for all analyses (any variable, scoring matrices, parses all tokens). You can also save the document-term-matrix, SVD and LCA scores as inputs to other analyses, such as predictive models.
And of course, these implementations are integrated and interactive as you would expect them to be in JMP, with new graphics to visualize and further explore findings.
Heath Rushing, co-founder of Adsurgo, is a fan. "I have used many text mining tools. In terms of ease of use, Text Explorer is the best of the breed. You can efficiently clean unstructured data, visualize relationships, find major themes and group documents. Brilliant!" Heath says.
Whenever Chris has shown these text analytics additions, he runs out of time because the audience asks so many questions like, “Does it do this or that?” or “Can I use this to analyze my survey, maintenance log, web data, etc.?”
“There is a lot of excitement when people see the platform. Upon first sight, they start thinking of all the new things they could do with the text data that they have always had lying around but could never take advantage of before,” Chris says.
John Sall, chief architect of JMP, also worked on the new text analytics features and enjoyed it. “It’s been fun to work on a new area, supporting one more form of data from which users can derive value,” John says.
To learn more about the new Text Explorer platform, watch the Analytically Speaking interview with Adsurgo co-founder, Heath Rushing. Heath was very influential in the development of Text Explorer, including naming the new platform.