JMP 13 introduced a host of exciting new features to JMP customers. One of the most highly anticipated features was the Text Explorer platform, which allows analysts to understand the structure of text data. The buzz around Text Explorer is certainly justified as the platform has the potential to reveal objective insight from a multitude of text entries that would otherwise be impossible for an individual to detect (especially when the analysis involves thousands or millions of text sources).
As JMP customers continue to take advantage of Text Explorer, the JMP development team has made efforts to simplify the use of the platform and improve interpretability of results. These improvements focused specifically in Latent Semantic Analysis and Topic Analysis and are available in JMP Pro 13.2. In this post, I outline these efforts and emphasize how the changes facilitate analysts' use and interpretation of the Text Explorer platform.
Latent Semantic Analysis and Topic Analysis are two approaches for analyzing text. Previously, these analyses were carried out in JMP through a singular value decomposition (SVD) of the “Centered and Scaled,” “Centered,” or “Uncentered” document term matrix (DTM). The DTM is a matrix derived from your text data, which summarizes the number of terms within and across text documents. The SVD on the DTM led to insightful results, allowing JMP Pro users to identify key dimensions (aka singular vectors) underlying their text data and values that describe the meaning of such dimensions (aka term or document singular vectors).
A key improvement in the platform – implemented in 13.2 – consisted of doing the SVD of the “Centered and Scaled” or “Centered” DTM divided by N-1 (N = total number of rows in DTM), or just N if users select the “Uncentered” scaling option. Importantly, dividing the DTM by a constant (N-1 or N) doesn’t change the pattern of results in the platform, but it does make the resulting estimates much more interpretable.
Thanks to the DTM being divided by N-1 or N, Latent Semantic Analysis is now identical to Principal Components Analysis (PCA). Similarly, Topic Analysis is now identical to a Rotated PCA (see Figure 2 for select output from both platforms). Thus, Term Singular Vectors are now identical to Loadings in a PCA context, and are consequently labeled “Term Topic Loadings.” When users select “Centered and Scaled” as the scaling option in Text Explorer (which is the default), results are identical to a PCA of a correlation matrix. Similarly, when “Centered” is chosen, the results parallel those of PCA of a covariance matrix, and when “Uncentered” is selected, the results will be just like PCA of an unscaled matrix.
Are you familiar with PCA? You can now translate everything you know about PCA into text analysis. Here are some examples:
To further facilitate the translation between Latent Semantic Analysis/Topic Analysis and PCA/Rotated PCA, I created the following illustration, which I call the Rosetta stone of Text Explorer (as in the Egyptian stone that helped historians understand hieroglyphs). I include panels for the Text Explorer and PCA platforms with keywords used in reports that have equivalent meaning, and I include a panel for Factor Analysis because a special case of Factor Analysis is identical to Rotated PCA (the case of communalities == 1, see my post on this topic).
Figure 3 helps one see how the lingo from one platform relates to the jargon of the other.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.