Visualizing and Exploring Text Data

Learn more in our free online course:
Statistical Thinking for Industrial Problem Solving

In this video, you learn how to visualize and explore unstructured text data using the file Pet Owner.jmp.

To analyze these data, we select Text Explorer from the Analyze menu in JMP.

We select Survey Response as the text column, and click OK to run the analysis using the default tokenizing, terming, and phrasing options.

The red triangle in the Text Explorer provides many options for displaying, terming, and parsing (or tokenizing) the text data. This enables you to modify the options used for curating the term list from within the platform.

We’ll select Stem for Combining from the red triangle under Term Options and then Stemming.

We can see that "dog" and "cat" are the most frequently used terms. For now, we’ll keep them in the term list. We’ll also add the top phrases to the term list.

To visualize these data, we’ll use a word cloud. To do this, we click the red triangle and select Display Options, and then Show Word Cloud.

The initial word cloud isn’t very interesting. We’ll change it to a centered layout. To do this, we click the red triangle for Word Cloud, select Layout, and then select Centered.

Notice that the word cloud is dominated by the words "dog" and "cat". To remove these terms from both the word cloud and the term list, we select the terms, right-click, and then select Add Stop Word.

Let’s add a little color. To do this, we click the red triangle for Word Cloud, and select coloring and then Arbitrary Color. This applies random colors to the terms in the word cloud. We’ll click the red triangle and deselect Show Legend to remove the legend.

The words are sized relative to the occurrence in the term list. So the top terms are "bark", "walk", "jump", and "video".

How is the word "protect" used in the survey responses? To see this, we’ll right-click the term "protect" and select Show Text.

You can see that the word "protect" is generally used to describe dogs being protective in some way.

Let’s explore these data further.

This is a survey of cat and dog owners. It is likely that different terms are used to describe cats and dogs. To explore this, we select Local Data Filter from the top red triangle. Then we select Owner and then Add.

When we click cat in the local data filter, you can see that cat owners used words like "jump", "video", "mice", "purr", and "catch".

Dog owners used words like "bark", "walk", "protect", "take", and "huskies".

Not many of those surveyed own both a dog and a cat. But you can see that the most frequently used term is "chase". This is not surprising given the typical relationship between cats and dogs.

Let’s look at these data one more way. First, we’ll click Clear to clear the data filter.

Next, we’ll save indicators for the most frequently used terms to the data table. To do this, we select the top terms in the term list, right-click, and select Save Indicators.

This saves new columns to the data table for each term. If the document includes the term, the column is populated with a 1. If it doesn’t, there is a 0.

Now we can explore these terms outside the Text Explorer platform.

For example, we can use Graph Builder from the Graph menu to graph Owner versus each of the indicator variables using a mosaic plot.

This can provide additional insights into how the different terms are used in the survey.

Learning Resources

Recommended Articles