JMP Blog

Ross_Metusalem · Sep 30, 2020 02:50 PM

Suppose you’re thinking about streaming a new movie. You read a synopsis: “The ruler of the future tells best friends Bill and Ted they must compose a new song to save life as we know it. But instead of writing it, they decide to travel through time to steal it from their older selves.” While that does sound most excellent, you read some viewer comments, too. “Lots of fun.” “The perfect movie for where we are as a people.” “Very disappointed with the weak plot.” “Please, no more.” And so forth. You weigh the positive comments against the negative and decide to watch Bill and Ted Face the Music. (How much unwatched content do you really have left on Netflix at this point?)

We read viewer comments in situations like this because they tell us about people’s opinions, attitudes or feelings – things commonly called sentiment – often in relatively few words. Just a single word or phrase like fun, perfect movie or exceptionally poor can convey sentiment quite effectively, and knowing people’s sentiment can be very important. Do customers like our new product? What does the Twitterverse think about the latest news? Which provisions of the new law are most or least popular? Sentiment is at the core of the answer, and language helps us gauge it.

Simply reading a text is the best way to gauge sentiment. But what happens when there are hundreds, thousands, or even millions of texts to read? We need computers for that. Computers can’t comprehend language like humans do, but they can search for words or phrases that we believe convey sentiment, score texts based on those sentiment words, and then present the scores back to us in helpful ways. This can give us a large-scale picture of sentiment that we couldn’t feasibly get otherwise.

This is the essence of Sentiment Analysis, a text analytic technique in JMP Pro 16. Let’s see how it works.

Do You Like Coffee Pods?

We’ll spare you a deep dive on the latest Bill and Ted film in favor of a perhaps more everyday example: coffee pods, those little capsules used in single-serving coffee machines sometimes found in breakrooms, at continental breakfasts, or in some people’s kitchens. What do consumers think of coffee pods?

We’ll analyze 9,133 coffee pod reviews on Amazon taken from a larger data set of Amazon reviews found here. To avoid controversy, we’ll collectively analyze reviews of many different brands’ pods and won’t single out any specific products. In practice, though, we’d want to restrict the domain of our analysis tightly, perhaps analyzing reviews of just one of our own products or one of our competitor’s, which would be more informative than looking at all coffee pods. We’d also be sure to exclude reviews of related things like coffee machines or accessories, or reviews focused on secondary issues like the packaging or delivery service, because what expresses positive sentiment in one domain (delivery was cheap) might not in another (the coffee tasted cheap).

Now before we go on, I hear some readers asking, “Why not just look at the star reviews?” Of course, we could and, in practice, would look at the star reviews, but (a) text is a much richer source of information and (b) in many situations (e.g., posts in public forums) text is the only information available. We’ll ignore the star reviews for now to keep the focus on Sentiment Analysis.

Sentiment Analysis is coming in JMP Pro’s Text Explorer platform, and it’s pretty easy to execute. Just select the Sentiment Analysis option, and JMP automatically finds sentiment words, scores each review, and produces a variety of outputs for us to fine-tune the analysis and explore the results.

Launching Sentiment Analysis

Fine-Tuning the Analysis

Let’s take a closer look at that output, starting with a few individual reviews to see how sentiment scores are calculated and, importantly, why and how we fine-tune our analysis.

Review #4297 is relatively positive, saying things like excellent, convenient and satisfied and scoring +60 overall (on a -100 to 100 scale, where zero is neutral sentiment). How did JMP Pro calculate this score? It consulted a built-in list of sentiment words and associated scores (a sentiment dictionary) and averaged the three sentiment words’ scores together. Pretty straightforward. (If preferred, a min-max scoring option calculates the overall score as the difference between the most positive and most negative words in the comment.)

Review #4297 is relatively positive

The algorithm that parses the text automatically handles negators (e.g., no) and intensifiers (e.g., very). Review #6034 is a nice example. The word good gets a baseline score of +60 by default, and that score increases to +84 when preceded by the intensifier really but becomes -60 when negated by not.

JMP Pro adjusts sentiment words' scores in the presence of negators or intensifiers

JMP Pro’s built-in sentiment dictionary is completely customizable, and it’s wise to tailor it to your domain. Look at Review #910 below. Nice and good are obvious sentiment words, but what about too strong, which is clearly negative in the domain of coffee pods? Recognizing that many words’ sentiments are domain-specific (e.g., rock solid might be positive for a car but negative for a mattress), JMP Pro’s built-in dictionary makes relatively few assumptions, letting you add to, subtract from, and modify the dictionary to tailor the analysis to your domain.

You can tailor the analysis to include domain-specific sentiment words or phrases like "too strong"

Customizing the dictionary is simple. To add too strong as a negative phrase, we simply highlight the phrase and click a button to assign it a score, here -30. The full analysis automatically updates, finding every instance of too strong and scoring it accordingly. This brought Review #910’s score down from +43 to +18, which seems an improvement.

Adding a new sentiment phrase is pretty painless

There are a variety of additional tools for customizing the analysis, including a list of potential sentiment words you might want to add to the dictionary (here, it automatically flags words like strong, weak, bitter, expensive), the ability to modify word scores however you like, customization of negators and intensifiers, and more. Sentiment Analysis should always be tailored to the particular context in which it is applied, so if you end up using JMP Pro 16 for Sentiment Analysis, you’ll want to get familiar with these tools.

Exploring the Results

What can we find in these data once we’re done tailoring the analysis? First, the summary shows that sentiments skew relatively positive, with a mean score of +39.5. There are 6,856 net positive reviews with a mean score of +56.9; 1,277 net negative with a mean score of -38.7. It looks like people are relatively positive about their coffee pods, and the average positive review is more positive than the average negative review is negative. While we’re about to see a number of tools for diving into these scores in more detail, it’s worth noting that all reviews’ sentiment scores can be saved back to the data table, making them available for visualization and analysis in other tools.

A summary table provides an aggregated view of positive and negative sentiment

The accompaying histogram visualizes the distribution of sentiment scores

There are some different ways we might want to drill down into the results further. First, which sentiment terms are appearing most frequently? Looking down the list, we see the most frequent words are generally positive, with words like great, bold and strong ranking highly. If we scroll down, we find the most frequent negative words are weak and bitter. If we click on one of these, we can see its usage in context to get a deeper understanding of people’s complaints.

Exploring comments by frequent sentiment words can be highly informative

After exploring by most frequent sentiment words, we might be interested in specific features of coffee pods. That is, instead of looking at coffee pods holistically, let’s zero in on something specific like flavor. If we look to the Feature Finder, JMP Pro provides a list of words that often co-occur with sentiment terms, so we can explore sentiment related to just those words. Here, we select flavor from the list to see what people are saying about it. We can even click Score Selected Features to ask JMP Pro to rescore every comment specific to flavor only. This is a great way to drill down into a subdomain you’re most interested in.

The Feature Finder drills down into sentiments related to specific features like coffee flavor

Sentiment Analysis in JMP Pro 16

Sentiment Analysis, coming in JMP Pro 16, enables you to gauge how positively or negatively people feel about something, based on their written comments. Because the analysis should always be tailored to the specific context, the tool includes a variety of features to easily customize the sentiment dictionary and scoring system as you explore the data.

Once the analysis is fully tailored, interactive tools enable you to drill down into the results in a variety of ways to answer whatever specific questions you have. In our case, it looks like people are mostly positive about the coffee pods they get on Amazon, and they’re particularly fans of coffee that’s bold and strong rather than weak or bitter.

Want to uncover people’s feelings, opinions or attitudes? Gather some comments, turn Sentiment Analysis loose on them, and see what answers you find.

bob_lamphier · ‎10-09-2020

@Ross_Metusalem Your write-up on sentiment analysis is "Too Strong". And in this case, I would score that at least +60!

Ross_Metusalem · ‎10-09-2020

@bob_lamphier, I'll take it!

PCS · ‎02-21-2021

Nice blog. Look forward to the version 16!

Ross_Metusalem · ‎02-22-2021

Thanks, @PCS! We're excited to roll JMP 16 out in the very near future.

matthias_bruchh · ‎02-26-2021

Interesting feature.

Just a curiosity about the score calculation: Does this take the order the words into account?

E.g. "really not good" and "not really good" mean something different. Would these expression get different scores?

Ross_Metusalem · ‎02-26-2021

@matthias_bruchh, the scores in these two examples will be different. Negators like "not" are disallowed in the middle of sentiment phrases, so that "not really good" gets intensified and then negated while "really not good" gets negated but not intensified. This reflects one of the many rules that have been built into the system intended to handle the often messy phrase and sentence structures in natural language. What's nice: If you encounter a phrase like "really not good" and want it to have a particular score other than what the algorithm has assigned, you can just highlight it and assign your own score to override the algorithm (as in the "too strong" example above). The user ultimately gets the final say.

matthias_bruchh · ‎03-01-2021

@Ross_MetusalemI see, thanks.