Scott Wise's Blog

scwise · May 9, 2018 01:20 AM

Seems like one of the downsides in our world of fast internet news is that it is often hard to separate the real content from the fake. Any new sensational story will seemingly generate a ton of blogs, articles, and tweets with differing opinions. Wouldn’t it be nice if there was a tool that actually sorts through all this noise and helps us find the truth? While no “magic eight-ball” exists that can instantly tell us what to look at, there are newer analytic methods that may be of help. Text Exploration is the ability to find patterns in bodies of sentences, comments and text. Let’s see if this method can help us find the truth in a new story that caused quite a stir in my part of the world!

One of the signs of spring time in Central Texas is to enjoy the wildflowers that bloom in colorful bunches along our fields and roadsides. Of particular importance is seeing how plentiful the Bluebonnets are, the most beautiful of the wildflowers that is our official state flower. As a loyal Texan you are seemingly honor bound every spring to drag your family out and take pictures of them in a Bluebonnet patch. However, recently an ominous picture (see above) was posted on the Texas Hill Country Face Book Page and spread like wild fire throughout the internet. This picture's description warned of the discovery of a rattlesnake that had evolved by taking on blue colors and hiding out in the Bluebonnets! Dubbed the “Texas Bluebonnet Rattlesnake”, this caused panic and generated many varied opinions on the internet ranging from what scientific name to call this new rattlesnake to even what to do if you fell victim to a snake bite while taking you next Bluebonnet pictures!

To see this if this new method can help find some truth, we started by copying the full body of text from nine random articles and copied them directly into a JMP datatable column. Then we used the JMP Text Explorer to find those words and phrases that occur the most often in all these write-ups. To do this the Text Explorer first runs all this text through a library of known generalized expressions (often called Tokenizing) to first remove unhelpful text from the analysis (Ex: punctuations such as period and commas, connecting words like “and” or “but”). Text Exploration can also combine like words (often called Stemming) (such as grouping “fail, fails, failed, failing” under the one root word “fail”). This leaves us with a List of Terms and Phrases as seen below that contains 539 total items (see below for a snapshot of a proportion of the Term and Phrase List)

Then we can edit and shorten this Term list by removing further unwanted ones (called adding Stop Words) and combining others that logically belong together. Combining words like “found/discovered” through Recoding and adding helpful Phrases, such as “april fool’s” helped us create the new Term and Phrase list below that contains a reduced amount of 16 total items. (See below for the reduced Term and Phase List)

Lastly, we will make the analysis better by visualizing the results in a graphical layout called a Word Cloud. This is a version of an “ordered layout” where the largest terms are listed first and given bigger font sizes. Some large terms are not too much of a surprise, as we would expect all the articles to contain the words “rattlesnake, bluebonnet and photo.” But going down list in our Word Cloud, we run into larger instances of words indicating that this story might be fake (see “april fool’s, fake/hoax/joke/prank” in the word cloud below) before we run into words indicating the story is real (see “real/fact/true” in the word cloud below).

In fact, if we highlight the words/phrases in the graphic (“april fool’s, fake/hoax/joke/prank”, and “myth/debunk/gotcha”), we can then go directly to the articles which contain these text elements. Using this Show Text feature, we were able to focus our attention on just the articles we need to see to uncover the truth.

This lead us to the text from one of the nine articles https://www.hoaxorfact.com/Pranks/texas-blue-bonnet-rattlesnake-new-species-discovered-facts.html that not only reported this as a hoax, but also further found that the picture was actually a digitally altered one of a typical Western Diamondback Rattlesnake that is available over Wikipedia.org. Looking at a comparison of the pictures below, we are now pretty sure that this was just a good April Fool’s joke, and we should net fear taking our pictures in the bluebonnet patch!

As we can see, Text Exploration is a very flexible and powerful tool that adds more information to our understanding of information! Could it be applied to a larger application, say to the world of politics where we might be able to use this approach to sort through the fake news? Possibly, but we would have to hope that the words leading to truth are reported with more frequency than those that mislead or give false information. So perhaps this gives more importance for the need to challenge what we hear and always seek/report the real truth. Else we will all be easily mislead as Steve Earl warns in his classic song “Snake Oil”:

“Ladies and gentlemen, attention please,

Come in close so everyone can see ,

I got a tale to tell,

A listen don't cost a dime,

And if you believe that we're gonna get along just fine”