Choose Language Hide Translation Bar
Walking through The Walking Dead with text exploration

Walker, Biter, The Governor, Negan, JSS, Terminus – if these words raise your blood pressure like they do mine, then you’re probably just as big of a fan of the TV show The Walking Dead as I am.

And if you’re a fan of the show but haven’t made it through season seven yet, then now would be a good time to stop reading this post because there will be spoilers. You've been warned!

The Walking Dead is a TV adaptation of a popular comic book series about sheriff deputy Rick Grimes leading a group of survivors through the zombie apocalypse. While the zombies cause plenty of problems, we quickly learn that rival groups of survivors are the biggest threat to the well-being of Rick and his entourage.

With the eighth season of the show about to start, I got excited to find and analyze some data related to The Walking Dead. As I do anytime I want to learn about something, I headed over to Wikipedia and skimmed over the entry for the show. Using the Internet Open feature in JMP and a handful of table merges, I was able to put together a data set about The Walking Dead without much effort. My data set includes information like number of viewers, writer, director, and a short description (a paragraph or so) for each episode. The figure below gives you an idea of what my final table looks like:


A sample of what the data look likeA sample of what the data look like


*****Spoiler alert! This is your last chance to stop reading before there are serious spoilers!*****


The first thing I wanted to look at was a summary of the episode descriptions. The Text Explorer platform was added in JMP 13, and it has quickly become one of my favorite parts of JMP. Text Explorer is a great way to quickly start exploring your unstructured text data. After feeding the description of each episode into Text Explorer, the first thing I like to do is look at the list of most common terms and phrases.


A list of common words and phrases in episode descriptionsA list of common words and phrases in episode descriptions


The list of terms shouldn’t be too surprising for fans of the show. It is more or less a who’s who of characters in the show. Rick is Rick Grimes, the main character of the show. We see some of the other main characters like Darryl, Carol, the Governor, and Michonne. “Walker” is the term that many of the characters use to refer to zombies, so no surprise that it is near the top of the list.

The list of popular phrases reminds us of some of the major storylines in the show. Rick and his group of survivors struggle with Negan and the Saviors, encounter hordes (and herds!) of walkers, and try to protect their compound in Alexandria. We’re also reminded of the power struggle between Rick and Shane and the romance between Glenn and Maggie. As we can see here, even just looking at the list of common terms and phrases can give us a lot of information about the descriptions of each episode.

Text Explorer provides another useful tool for looking for trends in  our text data – Topic Analysis. My colleague Laura provided a nice explanation of topic analysis in a recent blog post. Looking at the first 10 topics contained in the descriptions of Walking Dead episodes, we can clearly make out a few of the major storylines that have unfolded so far. The first topic seems to come from the third season, where Rick and his group are inhabiting a prison and have to defend themselves against the nearby town of Woodbury led by The Governor. After the prison is destroyed, Rick and his group are split up and eventually reconvene in Terminus, which we see in Topic 5. We could go through topic by topic (and I encourage you to do so in the comments!), but I think that it is sufficient to say that as both a statistician and a fan of the show, I thought it was cool how well the topic analysis was able to recover the story arcs.


Topic Analysis of episode descriptionsTopic Analysis of episode descriptions


Now if you’re a fan of the show, you know that season seven of The Walking Dead got started off with a serious shock: Super-villain Negan killed fan-favorites Abraham and Glenn. The season premiere seemed to rattle and upset a lot of fans, so I wondered if people stopped watching the show as a result. The data that I downloaded from Wikipedia includes the number of viewers, so I’m ready to fit a model and look at how viewership changes over time.

Now what does the number of viewers mean in the days of DVR and streaming? I’m not so sure, but it is a (crude) measure of the show’s popularity. If we plot the number of viewers against time, it sure looks like some fans tuned out after what happened to Glenn and Abraham. And this story suggests that the season didn't get much better after the premiere, as the list of least popular episodes contains a good bit of season seven.


Figure 1: Number of viewers for each episode of The Walking DeadFigure 1: Number of viewers for each episode of The Walking Dead


To learn about the show's viewership, I fit a change-point type model with covariates similar to what I did in a previous post about LeBron James. Here I’m using the Lasso in the Generalized Regression platform in JMP Pro to fit a model that will accommodate a moving average as well as other effects like who wrote the episode, who directed the episode, and outliers. It doesn’t seem like knowing the writer or director of an episode would impact a viewer’s decision to watch or not to watch an episode, but I certainly wouldn’t rule it out. Fans of the show can be pretty dedicated, or at least dedicated enough to watch The Talking Dead – a live recap show that follows each episode.

So what did our model tell us? For one, it confirmed that the writer and director didn’t really seem to impact the number of viewers. It also suggests that season premieres tend to provide a boost of almost a million extra viewers. This shouldn’t be surprising because the show always does a good job leaving us hanging in between seasons and building tension for the premiere. Similarly, season finales also provide a bump of about a million extra viewers. There are also some outlier episodes like “The Day Will Come When You Won’t Be," the season seven premiere. Again, not too surprising. The season six finale set up Negan doing something drastic to Rick’s group, and there was serious anticipation built up for “The Day Will Come when You Won’t Be.” Below is a summary of the biggest outlier episodes and how many extra (or fewer) viewers in millions watched than expected.




But now onto the big question: Did viewership drop after Negan killed Glenn and Abraham? Once we adjust for outliers, directors, and the other covariates in our model, we’re left with an underlying mean function that we see in Figure 2. The plot of the raw data in Figure 1 is too noisy to draw any conclusions about what is happening, but the trend paints a much more clear picture. This plot suggests that about 400,000 viewers stopped tuning in immediately after the premiere of season seven. And within the next five episodes, the number of viewers had fallen by about 2 million.


Figure 2: Estimated trend function for the number of viewersFigure 2: Estimated trend function for the number of viewers


Of course, correlation does not imply causation, so we can’t say that the departure of Abraham and Glenn caused people to stop watching (however, this story sure suggests a causal relationship). Maybe the show was already on the slide? Our trend function suggests that the number of viewers peaked between episodes 50 and 68, or late season four through the premiere of season six. I consider that to be a strong stretch in the series (except “Slabtown” of course, that episode put me to sleep quick); it took Rick and his crew from Terminus to Alexandria. But then our trend steadily drops by almost 1 million viewers over the course of the sixth season. Season six included Glenn’s infamous death fake-out; surely that storyline had to make some fans lose interest (I came close. I really didn't like that).

So with the new season about to start, you can bet that I’ll be watching. It will take more than losing a couple main characters to get me to give up on the show. After all, I've already lost almost all of my favorite characters on the show: Shane, Tyreese, Abraham. But I have one more who is still standing strong. Keep hanging in there Carol!

Article Labels

    There are no labels assigned to this post.


Interesting analysis/graphs! I stopped watching the show about that time, but it wasn't because of Abraham and Glenn getting killed, per say. In the beginning of the series, I liked where they were in the "discovery phase" and figuring out things about the zombies and surviving. Then it got into the soap-opera phase (which started to get boring). And then it got into senseless violence/meanness between living people - violence against zombies is ok, but I just didn't really want to watch the violence between people (not just Abraham and Glenn getting killed, but in general). That's when I stopped watching.

Level I

very informative and easy to follow. I am new to JMP and Text Explorer. I have started performing text analysis of thousands of medical notes. In these notes there are numerous words that should be excluded. Another word, tons of data cleaning thru stop words. Is there an easier way to do perform text analysis on the selected list of words and phrases? I have my list but don't know how to incorporate the list in Text Explorer. Also, the context of these selected words matters too. How to account for the context? I know that I can show associated text and documents but words are being repeated many time through many documents/notes. Any ideas/suggestions?


Thank you,

I'm glad you found the post to be useful! Right now I don't think there is a convenient way to only do the analysis on a subset of the words, it would be a manual process like what you are currently doing. That's something we should consider for the future. And Text Explorer does not currently have a way to account for the context of a word, it is strictly based on the document term matrix (indicator variables for which words appear in the text).