Finding the Source of Grandma’s Chili: Investigative Text Exploration
Nov 24, 2017 4:26 PM
| Last Modified: Nov 25, 2017 8:03 AM
This blog by Scott Wise, JMP Technical Manager, seeks to use analytics to answer interesting questions that occur all around us. In this installment, we will use the latest text analytic methods to learn more about the differences in regional chili recipes and find out where my Grandma’s chili comes from!
One food critic once said that chili was the one food never to review as it was guaranteed to generate automatic disagreements from everyone. Not only do we each have our own favorite way of making chili, but everyone feels that their recipe is the best! To complicate the matter, this very traditional American food has many regional variations.
That brings us to one of my more memorable moments in moving to Texas. There was a chili cook-off at my place of work and I thought it would be a good idea to make my delicious Grandma Lillian’s Chili family recipe from Cleveland, Ohio. As a bonus, I was also asked to join as a judge in one of the competitive chili contests. Those of you who live in the Texas can probably already guess the results of the day. First off, my Grandma Lillian’s Chili (pictured above) was ridiculed for the fact that I had included beans in the mix! Secondly, I quickly learned that a chili cook-off in Texas means to produce the very hottest, soupy concoction known to man. Luckily the hole created in my stomach from tasting the progressively hotter and hotter samples of chili did repair over time! But it got me wondering why was my Grandma Lillian’s Chili recipe so different from what they have in Texas and even where did it come from?
First, we will start with the full recipe and cooking instructions for Grandma’s Lillian’s Chili:
1 ½ to 2 pounds of ground chuck beef
2 - 16 oz cans of red kidney beans drained (Busch light red)
1 can tomato soup (undiluted)
1 28oz plus can of tomatoes (whole & cut them up) – Don’t drain and add a tablespoon of sugar
1 cup of chopped onions
2 tbsp of chili powder
1 tbsp of flour
½ tsp of salt
1 tbsp of water
Brown the meat and the onions & combine in a pot
Add the beans, soup and tomatoes
Add the paste and simmer for 45 minutes at least
Add more seasoning & ingredients to taste
Makes six bowls
Next, we will compare this recipe to other typical ones we can find throughout America. What we know as today as chili seems to have come out Texas as an adaption to Mexican dishes and rapidly expanded to the rest of the American Heartland through cattle drives and later World’s Fair Expos. Along the way chili was further adapted into other local styles of cooking. We collected a minimum of three random recipes from the following twelve regional styles listed below.
Chili Con Carne
Usually very hot and nicknamed “Texas Bowl of Red”
Football fan favorite of chili served over Frito Corn Chips
Chili with a Cajun twist
Features chicken and New Mexico’s famous green chilies
Made with plenty of chilies
Similar to Chili Con Carne
Kansas City Chili
Hearty chili with beans
Springfield Style Chilli
Heartier chilli spelled with an extra ‘l” for Illinois
Meaty and typically served over spaghetti
Coney Chili Dog Sauce
Serious chili for hot dogs
Chicken based chili with white beans
Plenty of vegetables, no meat
We organized a data table with a column called “Ingredients” containing all the recipe ingredients so we can use the modern approach of text analytics to help better understand our chili differences. (See Data Table below.)
Next, we used the JMP Pro Text Explorer platform to look for patterns among the words in our “Ingredients” column. In a typical text exploration analysis, we will make us of the built-in algorithms in the software for tokenizing and stemming. Tokenizing algorithms use a built-in regular expression library to parse the text and find the most common words that fit common patterns. This allows us to ignore unhelpful words in the text (such as ands, buts, etc.) Often done at the same time is Stemming which combines words into their root word (combining cups with cup, etc.). (See below for the Text Exploration setup).
We get our first list of Terms (words) that have the highest counts among all of our “Ingredients.” From this list, we can further identify additional unhelpful words that won’t add to the study of the ingredients. So, we designated words like “Teaspoon” and “Chopped” as Stop words to exclude from the analysis. We can also include the top Phrases to be added into our list of Terms. (See Below for the Term List & Phrases before editing).
This allows the software to form a Document Term Matrix (DTM), or a table of these list counts by row based on our finished Term list. Often with this list, we generate a popular visual called a Word Cloud to easily see the largest Term counts among our text data, as seen by the larger font sizes. Now we are ready to see the relationship of high occurrence ingredients like “Onion” and “Chili Powder.” (See Below for the Term List After Editing and the Word Cloud)
Lastly, we will use a method in JMP Pro to dimensionally reduce and group the rows of our data into Clusters by looking at the DTM and grouping those rows which have common Words/Phrases. Here we used a Latent Class Analysis model to form 3 Clusters. Looking at these clusters a few things jump out at us:
Cluster 1 seems to be dominated by chili recipes with ground beef, tomato sauce/paste, chili powder and beans in it.
Cluster 2 contains chili recipes with chicken and green (chilies).
Cluster 3 has chili recipes with red chilies and pork.
(See Below for the Latent Class Analysis Cluster Results).
Using the Cluster Probabilities by Row (see below) we can see that Grandma’s Chili belongs in the first cluster with a shared profile of using ground beef, tomatoes, chili powder and beans (kidney). In fact, the meaty (ground beef), hearty (with beans), and approachable (tomato and chili powder) attributes of my Grandma Lillian’s chili are well represented in many other Mid-West based recipes in Cluster 1. Out of this group we looked at each recipe and found a near match in the Kansas City Chili Recipe #2. The kind of chili I encountered in Texas (Chile Con Carne) is better represented by Cluster 3 that is heat hot (chilies), spicy (oregano), often contains pork and of course no beans!
If we look at Kansas City Chili Recipe #2, we can see a near match with my Grandma Lillian’s Chili. This makes sense after learning that my Grandmother Lillian grew up and learned to cook while living on a farm in nearby St. Josephs, Missouri. (See the ingredients for this Kansas City Chile below).
Kansas City Chile Recipe #2 Ingredients (Missouri)
The power of using Text Exploration is a great additional tool in investigating all kinds of unstructured text that commonly resides in our data. From text in notes captured on warranty issues, lab tests, to even looking at food recipes, this new method opens a lot of opportunity to better understand our data. While it does take some user interaction and interpretation, it can really make a difference in finding new patterns among these text entries. But for now, I’ll just enjoy my Grandma Lillian’s “Kansas City Style” Chili and stay away from eating too much of the hot “Texas Chile Con Carne”. For as the Steve Miller Band warns in their 1970 song “Hot Chili” … “It's hotter than noon, It will melt your spoon, So buddy, you better get ready, For eatin' hot chili.”