The crime rate conundrum

calking · Jan 7, 2019 04:41 PM

Every year they arrive.

“[Such and such a place] was ranked the Safest City in America!”

“The Top 10 Most Dangerous Cities in 2018"

“You’ll Never Guess the City with the Most [Insert Crime of Choice]”

Maybe it started as a spark of curiosity from my Internet Explorer newsfeed (before I made the switch to Google Chrome and never looked back…). More likely, it started when I was a resident of Albuquerque, which last I checked was unfortunately still struggling with a rampant property crime wave (see here for a somewhat humorous take on the situation). In any case, I’ve become intrigued by the annual lists put out by the likes of Money magazine or Niche that claim to showcase the best and/or safest cities in America.

As one might expect, there’s quite a bit of information that goes into creating these rankings. The best sites will describe their methodology, and it often varies from site to site. However, nearly all of them base their crime rate information on the usual gold standard: the FBI Uniform Crime Reports. These reports are gathered on an annual basis and contain counts of reported offenses displayed in various tables. Of particular interest to me (and those generating these lists) are the tables that gave the reported offenses at the city level (Table 6 for years up to 2016 and Table 8 for 2017). After some “fancy” data importing and manipulation, I was able to compile a database of reported offenses for cities throughout the US from 2010 to 2017.

Now, to the FBI’s credit, they make it explicitly clear that their data should not serve as the sole basis for comparing cities and other regions, most likely for the following reasons:

These counts are based on reported offenses, meaning that there’s a good chance the true count is being underestimated.
Not every city reports their crimes to the FBI every year. In fact, there are some years where only a handful of cities in a particular state report their data.

That’s why the sites doing the ranking will typically have other types of data included in their computations. For my purposes, I was simply interested in using the data as a starting point for further investigations, but I still think it is important to acknowledge and share their warning as part of good statistics practice.

Crunching the Numbers

After getting the reported counts, I then computed the crime rate in incidents per 1,000 residents. While the usual standard is to compute the rate per 100,000, I decided on this level as it makes it easier to assess both large and small cities (i.e., it may be harder to think of a crime rate per 100,000 when you only have a population of a few thousand). I then performed some cursory exploratory data analysis, as all good statisticians do. I started looking around for cities with the highest crime rates, and there was, in fact, one that caught my eye: Lakeside, CO.

Take a look at Figure 1.

Figure 1: Based on these numbers, you'd seldom find a more wretched hive of scum and villainy! Figure 1: Based on these numbers, you'd seldom find a more wretched hive of scum and villainy!

You probably noticed that the crime rate is on a logarithmic scale, which certainly can’t be a good sign. But look at the numbers! We’re talking roughly 600 or so violent crimes per 1,000 in 2013 and 2014 alone! And the property crime is off the charts! What is this dreadful lawless place where criminals seem to run freely?!

Well, let’s cut back on the number of exclamation points and think critically about what we’re seeing here. The first thing should be to investigate the numbers that go into computing the crime rates. Doing so, we find that the bustling metropolis of Lakeside, CO, has a whopping grand total of … eight people. That is, as of 2011; in 2010, it had a population of 19. Got to make sure we're being accurate with our numbers here.

If we look at the Violent Crime count for 2013 and 2014, there were five reported incidents. Now the FBI Crime Report breaks down Violent Crime into four categories: Homicide (murder), Rape, Robbery, and Aggravated Assault. You’ll be happy to know that there’s not been a reported murder in Lakeside since 2010 (which also makes sense; otherwise, the population would have dropped significantly after 2013). In fact, these counts can be pretty evenly divided between robbery and aggravated assault.

Now about that property crime rate. It’s well above one for nearly the entire time, though it skyrockets in 2013. Looking at the Property Crime counts, we’re talking 500-600 reported incidences. For eight people!! Ok, I’ll stop yelling now. It turns out that, just like Violent Crime, the FBI divides Property Crime into another four categories: burglary, theft, auto theft, and arson. You might be wondering what the difference is between burglary and theft, and why these two are considered different from robbery. It’s best explained with some examples. Burglary is typically what you think of if you’re worried about your house or car being broken into. Theft is if you’ve been robbed by a pickpocket or a purse snatcher. What distinguishes those two from robbery is that they both do not involve force or threats of violence against the victim. If someone were to point a gun at you and demand your valuables, that’s robbery. With that clarified (you weren’t planning on sleeping tonight, right?), if we delve into the counts, the overwhelming majority of the reported Property Crime incidents is due to theft, with a smattering of burglary and auto theft rounding out the remainder.

Getting a Second Opinion

Now that we have a better understanding of what types of crimes occur here, perhaps getting a look at Lakeside itself may help give us an even better picture of what’s going on. In Figure 2, we have a Google Maps view of the “city” in question. I added the red outline to help it stand out more.

Figure 2: Well now we know where the Lake part comes from... Figure 2: Well now we know where the Lake part comes from...

As you can see, it’s actually a very small suburb of the Denver metropolitan area. I’ve circled in gold where the 8 residents live. You’ll also notice that I highlighted a few other things in purple. The one on the right shows the Lakeside Shopping Center, which I presume is filled with everything Lakeside residents could ever need or want. The one to the right is an amusement park (which you may have guessed at given the picture Google selected to go with it). Given that these two make up the majority of the “city”, you can start to get a picture of what might really be going on here. Not a complete picture, mind you, but at least things start to make more sense than the raw crime rate would have you believe.

Given this situation, I decided to create a custom “crime score” for my analysis purposes that would hopefully be more robust to such situations. This score involves taking the crime rate and multiplying it by the city’s Probability Score, which is essentially the value of the empirical cumulative distribution function for that city’s population taken from the estimated empirical distribution across all city populations (on a year by year basis to account for any fluctuations). My reasoning is as follows: If you have two cites each with a crime rate of 500 per 1,000 residents, but the one has a population of eight people and the other has a population of 800 people, then you might consider the city with a greater population to be really unsafe compared to the one with the smaller population given the other facets that are involved. Of course, I make no claims that my new crime score is perfect by any means as I’m sure someone could provide a valid reason as to why another metric might be better suited. But it seems to do a decent job for my purposes.

The following table illustrates the impact of using my custom score vs. the raw crime rate. The data are computed using the reported offenses from 2016 and, as a reminder, the crime rates are in incidents per 1,000 residents.

City	Population (2016)	Violent Crime Rate	Violent Crime Score	Property Crime Rate	Property Crime Score
New York, NY	8,566,917	5.73	5.73	14.62	14.62
Los Angeles, CA	4,007,905	7.19	7.19	24.74	24.73
Chicago, IL	2,725,153	11.05	11.05	31.91	31.9
Cary, NC	164,835	0.92	0.91	10.05	9.89
East St. Louis, IL	26,769	28.28	23.78	22.56	18.97
Lakeside, CO	8	500	0.1	39875	8.33

As you can see, the crime score can never exceed the raw crime rate (by definition). You can also see how the smaller cities are more strongly impacted, which is also by design. If you look at the East St. Louis entry, you can see an example of how a relatively large crime rate can still lead to a large crime score in spite of the relatively small population, as I had hoped. I am willing to admit that there might be cases where the score changes the rate sufficiently to negatively alter a city's relative standing (i.e. make a city with a higher crime rate than another suddenly switch places in terms of crime score), but I would argue that would happen either with a very small city (which the score is designed to do) or with cities that are very close in crime rate that the change is practically negligable. But again, I'm open to constructive criticism and other measures.

Context Is King

In summary, I think the case of Lakeside, CO, serves as a warning to never mindlessly accept data at face value. Even with something like a crime rate, which is intended to put small and large cities on an even standing by eliminating the population gaps, there are still going to be issues that can’t be resolved by one measure alone. It's also a great example of why you should investigate outliers and not just throw them out as there's often valuable insight hidden in their strange behavior. In either case, it shows the importance of having a more complete picture.

I would say this also rings true about the data in general. Recently, Albert Cairo gave a great talk on accurately visualizing data. One of his main points was that the purpose of a graphic is to start a conversation. I claim the same could be said about this data (and the graphics I generated along with it). Rather than analyze from the standpoint of fear (i.e., “Oh wow! I’m avoiding that place like the plague!!”), I choose to analyze it from the standpoint of curiosity. For example, after finding a city with a high crime score for a particular crime, I then do a quick search for more contextual information. Perhaps the city used to have a thriving industry that is now in decline. Or perhaps it has had a tumultuous history that is still being played out.

Furthermore, I fully understand that one cannot treat an entire city as one object. Some parts of a city suffer more than others, so I have other resources to which I turn for a more complete evaluation. All in all, it’s as I like to say: Context is King (no pun intended…OK, maybe a little).

I’ve attached my data set for you to explore on your own. You can also check out some of the graphics I generated over on JMP Public, along with some other cool visualizations by other users. These graphics can also be generated using the table scripts in the attached data table.

Phil_Kay · ‎01-17-2019

Nice. When you start exploring the data you often find it is telling you something very different from the media headlines.