Good afternoon.
Today we're going to be talking about the understanding
of crime rate and crime prediction.
Before we go into the data, let me introduce a team.
We have Karanveer, our data scientist and data modeling expert,
and myself Grant Lackey as a data researcher and
data visualization specialist.
So before we get into the data, let's do an overview
of the entire presentation.
We're going to begin with the background
which will be the initial data sets and why we chose our data.
The data overview, which again is going into the reason
why we chose our data and what we're going to be trying to answer.
The business problems, which is the problems that
we had with our data.
Why we're trying to answer certain
questions and the overall idea of the entire project.
Next is our methods and plans, which is our procedure of answering
our business problem and then our results,
which are the results of our methods' plans.
Our applications, which are real- life implications
from our results, and post-analysis,
which is what we could include or add on to our results.
What we could add on to improve upon this and years to come.
Beginning with background, why should we care about crime rate?
Well crime is just important to everyone, and it's everywhere in the United States,
and so what is crime rate and how can we define it?
How we define crime rate is our initial criminal activity,
divided by the population density per county or per state.
We're mainly going to be looking at per state.
So how are we going to identify factors
which can reduce crime rates throughout our entire project.
Here we're going to be speaking about how certain crime
is going to be more influential in certain states than others,
and do certain crimes influence other crimes?
So for example,
if there was a murder crime, would guns or theft be more
influential in that murder crime or would other crimes be influential in that?
So looking at our data overview.
We started off with our initial data set which is our crime statistics data,
and we added other data variables later on throughout this initial data set.
Beginning with our initial data set, we started with 2014 data,
and this initial data set was given to us from
Federal Bureau of investigation: the FBI.
We looked at 42 criminal activities which are wide range
from murder, theft to drug possession, drug activities.
We looked at about 3,200 counties, and within all these counties
or within these states would be all those counties.
We had to look at 48 states.
We had to exclude Florida and Illinois,
because Florida and Illinois did not provide data to the FBI
for the criminal activities.
If you look at future 2018 data or past 2012 data, it's the same issue.
They just don't provide data to the FBI it seems .
With all of this for 2014 data, there's 180,000 data points
talking about the FIPS codes.
This is how we identified certain criminal activity in certain counties.
For example, we have our state codes, which would be 01 for Alabama.
These states are represented alphabetically.
Alabama would be the first one, and then each county within that state
would have numbers to them.
For example, Baldwin would be 003.
If you looked at Baldwin, Alabama,
it'd be 01003, so on and so forth for every county detailed in the state.
Looking at our extra variables, we looked at the census data.
Census data is always great for checking out the age, population,
income per county or per state,
and we had to look at other data sets like
gender, immigration, religion, marriage, unemployment and literacy rates.
These other data sets looked more or so at the statewide rates,
and this isn't really related to criminal activity,
but we wanted to involve it
within our initial data set to see if there's any correlation with them.
Going into our business problem,
we want to answer what states in the United States specifically
have the highest and lowest crime rates and why is that so?
To answer our business problem,
we have to answer these business questions going into that.
How can we identify variables that influence crime?
Which are the most important factors?
Are there crimes that influence other crimes?
I'm going to hand it off to Karanveer to talk about plans and methods.
Thank you, Grant.
Our approach to solve this business problem,
was to come up with a regression model.
We have used JMP to make it.
First, as Grant mentioned,
we have connected the various databases, that is the crime data set,
along with those extra variables such as religion, income, etc.
We have made sure whether the data looks clean.
A fter that, we have run our regression model,
which is able to predict the crime rate for us.
With this, we are able to know the various variables
and their importance in determining this crime rate,
and we are able to list them by their importance.
A t the end we'll also be showing you visualizations based on it.
As Grant mentioned, we had 42 criminal activity variables.
Some of these variables were very small,
such as drug possession, drug consumption, drug sales.
In that case, we have simply grouped them to make sure that
we can come on a conclusion on that
since the data was otherwise too small for the subgroups.
We'll be looking them state wise
as we didn't have the extra variables on a county basis.
But I feel that this is great for starting this project.
Our target variable would be the crime rate.
We have defined the crime rate as the number of arrest
in that certain population.
Now, coming down to the variables that we are using.
Most of these variables have been normalized and we have used
a percentage for them, such as immigration for gender.
We will be using two types that will be a male and a female,
and then religion, unemployment, marriage, literacy.
Most of these are normalized so that
we don't have an analysis which could be misleading.
Coming down to the final equation of our regression model.
This is the equation of a model.
We have rounded off the samples,
and as we can see there are a lot of variables
that have a positive influence, as in, that they increase the crime rate,
and there are certain variables which have a negative sign with them.
They basically decrease the crime rate.
Using this we can see how we can define a crime rate in any state or county.
Coming down to the results.
The finding number one.
We really wanted to see which states have the highest crime rate.
These are the following five states.
Tennessee, Wyoming, Mississippi, Wisconsin, New Mexico.
Then we have the following states with the lowest crime rate,
that are New York, Alabama, Vermont, Massachusetts and Michigan.
Here is a following visualization explaining how the crime rate
varies across United States.
As we see, there is no certain pattern and it's all over the place.
Finding number two.
Using JMP and doing a log [inaudible 00:07:57]
on the variables, we could basically see which variables have more importance.
The number one was weapon owned, followed by literacy rate,
then religion percentage, immigration, population density,
and the unemployment rate.
I think this is a great finding, while any government body
or any organization wants to allocate resources
whenever they are trying to reduce the crime rate or trying to analyze it.
The finding number three is something really interesting.
Our goal was to see whether there are certain crimes
which could help us solve not just that crime,
but maybe other crimes as well.
Which these crimes are trying to influence.
Drug and weapon was one of them.
We could see drug and weapons have a very high correlation
with say, theft, robbery, murder.
Using a chi- square test, we saw that the correlation is very high.
So in case any organization would want to focus on and start with,
I think drug and weapon is a great category where
they can focus at for reducing crime rate in any state or county.
This is the following map showing
the religion rate, weapons owned, and literacy rate,
and the variation across United States.
If we put it with the crime rate, we can see a certain pattern
which is actually explained by our regression model.
Now coming down to the implications,
how we can use our analysis to a real- world solution.
Like the data set we have used, and we have connected to variables,
we would definitely want to work with governments, towns and communities
because crime is a universal problem
and this is something everybody wants to reduce.
The restore allocation can be done according to this,
and further, this would result in a decrease in crime rate
and increase in happiness in the community.
Post- analysis.
There are a lot of things that we would want to include in our project,
and this is a great future scope as well.
First thing, we would want to include more variables
such as weather, ethnicity and the list goes on.
We could definitely even listen to the government bodies and take inputs
for these variables from them.
County detailed or at least city detail.
I feel it's great to start with state- wise data,
but we would definitely want to focus on a more detailed level of analysis,
so that we can use these conclusions to the real world
more clearly, more precisely and we would have a better impact as well.
The data time frame.
Right now we have used the data from the year 2014.
I feel this is an eight year old data set.
We would definitely want to use a more latest data set,
and something that is spanning over a couple of years,
so that it gives us clarity.
Since COVID has impacted us in a lot of ways,
and it has changed how basically lives are working around us,
and so has crime rate and the way crime happened has been changed.
We would definitely want to focus post COVID ,
and over the last two- three years, for a post- analysis.
That's all and thank you.