Hi, my name is Shalika Siddique
My name is Anand Manivannan
and we're both students from Oklahoma S tate University
and we currently pursuing a business analytics and data science degree.
Today we are presenting a boaster where we explore euthanasia
in animal shelters,
and we hope to understand why cats and dogs are being put down.
Every year we lose about 920,000 animals annually.
Using JMP Pro, we would like to identify the key factors,
that lead to euthanization of cats and dogs.
Once we identify these key factors,
funds can be channeled to relevant sectors
to prevent euthanization of animals that could have been saved.
In addition to this, we aim to make predictions to identify
which animals are most likely to be euthanized.
A little information about our data set here,
we source the data from Austin's data portal,
and the animal shelter that we use for analysis is located
in Austin, Texas.
Overall, we had about 130,000 records.
After cleaning and filtering, we focused on about 67,000 records
that were specific to cats and dogs.
Prior to our analysis we explode the data set
and attempted to derive insights.
We use JMPs, graph builder to create visualizations
such as bar graphs.
From the 67,000 records, there were about 3,171 animals
which were euthanized.
Which is about 4.7 % animals of the shelter.
In comparison to animals surrendered to the shelter by the owner,
stray animals were most prone to euthanasia.
When we compare the age of the animals,
we notice that kittens under the age of 15 months,
contribute to 25 % of euthanasia,
while pups contributed to 13 % of euthanization
This bar graph here,
is an example of one of the visualizations
created using JMPs builder.
The lavender bar represents cats,
while the purple bar represents dogs.
We can see that intact males followed by intact females
are more prone to euthanization,
in compared to neutered animals.
Next A nand will go over in detail over the modeling.
Thank you Shalika.
Yes, I'd like to talk a bit more about our approach towards
modeling using JMP.
Before we could start modeling,
we performed a few data preprocessing steps to prepare our data.
We did things like standardizing units
for certain variables,
such as age, which was in weeks, months and years.
We wanted...
We converted that to just months.
We bend on the age variable so we could convert it into
a categorical variable.
It looked in the...
It looked like age ranges like 10- 15 and 15- 25.
We grouped rare breeds and colors to reduce
the number of categories.
Additionally, we also filtered just cats and dogs
from all the other animals that went through the shelter.
During a modeling phase we noticed something very peculiar.
We noticed that class imbalance in our target variable,
which talked about whether an animal was adopted,
and whether an animal was euthanized.
About 64,000 records, out of 67,000 records were adopted
animals, and only 3,000 animals were euthanized animal.
Since a model was to focus on predicting euthanasia,
we had to resolve this issue, and hence we used JMPs Bootstrap model
and Boosted Forest to resolve this issue.
It used the concepts of bagging and boosting to do this.
Since bagging and boosting models don't really
give a lot of room for interpretation
in terms of what the variables do,
we used logistic regression to interpret these variables as well.
After modeling,
we tuned up parameters to get the best results.
We chose a few certain metrics,
to choose the best model based on its performance on validation data.
We used a 70:30 % validation split,
and prior to modeling, we also tested the assumptions
for logistic regression.
Or over to the top right, you can see that we tested
for multicollinearity and independence among variables
using JMPs contingency analysis,
which spread out a muse plot, and gave us a P and correlation value,
that basically told us which variable was correlated with each other.
Now I'd like to dig a bit deep into each model
and how we selected our models.
Over to the top left,
you would see that we chose metrics like specificity,
this classification area under the Cove and R-S quare
to choose which model performed the best.
These metrics were chosen for a particular reason
that aligned with our goal.
Our goal was to predict which animals would want to be euthanized.
The cost of our model,
incorrectly predicting a euthanized animal,
as a non- euthanized animal, would mean that animal
would probably die and not be saved.
Hence we wanted to focus on increasing the accuracy
of euthanized animals and reducing the misclassifications.
Hence these particular metrics, were chosen
First we ran the nominal logistic regression model,
which you can see over to the bottom left.
The Log worth immediately gave us which variables
were the most important in predicting euthanasia.
Turns out it was sex of the animal intake condition,
intake type and outcome age .
A lot of these are not surprising,
and it matched with what research shows.
The whole model turned out to be significant as well,
the P- value less than 0.001
Following that, we ran the Bootstrap F orest model,
which was tuned to have a hundred trees and feature selection
criteria value of three Bootstrap.
We used receiving operating characteristic or the AUC curve,
to determine which classification threshold
gave us the best classification results.
We ended up using 0.1 or 10 % as our classification threshold.
Over to the right,
you would see that we ran the Boosted Forest model,
with parameters of 87 layers and a learning rate of 0.179
Over at the bottom,
we used the decision matrix for all three models to calculate
the specificity of each particular model.
Which you give us how accurately the euthanized animals
were being predicted.
We also use misclassification rate and R-S quare
from the overall statistic tab of JMP.
In every metric, we found that our Bootstrap model
outperformed the other models,
and hence we chose that as the winning model to make
predictions on euthanasia.
Next, I would like to go over,
some important results that logistic regression gave us.
With regards to sex,
we found that intact cats and intact dogs,
were way more likely to be euthanized than
neuter spayed animals.
With regards to breed, we found that mixed cat breeds,
and Pit Bull dog mixed breeds, were more likely to be euthanized
than all other breeds.
With regards to age we found that cats that are 4.5
to 6 years, are more likely to be euthanized than younger cats.
Dogs under 1.2 years are the least likely to be euthanized.
This was widely surprising because it's contradictory
to what we found during a data exploration phase.
Similarly, with regard to intake type,
we found that older surrendered animals,
are twice more likely to be euthanized than stray animals.
This is completely, again, contradictory to what we found
in the data exploration phase.
That goes to show that what the power of statistical
analysis and unbeating true facts.
Next, I will be handing it off to Shalika again
to go what recommendations we can make to these animal shelters.
Thank you Anand.
Based on our analysis, we have a few recommendations
that animal shelters could use to lower euthanizations.
We believe that animals taken into the shelter
should be neutered or spayed
This is in accordance with medical research,
which proves that intact animals are more prone to diseases.
Animal shelters could also use our Bootstrap Forest model
to prioritize which animals needs to be saved,
in case a difficult decision needs to be made.
In support of that,
here are some recommendations from Austin's animal shelter.
This particular shelter would need to prioritize cats over dogs
as they are more prone to euthanizations.
With regards to age, cats aged between 4.5-6 years,
and dogs over 1.2 years would require more attention.
Owner surrendered dogs need to be prioritized over stray animals.
Finally, when it comes to breeds,
Pit Bull mix dog breeds and mixed cat breeds,
are more prone to euthanization and would likely require more attention.
That brings us to the end of our presentation.
We hope that animal shelters could use this analysis,
to reduce the need for an animal to be euthanized.
Thank you.