We lose approximately 920,000 shelter animals to euthanasia requests every year. Instead, these animals could’ve made 920,000 families happier. We would like to explore the current data from Austin’s animal center to understand what conditions lead to a euthanasia request and if measures can be adopted to prevent them. The data is sourced from Austin’s Open Data Portal and consists of two tables - intakes and outcomes, dating from Oct 1, 2013, to the present. Intakes represent the status of animals as they arrive at the animal center while outcomes represent the status of animals as they leave. Each animal in is identified by a unique Animal ID. Each table consists of 136K data points and 12 features. We first explore the distribution of data by various categories such as breed, gender, age, and intake condition. Finally, classification models like logistic regression and random forest classifiers are used to make predictions on whether an animal will be euthanized. Understanding the key factors like their intake condition, sub-type of euthanasia, breed, and age could unveil crucial insights into understanding the causes for these animals to be put down and consequently advise on where to target funding for research and facilities.

Hi, my name is Shalika Siddique

My name is Anand Manivannan

and we're both students from Oklahoma S tate University

and we currently pursuing a business analytics and data science degree.

Today we are presenting a boaster where we explore euthanasia

in animal shelters,

and we hope to understand why cats and dogs are being put down.

Every year we lose about 920,000 animals annually.

Using JMP Pro, we would like to identify the key factors,

that lead to euthanization of cats and dogs.

Once we identify these key factors,

funds can be channeled to relevant sectors

to prevent euthanization of animals that could have been saved.

In addition to this, we aim to make predictions to identify

which animals are most likely to be euthanized.

A little information about our data set here,

we source the data from Austin's data portal,

and the animal shelter that we use for analysis is located

in Austin, Texas.

Overall, we had about 130,000 records.

After cleaning and filtering, we focused on about 67,000 records

that were specific to cats and dogs.

Prior to our analysis we explode the data set

and attempted to derive insights.

We use JMPs, graph builder to create visualizations

such as bar graphs.

From the 67,000 records, there were about 3,171 animals

which were euthanized.

Which is about 4.7 % animals of the shelter.

In comparison to animals surrendered to the shelter by the owner,

stray animals were most prone to euthanasia.

When we compare the age of the animals,

we notice that kittens under the age of 15 months,

contribute to 25 % of euthanasia,

while pups contributed to 13 % of euthanization

This bar graph here,

is an example of one of the visualizations

created using JMPs builder.

The lavender bar represents cats,

while the purple bar represents dogs.

We can see that intact males followed by intact females

are more prone to euthanization,

in compared to neutered animals.

Next A nand will go over in detail over the modeling.

Thank you Shalika.

Yes, I'd like to talk a bit more about our approach towards

modeling using JMP.

Before we could start modeling,

we performed a few data preprocessing steps to prepare our data.

We did things like standardizing units

for certain variables,

such as age, which was in weeks, months and years.

We wanted...

We converted that to just months.

We bend on the age variable so we could convert it into

a categorical variable.

It looked in the...

It looked like age ranges like 10- 15 and 15- 25.

We grouped rare breeds and colors to reduce

the number of categories.

Additionally, we also filtered just cats and dogs

from all the other animals that went through the shelter.

During a modeling phase we noticed something very peculiar.

We noticed that class imbalance in our target variable,

which talked about whether an animal was adopted,

and whether an animal was euthanized.

About 64,000 records, out of 67,000 records were adopted

animals, and only 3,000 animals were euthanized animal.

Since a model was to focus on predicting euthanasia,

we had to resolve this issue, and hence we used JMPs Bootstrap model

and Boosted Forest to resolve this issue.

It used the concepts of bagging and boosting to do this.

Since bagging and boosting models don't really

give a lot of room for interpretation

in terms of what the variables do,

we used logistic regression to interpret these variables as well.

After modeling,

we tuned up parameters to get the best results.

We chose a few certain metrics,

to choose the best model based on its performance on validation data.

We used a 70:30 % validation split,

and prior to modeling, we also tested the assumptions

for logistic regression.

Or over to the top right, you can see that we tested

for multicollinearity and independence among variables

using JMPs contingency analysis,

which spread out a muse plot, and gave us a P and correlation value,

that basically told us which variable was correlated with each other.

Now I'd like to dig a bit deep into each model

and how we selected our models.

Over to the top left,

you would see that we chose metrics like specificity,

this classification area under the Cove and R-S quare

to choose which model performed the best.

These metrics were chosen for a particular reason

that aligned with our goal.

Our goal was to predict which animals would want to be euthanized.

The cost of our model,

incorrectly predicting a euthanized animal,

as a non- euthanized animal, would mean that animal

would probably die and not be saved.

Hence we wanted to focus on increasing the accuracy

of euthanized animals and reducing the misclassifications.

Hence these particular metrics, were chosen

First we ran the nominal logistic regression model,

which you can see over to the bottom left.

The Log worth immediately gave us which variables

were the most important in predicting euthanasia.

Turns out it was sex of the animal intake condition,

intake type and outcome age .

A lot of these are not surprising,

and it matched with what research shows.

The whole model turned out to be significant as well,

the P- value less than 0.001

Following that, we ran the Bootstrap F orest model,

which was tuned to have a hundred trees and feature selection

criteria value of three Bootstrap.

We used receiving operating characteristic or the AUC curve,

to determine which classification threshold

gave us the best classification results.

We ended up using 0.1 or 10 % as our classification threshold.

Over to the right,

you would see that we ran the Boosted Forest model,

with parameters of 87 layers and a learning rate of 0.179

Over at the bottom,

we used the decision matrix for all three models to calculate

the specificity of each particular model.

Which you give us how accurately the euthanized animals

were being predicted.

We also use misclassification rate and R-S quare

from the overall statistic tab of JMP.

In every metric, we found that our Bootstrap model

outperformed the other models,

and hence we chose that as the winning model to make

predictions on euthanasia.

Next, I would like to go over,

some important results that logistic regression gave us.

With regards to sex,

we found that intact cats and intact dogs,

were way more likely to be euthanized than

neuter spayed animals.

With regards to breed, we found that mixed cat breeds,

and Pit Bull dog mixed breeds, were more likely to be euthanized

than all other breeds.

With regards to age we found that cats that are 4.5

to 6 years, are more likely to be euthanized than younger cats.

Dogs under 1.2 years are the least likely to be euthanized.

This was widely surprising because it's contradictory

to what we found during a data exploration phase.

Similarly, with regard to intake type,

we found that older surrendered animals,

are twice more likely to be euthanized than stray animals.

This is completely, again, contradictory to what we found

in the data exploration phase.

That goes to show that what the power of statistical

analysis and unbeating true facts.

Next, I will be handing it off to Shalika again

to go what recommendations we can make to these animal shelters.

Thank you Anand.

Based on our analysis, we have a few recommendations

that animal shelters could use to lower euthanizations.

We believe that animals taken into the shelter

should be neutered or spayed

This is in accordance with medical research,

which proves that intact animals are more prone to diseases.

Animal shelters could also use our Bootstrap Forest model

to prioritize which animals needs to be saved,

in case a difficult decision needs to be made.

In support of that,

here are some recommendations from Austin's animal shelter.

This particular shelter would need to prioritize cats over dogs

as they are more prone to euthanizations.

With regards to age, cats aged between 4.5-6 years,

and dogs over 1.2 years would require more attention.

Owner surrendered dogs need to be prioritized over stray animals.

Finally, when it comes to breeds,

Pit Bull mix dog breeds and mixed cat breeds,

are more prone to euthanization and would likely require more attention.

That brings us to the end of our presentation.

We hope that animal shelters could use this analysis,

to reduce the need for an animal to be euthanized.

Thank you.

Published on ‎05-20-2024 07:52 AM by | Updated on ‎07-23-2025 11:15 AM

We lose approximately 920,000 shelter animals to euthanasia requests every year. Instead, these animals could’ve made 920,000 families happier. We would like to explore the current data from Austin’s animal center to understand what conditions lead to a euthanasia request and if measures can be adopted to prevent them. The data is sourced from Austin’s Open Data Portal and consists of two tables - intakes and outcomes, dating from Oct 1, 2013, to the present. Intakes represent the status of animals as they arrive at the animal center while outcomes represent the status of animals as they leave. Each animal in is identified by a unique Animal ID. Each table consists of 136K data points and 12 features. We first explore the distribution of data by various categories such as breed, gender, age, and intake condition. Finally, classification models like logistic regression and random forest classifiers are used to make predictions on whether an animal will be euthanized. Understanding the key factors like their intake condition, sub-type of euthanasia, breed, and age could unveil crucial insights into understanding the causes for these animals to be put down and consequently advise on where to target funding for research and facilities.

Hi, my name is Shalika Siddique

My name is Anand Manivannan

and we're both students from Oklahoma S tate University

and we currently pursuing a business analytics and data science degree.

Today we are presenting a boaster where we explore euthanasia

in animal shelters,

and we hope to understand why cats and dogs are being put down.

Every year we lose about 920,000 animals annually.

Using JMP Pro, we would like to identify the key factors,

that lead to euthanization of cats and dogs.

Once we identify these key factors,

funds can be channeled to relevant sectors

to prevent euthanization of animals that could have been saved.

In addition to this, we aim to make predictions to identify

which animals are most likely to be euthanized.

A little information about our data set here,

we source the data from Austin's data portal,

and the animal shelter that we use for analysis is located

in Austin, Texas.

Overall, we had about 130,000 records.

After cleaning and filtering, we focused on about 67,000 records

that were specific to cats and dogs.

Prior to our analysis we explode the data set

and attempted to derive insights.

We use JMPs, graph builder to create visualizations

such as bar graphs.

From the 67,000 records, there were about 3,171 animals

which were euthanized.

Which is about 4.7 % animals of the shelter.

In comparison to animals surrendered to the shelter by the owner,

stray animals were most prone to euthanasia.

When we compare the age of the animals,

we notice that kittens under the age of 15 months,

contribute to 25 % of euthanasia,

while pups contributed to 13 % of euthanization

This bar graph here,

is an example of one of the visualizations

created using JMPs builder.

The lavender bar represents cats,

while the purple bar represents dogs.

We can see that intact males followed by intact females

are more prone to euthanization,

in compared to neutered animals.

Next A nand will go over in detail over the modeling.

Thank you Shalika.

Yes, I'd like to talk a bit more about our approach towards

modeling using JMP.

Before we could start modeling,

we performed a few data preprocessing steps to prepare our data.

We did things like standardizing units

for certain variables,

such as age, which was in weeks, months and years.

We wanted...

We converted that to just months.

We bend on the age variable so we could convert it into

a categorical variable.

It looked in the...

It looked like age ranges like 10- 15 and 15- 25.

We grouped rare breeds and colors to reduce

the number of categories.

Additionally, we also filtered just cats and dogs

from all the other animals that went through the shelter.

During a modeling phase we noticed something very peculiar.

We noticed that class imbalance in our target variable,

which talked about whether an animal was adopted,

and whether an animal was euthanized.

About 64,000 records, out of 67,000 records were adopted

animals, and only 3,000 animals were euthanized animal.

Since a model was to focus on predicting euthanasia,

we had to resolve this issue, and hence we used JMPs Bootstrap model

and Boosted Forest to resolve this issue.

It used the concepts of bagging and boosting to do this.

Since bagging and boosting models don't really

give a lot of room for interpretation

in terms of what the variables do,

we used logistic regression to interpret these variables as well.

After modeling,

we tuned up parameters to get the best results.

We chose a few certain metrics,

to choose the best model based on its performance on validation data.

We used a 70:30 % validation split,

and prior to modeling, we also tested the assumptions

for logistic regression.

Or over to the top right, you can see that we tested

for multicollinearity and independence among variables

using JMPs contingency analysis,

which spread out a muse plot, and gave us a P and correlation value,

that basically told us which variable was correlated with each other.

Now I'd like to dig a bit deep into each model

and how we selected our models.

Over to the top left,

you would see that we chose metrics like specificity,

this classification area under the Cove and R-S quare

to choose which model performed the best.

These metrics were chosen for a particular reason

that aligned with our goal.

Our goal was to predict which animals would want to be euthanized.

The cost of our model,

incorrectly predicting a euthanized animal,

as a non- euthanized animal, would mean that animal

would probably die and not be saved.

Hence we wanted to focus on increasing the accuracy

of euthanized animals and reducing the misclassifications.

Hence these particular metrics, were chosen

First we ran the nominal logistic regression model,

which you can see over to the bottom left.

The Log worth immediately gave us which variables

were the most important in predicting euthanasia.

Turns out it was sex of the animal intake condition,

intake type and outcome age .

A lot of these are not surprising,

and it matched with what research shows.

The whole model turned out to be significant as well,

the P- value less than 0.001

Following that, we ran the Bootstrap F orest model,

which was tuned to have a hundred trees and feature selection

criteria value of three Bootstrap.

We used receiving operating characteristic or the AUC curve,

to determine which classification threshold

gave us the best classification results.

We ended up using 0.1 or 10 % as our classification threshold.

Over to the right,

you would see that we ran the Boosted Forest model,

with parameters of 87 layers and a learning rate of 0.179

Over at the bottom,

we used the decision matrix for all three models to calculate

the specificity of each particular model.

Which you give us how accurately the euthanized animals

were being predicted.

We also use misclassification rate and R-S quare

from the overall statistic tab of JMP.

In every metric, we found that our Bootstrap model

outperformed the other models,

and hence we chose that as the winning model to make

predictions on euthanasia.

Next, I would like to go over,

some important results that logistic regression gave us.

With regards to sex,

we found that intact cats and intact dogs,

were way more likely to be euthanized than

neuter spayed animals.

With regards to breed, we found that mixed cat breeds,

and Pit Bull dog mixed breeds, were more likely to be euthanized

than all other breeds.

With regards to age we found that cats that are 4.5

to 6 years, are more likely to be euthanized than younger cats.

Dogs under 1.2 years are the least likely to be euthanized.

This was widely surprising because it's contradictory

to what we found during a data exploration phase.

Similarly, with regard to intake type,

we found that older surrendered animals,

are twice more likely to be euthanized than stray animals.

This is completely, again, contradictory to what we found

in the data exploration phase.

That goes to show that what the power of statistical

analysis and unbeating true facts.

Next, I will be handing it off to Shalika again

to go what recommendations we can make to these animal shelters.

Thank you Anand.

Based on our analysis, we have a few recommendations

that animal shelters could use to lower euthanizations.

We believe that animals taken into the shelter

should be neutered or spayed

This is in accordance with medical research,

which proves that intact animals are more prone to diseases.

Animal shelters could also use our Bootstrap Forest model

to prioritize which animals needs to be saved,

in case a difficult decision needs to be made.

In support of that,

here are some recommendations from Austin's animal shelter.

This particular shelter would need to prioritize cats over dogs

as they are more prone to euthanizations.

With regards to age, cats aged between 4.5-6 years,

and dogs over 1.2 years would require more attention.

Owner surrendered dogs need to be prioritized over stray animals.

Finally, when it comes to breeds,

Pit Bull mix dog breeds and mixed cat breeds,

are more prone to euthanization and would likely require more attention.

That brings us to the end of our presentation.

We hope that animal shelters could use this analysis,

to reduce the need for an animal to be euthanized.

Thank you.



0 Kudos