Furever Homes: Discovering Insights With JMP to Reduce Shelter Dog Returns
Animal shelters strive to permanently place dogs in homes with individuals and families. Yet, one long-term retrospective study estimates the percentage of dogs returned to shelters after adoption is approximately 9 percent. Returning shelter dogs impacts the resources available to care for additional animals. Identifying the root causes of returns to the shelter can inform programming targeted at reducing this phenomenon.
Using a data set for a large shelter system in New York, comprised of 3,465 first-time shelter dogs tracked for six months following intake, we identified potential areas for interventions to reduce return rates of adopted dogs through multiple models. Logistic regression in combination with stepwise variable selection was the model reported in this research project to explore the relationship between the probability of a dog being returned to a large number of covariates.
We found that a dog’s age, tendency for aggressive behavior, breed, length of stay, and shelter geography are related to the probability that a dog will be returned. Additionally, we found that transporting dogs between shelters was not related to return probability. JMP’s easy-to-use, interactive analysis and visualizations helped the client, who had little statistics training, understand and ultimately execute the analysis on her own. These findings will guide the allocation of resources to interventions, such as educational materials or training programs, that may help reduce overall return rates.
Hi. My name is Alicia Arneson. I am a PhD student at Virginia Tech. I work in our Statistical Consulting Group on campus to help researchers gain insights from their data. This project came to me as part of that work, and I was really excited about it because if you notice the adorable girl on the center of the screen, she's my best friend. Her name is Winnie, and I adopted her during COVID from an animal shelter.
When I picked her up, they actually told me that she'd already been returned once. I went home with the expectation that I may have a small terror on my hands. I didn't know exactly why. They didn't have a ton of information for me. But turns out we were a great fit. She has her weirdnesses and hiccups steps, but we get through just fine. She's a Red Heeler mix, so she does all the things hurting dogs do, but it works fine for us.
I found out when this client approached me with this project that this is a really common scenario in shelters. They don't often get a lot of information about why dogs are brought back. People sometimes feel guilty and they give vague answers. All shelters are left with are the things that they can glean that are really obvious.
This client had a huge data set. She works in a large shelter system. She had over 3,400 records of dogs who it was their first time through the shelter. We tried to reduce some bias by not taking through their second time around or anything like that. That's for a later project. But we wanted to use the information we did have to see if we could narrow down some of the things that were most related to the probability a dog would be brought back.
Because if anyone's ever tried to solve a problem within any large organization, you know that can be huge, stressful, and expensive. We really wanted to help these shelters narrow down on those things that might be the most impactful or the most related to return so they could use their resources in the best way as possible to explore further.
We decided to call a return any dog that came back within 6 months of adoption. This is a little different than most shelters do. They usually go with 30 days, but we actually used JMP to make a data-driven cutoff for this. We were able to use JMP's Distribution Explorer. If you go to Analyze and then Distribution, this is what I'm talking about. It gives you a handful of really useful tools right off the bat.
This talk is going to be at least 40% of love letter to JMP Profilers. I'll just go ahead and warn you, so you can do some fitting to these distributions. This one I knew was exponential because we had this time-to-event thing, and then it's shaped just like I expect an exponential distribution to be shaped.
If I hit the red arrow next to the response variable name and hit Continuous Fit, I didn't show it here, but another little dialog box opens up and you can click Fit Exponential. Then you can use the Profilers off of that exponential fit. We started with the quantile Profiler because we wanted to create a cutoff that captured at least 95% of the dogs that were being returned at all. That's pretty common place in statistics. We say that it's outside of this range of 95% normal, then there is some possibility it maybe came from some other process. We thought 95% was reasonable.
The Quantile Profiler lets you change the probability you're after, the cumulative probability, and it tells you what response goes with that. It told us 173 days might be that good cutoff that we wanted. That's awfully close to 6 months. 6 months makes a much easier thing to communicate to shelters. We use the distribution Profiler to see if 6 months was close enough to 95. That that was reasonable.
We put in 182 days, which is what the client wanted to use as 6 months, and it told us that the cumulative probability for that was 95.7%, so pretty darn close. We decided to go with 6 months, and JMP helped us make that choice. Then we took our big data set that we cleaned up and labeled everything appropriately. We knew we wanted to fit some simple model that would get the client the information she wanted without overwhelming her, which I feel like is a strong suit of JMP. It's a great software for this.
We used the Fit Model option under the Analyze tab. One of my favorite things about JMP is the way that if your response variable is labeled appropriately—in our case, I labeled it as a nominal variable—then JMP can pick up on what might be the best model type for it. I didn't have to change anything. It was great.
I put in my Y variable, I put all the things I was interested in into the inputs. Then this top box is where I would have I changed it if I wanted a different model, but JMP already knew that I wanted a logistic regression model, which is amazing. I clicked run. Right off the bat, JMP gives you this beautiful effect summary that makes it really easy to start seeing immediately what things seem important or had a lot of signal.
We decided, because like I said, the goal was to take this big, huge series of variables that we had and try to narrow it down to the most important things. We did backward stepwise regression, which just basically means we wanted to remove the things that didn't have enough signal with respect to everything else.
I started with these big interactions. The interactions, just as some context, the client was really interested in the geography of the shelters and where the dogs came from and went and how that impacted things. We did some higher-order terms there. But we removed those first because they actually ended up not being too important. Then we started tearing down the main effects.
From all of that, once we kept just clicking, you clicked the variable, you click remove, JMP refits the model, you get new P values, you do it again. We did that a bunch of times. Then we got this much smaller model of the things that seemed the most important, which is really, really great. These are the things that were most related or most correlated to the probability that a dog would be returned.
Then once again, we're back to the Profilers and how much I love them. We could then use the Profiler option under the model output so that the client could go in and explore for herself which things were important and how they impacted the probability of return, which is great because sometimes it's hard, especially when you're not just in the plain old linear model space, to explain what the coefficient mean and all of that. For a client to be able to go in and visually see the effect of changing the levels of the variables is just absolutely invaluable. This was a great part of this project, and it made the client feel really empowered.
We also use JMP to do all of our visualization. I love the JMP Graph Builder because it lets me personalize things and I don't have to fight with code. It's just really wonderful. We did some basic things. We found out that 10.5% of the dogs in our data set were returned, which makes sense because we expected about 8 or 9 because that's what the client sees in other studies that have been published. But we pushed our cut-off further than most shelters typically do, and so we expect that there's a few more returns that get populated there.
This graph shows nicely that we found out that the working breeds were the ones that were most likely to be returned. That's things like German Shepherds. If you've ever been on a shelter website, you probably have seen a lot of those, so that makes sense. Then some of our toy breeds are less at risk. That makes sense because a toy poodle is often not a huge burden on a household. I say that, but I've never had a toy poodle, so I could be wrong.
Since I mentioned the client was really interested in these geographies, we found out that where the dogs go seemed to be related to the return probability in some way. We found out that when dogs go to urban homes, they were more likely to be returned than when they went to rural or suburban homes.
I do want to mention that that is definitely not causal, and that all of these variables are proxies, and they need to be dug into a little more. We don't know why urban homes have this higher probability of return just yet, but it makes for a really interesting thing that the client can go forward and do more research on.
Less surprising, we found that when shelter staff had documented that a dog was aggressive in some way—so they do some behavioral checks when the dogs come in, if they showed aggression in some way—then those dogs had a higher probability of return. That was probably one of the more impactful things.
Then the dogs age seems to be related in some way. Most of the dogs that were being returned are comfortably adult, I'll say that. 5–9 years old was where it peaked. I was really surprised by this because I expected puppies to be returned more frequently than adult dogs just because they can be really hard to manage, but that was not true. Actually, old dogs were the least likely to be returned, which is really interesting, too.
All in all, we knew coming in that no single factor causes a dog be returned. Sometimes it's just a bad match between the owner and the dog or the situation. But we did identify some interesting candidates for the shelters to go forth and use their resources in a data-driven way to try to reduce the number of returns that they're seeing in the long run. JMP made this process a breeze.
As I mentioned, this client works in a large shelter system. She is not a data scientist. She was a master's student in a very applied program, and she had a specific question, and we had data that could answer it. JMP made her feel like she was really part of that process. We combined her field knowledge and my statistical knowledge, and she was able to be involved because in JMP, you can just save your analysis right to the data set, for example. I could send her the data set, she could click play, and she could see what I was seeing, and she could use the Profilers, and she could use Graph Builder. It was just all in all a really smooth, wonderful experience with this client. That was great.
I just want to say a few thank yous, first and foremost to the client that collected these data and provided some expert guidance along the way. To my good friend and co-collaborator, Simin, who helped with the analysis and data cleaning and keeping good records about what we were doing. Our mentor, Dr. Jennifer Van Mullekom, as well for her support and guidance on this project and all the other ones that I work on in SAIG, our statistical group. Thank you very much for listening. I hope you come ask me all your questions in person.