On Saturday, the Kentucky Derby will be run for the 143rd time. That got me interested in looking at what characteristics are shared by Derby winners of the past and see if any of this year’s horses are similar.
I used the Internet Open feature in JMP to open a list of past winners. The list of winners I ended up using for my analysis dated back to 1968 (because other data was also available for horses from then onward).
For each horse, I looked up at the odds of winning, the previous race run, previous result, and win streak entering the Derby. Then I looked up the same information about this year’s participants and concatenated the data.
I used hierarchical clustering to group the horses in the dendrogram shown below. The dendrogram is colored by cluster, and I saved those clusters to the data table for future analysis. The highlighted horses are competing in the 2017 Derby, while the others are past winners.
Some of the past winning horses had not run a race prior to the Derby. I excluded those from the analysis. Why? Because all of the entrants in this year’s race have previously raced – so those past winners without previous racing history would not provide much insight into this year’s race. The clusters give us a nice way to break up the horses. You can see a graphical summary of the clustering below. To get the graph to display the same colors as the dendrogram above, I used a column property value of colors in the cluster column.
Here’s what I see in this analysis:
- Cluster 1 contains the clear favorites, with all horses winning their previous race and average odds of 2.9:1. The only horse from this year’s field in cluster 1 is Always Dreaming.
- Cluster 2 is interesting, with a mix of Derby winners and this year’s participants; most of the horses in this cluster are coming in to the Derby having won their previous race, but these horses have longer odds.
- Cluster 3 is made up of horses with pretty good odds of winning; most of these horses have had a second- or third-place finish in the previous race.
- Cluster 4 consists of the long-shot winners mixed with many horses in this year’s field.
- Cluster 5 and Cluster 6 are made up of three horses that did not fit with any others.
We can also view the clusters in a constellation plot, which is a fun way of visualizing the similarity of the horses. It gives you a graphical representation of which horses are most closely related to each other.
The parallel plot below is a nice way to look at this data as well. I have highlighted Always Dreaming and several past winners with similar paths to the Derby. As you can see, Barbaro (2006), Big Brown (2008), Orb (2013) and Nyquist (2016) all competed in and won the Florida Derby prior to competing in the Kentucky Derby. All of those horses arrived at Derby with a winning streak of at least two in a row and had 6-1 odds or better.
Hopefully, this blog post gives you some insights into the 143rd Kentucky Derby. Here are a couple of things to keep in mind:
- Always Dreaming is a favorite and more similar to past winners than to any other horse in this year’s field. Does that mean he will be the winner? We will have to see on Saturday.
- Whether you like the favorite or the 80-1 long shot, don’t forget the words of cartoonist Nate Collier: "No horse can go as fast as the money you bet on him."
One more thing: Did you know you can try JMP for free? Get some data and see what you can find our and what cool visualizations you can make.