Jump into the car for the road trip of a lifetime! (part 1)
Our World Statistics Day conversations have been a great reminder of how much statistics can inform our lives. Do you have an example of how statistics has made a difference in your life? Share your story with the Community!
Jump into the car for the road trip of a lifetime! (part 1)
Oct 3, 2018 6:43 AM
| Last Modified: Nov 1, 2018 10:01 AM
If you were to ask me where my obsession with road trips began, I would have to say it probably started shortly after my wife and I drove I-40 from Blacksburg, VA, to Albuquerque, NM, to start my first job out of graduate school. Perhaps there was something in seeing the changing landscape and visiting new places I found exciting. Though, to be honest, it probably is more likely due to a compulsion to “finish the route” and drive the rest of I-40 on both ends. I’ll compromise and say it was a bit of both.
Initially, my road trip fetish focused primarily on traveling all the major US Interstates (those that ended in a 0 or a 5), seeing as how I had nearly finished one already. However, as I perused various travel books, I found that the road trip community tended to dismiss the interstates as avenues for commerce and all-around "busyness," instead preferring the quieter, slightly slower pace of America’s other major roadways: the US Highway system. Mapping out several of these highways triggered the excitement of new places and landscapes. My new dream was to travel on every major US Highway, that is, all those with two-digit designations that were still active, the only exception being Hwy 101 in California.
The stuff dreams are made of... (image from Wikipedia: https://commons.wikimedia.org/wiki/File:Map_of_current_US_Routes.svg)
Since starting my position recently as a Research Statistician Tester at JMP, I’ve learned more about the software than I ever thought possible. I was surprised at how quickly I learned JMP Scripting Language (though I’m certainly still learning JSL), which opened even more possibilities for me. But, once I discovered that the drag-and-drop Graph Builder had the ability to plot maps?! Well, you can probably guess where this is going (if the title wasn’t a dead giveaway).
After overcoming my initial excitement, there was a moment of skepticism. Could you actually plot out a road trip with Graph Builder? I knew you could plot regions, but what about lines and roads? Well, I’m happy to say that, after spending perhaps more time than I care to admit, the answer is YES! It ain’t no Google Maps or AAA Trip Tik, but I think it’s definitely worth a look.
Laying Down a Path
So the first question to answer was what kind of road trip was this going to be? Well, my dream was to travel every US Highway, so why not make a trip that did just that! Specifically, the goal would be as follows:
Find a path through the US Highway system that connects every active primary US Highway.
While my inner completionist would balk at the possibility of not traveling every single mile of pavement that made up a particular highway, some compromise had to be made, and this would still serve to be a great challenge in any case. But first, some ground rules:
The goal is to travel on only US Highways as much as is possible. That is, I’d like to go directly from one highway to another without having to take a non-highway road in between where possible.
The highway must be currently active (sorry, Route 66ers) and serve as a primary member of the highway system (two-digit designation, excepting 101)
Each highway would be traveled only once.
Seems straightforward enough, right? The first step was to compile a database of highway intersections, showing which highways connected to which other highways and the locations of these intersections. I spent a great deal of time trying to find such a database ready-made, but no such luck. So I decided to make my own in a data table.
Example of the process I used to get the coordinates for each intersection. A true labor of love.
It’s worth mentioning that often times highways will intersect one another multiple times or even run concurrent with one another. I only recorded the first intersection that occurred when following a highway from north or east to south or west, imbuing my network with an inherent directional bias. In fact, that’s exactly what I was building: a network of US Highway “nodes” based on their intersection “edges.” Once this network was complete, the goal would then be to find a path through the network that connected each node with no node being repeated in the path (if possible).
For those familiar with graph theory, this might sound like I’m working with the famous “traveling salesman problem.” And you would almost be right. The traveling salesman is actually a special case of what is known as a Hamiltonian path, which is a path that connects every single node with no repetitions (not to be confused with an Eulerian path, which is a path that uses every single edge in the network without repetition and is associated with the famous Koenigsburg bridge problem). What makes the traveling salesman a special case is the fact that the edges have weights (say, time or distance), and so there is the added condition that the path must minimize the total weight. Fun fact: The Hamiltonian path gets its name from William Rowan Hamilton, who was working on finding a path connecting all the vertices of a dodecahedron. At one point, he pretended that the vertices were famous cities and he was searching for a “world tour.” I have a feeling we would have gotten along quite well.
Traditional example of a Hamiltonian path on a dodecahedron (image by Christoph Sommer published to Wikipedia: https://commons.wikimedia.org/wiki/File:Hamiltonian_path.svg)
This Hamiltonian path is exactly what I was looking for! Since it was a well-known object, there should be an efficient algorithm to find it, right? Well, not so fast. You see, the problem of finding a Hamiltonian path through an arbitrary graph is what is called an NP-hard problem. This is just a fancy term to say that it’s very easy to verify if a path is Hamiltonian, but quite difficult to find such a path for all but a few exceptions. In other words, there is no general fast algorithm to find a Hamiltonian path for every situation. So, once again, I set out on my own to see if I could develop an algorithm for my case.
For each connecting node, run a single-pass Depth-First Search (i.e., don’t go back up once you run out of connections, even if you missed some) and see which node has the longest path.
Add that connecting node to the path.
Repeat steps 2-3 until there are no more available nodes to add to the path.
The idea behind this algorithm is that adding nodes with the longest potential path should, in theory, get you much closer to an overall path throughout the network by reducing the risk of having a path terminate early.
Filling in the Gaps
Now that I can get an initial path, the next step was to determine how to “graft in” any missing nodes that got left out of the initial run. I came up with three possible methods:
Pick the first missing node, force it to connect to its earliest possible connection in the current path, and restart the search from there – Of course, you don’t have to pick the first missing node (in fact, there may not necessarily be a “first” missing node), but the idea would be the same. Essentially, this approach “grafts” a missing node into the path and then pretends that it reached this point during the algorithm’s run (the restart part). The one risk of this approach is that you may find yourself caught in a neverending loop; because of the limitations of the basic search algorithm, you may never reach a path that incorporates every single node. You could adjust this approach by trying to include more random selections in the path search, but there’s still no guarantee of finding a path any faster.
If there is only one missing node, explore every possible combination of fitting it into the current path and return the one that successfully does so – In essence, this approach simply amounts to an exhaustive search across all possible paths that include the missing node given the current path layout. The one risk of this approach is if the missing node has a lot of connections; the number of possibilities may quickly exceed the computational limits of your machine. That’s why I only used it if the missing node had less than 10 connections. Of course, you could also use this approach if there are more than one node, but each has very few connections.
Use some elbow grease and manually input the missing nodes into the path – This might seem like a cop-out, but it might be necessary and not necessarily all that difficult if the algorithm is having a hard time finding a path. It’s also a good way to incorporate expert judgement into the path; perhaps a more scenic path could be found with a little input from the human than a pure computer-based solution. The perfect example of man and machine working in harmony towards a common goal.
In searching for my Hamiltonian path, I actually ended up using each of these approaches at least once. In the next post, you can see the fruits of my labor.