Our World Statistics Day conversations have been a great reminder of how much statistics can inform our lives. Do you have an example of how statistics has made a difference in your life? Share your story with the Community!
While JMP 8 doesn’t have image manipulation support, it does allow the user to create a custom DLL that can be invoked from JSL. So I created a DLL that allowed some functions from ImageMagick, an open source image library, to be called from JSL.
Using this DLL, I tried to count cars in photos from some of the North Carolina Department of Transportation's Webcams. Here's an image from the one near Exit 289 off Interstate 40, in the Raleigh area:
Unfortunately, the more distant parts of the road are hard to see because everything blurs together. So I didn't look at that part. I cropped it out.
To make the later calculations easier, I also converted the images to grayscale, so I only had to work with a single intensity for each pixel, rather than three color channels.
The image then looked like this:
In order to count the cars, I needed to know what parts of the pictures were cars. Since the Webcams update every 3 minutes, I had a lot of other pictures to which to compare each one. So for each picture, I analyzed, I looked at the 10 pictures before it and the 10 pictures after it and averaged the 20 of them together. The result was pretty close to a picture of the road with no cars, as you can see:
This picture told me what parts of the image I was looking at were not cars. By subtracting it from the image I was analyzing, I got a nice picture of what parts were cars:
At this point, it is pretty easy, even from a script, to tell which parts of the picture are cars. The gray regions are cars; the dark regions are not. There were some regions that were dark gray, where it wasn't clear whether they should be considered cars or not. I found that what seemed to work best for them was to consider the areas where the color intensity was less than 60 to be background, and the areas where the intensity was 60 or more to be cars. With this sort of filter applied, the picture looks like this:
I then had a collection of (x, y) coordinates of pixels that composed several cars. Since the pixels of a single car should be closer to each other than to those of other cars, I tried using JMP's cluster analysis tool to divide these points into clusters. Ideally, each cluster would represent one car. Here's that same picture with each cluster given its own color:
As you can see, it seems to have done a pretty good job on this picture. But since JMP's cluster analysis tool needs a number of clusters before it can do the analysis, it isn't the best tool for automated counting. That's why so many individual pixels seem to be their own clusters; JMP is splitting them up into the default number of clusters, 20, when there are only 8 cars in the picture.
Because JMP's cluster analysis tool isn't the best tool for this job, I ended up using a different one. Once you account for the fact that more distant cars look smaller, most cars are of similar sizes. Therefore, back at the black-and-white step, I could count white pixels (making sure to account for the distances of the cars from the camera) to get an approximation of the number of cars in the picture.
Around this time I also decided to switch Webcams. The problem with this one was that it moved around a lot, which prevented the averaging from working. Here are four consecutive pictures from it:
These four are a bit more extreme than most, but you get the idea. After some looking, I found a different Webcam that didn't move around. Here's a picture from it:
Since the size of a car varies with the y-coordinate in a predictable, linear way, it is possible to adjust for it using relatively simple calculations. Unfortunately, doing this requires identifying four points (two on each side of the road), and communicating those four points to a script is rather tricky.
There is an easier way. JMP has a modeling platform. If you count cars in some of the pictures by hand and give that data to JMP, it can create a model that will predict the number of cars in the other pictures, given the location and distribution of pixels in them.
I divided each image into 5-pixel-tall horizontal stripes, as is approximately shown in the (cropped) copy below of the image above.
After subtracting the average picture from it, I counted the number of pixels with intensity higher than 60 in each stripe in each pixel. I could have fit a model right then; but since I was trying to explain the number of cars based on the number of pixels in each stripe, there were roughly 30 explanatory variables -- one for each stripe. I would have needed a lot of images to create the model, which wouldn't have left many to use it on.
To fix this, I extracted principal components. Principal components are linear combinations of the original 30 explanatory variables computed in such a way that the first principal component explains the most variation in the data and the 30th the least variation. It turns out that the top 11 principal components explained more than 95% of the variation. So by modeling based on the principal components instead of the raw data, I reduced the number of explanatory variables to roughly a third of what it was, while still retaining 95% of the information.
I hand-counted cars in a randomly chosen selection of 120 pictures (I had more than 600, so this was only a small portion of them) and fed those numbers, along with the pixel counts, to JMP's Fit Model platform. JMP came up with a model that I used to create a table of calculated number of cars by time. I then graphed that data for Thursday afternoon and Friday. With a spline fit to it, the graph looks like this:
This graph does indeed show the traffic patterns I expected. There are significant rises in traffic around morning and evening rush hours as well as lunchtime.