What do you do when you have only an image of a graph, but you need the data behind the graph? It can be a painstaking process if you decide to do it by hand, one point at a time. However, you can pull data from an image into a data table using JMP, and it only takes a few steps.
The first step is to get the entire image as information into a JMP table. While this could be done via scripting, it can also be done using the excellent Image to Table Add-In. I'll use an image of the population in Reykjavik as an example, but any image of a line graph should work.
After downloading and installing the add-in, selecting Add-Ins > Image Analyzer allows me to select the image and import it as a data table. The data table includes the RGB data (as well as some other information) for each pixel in the image.
Once the table is open, using the Graph Builder to recreate a black and white version of the image is simple enough. In the image below, I've used the I column (Intensity) as a color column, and I reversed the Y scale to make it match the way images are saved.
Now I can see an immediate problem: the scale on the graph doesn't match the scale in JMP. This is really easy to fix; it involves using some built-in features in JMP, along with an intermediate data table.
The next step is to change the scale X and Y values in the image data table to match what is shown in the graph. Essentially, there's a need to apply an offset and a scaling factor to each column. Fortunately, this is just fitting a line, which is a basic function of the Fit Y by X platform. In the picture below, I've fitted a line to the relationship between the current values and the desired values for X and Y.
The key here is to look at the formula for the Linear Fit. It gives both an offset and a scale, in the form of a y = mx + b equation, where m is the scale and b is the offset. These parameters can be used with the New Formula Column tool in JMP to make quick work of fixing the axes.
Now, new columns for X and Y, with axes and scales that match the image, are in the data table. Changing the axes to these new columns in the Graph Builder shows how these new axes align with the axes in the image.
The only step remaining is to eliminate all the rows in the image data table that aren't data points on the line of interest. To do this, K Means Clustering is a fast and nearly automatic tool. (Special thanks to @landonbw1 for showing me this trick.)
The original image appears to have four colors: white, blue, gray, and black. It should be a simple thing, then, to ask JMP to separate the R, G, and B columns into four clusters, and the blue line should be one of those clusters. Due to the way image compression, color blending, and other factors work in image files, there are actually over 500 separate colors in this image. Still, we can separate the main colors easily enough. Generally, setting the number of clusters to twice the number of visible colors (colors that a regular human would say are in an image) creates clusters of individual graphic elements from an image file.
There's a certain irony in the fact that it takes longer to read how to extract data from an image into a data table than it takes to actually do the process. Irony aside, however, there are also many opportunities to use this process, including:
What opportunities does this open up for your work?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.