Subscribe Bookmark
XanGregg

Staff

Joined:

Jun 23, 2011

Visual Data Quality with Named Colors in JMP

In "How many words for red?" (and part 2), blogger Sean Roberts shows some interesting visuals of named colors. He's trying to correlate the number of names of a hue with the different perceptual space of each hue from a psychology angle. I'm always interested when the raw data for an illustration is available so I can play with the data in JMP.


For one graph, the data source is Wikipedia's List of colors page, which purports to list all colors mentioned in articles about color on Wikipedia. Such a list hardly seems like a good foundation for a psychology study, but it seems to serve as a reasonable approximation of English color names.


Wikipedia tables are usually nicely formed HTML tables with proper column headings, and JMP can read them in just fine using the File : Internet Open command. I ran into two easily resolved complications for this page. First, each letter of the alphabet had a separate table of colors, so JMP imported 24 tables (no Q or Z tables). I fixed that with a single Tables : Concatenate operation on all the tables. The second problem was with the presence of the "°" symbol in the hue angle values. Find and Replace removed the degree symbol, and changing the column data type to numeric put everything as it should be.


Besides the color name, the table contains three numeric representations of each color: hexadecimal RGB, decimal RGB and HSV (Hue, Saturation and Value). Trying a few basic plots provided some interesting hints about the quality of the data. Here's the Value and Hue components of each color.


Named colors by hue and value


Notice there are a few stray points. The Value quantity is defined to be in the range 0 to 1, but a few colors obviously exceed 1. And Hue goes from 0 to 360, but some colors have a negative hue. Using visualizations to find data quality problems is a natural fit for JMP because JMP makes it easy to look at lots of visualizations and because JMP links the graphs to the table so you can quickly identify the strays.


Aside: How did I color the markers that way? I ran a small script to set the row states:


For Each Row(

Row State() = Combine States(

Marker State( "filled circle" ),

Color State(

RGB Color( :red / 255, :green / 255, :blue / 255 )

)

)

);



After seeing those obvious data errors, I wrote a more elaborate script to check the consistency of every column against the hex RGB representation of the color. Turns out a few of the decimal RGB values and a lot of the HSV values are inconsistent. Here's a plot showing the original green values from the Wikipedia table against the green values calculated from the hex RGB.




The points not on the diagonal are in disagreement. Straw (labeled) is the only one where it's obvious the original is wrong (unless they meant strawberry!). Otherwise, all I can say is that the different color representations are inconsistent; I can't say which is correct without further research.

2 Comments
Community Member

Visual Data Quality with Named Colors in JMP, Part 2 - JMP Blog wrote:

[...] my previous exploration of colors and names revealed inconsistencies in the Wikipedia color data, I looked around for a more authoritative [...]

Community Member

XKCD Dominant Color Map in JMP - JMP Blog wrote:

[...] wrote about using visualizations in JMP to inspect the quality of a couple of color name data sets (Part 1 & Part 2). This week, xkcd author Randall Munroe has posted the results of his own color name [...]