The Engineering Mailbag
Episode 5: I. DO. NOT. LIKE. SPIDERS!!! (And spider plots can take a hike, too.)
I can't ignore spider plots forever, so here we go...
Every now and again, we systems engineers run into interesting questions that would fall somewhat outside the typical range of JMP usage. These applications are generally clever, often bringing home how using data isn’t just for business or technical problems. Other times, the questions are just unexpected, challenging problems. These “curve balls” (as I like to call them) come in many different forms: coding problems, interesting analyses, ways of visualizing data…you get the idea.
Occasionally, I’ll get questions about problems that I really don’t want to answer. Today's query is in that little pile, and I have avoided this one for years! The subject of this entry in The Mail Bag is generally regarded as a scary, unpleasant plot by the data viz community, so I guess it’s appropriate for a Halloween post.
Since this type of plot shows up regularly for cosmetics and consumer analytics customers in my day-to-day work, I can't ignore it forever. So, I’m taking this as an opportunity to help out a number of users groups in New York and New Jersey. But I have to say…I really do not like spider plots.
The question
Like I said, I’ve received this question more often than I care to mention. So, I’m going to skip the actual question email, you'll just have to trust me that there have been several.
There were also some in-person requests.
And a few posts on the JMP Community.
Yeah.
Below is an example from a peer-reviewed paper that someone gave me as a reference for what they would like to see:
(J. Inst. Brew. 2012; 118: 325–333) Author's note to any data visualization specialist that may be reading this: Please forgive me. I know this is an awful graph.
The response
My general response for these requests historically falls along the lines of what most data visualization experts have said, which is to discourage people from using this plot if at all possible. However, recently, I started thinking about this persistent question a little differently. I’ve since taken a different tack on the problem.
My current philosophy about spider plots is that I’m open to helping people see how to make one if I can also show them better ways to present the same information along the way. As was the case with the other visualization question I answered, for me, the process of making this chart was much more instructive than the actual result.
I. HATE. SPIDERS.
OK, I have (or rather, had) a strong antipathy for spiders for many years. I wouldn’t quite call it a full-blown case of arachnophobia, but for many years I would dispatch anything with eight legs (and occasionally six; you can never be too sure!) with extreme prejudice. I gave no mercy. I gave no quarter. I. Took. No. Prisoners. Anyway, I eventually got over it. And, with the exception of ticks (vampire spiders!), I have made peace with my eight-legged friends. We generally just stay out of each other’s way now. I’d like to think that part of my aversion to spider plots came from my issue with spiders, but it could also be that they’re a royal pain and I instinctively knew that making one would be a bit scary. That’s not to say there aren’t some interesting coding problems involved in constructing them, particularly if you want them to have the interactivity users expect from JMP (just that I knew the code itself was going to be a bit intimidating). Anyway, let’s have a look at the problem.
Scoping the problem
Practicing what I preach, I started with a problem statement:
I want to make a JMP add-in that displays a spider plot and other possible ways of visualizing a data set in a framework that I can add on to later.
The next step is to create a punch list of the different things that I need to do:
- Get the dimensions for graphing from the user.
- Get a label column from the user for each data subset.
- Construct an interactive spider chart for the provided dimensions by each subset.
- Construct an interactive radial chart for the provided dimensions by each subset.
- Construct an interactive parallel chart for the provided dimensions by each subset.
- Construct an interactive table of pairwise correlations for the provided dimensions.
Scoping the problem revealed some good news, specifically, I can offload a lot of this stuff to JMP if I use Application Builder, which will also make it really easy to add new visualizations later if needed. The challenging bits were mostly around the first two charts and making them interactive. JMP’s graphs work in Cartesian coordinate systems (x and y coordinates). Those graphs are, for all intents and purposes, in a polar coordinate system (r and φ coordinates). As a result, I am going to need some code to convert back and forth between Cartesian and polar coordinate systems. Let’s start there.
Coordinate transform functions
Here’s the function that I used for working between the two coordinate systems:
polarToRect = Function( {r, t},
Eval List( {r Cosine( t ), r Sine( t )} )
);
It’s fairly simple. Just what you’d find from a geometry textbook. It is a pretty nice little bit of code, in that it takes radius (r) and angle (t) and returns a coordinate pair (x,y). BTW, this function is also in an application that I reference later on that’s included in the JMP sample applications.
Because I’m working in Application Builder, just about everything else I’m going to do will be either a function or an expression. Everything about this project is also spectacularly repetitive, so it will make the iterative parts of my code easier to read if I’m just calling functions.
The easy bits
If you have a look at the final app source, you can see that I’m leveraging some existing parts of JMP: Graph Builder for a parallel plot, and Multivariate for the pairwise correlations. They were created, almost exclusively, using JMP generated code or capabilities in the Application Builder. So, we’ll get them out of the way first.
Because of the way that Graph Builder treats columns in this case, and the fact that I have to be able to handle an arbitrary number of variables, it’s easier for me to build the Graph Builder script as an expression and then append it to the main display tree in the correct place. Here’s the code for that:
drawParallel = Function( {pCol},
{default Local},
// build the list of variables with the format "X( :col1), X( :col2)" as a string
// also building a list of plot elements
colStr = "";
gElemStr = "";
For( i = 1, i <= N Items( pCol ), i++,
If( i == 1,
colStr = colStr || "X(" || Char( pCol[i] ) || ")";
gElemStr = gElemStr || "X(" || Char( i ) || ")";
,
colStr = colStr || ", X(" || Char( pCol[i] ) || ", Position( 1))";
gElemStr = gElemStr || ", X(" || Char( i ) || ")";
)
);
// Get the grouping variable as a string
xVarStr = Char( gCol );
// Insert the strings into the main string
Eval(
Parse(
Eval Insert(
"parallelVLB << Append(Graph Builder(
Size( 766, 256 ),
Show Control Panel( 0 ),
Show Legend( 0 ),
Variables( ^colStr^, Color( ^xVarStr^ ) ),
Elements( Parallel( ^gElemStr^, Legend( 3 ) ) ),
SendToReport(
Dispatch(
{},
\!"Graph Builder\!",
OutlineBox,
{Set Title( \!"Parallel Plot\!" ), Image Export Display( Normal )}
),
Dispatch( {}, \!"X title\!", TextEditBox, {Set Text( \!"Variables\!" )} ),
Dispatch( {}, \!"graph title\!", TextEditBox, {Set Text( \!"\!" )} )
)
))"
)
)
);
);
Notice that all I’m doing is formatting the variables as strings and plugging them into the right spots in the code and then running it. The end result is a fully interactive parallel plot in my app with very little coding on my part. I tried to script up a parallel plot manually; it worked, but just barely. I don’t recommend it. Just use Graph Builder and Append(). I’m also using the Eval Insert() construct. It’s a little easier to read than constructing the string using concatenations.
The multivariate plot is just a simple Parameterization of the Multivariate Platform in JMP. I didn’t even have to write code to get this bit! There’s an Advanced Mastering JMP webinar that shows how to do that with just mouse clicks. Again, I’m just doing these bits this way for convenience and because I really didn’t want to spend a huge amount of time on this part of the app. It’s not being lazy – it’s being an efficient coder. JMP gives you the code, so you might as well save yourself some time.
The radial plot
The radial plot is actually a port (with some modifications) of an example app that’s included in JMP’s samples directory. The big modification is that I’m using Marker Seg() instead of Marker(). And, that’s actually an important point. It’s possible to make Marker() interactive using some mouse capture commands and some logic, but Marker Seg() handles all this automatically. You just have to tell it which data table you need it to monitor.
drawFlies = Function( {nP, pCol, obj},
{default Local},
// Create a list of angles for the spider plot
th = ((1 :: nP) - 1);
th = Shape( th, nP, 1 );
th = 2 * Pi() * th / nP;
//Get the data from the table as a matrix.
sel_dat = J( N Row( DataTable1 ), N Items( pCol ), 0 );
For( i = 1, i <= N Items( pCol ), i++,
sel_dat[0, i] = pCol[i] << get as matrix
);
// now get a matrix with two columns and nvars rows to keep track of the min and max of each variable
min_max = J( nP, 2, 0 ); // column1: minimum, column2: maximum. row for each variable chosen
For( i = 1, i <= nP, i++,
min_max[i, 1] = Min( sel_dat[0, i] );
min_max[i, 2] = Max( sel_dat[0, i] );
);
// Create some empty matrices for the converted values
xMat = [];
yMat = [];
// Convert the data matrix
For( i = 1, i <= nR, i++,
current = sel_dat[i, 0];
// for each variable, subtract the minimum and then divide by the range.
adj = J( N Row( current ), 1, 0.0 ) + 1.0 * ((current - min_max[0, 1]`) (min_max[0, 2] - min_max[0, 1])`);
adj = adj`;
total = Sum( adj );
{x, y} = polarToRect( adj, th );
x = Sum( x ) / total;
y = Sum( y ) / total;
xMat = xMat |/ x;
yMat = yMat |/ y;
);
// Draw markers
annotateRad = Expr(
// Matrices and book-keeping variables
xCoord = xxx;
yCoord = yyy;
obj[FrameBox( 1 )] << Append Seg( Marker Seg( xxx, yyy, Row States( DataTable1 ) ) );
);
// Substitute values into the annotation expression
Substitute Into( annotateRad, Expr( xxx ), xMat, Expr( yyy ), yMat );
annotateRad;
);
As you can see, A LOT of the code is dealing with the radial transforms, etc. The interesting bit for this discussion doesn’t happen until you’re almost at the end of the function. (Hint: Look for the Draw Markers comment.) I’m appending a Marker Seg() to the Graph Elements stack. If you right-click on the radial plot in the finished report and select Customize, you can see it in there:
The reason that Marker Seg() is so powerful is that it has that Row States() reference in it. Remember, everything in JMP is linked through the data table via row states. So, by telling JMP which data table the report needs to communicate with, I can sync up all the charts in the report and have all the interactivity I’m used to with JMP. And it's just by going from Markers to Marker Segments (which blew my mind when I got it working).
The spider plot
OK, we’ve got nothing else that I can talk about except that dang spider plot. So, let’s get into it.
Since a spider plot is basically a parallel plot wrapped around central axis, we’re more or less going to have to make a parallel plot – with all the points projected onto the polar coordinate system. And, if that sounds painful, you’re right. It is. It’s also really, really hard for your brain to process radial information. So, that one change – from Cartesian to polar – makes spider charts significantly harder to read. Anyway, I’ll save the discourse on the merits of this graph for another time. On to the code!
The good news is that since I’m really only dealing with one dimension per variable, I have full control over the angular part of the coordinate system. So, there’s some transformation work there, but it’s not that bad. It gets messy when you need to consider that the user might want a wider scale displayed than the data actually covers, e.g., the values are between 3 and 5 but the possible scale is 1 to 7. I was able to handle that by looking for an axis column property and then getting the max and min values from there. Since that’s not a critical point in the narrative, you can see how that was done in the source code I’ve included with the add-in.
The radial spokes and reference lines are just lines drawn using the Line() command. The hardest part of this whole thing was making the webs for each row interactive and getting the legend to work. I made the lines interactive by employing a Line Seg() and a Marker Seg(). The legend was a repurposing of a really old piece of JMP I ran into recently called a Row Legend. Let’s look at each of them.
Here’s the code for creating the lines on the spider chart. As with the radial chart, it takes a lot of data manipulation just to get it into a format that makes sense to graph:
drawTrails = Function( {nP, pCol, obj},
{default Local},
// Create a list of angles for the spider plot
th = ((1 :: nP) - 1);
th = Shape( th, nP, 1 );
th = 2 * Pi() * th / nP;
//Get the data from the table as a matrix.
sel_dat = J( N Row( DataTable1 ), N Items( pCol ), 0 );
For( i = 1, i <= N Items( pCol ), i++,
sel_dat[0, i] = pCol[i] << get as matrix
);
// now get a matrix with two columns and nvars rows to keep track of the min and max of each variable
min_max = J( nP, 2, 0 ); // column1: minimum, column2: maximum. row for each variable chosen
// Check if the user wants to use axis column property values for max and min.
If( axisProp == 0,
// Directly calculate the values from the data table.
For( i = 1, i <= nP, i++,
min_max[i, 1] = Min( sel_dat[0, i] );
min_max[i, 2] = Max( sel_dat[0, i] );
)
,
// If the user wants to use axis column property values
For( i = 1, i <= nP, i++,
// Pull the column name and check if the axis property is defined
col = pCol[i];
axisFlag = Contains( col << Get Properties List, Expr( Axis ) );
//If the axis property is present pull the values from the property
If( axisFlag == 1,
axisPresent = col << Get Property( "Axis" );
min_max[i, 1] = Eval( Extract Expr( axisPresent, Min( Wild() ) ) );
min_max[i, 2] = Eval( Extract Expr( axisPresent, Max( Wild() ) ) );
,
// if the property is not present, calculate the values directly from the data
min_max[i, 1] = Min( sel_dat[0, i] );
min_max[i, 2] = Max( sel_dat[0, i] );
);
)
);
// Create some empty matices for the converted values
xMat = [];
yMat = [];
// Convert the data matrix
For( i = 1, i <= nR, i++,
current = sel_dat[i, 0];
// for each variable, subtract the minimum and then divide by the range.
adj = J( N Row( current ), 1, 0.0 ) + 1.0 * ((current - min_max[0, 1]`) (min_max[0, 2] - min_max[0, 1])`);
adj = adj`;
// an error check form missing values in the vector
For( k = 1, k <= N Rows( adj ), k++,
If( Is Missing( adj[k] ),
adj[k] = 0
)
);
{x, y} = polarToRect( adj, th );
xMat = xMat |/ x;
yMat = yMat |/ y;
);
// Draw Lines and Markers for spider plot
annotateSpider = Expr(
// coordinate matrices
xMat = xxx;
yMat = yyy;
nParam = nnn;
nPoints = N Rows( xMat );
nGroups = nPoints / nParam;
// Reshape the matrices to make them easier to work with
xSMat = Shape( xMat, nGroups );
ySMat = Shape( yMat, nGroups );
// Create an empty string to hold a list (as a string)
pathList = "";
// Draw the lines as polygons using the reshaped matrix
For( i = 1, i <= nGroups, i++,
// Get the coordinates for a given path and add the first value to the end to close the path
xCoord = xSMat[i, 0];
xCoord = xCoord || xCoord[1];
yCoord = ySMat[i, 0];
yCoord = yCoord || yCoord[1];
//
obj[FrameBox( 1 )] << Append Seg( Line Seg( xCoord, yCoord, Row States( DataTable1, {i} ) ) );
);
// Draw the markers
For( i = 1, i <= nP, i++,
// Get the coordinates for a given path
xCoord = xSMat[0, i];
yCoord = ySMat[0, i];
// Run Markers Seg (need to move and reformat the matrix)
obj[FrameBox( 1 )] << Append Seg( Marker Seg( xCoord, yCoord, Row States( DataTable1 ) ) );
);
);
// Substitute values into the annotation expression
Substitute Into( annotateSpider, Expr( xxx ), xMat, Expr( yyy ), yMat, Expr( nnn ), nP );
// Run the annotation expression
annotateSpider;
);
The important bits are the Line and Marker segments toward the end. I had to draw each connecting line individually. There is a Path Seg() that would have made this really easy, but it creates a filled polygon if you try to color it. So, Lines and Markers it was! By using Graphic Segments, it's really easy to hook back to the data table through the Row States operator. Now, because all the graphs are looking at the row states from the main data table, it becomes possible to color them all simultaneously by assigning colors to each Row State in the data table.
When you right-click on some of the graphs in JMP, there will be the Row Legend option. It’s been in JMP for a while and does two things. First, it colors the rows in the data table by column (like the Red Triangle Menu option in the data table does). Second, it creates a little legend next to the visualization. That’s all I had to do to get the colors in sync with one another across four graphs. Here’s the code:
drawLegend = Function( {gVar, obj},
{default Local},
// extract the column name with the grouping variable
colName = gVar[1];
// pass the variables into the expression
annotateLegend = Expr(
obj[FrameBox( 1 )] << Row Legend( ggg, Color( 1 ), Continuous Scale( 0 ) )
);
// substitute into the expression
Substitute Into( annotateLegend, Expr( ggg ), colName );
// run the expression
annotateLegend;
);
Since all my graphs are looking at the same data table for row states, they all automatically inherit the coloring the Row Legend assigns! Super slick.
Wrapping things up
And that’s it. For scripters, the main thing I’d like you to get out of this is the power of the Graphics Segments. A lot of the bits that make JMP so special are wrapped up in what those functions can do. For everyone else, there’s now a spider plot add-in. I’m pretty proud of how it turned out. But, please don’t use it. There are much better ways of visualizing that type of data.
Author's note
No spiders were harmed in the writing of this blog, although some may have taken umbrage at being lumped in with insects (six-legs) or ticks (also arachnids, but they’re vampire spiders, so I’d like to think that even spiders hate them).