Our World Statistics Day conversations have been a great reminder of how much statistics can inform our lives. Do you have an example of how statistics has made a difference in your life? Share your story with the Community!
Jul 26, 2019 11:08 AM
| Last Modified: Jul 29, 2019 5:19 AM
At JSM 2019, I’m participating in a session on Dynamic Interactive Data Visualization. When I was first invited by the session chair, Blanton Godfrey, to contribute a talk, my first thought was, “Sure, JMP has a lot of dynamic and interactive data visualizations.” But a second thought quickly followed, “What does that really mean? Everything is interactive these days.”
That led me down the path of trying to categorize different methods of interactivity. The lofty goal is to help define a classification of data visualization interactivity that would be useful in comparing products or discussing individual implementations. In this initial effort, I’ve classified nine techniques, along with example videos of the interactions within JMP.
I’m using the same data set throughout my example videos. It contains 60+ columns of demographic, economic and election data for 3,000+ US counties. Now on to the methods!
Interactive graph building means constructing or restructuring a graph with immediate view updates. In JMP, the interactions are directly on the surface. Other implementations can still be interactive with drop zones off to the side the graph.
The video highlights a couple challenges. When the second Y variable, “Rep Pct,” was added, there were more Y drop zones than before, so we could specify where to put it with respect to the existing Y variable. And the Color and Size drop zones are artificial in that they don’t have the same natural position as the X and Y axis variables do.
This ending chart is much like one featured in a New York Times article titled, “The Most Conservative Counties Are the Ones That Get the Most Government Assistance.” There was some discussion on Twitter about the regression line and other features. That led me to collect and explore the data myself (Bureau of Economic Analysis and Tony McGovern’s election results). When I realized how many interactivity methods I was exercising during the course of my exploration, I thought this would make a suitable demonstration data set.
When there are multiple views using the same data table, selection linking means that elements that are highlighted in one view have their corresponding elements highlighted in the other views. In JMP, selection linking is on all the time, which means it has to be fast even with millions of rows. To achieve that, selection is baked in as a core part of every data table.
The biggest challenge is when graph elements don’t have a one-to-one correspondence with each other, such as the county-level dots and the state-level shapes in the example. Showing partial selection in a clear way for any graph type is an open challenge. For the map, JMP highlights an entire state shape when any of its counties are selected elsewhere. For graph elements that are sized by count, such as histogram bars, JMP will use partial highlighting to reflect partial selection.
Row filtering is one of the most common types of data visualization interaction. The idea is that the visualization only uses a subset of the data table’s rows, based on the constraints of the filters which appear next to the visual.
There is an option called "Lock Scales" that I used in the video to keep the axes fixed while filtering. Otherwise, they would be adjusted for each filtered subset.
A passive kind of interaction is to hover over a graphic element and have a floating window appear with more details. Sometimes these are called “tooltips” since they were originally used for toolbar icons. In a data visualization, hover detailing is a way to show both more precise values for the charted variables as well as additional related variables.
In the example, notice that county name was included in the hover details for the scatterplot even though that variable is not in the chart. That’s because the variable in the data table is marked with an attribute that causes it to appear in all hover details by default.
Model tuning means that the visual representation changes as a model parameter is changed. In the example, the model is a spline smoother, and the model parameter is the stiffness parameter, lambda.
Though many models are too slow for such interactive tuning, computers are fast enough to do more than we often realize. This simple-looking model has a lot going on. There are four separate smoothers, and each one has a 250x bootstrap confidence interval, so there are 1,000 spline models being fit each time the stiffness parameter changes.
Mapping aesthetic attributes to graph elements is a simple but important interaction. With aesthetics, such as colors and line styles, you often need to see it to know how it well it works.
Column switching can be a good way to take a quick pass through a new data set.
Volume slicing is a way of exploring a multi-dimensional data space by graphing a set of two-dimensional slices. The Profiler in JMP uses slicing to explore multi-dimensional models. In this example, we’re modeling Rep Pct, on the Y, against three factors, Personal Income, Personal Transfer and Ballot Rate, which results in a four-dimensional surface. Each frame shows a slice across one X while the other Xs are held fixed.
Personal Transfer seems to have little effect (it’s mostly a flat line in this slice, at least), but as we interact with it and change its fixed value used by the other frames, we see an interaction with Personal Income. Personal Income has a loose positive slope for high Personal Transfer and a strong negative slope for low Personal Transfer values.
A scale defines how data values are mapped to screen coordinates and is itself visualized as an axis. The axis scaling interaction is the ability to change that mapping by directly manipulating the axis. The interactions include panning, stretching and zooming.
Axis scaling is not always just a matter of redrawing elements at different locations. Notice that the dot plot on the right has to adjust its marker dodging layout for the new scales, and the axes themselves sometimes have to recompute things like tick mark intervals. In the case or geographic visualizations, it’s even useful to switch projections depending on the scale.
For JMP, the regional scale uses an Albers equal area projection, and the world scale uses a Kavrayskiy VII compromise projection.
Given these nine data visualization interaction methods, we can better consider further questions:
What other general interaction methods are useful in data visualization? I’m saying “general” methods since I imagine there is no limit to the number of specialty interactions geared toward a particular kind of graph. For instance, JMP has a way to interactively change histograms bin sizes that I didn't include here as a general method.
Should some of these methods be split into multiple methods? Maybe volume slicing is different for data exploration versus model exploration. I’m considering interactive graph resizing as a kind of axis scaling, but maybe it’s worth having its own category.
Should some of these methods be combined as varieties of the same method? Maybe row filtering and column switching are two varieties of a larger subsetting interaction.
Are there better names for these methods, either pre-existing or new? Some of these names come from JMP terminology, and some are my own invention for this discussion.
Should different levels of each methods be recognized? For instance, should we distinguish between graph building with or without a live preview.
I hope to hear feedback on these questions and others, either here, at JSM or on Twitter.