Data table tools, part 2: Using a "Selected" column
Dec 15, 2016 6:30 PM
| Last Modified: Mar 3, 2017 12:39 PM
Today I’d like to show one of my favorite tricks for interactive data visualization and analysis: the “Selected” column. This is a formula-based column that contains ones in all of the currently selected rows, and zeros in all of the unselected rows.
The formula couldn’t be simpler:
…but the problem is that I can never remember the syntax of JSL’s row state commands, since I don’t use them often enough. If you are in the same boat, don’t worry: The Table Tools Add-in lets you add this column to any table with a couple of clicks.
We’ll go through a few examples illustrating how useful this column can be.
Using the column like a “By” variable, you can see how the selected rows differ from the rest of the data. Below, a standard histogram report is used to select the cars made in the US; in the Graph Builder scatterplot, the Selected column is used in the Overlay role, so we can see how the US cars (red) differ from the rest (blue).
While this example used the Selection column in a nominal role, it can also be used in a continuous role, as in the next two examples.
In Text Explorer, using the term list or word cloud, you can select a given term, then select all documents (rows) containing it. Using the red triangle menu, you can then color the word cloud using the Selection column’s values, to see which other terms tended to appear in documents containing the first term.
The following word cloud was obtained by using this idea, after selecting documents containing the word “tire”.
What’s nice is that by using the Selection column, you can use the word cloud this way no matter how you’ve selected the rows! Instead of selecting using a given term, you might select rows based on column values, user input, a graph, or some other mechanism—it doesn’t matter! The end result is that you can see which terms are most associated with the selected rows, without writing a complicated query, subsetting tables or moving to another platform.
It is often useful to graph the mean and/or sum of continuous flags, and this is especially true for the Selection column. The upper part of the graph below was made by averaging the Selection column, while the lower was made by summing the Selection column.
From the sum bars, we can see that about 725 of the Ford Explorers in the data set are selected, while the Proportion bars tell us that this represents about 20% of all Explorers in the data set.
I could go on and on—I really love using the Selection column. Try it out and see what it can do for you!