As someone who studied economics, I came to appreciate the amount of information that can be nicely distilled into a well-chosen graph — relationships with price and quantity, changes in supply and demand over time, efficient frontiers, and more. With so much data and so much potential value to be gained from data, it is good to see that we are poised to take greater advantage of our amazing visual bandwidth through more graphic encoding of data. The trend of greater “visual literacy” or graphicacy is apparent.
Data visualization expert, Stephen Few, recently wrote an article, Criteria for Evaluating Visual EDA Tools in his quarterly Visual Business Intelligence Newsletter. The first sentence in the article is important: “We visualize data for various purposes.” To expand on that, let’s consider when data are data and when data become information.
Many consider data visualization and information visualization to be synonymous. However, in the context of a data-information continuum, we begin to see that there are important differences. Data only become information when the data serve to inform. If you are the analyst exploring data, seeking to produce information, you are the primary audience while doing Exploratory Data Analysis — you and the data, learning, on your path to produce more information. You are simultaneously the information producer and consumer. The data are informing your next steps as you seek to learn more.
Once you have obtained results and produced information, you typically need to disseminate your findings to others (information consumers). Regardless of the audience looking at the data and/or resultant information, the most efficient way to process what you are seeing is through graphical representation of data. We just can’t process text and tabular output as fast as we can graphs and data movies. Visualization plays an important role in data discovery as well as information dissemination.
That said, what capabilities are needed to explore data and produce information? The economist’s favorite answer is (of course) it depends! If you are asking relatively simple questions of your data, simple capabilities suffice. Even spreadsheets can do the trick (though it pains me to write it since they are overused to do things for which they were never intended!). If, however, you are asking more complex questions of your data — if the data need to be more rigorously explored to reveal truths — you need a richer set of data discovery capabilities.
Data visualization implies a greater degree of data analysis than information visualization. An analyst may need to look at things like scatter plots, parallel plots, dendrograms and surface plots to determine relationships, patterns, trends and anomalies to inform her next step. When she is ready to share results and depending on the audience, the analyst may show some of these same graphs to convey interesting findings in the data. She may also show completely different graphs — some that weren’t part of her discovery process – but she selects the graphs that best make the points and convey the stories in a way her audience will readily comprehend. Such presentation graphs may be in the form of bar charts, bubble plots or specific spatial graphs. I find it awkward to just talk or write about data and information visualization without showing some visualization, so a few examples are included.
Using some publicly available breast cancer data, you can explore the data with various plots like the first two below. If you click on the malignant or benign bars in the diagnosis histogram, you can see the distributions of each of those diagnoses highlighted in other attributes. Even before you’ve begun to build a predictive model, you can see some nice separation in both of the images below — you can see how the attributes relate to the outcome you are trying to predict.
Say you’ve built a model that captures some of the complexity in how the variables interact and how they relate to the target variable, diagnosis, using visualization methods like the Profiler, you can see how changing one input variable (clicking and dragging the red vertical line), affects the outcome as well as other inputs.
Another important difference between visual data exploration and presentation graphics is that data visualizations are usually very quickly created — often in sub-seconds — to inform the analyst’s next step and so that the analyst can stay “in flow.” Depending on the audience and the degree of customization required, presentation graphics can take a great deal longer to create — often hours.
So, digging a little deeper into what’s behind the rising levels of graphicacy, it’s important to note the different skills that can contribute to making more sense of all this big data. Other evidence that graphicacy is on the rise: numerous LinkedIn groups on visualization have formed. And this popular two-day Data Visualization course at Harvard, which I had the pleasure of attending last year, apparently had a lengthy wait list when it was offered this past June. Visualization is increasingly on the radar of IT industry analysts as evidenced by the recent report from Forrester on advanced data visualization platforms. This past April, there was an interesting panel at the New York Public Library with a number of noteworthy thought leaders reflecting the multi-disciplinary interest in data and information visualization: What Makes a Good Data Visualization?
Whether you need to explore data, present and/or consume information, whether the questions you ask of your data are simple or complex, you can more efficiently and effectively produce and consume information by taking advantage of the growing number of visual paradigms and capabilities. Most of us are visual learners, and more than half of our brains are dedicated to supporting seeing. With growing varieties and volumes of data — and growing varieties and volumes of problems and opportunities — it is timely to see signs that we are taking greater advantage of our visual bandwidth to make faster sense of data.