Kaiser Fung on data analysis & information visualization
Oct 9, 2012 10:17 AM
“Once you can define a problem, it’s a matter of looking for the right techniques and right methodologies to apply to it. And, there are many, many, many methodologies out there. The question is really: Can you pick the right one? And oftentimes that depends on: Did you define the problem in the right way?” -- Kaiser Fung, Analytically Speaking webcast, Sept. 10, 2012
How is data represented graphically in the news media, and what story does it really tell? Is data science different from statistics? Can the communication barriers that exist between statisticians and business analysts really be broken? What role does discovery play in predictive analytics?
Recently, we had the privilege of hearing Kaiser Fung address these questions during the September installment of Analytically Speaking (now available for on-demand viewing in case you missed it). Fung, the Vice President of Business Intelligence and Analytics at Vimeo and author of Numbers Rule Your World (book and blog), also discussed five statistical principles they don’t teach you in statistics class and why his preference for line charts creates lots of controversial posts to his pioneering blog, Junk Charts.
Because Kaiser was such a popular speaker, we were unable to respond live to all of the questions submitted by our webcast audience, so he kindly agreed to answer a few more in this post-event interview.
Question: What are some of the most common worst practices you see in information visualization?
Kaiser: The absolute worst practice is plotting for plotting’s sake. There are many charts out there where I can’t figure out what question the designer wanted to address. The next problem is using bad data. Another issue is favoring pretty things that distort our perception: bubble chart is an example. On Junk Charts, I use a trifecta checkup that tests each chart on those three criteria.
Question: An analyst’s ultimate goal is to provide data support for strategic decision making. What are the key qualifications the analyst should have in order to contribute to business decision making?
Kaiser: At the minimum, the analyst needs to be able to mentally process a lot of numbers, and extract the information out. It’s very easy to get bogged down by details. Big data magnifies this problem. I'd much rather see a high-quality analysis on a subset of data than a poor analysis of a mountain of data.
Next, the analyst must keep the business goal in mind at all times. A simple analysis using boring techniques frequently produces a much larger impact than complicated methodologies that improve the status quo by some immaterial amount, even if it’s statistically significant.
Lastly, there are many intangible qualities that one finds in successful analysts. Curiosity, perseverance, humility, knowing how and when to cut corners, time management, etc., are all useful.
Question: In the Analytically Speaking webcast, you mentioned that you only show the final story to your audiences and that a lot of stuff is “dumped.” What's the typical amount of time spent on data quality check, actual analysis and writing up the final story documentation? I believe I am spending too much time on the first two steps. Do you have any tips to improve the story-telling techniques so all analyses are worth it?
Kaiser: Here's my ideal scenario: You spend a month on the project, you go through hypothesis after hypothesis, you hit many dead ends, but finally you find an answer you can live with. You love this answer because it addressed the question simply, and you are able to throw out most of the other work because it turned out to be irrelevant or have much weaker effects than your final answer. So now you have a simple theory with solid data to back it up. It all fits onto one slide, and you found the one chart to bring it all together. I’d walk into that meeting with the one slide. The manager gets the story, and it solves his/her problem.
Question: How would you define a story?
Kaiser: By "story" I mean something written in English with only a few key numbers. Imagine you have 10 slides. Pull out the 10 headers and put them on a piece of paper. Do you have a story? Typically, it has a conclusion, then it has points that support the conclusion. It’s like writing an argument for a history class; the only difference is you have data as some of your raw materials.
Question: Can you please share your views on the analyst and analytical tools? How can they best complement each other?
Kaiser: I always think the analyst comes before the tools. If the analyst is clear about how to look at the data, he/she can find tools that would facilitate that analysis. That’s why on Junk Charts, I rarely talk about tools. If you really want me to talk about tools, then I’d say tools have to simplify the analyst’s life. Half-developed tools inevitably create extra work for the analyst.
Question: Can you address how you handle messy data with lots of noise? And do you typically start your process by assessing the clean data or raw data, or something outside of the data collection?
Kaiser: This question can be interpreted in two ways. The noise may be a feature of your data, or it is erroneously added to your data. I always spend time figuring out how the data is collected, ideally talking to the person who collected the data. Always ask yourself how big a deal the errors are before spending time correcting them. If your data has high variance, that’s when you need statistical methods.
Question: What recommendations do you have for breaking down perceived barriers between statisticians and executives?
Kaiser: My favorite tip is learn to speak their language. Don’t try to teach [executives] math.