Michael Berry and Gordon Linoff speak analytically
"Curiosity and creativity are arguably inborn, but intuition for data comes from time spent exploring it," says Michael Berry, Business Intelligence Director at TripAdvisor.
Berry and Gordon Linoff, who together co-founded the consultancy Data Miners Inc. and co-wrote the book Data Mining Techniques for Marketing, Sales, and Customer Relationship Management, were our guests for this month’s webcast installment of Analytically Speaking. If you missed the webcast, it’s now available on demand.
Due to the large volume of questions submitted by our viewing audience during the live webcast, we were unable to address all inquiries. But Berry and Linoff have followed up with answers to some of these questions about analytics.
Question: Is there a specific process you follow for exploratory analysis when you get new data? For example, what graphs do you start with?
Linoff: This is a very general question, and it depends very much on what the data looks like. Sometimes data is already processed into customer signatures, which are suitable for modeling. In this case, univariate statistics are the place to start. If there is a time component to the data, such as when a customer starts, then I would look at the distribution of values by time, typically using scatter charts or bar charts. Why do so many customers start in January? Or, why are non-payment stops higher in May? Raw data can be trickier. To have full confidence in the data, you need to understand the processing to be sure the data is consistent. Scatter plots of values and bar charts against time are key here.
Berry: The first thing I look at is the distribution of each individual variable. I use histograms for that. Next, I am likely to use scatter plots to visualize relationships between variables I expect to be related.
Question: How long would you recommend storing historical data?
Linoff: Storage is cheap, so this becomes less of an issue over time. In some businesses, there is a clear need to store many years of data. In others, such as e-tailing, the need may be less obvious. However, you generally want at least 13 months of data to do year-over-year comparisons. You often want more. If, for instance, you are a top news site on the Web and you want to understand how election news drives reader behavior, you want to know what happened in 2008 and perhaps even 2004.
Berry: There is no set answer, but I'd say "as long as it might be useful.” Some kinds of data should be saved "forever." This includes customer relationship details such as the acquisition channel and original product purchased.
Question: What motivates you to write and teach about data mining?
Linoff: LOL. That's what I do. I'm fundamentally a nerd at heart, and data mining is a great intersection of data, statistics and common sense.
Berry: I truly find the subject fascinating and I like to share my enthusiasm with the world.
Question: What can I do to help shift my corporate culture from “gut” decision making to data-driven decision making?
Linoff: You need buy-in at a high level. A Harvard Business Review issue dedicated to big data is a sign that executives care ─ or should care ─ about such issues. However, data-driven decision making can go against the current corporate culture. Sometimes you have to wait for obvious failures, and be ready to take advantage of the opportunities they present. For instance, in one case, we were able to move ahead with a survival forecasting system only when the previous forecasting system was so off that high-level executives noticed the discrepancy.
Berry: Cultural shifts are very difficult, but some successes will get noticed. Start with something where the benefits are easy to measure and easy to understand, such as increased conversion rate or increased retention.