Gordon Linoff has been a self-starter in many ways. To name a few, he, along with Michael Berry, founded Data Miners Inc. in 1998. Gordon was also one of the first experts who SAS looked to in the 1990s to start the ever-popular SAS Business Knowledge Series. He has consulted for a wide range of companies including Bank of America, BT, The Limited, The New York Times, T-Mobile, The Teaching Company and Pfizer, and he has authored some of the most widely respected data mining books in the world.
In a couple of weeks, Gordon will be hosting a special breakfast during our annual users conference, Discovery Summit 2011 in Denver. I caught up with Gordon recently to discuss what cool new projects he's been working on lately.
How would you describe survival mining? Tell me about some of the interesting work that you’ve been doing in this area.
Data mining is the process of analyzing large amounts of data to solve real-world problems. Survival data mining is specifically applied to time-to-event problems. The classic example is trying to understand when customers will leave. The question is not whether or not a customer will eventually stop. The question is how long will the customer remain as a customer.
This simple example of using survival analysis for understanding customer retention has several direct uses. Customer profitability is related to two factors: how profitable is a customer per unit time times how long is a customer expected to stay? The second half of this equation is survival analysis. Another use is forecasting customer numbers in the future. Forecasts built from the bottom-up – using survival analysis – are more powerful than forecasts built from the top-down because you can more easily drill down to find the factors that influence the forecast.
And, customer retention is only one application. Since I started using survival data mining about 10 years ago, I see time-to-event problems everywhere. The question is not whether a customer will reactivate, but how long until the reactivation. The question is not whether or not a customer will make a purchase, but how long until the next purchase. What retailers call purchase velocity can be measured and visualized using survival data mining.
You’ve now authored many books, one of which is now in its third edition. Explain some of the new material that you cover in this edition that hasn’t been included in past. What are some of the key insights you want people to take away from this book?
Although inspired by the earlier two editions, Michael Berry and I rewrote all the chapters, added several new ones and combined some of the earlier ones. The book is still divided into three parts. The first few chapters are an overview of data mining and the data mining process, illustrated with stories based on what we've learned over the years. The bulk of the book is the techniques chapters, with a separate chapter for each major technique. The rest of the book puts data mining into the context of the business, with discussions on data, data warehousing and more case studies.
Perhaps the most interesting new chapter is the chapter on text mining, which includes a wonderful case study about successes that DirecTV has had using text mining. Another important new chapter covers principal components and singular value decomposition. This chapter was quite challenging to write, since our book emphasizes understanding the techniques over matrix algebra. Clustering is such an important family of techniques that we split the original chapter in two, adding much more discussion of techniques, such as expectation maximization clustering, hierarchical clustering and other clustering methods.
Longtime SAS users will recognize you as a crucial instructor for many years with the SAS Business Knowledge Series. What has been your favorite class to teach and why?
All my courses are my favorite. Michael and I actually helped found the Business Knowledge Series more than 10 years ago. SAS has had a long history of providing excellent training for its software. In the late 1990s, they decide to leverage the investment in training by bringing in outside experts to talk about different subjects, and we were the first trainers to offer such a course.
Your consulting works spans many industries (and many countries). What are you currently working on, and what has been most interesting to you about this project?
I work on a wide variety of projects. One of my projects is working with a large, content-providing website to encourage customers to become paying subscribers. This is challenging because the volume of data being gathered is so large.
I recently finished a small project outside the business area. FORGE is a program in New Jersey designed to help women parolees adjust to life outside prison – and to prevent recidivism. This allowed me to apply survival analysis to an interesting domain, where the purpose is to help people and prevent crime.
Some of my work is more on the "data" side more than on the "mining" side. For the past year, I have been working with the Lehman Brothers estate, helping to make sense of the data that describes the derivative transactions involved in the bankruptcy proceedings.