Subscribe Bookmark


Jun 23, 2011

Report from UK: Building better statistical models, faster and easier

Almost 40 leading analysts from financial services, marketing, R&D and the public sector gathered together in SAS' Manchester, UK, office to find out how you can build better statistical models, faster and easier. Sam Gardner, the global technical lead for JMP Pro, kicked off the seminar on 25 Sept. with a quick survey of the audience and found that their statistical knowledge was generally moderate, with a few members of the audience describing their knowledge as high or low. A normal distribution then!

Following an introduction to modelling, a key message that Sam demonstrated in JMP Pro is that holding back data as you build a model prevents over-fitting. A delegate asked about how big the holdback data size was. As a true statistician, Sam answered "It depends." Over the course of the demonstrations, Sam and Robert Anderson, the UK’s technical lead, looked to answer this question.

Sam also gave an introduction to decision trees, also known as recursive partitioning, CHAID and CART. Sam described decision trees as being built from a series of nested "if statements" that are branches in a tree. Sam explained that random forests and boosting can help build models that pull out more subtle or hidden effects.

In four case studies:

  • Robert showed how holdback validation could be used to efficiently help solve an engineering problem in the semiconductor industry. Without this technique, the engineers could end up going down blind alleys and checking good process steps for nonexistent problems, thereby wasting valuable resources.
    • Sam showed how K-fold cross validation can be used to build more robust stepwise regression models, to describe the formulation of animal feed supplements in agricultural R&D.
      • Robert's next example looked at the propensity of telco customers to churn. He showed how more advanced modelling techniques such as random forests, combined with holding back one part of the data to tell the model when to stop building and another part to test the model, can help build better models.
        • Sam showed how neural networks can be used to build flexible models for a risk application. He used Receiver Operator Curves (ROC) and Lift Curves to assess the models. He also compared a range of different models using the Model Comparison platform. He also showed how you could use a model averaging to produce an excellent model.
        • A number of the delegates said they were amazed at the speed with which models can be built and compared in JMP Pro. Others said that, whilst they could build models with the tools that they currently have, it was very difficult to explain these to others. They appreciated how the Prediction Profiler would allow them to do this, and to do scenario or "what if" analysis so that execs could make more informed decisions.

          A few asked about deploying the model and were satisfied to hear that SAS scoring code could be created from the model, and the model equation could be easily coded in other languages such as SQL.