Our World Statistics Day conversations have been a great reminder of how much statistics can inform our lives. Do you have an example of how statistics has made a difference in your life? Share your story with the Community!
We have had such a favourable response to our seminar on Building Better Models that we held our third one on 10 April, with nearly 100 people attending. It's become a global success, having been delivered 20 times throughout the world. The seminar is based on George Box's concept that "all models are wrong, but some are useful," and it investigates how different predictive modelling methods can be used to make the models built by JMP Pro as useful as possible. George Box is one of the great statisticians of the last century and, sadly, passed away earlier this month, which added poignancy to the event.
The seminar was kicked off by Sam Gardner, who gave the audience an introduction to statistical modelling and why it is important to avoid overfitting a model, which is because it might lead to the model "predicting" the noise, rather than focusing on underlying signal. You can hold back data to help you select the best model using the "Validation" sample in JMP Pro, because this data is used to validate which model is best by observing when the RMSE is at a minimum. JMP Pro also allows you to evaluate how good the model by holding back a third "Test" sample from the data. It has a range of techniques to assess this, including R2 (which assesses the proportion of variability within the model), AIC, BIC, Confusion Matrices, ROC and Lift curves.
Sam gave an introduction to decision trees, and Robert Anderson of the JMP UK team then showed how these, coupled with holding back data to validate the model, can help allocate engineering resources to the right tasks, using a semiconductor process engineering case study. The upshot was a more productive use of engineering effort, saving time and cost, compared with traditional statistical modelling methods.
Sam followed this with a research and development example, in which he built a linear regression model to determine the melting point of pharmaceutical compounds based on molecular structure. He combined other techniques such as exploratory data analysis to identify important compounds and variable reduction using principal components analysis and clustering.
JMP Pro allows you to apply boosting and bagging techniques to build better decision trees. Robert showed how these could be used to build a more accurate model predicting customer churn in a telecommunications company, so that users could be marketed to appropriately. He also showed how you can create an interactive visualisation of the model -- the Prediction Profiler in JMP (see image below) -- so that you can use scenario analysis to communicate with your executives and reach better business decisions.
Neural networks are very flexible models and so are prone to overfitting. Sam showed how to build better neural net models and compare them, using a financial risk case study. He also talked about how neural nets are starting to be used to build "models of models" using legacy data, so that scientists can conduct experiments on those models, saving time and resources in the lab. You can find an example of how Goodyear uses this to design better tyres at our website.
Following on from the popularity of this seminar, we will be offering two webcasts on building better models on 1 and 7 May. You and your colleagues can register via our webcasts page. We will also be holding a one day, hands-on workshop on 25 June for new users of JMP Pro. Places on this are strictly limited, so if you would like further details, you can contact me via email.