Subscribe Bookmark RSS Feed
bernard_mckeown

Joined:

Jun 23, 2011

Webcasts show how to build better statistical models

We have two upcoming webcasts on Building Better Models presented at times convenient for a UK audience:

  • If you are new to JMP Pro, you will want to view the webcast on 21 October 2014.
  • If you are already using JMP Pro, the webcast on 31 October 2014 will suit you best.
  • These webcasts will help you understand techniques for predictive modelling. Today’s data-driven organisations find that they need a range of modelling techniques, such as bootstrap forest (a random-forest technique) and partial least squares (PLS), both of which are particularly suitable for variable reduction for numeric data with many correlated variables. For example, some organisations deal with a multitude of potential predictors of a response, sometimes numbering into the thousands. Bootstrap forest and PLS can help analysts separate the signal from the noise, and find the handful of important variables.

    Other organisations deal with the problem of customer segmentation. They may need to employ techniques including cluster analysis, decision trees and principal component analysis (PCA). Decision trees are particularly good for variable selection. Using a variety of modelling techniques can result in a different selection of variables, which can provide useful insight into the hidden drivers of behaviour.

    Consumer data is notoriously messy, with missing values, outliers and in some cases variables that are correlated. Missing values can be a real problem because the common regression techniques exclude incomplete rows when building the models. This "missingness" itself can be meaningful, so using informative missing techniques to understand its importance can help you create better models. Some techniques, such as bootstrap forest and generalised regression, handle messy data seamlessly.

    A critical step in building better models is to use holdback techniques to build models that give good predictions for new data, as well as describe the data used to build the models. Holding back data to validate models helps to keep the model honest by avoiding overfitting and creating a more accurate model.

    Analysts face a major hurdle in explaining their models to executives in a way that enables them to do "what if" or scenario analysis, thereby exploring decisions before committing to them. A powerful way to do this is by dynamically profiling the models. Once companies have selected the best model, they often want to deploy the models to score existing and new data so that different departments can take appropriate actions.

    I hope you can join us for one of these live presentations where we will demonstrate how to use these predictive modelling techniques using case studies.