Through a variety of case studies, you'll learn to build better and more useful models with advanced predictive modeling techniques, such as regression, neural networks and decision trees. You'll learn to partition your data into training, validation (tuning) and test sets to prevent over fitting. And you'll see how to use comparison techniques to find the best predictive model. This tutorial is for analysts, scientists, engineers and researchers interested in learning how predictive modeling can help them use the data they have today to better predict tomorrow. Although summarized in the Building Better Models slide deck, the two case studies below are available as separate stand-alone presentations.
Data for identifying insurgents from a stochastic computer simulation of a helicopter flying surveillance for a convoy are modeled using several different methods. The six factors affecting Proportion Insurgents Identified (the response) are Helicopter Height, Helicopter Speed (relative to convoy), Helicopter Distance (from convoy), Convoy Speed, Number of Insurgents with AK47s, and Insurgent Camouflage Level. Models employed include several types of regression, decision tree, and neural. Relative strengths, weaknesses and prediction accuracy of models are compared.
In 1998 DARPA developed a representative cyber-attack data set with over 20 attack types, 41 potentially causal factors, and nearly 5 million rows of data. These and derivative data are analyzed using a variety of predictive models, including nominal logistic, decision trees, and neural models. It will be shown that the ability to predict attacks can be further improved by averaging models. Both simple algebraic averaging of model probabilities as well 'ensemble modeling' - where models are used as inputs to other models - will be demonstrated.
This presentation will show how Design of Experiments (DOE) methods can be used to extract the most useful information from the smallest number of computer simulation runs. By sequentially running blocks of simulations, computer experimenters can conduct the overall fewest trials necessary to do sensitivity analysis of the factors being studied without over-utilizing high performance computing resources. The greatest benefit occurs when fast-running (seconds) surrogate model can be developed for long-running (hours, days or weeks) simulations.
This tutorial is meant to expose testers to the very latest and effective Design of Experiments (DOE) screening methods. Attendees will learn about recently published methods for not only efficiently screening factors but for using the screening data to more rapidly develop second-order predictive models. Definitive Screening designs will be shown to not only detect main effects but also curvature in each factor - and do so in fewer trials than traditional fractional-factorial designs that when a center point is included can only detect curvature globally.