Tutorials and Presentations: (These Documents are available for download at bottom of page.)
Building Better Models – Tutorial 597
Through a variety of case studies, you'll learn to build better and more useful models with advanced predictive modeling techniques, such as regression, neural networks and decision trees. You'll learn to partition your data into training, validation (tuning) and test sets to prevent over fitting. And you'll see how to use comparison techniques to find the best predictive model. This tutorial is for analysts, scientists, engineers and researchers interested in learning how predictive modeling can help them use the data they have today to better predict tomorrow. Although summarized in the Building Better Models slide deck, the two case studies below are available as separate stand-alone presentations.
Case Study 1: Helicopter Flying Surveillance – Surrogate Modeling of Computer Simulation Data - WG28
Data for identifying insurgents from a stochastic computer simulation of a helicopter flying surveillance for a convoy are modeled using several different methods. The six factors affecting Proportion Insurgents Identified (the response) are Helicopter Height, Helicopter Speed (relative to convoy), Helicopter Distance (from convoy), Convoy Speed, Number of Insurgents with AK47s, and Insurgent Camouflage Level. Models employed include several types of regression, decision tree, and neural. Relative strengths, weaknesses and prediction accuracy of models are compared.
Case Study 2: Cyber Attack Data – Improving Prediction with Ensemble Modeling - WG5
In 1998 DARPA developed a representative cyber-attack data set with over 20 attack types, 41 potentially causal factors, and nearly 5 million rows of data. These and derivative data are analyzed using a variety of predictive models, including nominal logistic, decision trees, and neural models. It will be shown that the ability to predict attacks can be further improved by averaging models. Both simple algebraic averaging of model probabilities as well 'ensemble modeling' - where models are used as inputs to other models - will be demonstrated.
Efficient Modeling & Simulation Using Design of Experiments Methods – Tutorial 595
This presentation will show how Design of Experiments (DOE) methods can be used to extract the most useful information from the smallest number of computer simulation runs. By sequentially running blocks of simulations, computer experimenters can conduct the overall fewest trials necessary to do sensitivity analysis of the factors being studied without over-utilizing high performance computing resources. The greatest benefit occurs when fast-running (seconds) surrogate model can be developed for long-running (hours, days or weeks) simulations.
The fast surrogate model enables testers and analysts to interactively query the modeled process to find optimal operating conditions or the frontiers of the acceptable operating window. These conditions of high interest can then be run using the full simulation to both validate the surrogate model as well as increase the accuracy of prediction. Design solutions demonstrated will include the application of traditional DOE methods to discrete event and agent-based simulations, and modern space-filling designs to more complex physics-based simulations such as Computational Fluid Dynamics (CFD). When to use, and how to choose among traditional linear regression approximation methods and spatial regression interpolation methods will be discussed. The effective practice of using checkpoint simulations for determining the accuracy of surrogate model predictions will be demonstrated.
Using Definitive Screening Designs to Get More Information from Fewer Trials – Tutorial 594
This tutorial is meant to expose testers to the very latest and effective Design of Experiments (DOE) screening methods. Attendees will learn about recently published methods for not only efficiently screening factors but for using the screening data to more rapidly develop second-order predictive models. Definitive Screening designs will be shown to not only detect main effects but also curvature in each factor - and do so in fewer trials than traditional fractional-factorial designs that when a center point is included can only detect curvature globally.
These new designs when first published in 2011 could support only continuous factors. Improvements published in March of 2013, now enable them to support categorical factors with two levels. When the number of significant factors is small, a Definitive Screening design can collapse into a 'one-shot' design capable of supporting a response-surface model with which accurate predictions can be made about the characterized process.
A case study will be shown in which a 10-factor process is optimized in just 24 trials. Checkpoint trials at predicted optimal conditions show the process yield increased by more than 20%. In cases where more than a few factors are significant and the design can't collapse into a one-shot design, the existing trials can economically be augmented to support a response-surface model in the important factors. Graphical comparisons between these alternative methods and traditional designs will show the new ones to yield more information in often fewer trials.