Comparing and Selecting Predictive Models

Learn more in our free online course:
Statistical Thinking for Industrial Problem Solving

In this video, we show how to compare and select predictive models in JMP Pro. We use the data set Bodyfat 07 to fit predictive models for continuous %Fat using all of the available predictors.

The column Validation 2 partitions the data into training, validation, and test sets.

We have built three linear models using generalized regression. We’ve also built a regression tree and a simple neural network.

To determine which model performs best, we look at the fit statistics for the test set. But this can be a little challenging when you are comparing several models.

Instead, we’ll save the formulas for all of the models to the data table, and then use the Model Comparison platform in JMP Pro to compare the models.

To save a generalized regression model, we select Save Columns, and then Save Prediction Formula from the red triangle for the model. This saves a prediction formula column to the data table for the model.

To save the regression tree model to the data table, we also select Save Columns, and then Save Prediction Formula from the top red triangle for the model.

To save the neural network model, we select Save Profile Formulas from the red triangle for the model. Rather than saving several columns to the data table, this saves the entire neural model in one prediction formula column.

To compare the performance for these models, we use the Model Comparison platform. To do this, we select Analyze, Predictive Modeling, and then Model Comparison.

We select the five predicted columns as Y, Predictors. You can also leave the Y, Predictors field blank, and JMP will find all of the columns with prediction formulas for you.

We want to report separate fit statistics for the training, validation, and test sets So, we select Validation 2 as the BY variable and click OK.

You see separate reports for each partition of the data.

The models were fit using the training data, and the model complexity was determined by the validation data. So we use the statistics for the test set to select the best performing model.

To make it easier to identify the best model, we’ll sort on RASE (root average square error) for the test set. To do this, we right-click on the table and select Sort by Column. Then we select RASE, click the Ascending box, and click OK.

Of the models that we built for these data, the best performing model is the pruned forward selection model. This model has the lowest RASE, the lowest AAE, and the highest RSquare.

An alternative to saving formula columns to the data table is to publish them to the Formula Depot. For example, let’s look at the generalized regression models.

Instead of selecting Save Prediction Formula from the Save Columns menu for the model, we’ll select Publish Prediction Formula.

This writes the model to the Formula Depot.

From here we can use the top red triangle to launch the Model Comparison platform, you can apply the model to a different data table, you can generate scoring code for the model in different programming languages, and much more.

The Model Comparison platform makes it easy to compare and select the best performing predictive model. The Formula Depot provides access to Model Comparison, but it also enables you to easily apply your best model to new data.