Subscribe Bookmark


Jun 13, 2011

Making better predictive models quickly with JMP

There are many ways of generating a model such as basic linear regression, decision trees, neural nets, and generalized linear models. JMP Pro can be a great tool for data miners, those who want to get more information out of their data and build more accurate predictive models. It incorporates traditionally complicated algorithms that, in true JMP fashion, even a novice can harness to quickly build powerful models. Bootstrap forests, boosted trees and multilayered-boosted neural nets are just a sample of the powerful tools in JMP Pro. One thing that makes JMP Pro a powerful data mining tool is ability to build, tune and test your model in one step.


A while back, this blog featured a great post on the concept of training, validation and test sets to build your models. To build a model that is not only descriptive but also predictive, validating your model and subsequently testing it is essential.


The first step, as described in the blog entry, was to first split your data into three groups: training, validation and test portions. Next, you build a model based on the training portion of your data that has captured the behavior of the data (your system) and not just the noise. Once you've come up with a legitimate model, you apply that model to the validation set. If you've built a model that captures the underlying behavior, you should get similar behavior. If you do not, you have to go back and build another model; this process may go back and forth a few times. Once you get a model that gives similar behavior in the validation built with the training set, you expect the model to be repeatable so you apply the model to the test set.


JMP Pro is a huge breakthrough in simplifying the process. The first enhancement introduced is a much easier way of generating a validation column. From JMP, choose Cols -> New Column, Initialize Data -> Random -> Random Indicator. Random Indicator column will default to values 0 (Training) , 1 (Validation) and 2 (Test). You can now choose what portion of the data you want to use for each step.


Validation Column Generation




Most of the modeling platforms in JMP Pro now have a option to input a validation column. This allows you to complete the long process of building, tuning and testing your models, as I just described above, in one easy step. As you're inputting your variables, choose your newly made column as the validation option.


JMP Pro Model Validation




One fact of statistics is that not all modeling techniques work well for all data sets. Each technique has its strengths and weaknesses; each one can teach you something different about your data. One strength of JMP Pro is that it offers multiple modeling techniques under one roof. You can try each algorithm, evaluate its performance and even save the model (prediction formula) to the data table as a new column. JMP Pro Version 10 now allows you to compare your models once they've been built.


Compare your models




Run the Model Comparison platform and choose the models you want to compare (or let JMP choose them).


Compare Multiple Statistical Models




In this example, the Bootstrap Forest seems to have generated the best model. If multiple models have similar statistics, another method to view the models is to use the interactive Profiler platform. It not only allows you to develop an intuitive understanding of how the model works, but it also shows how varying multiple inputs can affect the output. You can download the data I used in this blog post  from the JMP File Exchange and try it out yourself! (Note: Download requires a free SAS profile.)

Community Member

Arka wrote:

Thanks Aashish. This looks really useful. I will ask why you need a validation dataset and then a test set. They appear to serve the same function. Is it a matter of passing the model through 2 filters to see its predictive value?

Community Member

Craig Burkhart wrote:


The training, validation and test sets normally come into play for techniques like neural networks and its associated techniques. In standard neural network regression, the training set is actively used to create the actual model. The validation set's purpose is to tell the modelbuilding step when to stop the training. Typically, a metric of the quality of the model, say rms error, is sampled on the validation set. Once the rms error as a function of the number of iterations (or epochs in neural net speak) flattens out for the validation set, the typical neural network algorithms sample for when the rms error begins to rise. If the rise continues for some stated epoch length--for example, 5-10 epochs--then the optimization is stopped and taken back to the point just before the rise occurred. Since the regression algorithm used the validation set to sense this endpoint, it technically cannot be used as a test of model predictive value. That's where the standard test set comes into play.

Community Member

Mohammad Mirwais wrote:

Dear Sir or Modam, I am a new student and I want to make a modle for the data set which I have now I am new with JMP how should I find some way to teach me step by step to make the modle for the data which I have?

Your coporation will be much more appreciated.

Arati Mejdal wrote:

Hi, Mohammed,

These videos on building models may help you:

Good luck!

Community Member

Mrinal wrote:

Is it possible to do cross validation in JMP for Regression, Neural network and Gaussian process model?