Subscribe Bookmark
clay_barker

Staff

Joined:

May 27, 2014

Supervised binning add-in for predictive modeling

When building a prediction model, there are a variety of ways that we can model the response as a function of our predictors. The Fit Model platform in JMP allows us to model the response as a linear function of our predictors. The Nonlinear platform allows us to model the response as a nonlinear function of the predictors, maybe in the form of an exponential or sigmoidal curve. Another option is to model the response as a step-function of our continuous predictors. We can build this kind of model by discretizing, or binning, our continuous columns. This blog post provides a brief description of the “Supervised Binning” add-in for building this type of model. Here “supervised” refers to the fact that we are using the response variable to help us choose the best binning scheme. The add-in is available for download from the JMP File Exchange as part of the “Predictor Binning” add-in (download requires a free SAS profile).

 

As an example, the Corn data table in JMP’s sample data folder provides corn yield measurements and the concentration of nitrate added to the soil. Plotting these data, we see that the relationship between the two variables is highly nonlinear, and we realize that it might be difficult to determine a model that would be appropriate for the data. So it would be interesting to bin the nitrate values, creating a step-function to predict corn yield. To do this, we simply launch the Supervised Binning add-in and specify the “yield” column as the Response and the “nitrate” column as the Explanatory Variable.

 

 

 

 

The add-in appends a binned version of the nitrate column to the original data table. This new column, “nitrate binned,” is a four-level categorical column where each level represents a different range of nitrate values. For example, the first bin represents the observations where the nitrate level is less than or equal to 10.58. Now we can use the binned column (in either the Fit Model or Fit Y by X platform) to predict yield. The figure below compares the binned prediction function to a disjoint quadratic model for predicting yield. The binned model is easier to interpret than the nonlinear model, and it has a lower mean-square error.

 

 

 

 

The corn example is nice for looking at how predictor binning works for a simple example, but binning is also very useful when building more complicated models. For example, we could look at the Boston Housing data from the JMP sample data folder. Here we are trying to predict median home values using a variety of features of each town. Suppose we want to use all of the data available to build a linear model, but we have reason to believe that several of the predictors have a nonlinear relationship with home value. For example, maybe we believe that home values are relatively constant for low crime rates, but drop dramatically above a certain crime rate.  We could make similar arguments to justify binning the “rooms” column. We can use the Supervised Binning add-in to bin the “crim” and “rooms” columns, as well as any other columns that might seem appropriate.

 

 

 

 

The add-in breaks the “crim” column down into 10 discrete categories. Once the per capita crime rate goes above 2.0, home values start to drop quickly. Now we can use a combination of our binned columns and the remaining continuous columns to build a model with potentially (hopefully!) much better predictive ability than a model built with the original columns.

 

So if you have ever found yourself wanting to bin or discretize your continuous predictors, you might want to try out the Supervised Binning add-in. You can find this add-in as part of the Predictor Binning add-in on the JMP File Exchange.

5 Comments
Community Member

Is your data too precise? wrote:

[...] post described an interactive binning tool that you can use to do this sort of binning manually. A third example of binning is to use a supervised approach, where the bins in the predictor are chosen in a way that maximizes the predictive ability of the [...]

Community Member

sandeep pawar wrote:

I am using JMP Pro 11. I ran the addin but nothing happened. No new column was created. Does it work in 11?

Community Member

sandeep pawar wrote:

It seemed to work on Boston Housing data...not sure why it wouldn't work on my data

Clay Barker wrote:

Hi Sandeep,

Unfortunately it's hard to guess why it might not be working for your data. My guess is that it should work in 11, it's more likely that something about your data has exposed a bug in the add-in. Things like excluded rows or missing values could possibly trip up the add-in.

Thanks,

Clay

Community Member

Hi Clay,

 

We are trying to use the add-in "Binning for continous Predictors", the unsupervised element works fine, but we are having issues with the supervised method.  After choosing a reponse variable and explanatory variable and clicking ok, nothing seems to happen.  We are using JMP 12.2 (64 bit).  Do you have any suggestions?

Article Tags