Our World Statistics Day conversations have been a great reminder of how much statistics can inform our lives. Do you have an example of how statistics has made a difference in your life? Share your story with the Community!
Choose Language Hide Translation Bar

Sharing the Ultimate Boon: A Journey From Modeling to Scoring

Abstract

Finding the right model to predict new outcomes given new data is an important accomplishment. But many times it is a step in a journey where the final goal is to share the model’s predictive power with a much larger audience. JMP Pro has many tools to facilitate fitting, comparison, and selection of predictive models. JMP Pro 13 added the Formula Depot: an efficient way to collect models, apply them to new tables, and access the Model Comparison and Profiler platforms. To help with deploying models to production, the Formula Depot can also convert them to scoring code in a number of different programming languages. In this tutorial, we use a real estate case study to illustrate the predictive modeling workflow. We’ll compile data, prepare the data for modeling, generate predictive models, publish models to the Formula Depot, and explore and select the best model(s). Then, we’ll generate scoring code to support the creation of web applications that can calculate housing prices “on the spot.” We will also explore different methods for scoring data, and provide an overview of current deployment architectures.

 

Tutorial Content 

The final result of the tutorial is a web application with housing predictive capabilities hosted on AWS. You can find it here. What follows is an explanation of the content associated with the tutorial, as it can be found in the attachments to this page.

 

The client tier (see attached file RedFinWeb.zip) uses Bootstrap for the user interface and OpenLayers for the mapping capabilities.  It is implemented as a static website hosted on AWS S3; the interface collects the user data entry using a customized HTML form; these are used as input values in the model evaluation, which is triggered by a REST call to the compute layer.

 

The compute layer (see attached file RedFinServerless.zip) uses two Amazon services: API Gateway is the entry point for the REST calls, which are mapped to an AWS Lambda service. Lambda is Amazon's implementation of the Serverless Architecture paradigm, also referred to as Function-as-a-Service (FaaS). Serverless provides performance at scale with low management costs to stateless, low latency, high-throughput applications, making it a great fit for scoring applications which are, by definition, embarassingly parallel.

 

The Lambda service was the final deployment destination for the Python scoring code generated by JMP. Deployment of applications to AWS services can be done manually using their management console, but to streamline the process we recommend the use of one of the many open source wrappers available. In this exercise, we used the Serverless application framework.

 

Note how the Python code captures both the feature engineering tasks (clustering, binning, imputation) as well as the model built on top of them. This allows the same raw data sources used to create the model to also be used to score new data in production - an important consideration when building a maintanable analytics pipeline.

 

Attached you will also find a Jupyter Notebook (RedFinNotebook.zip) used to test the model before deployment; a collection of Python models generated by JMP for the housing scenario (RedFinModels.zip); and the original JMP table with the data clean-up and scripts to generate the models. (RedFinData.zip). This last .zip file also includes a spreadsheet that illustrates how the REST backend can be called directly for scoring by other applications, in this case from an Excel formula.

 

References

 


Comments

Hi,

Can you tell me where to find the "jmp_score" for javascript?

Thanks,

Jim Grayson

Hi @jgrayson,

 

It is in your install directory, under Resources/Scoring. For example in my Windows machine, it is under

C:\Program Files\SAS\JMPPRO\16\Resources\Scoring\JavaScript

Hi,

I appreciate your help.  you last suggested running inside a browser behind HTML ...

Can you give me more specifics or link to something that would help me achieve something tangible?

Thanks,

Jim

The idea is that the web page is user interface through which you get your data that needs to be scored - for example, a dialogue where the user types a few values. Once the values are submitted, the event handler gathers then and calls the JavaScript model code generated by JMP. The result is displayed back to the user. The main benefit is that there is no need to hit back a server, so this is as scalable as it can be. The downside is that you are limited to scoring one to a few observations at a time - but depending on the use case, this is not a problem.

For a concrete example, take a look at this:

http://jmp.valentine.s3-website-us-east-1.amazonaws.com/

Here instead of user input, the web page generates random points that need to be classified as being inside or outside a curve. The classification is done by a JMP Neural Network model. Visualization is done with D3.js.

 

Another example that shows how to use a web application to collect data from the user, score the input and present back the result, is available in the scoring_examples.zip file attached to the 2016 presentation.

The folder "WebApp"  as a complete web application that illustrates the concept. It was designed to be used by a field research botanist, trying to classify new Iris flower specimens based on their measurements (I know, I know, how original...