Scoring Outside the Box: Code Examples - Python

When it comes to deployment, flexibility is another important feature to consider. Converting the model to scoring code is an important step; next you have to deal with the specific needs of each production scenario. That might lead to completely different solutions, from providing an API that can be called by remote applications, to creating simple tools that can score and visualize Excel spreadsheets, to creating distributed applications that can score massive amounts of data.

One of the answers to deployment requirements diversity is to use a language that can tap into a rich ecosystem of supporting libraries and frameworks. This is where Python stands out. One can find Python libraries (not to mention books and tutorials) to handle all of the challenges described above, and many more.

The examples below illustrate how simple solutions can be built in Python by leveraging JMP-generated scoring code and available libraries to solve scenarios that would otherwise require a big investment in custom software development. The examples were tested in Windows, using the Anaconda Python distribution from Continuum Analytics™.

Web Service

A common but difficult requirement is to make the ability to score data available to many users over a network. But you might not want to make the scoring code visible by being part of the web application itself (see our JavaScript example). Or you might want to log the scoring calls, or combine the input data sent by the user (or automated process) with data retrieved from a database.

All these requirements can be addressed by deploying the Python scoring code as part of a server-side application on a web server. By exposing the scoring code through an API callable over HTTP, you are effectively implementing a "scoring-as-a-service" solution that can serve users (through a web or even mobile application) and automated processes alike.

The code in the WebService directory shows a simple way to implement this solution. The main file, app.py, uses the Flask microframework to create a web server application. That application exposes a single entry point named score that calls a JMP-generated Python model to score the input data provided as URL arguments.

In the same directory we have both the JMP-generated Python scoring code and a copy of the jmp_score.py support file provided with the JMP install.

The solution also includes a web client application that illustrates how the scoring service can be called from a browser. It is basically the already mentioned JavaScript example, modified to call the scoring service instead of calculating the score locally.

The provided WebService/run.cmd script starts the scoring web service and then opens two browser tabs. The first contains the web client; try interacting with it and check the server window to see the requests and replies being printed. The second has a URL that points to the scoring service, passing along the encoded input values as arguments:

http://localhost:9004/score?Petal+length=5.1&Petal+width=1.9&Sepal+length=5.8&Sepal+width=2.789

The result should be a page displaying a JSON object with the scoring results:

{
    success: {
        Most Likely Species: "virginica",
        Prob[setosa]: 1.1654444266441007e-25,
        Prob[versicolor]: 0.000699496836649252,
        Prob[virginica]: 0.9993005031633508
    }
}

Jupyter Notebook

When working with in a team, or collaborating with other researchers, you often need to share not just the results of your analysis but also the motivation, methods and assumptions that led to them. Showing the code behind a numerical analysis allows your peers to validate your steps and even suggest improvements. Literate programming is an approach that supports this mix of programming logic with natural language.

Jupyter is the most popular open source implementation of the literate programming paradigm. Project Jupyter was born out of the IPython Project in 2014 as it evolved to support interactive data science and scientific computing across not just Python but also other programming languages. From the Jupyter site:

"The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text."

Jupyter was covered by a recent Nature article, which found many scientists that are now "publishing their notebooks alongside papers". The iPhython GitHub page keeps a list of examples.

The example under the Notebook directory contains a Jupyther notebook that shows how to use a JMP-generated Python scoring code model along with other core Python libraries for data analsys and visualization, all in a literate programming context. First launch the notebook using the provided command run.cmd.

That should launch your browser pointing to the URL:

http://localhost:8888/notebooks/JMP_scoring_Excel_Bokeh.ipynb

Follow the document, and optionally re-evaluate the associated code cells, to learn how to:

Spark

Sometimes even the largest server is not be enough. When your scalability requirements get to the level reserved for "Big Data" applications, you have to start considering distributed solutions.

In recent years, Apache Spark has gained a lot of traction in that space due to its speed, easy of use and generality. Thanks to a Python interface called PySpark, we can use Spark clusters to execute JMP-generated Python models and score large datasets.

The (Windows, Spark 1.6.1) example in the Spark directory includes data, models and the following scripts that implement the required steps in the setup and execution of a Spark application: