JMP and Python: The Next Step
This presentation outlines the new and significant updates to JMP 18's support for Python, which provides a much tighter integration with Python and a significantly greater Python-centric user experience. In JMP 18, you will be able to:
- Directly run Python scripts from a Python-aware script editor window, without any JSL needed as a wrapper.
- Run JSL from Python, while retaining the ability to run Python from JSL.
- Maintain a consistent runtime environment so that user scripts don't require any customizations to run on different users' machines.
- Conduct in-memory transfers of data between JMP and Python.
- Call JMP functionality from within the Python script through a JMP import module.
These JMP capabilities exposed to Python will expand over time, but the current focus is critical functionality and in-memory data support. In this presentation, hear first-hand about the changes and the rational behind them, as well as witness the new functionality provided to JMP users. Extensibility and data import capability through the use of external Python packages are discussed and access to GPUs via Python interface using external packages is demonstrated.
Hello, I'm Paul Nelson. I'm here to talk about JMP, Python and our next step. After one of the previous presentations last year, we decided we could up our game. JMP, Python had room for improvement, and that's what I'll be talking about here.
The biggest challenges that the customers had would JMP 14 through 17 with just getting the Python support to work. If it worked. It just worked fine. All too often that wasn't the case. With numerous Python distributions and different environments that made testing difficult for us.
Anaconda could be made to work on Windows sometimes, and Python virtual environments could be made to work sometimes. The environment, because everybody had different pythons, a script couldn't even depend on a common Python version in the user's environment.
Editing Python code in the JSON editor was difficult and unpleasant. A language that blocks of code are delimited by spacing. When you have to paste that in as a string in JSL, it makes it challenging. There's insufficient error reporting in the old versions of JMP from Python made it difficult to debug your Python scripts.
Finally, the data table and matrix transfers were done through the file system, and with a data table it was done as a CSV file, which is slow if you've got actually big data. Big class works fine, but bigger data is slow.
The vision that we've had is to make sure Python integration just works. No configuration necessary. To that end, we've buried a Python version 3.115 and JMP 18.0, and it is installed as part of JMP. It's a common environment, and all users will have the same version of Python with the same version of JMP.
We've gone further and made sure that data flowing between JSL and Python is in-memory transfers and zero copy access to the memory. We've made it an inviting and productive environment for Python programmers. We now also have a Python script editor where there's code coloring and syntax highlighting.
You can run pure Python scripts directly from the script editor and from within the Python environment, you can get direct access to JMP data, the log, you can call JSL and access data from Python. We've greatly enhanced the error reporting to the JMP log and the embedded log.
Because it's a common environment now we can test it and verify it. Our environment is the same as you should see. We've added an entire Python category in the Scripting Index to match… to explain and to show samples of what's now available with the import JMP package and a couple other utility packages.
The other part of this is this gives us the ability to extend JMPs capabilities through Python's vast library of packages. We're not attempting to make a one-to-one alternative JSL. Instead, we're trying to provide a familiar and productive environment for Python programming within JMP.
This is modernization, a revamp and overhaul. Some of the original design decisions for JMP's Python integration provided limiting. We have made every attempt to minimize the breakage due to the change, while providing a strong and stable foundation for the future.
There's every expectation that the breakage is limited to this transition to JMP. The key aspects is we now only support the Python version that we are shipping with JMP. Think of it as a private JMP virtual environment.
The other thing that will happen in the scripts is we've changed the way Python send of a data table works. It now creates a JMP that data table object. This is live access to the JMP Data Table. The in-memory copy, you can edit and modify the existing table. It's no longer a copy.
JMP 14 through 17, created a pandas DataFrame object by copying the data through a temporary CSV file in the file system. For large data. Exporting to CSV blows up, it becomes very large and slow. You also have the issue of potential loss of precision because you're going from a binary to a text format and back from text back into a binary format.
If you really need the same functionality, we've provided a DT to pandas Python in the sample files that provides a two pandas function. This works the way it used to do internally. It creates a temporary CSV file and tells pandas to import the CSV file.
There's also a JMP to pandas sample showing how to create a pandas DataFrame in memory from a JMP Data Table object that's building a column in pandas with a column from JMP column at a time.
The Python Send Matrix was another one that used to write through the file system. It would write a file of doubles to have NumPy read the file. This now happens through memory internally, and it's a live reference to the matrix instead of a copy.
Majority of the JSL Python API remains intact. The changes involved were primarily removing deprecating arguments and functions used to configure the original Python support to pick a version of Python.
The only API removal was the Python get graphics function, which retrieved an image from Matplotlib, and in our scripting index, we have a sample that shows exactly how to do the same thing that JMP in did internally with just JSL and Python code. Show it already. Let's see. Let's get out of the demo.
Here we have JMP. One of the things I want to show is you'll see Python icons in the toolbars and in the file new menu we have JSON script, but we now have new Python script. We'll bring up Python script editor.
We can use show the embedded log. We have a JMP import package. Import JMP. We've made our effort to make sure we've put Python document strings so that you can run, help JMP Python. We can run that script. You'll see we've documented the objects and functions within this top level JMP import package.
There are a bunch of constants. All of the most of the special directories are represented. So we can do. And now I do from Python. JMP open. JMP, well, I can't type today, JMP. SAMPLE DATA. Plus. Big Boss. And just like in JSL, we can select the line of code and run it in the script and it. And we have big class.
Now, this is not a copy. We can get that object. We could have put TT equals JMP open and we'd have a reference. But we can also go TT equals JMP.current. That'll give us a reference to the current data table.
Well, we can do things like print TT. That tells us it's big class. It's five by 40 columns. But even. More fun you can do. TT. Name. And we can use Python slice operators. I'm telling it to give us every other name in the name column starting with zero.
We index in Python just like Python should we index starting with zero. Our data table index is column by row. Since we are a column major kind of format and if we print. Every other. Name, we get Katie. Jane. Lilly. James, Barbara, Susan, Joe and David.
This is a live access, so we can do things like… and that we can do things like TT, and we can use either name or the column index. Column we print that. That gives us, pick. Here's your… I picked one, right, Lewis. Sorry. Zero zero. Is Katie. Since the data table is live. You can do Katie
Wait on. Lowercase. Mixed case. Now we see the output in our embedded log. You'll notice. We've changed it in the data table. This gives you the ability to modify data tables, access the data tables in a Python way, and you are able to create data tables from Python. We've given the efforts.
We now have the scripting index. I will show the… here, like I said, get graphics is deprecated. It actually no longer is a valid function in JMP, but the sample in the scripting index shows you how to go from what you used to have to how we actually did it internally within JMP to save the figure and open the image and create a new window with the image.
One of the nice new features of the Python centric environment they are in the Python category is a category all to itself. The JMP import package has data table objects. There's samples for each of the ones we can create new columns, we can add rows, and save the table. The properties on it, such as getting table names. All of that.
This one would create a new data table powered by Python. We also have a JMP utilities. Where you can in, or we can use it to run the pip JMP because it's made its own isolated environment. When we install packages, they need to be into JMPs isolated environment
Hopefully you should not run into any issues with packages you have in JMP conflicting with other things or other installations of Python. We've kept it isolated. And so in this case, so if you have to… we've got create jpip will create a script that will wrapper the Python pip script to allow you to install, or you can actually run it from the JMP utils class from within gem.
Here we'll show list pip command list and this will go out. Call pip and returns the list of packages that I have installed in my Python environment. Another example here you can actually install packages. And that shall run straight from within JMP.
There are a couple caveats, which is why the create jpip exists. If a package is loaded in memory like NumPy, since it's got a shared library. Windows and may prevent you from updating that package. There are some packages that require compilation. Those may have to be done from the command line.
Some of them may work, but those may need to run from the command line. That's why you'll want to create a jpip wrapper script. When that's run, it will pop up and give you a directory in which to write the file, and then select folder, and it will write out a jpip wrapper script that's runnable from the command line.
There when you run it, you want to use the minus minus user flag after install. We do that automatically for you when it's called from here. There's also a JSL function that will run the install as well. Python install packages is even simpler. You can just run the JSL script, and it will call pip in the background, properly setting up all the environment variables necessary to put the package into JMP's isolated environment.
Let's go. Then we can go on to show you some more of the demos. This is the easy basic stuff. But what we also… part of this whole process was to make JMP more friendly to Python and much easier to extend JMPs capabilities through additional Python packages.
One of our frequent support track that people desire is to be able to import parquet files. This demo takes and pulls a table from my drive. One that I downloaded from the internet. Correct. Then prints out the shape of the parquet file and builds up a JMP data table column by column from the data in the parquet file. We'll clear the logs. Actually see me do it.
I didn't run the script. I must have had some things highlighted. There we go. This was a parquet file. That we did the pyrrow parquet as PQ imported it. The Parquet Files properties. It's a thousand rows. Shape is 13 columns by a thousand rows. Schema shows the prince the schema out the column names.
Here at the very bottom we can show one of the other things is we're not done yet by a long shot with the things we want to improve on the Python side, but until we get there you can run JSL from the Python.
Even though the Python code does not yet support things like setting, display widths, or even changing column types. You can create a new column, but you can't change its type from Python. Those are things that we want to get to, but right now you have a workaround to work through being able to run JSL scripts from within Python, just as you were able to run Python from within JSL.
Now with this enhanced environment, you can call back and forth between the two. The Python can run JSL scripts. Which call in the Python can then run back in the JSL environment, so you can nest the calls to achieve things that would otherwise not be possible. That's taking a parquet file of data type to JMP doesn't natively handle. But we've done that by importing a Python package to support that.
One of the other things that's dear to me. I'm not much of a statistician, but I am graphics and, other math. Here's using Open Compute Language or OpenCL. We've imported the OpenCL package. To create a Mandelbrot fractal. Let's show you the CPU first. Doing it with the CPU. This actually will take about 16 seconds to run.
This is painful, and you'll notice JMP itself is slow. Not responsive because Python has control at this point. If you put a running cell wait deep inside tight loops, you will be able to regain control. This is using the CPU to generate this fractal.
Now let's change it so that runs the GPU instead, and run the script. Boom. That takes a fraction of a second to generate the image, save it out and open it up into a script and see what we say. It took 16 seconds using the GPUs versus… but whenever we go back far enough. It actually took 32 seconds while I'm here presenting to do it on the CPU.
This opens up if you want to utilize the GPU with JMP, you're now able to do that through the Python. We've tried both the PyOpenCL and the PyCuda, so if you have an Nvidia card, and you're working on with Cuda. You can you can bring in the PyCuda or the more generic OpenCL. Further enhancing JMP's compute capability not leaving power of your desktop unused.
We can also connect to SaaS over Python. SaaS has a SaaS y. Package, and we've installed that into the… and this is some of this work is stuff that one of my coworkers put together. Let's clear the logs. We actually see what I've got and we can run the script.
It's collecting this as should be. Here we go. This one actually uses the JSL to try and make sure all of the Python packages are already installed that might be needed to run the script, and you can set that up. That's part of what you see here in the log was saying those are already satisfied. Here's all of the results that came back from running the script connecting to SaaS. Using the SaaS Py package, and we'll go back through the log.
The script is echoed to the log. Like I said here. We have, connection information where it's connecting to SaaS, running the products. Our logistic. Then returning the output values and the end of the SaaS session.
We've heard TensorFlow might be a thing that people are interested in. Here's a really simple one. Wrong, again I've installed TensorFlow and this is running from JSL, but we could have easily just ran this code here from Python.
It runs the tensor flow creates this the sum. I'm sure you'll come up with much more complicated uses than I have to demo. Py digest is another one. It's, screen. Here was a sample I created to show, one was to show several of the things, including the run JSL where it allows me to use to pick a directory using JMPs pick directory, the JSL pick directory and use that in a python script. This is all pure python script. It allows me to pick a directory.
Creates a checksum of every file in the directory. Puts that into an SQLite database and then creates a JMP data table from the database. It's gives you a bit of database programming, gives a bit of JMP creating a JMP Data Table from Python and running JSL from Python.
Directory to pick. I got lots of data here, so let's just pick the presentation the parquet folder or select folder. Now we have a table of checksums. They created the… it checks on to every file in the directory, gave me their path names and an MD5 checksum of the contents of the file. Built the database. Built the JMP data table from Python, and also use the JSL to set the column widths.
Again, that's something we've not implemented yet, but is something on the list to look at. But you can do things from… you can do JSL things from your Python if you choose to.
Some people might want to us R. Well, in this case, I have created a connection using our serve on my machine and in our environment. It was just easier to run our serve from within a Windows system for Linux WSL environment.
I have an R serve demon running over here. And using the reserved client. I'm doing just a small little, creating an R vector on the server, summing result, and run this script. We've connected to our server. We get the vector, and we summed all the elements to get six. Our py2. There's also some examples of that we have. If you want something really simple that will work from the command line, there's a sample on how to use some process to shell out and run R from the command line.
Finally, the statistics one where Jon Saul has helped him. Create a sample that used HDB scan Python package. One thing to note with Matplotlib. Matplotlib on Windows does not have a graphic backend, so if you want to see any output, we found that PyQt5 works best.
As the back end and. On the Mac if you use Matplotlib, we will either force it to be the AGG, the non-graphic back end if you haven't installed. One of the PyQt5 or PySide. Because Matplotlib on the Mac automatically goes for the cocoa backend, and there is a bug that when you close the last window in Matplotlib on the Mac, it will cause JMP to exit.
To protect against that where we are forcing, if we find Matplotlib on your system, if there's a back end, we will load Matplotlib and force it to use the Qt back end. Otherwise, we make sure it's a um non-graphic backend. We've made it the back end. I have PyQt5 on my machine. With this sample we run some clustering on the HDB scan.
Now let's show the embedded log. We'll run the script. What we get, there's a figure generated in Matplotlib table of the data. Here you can see we did a JMP open and we took the DT reference. Those are the samples I have. Believe.
I'm gonna bring back the presentation. Our priorities for the near future. Continuing refinement and improvement. Increase the data interoperability between JMP and Python. Right now, the limitation within the column modification editing is you can create columns of any type, but you can only modify and get values for numeric and character columns, expression columns or state columns. Creation of linked tables are all to be done in the future.
While the Python script editor supports code coloring, syntax highlighting, it doesn't yet do automatic indentation support. We will be looking in that as well.
Goals for modernization was to make JMP Python integration just work out of the box. You should be able to be productive with JMP and Python without any configuration. That was, if it didn't work was very challenging. Why don't you reduce the time for Python programmers to be productive within JMP, and make JMP? Improve JMPs ability. Be an analytic hub because of the easy expansion of JMP using external Python packages.
Improve our ability to load different data formats that other people have written spent time to write like the parquet format where we, hdf5 we've tested that as well. There's lots of formats out there, more formats than any one development team can have time to write internally, but with this way you can pull in the Python package, and you can make it a JMP data table out of that data and have that power within JMP.
Appreciate your time. Hope you have enjoyed what we've done, and very interested in feedback. Thanks.