Python parallel processing from JMP JSL

1 Kudo

Over the Holiday break, I had time to tinker with Python. The issue we have run into with parallel processing from the Python/JSL interface is Python typically launches multiple copies of JMP, not Python. The reason for this is we embed Python in JMP, so the executable is JMP instead of Python. I discovered the multiprocessing package allows setting the executable to launch when running the parallel code. This is the missing piece to the puzzle.

The script below I have tested on JMP 15.2.1 on both Mac and Windows with Python 3.9. On my Windows machine I had to use the Path() argument to Python Init(). I have multiple versions of Python and want to ensure the correct one is picked. On my Mac, Python Init() without arguments works fine for my configuration. There is one issue I ran into on Windows, I had to hardcode the path to my pythonw.exe. You will need to change that to match your Python location. Search for FixMe:

Other than the Python Init() parameters and the path to pythonw.exe on Windows, this should run without a problem on JMP 15.2.1 and newer on either Mac or Windows. May run on older versions of JMP that have Python support, but that has not been tested.

You will need to create a defs.py file in the same directory as this script. The multiprocessing package requires that the code to run in parallel come from importable code, not local to the script. So most of the work in this script was making sure sys.path is setup properly so that the defs.py file can be simply imported by the script.

The entire contents of the defs.py file is in the file comments, making it easy to paste into a new file. That contents duplicated below is:

def f(x):
   return x*x

The script takes a list of numbers [1,2,3] and in parallel calls f(x) to square the value giving the results [1,4,9].

pyParallel.jsl

Names Default To Here( 1 );

/* Description: Python mulltiprocessing from JMP JSL
 * File:   pyParallel.jsl, and defs.py
 * Author: Paul Nelson 
 * JMP Statistical Discovery, LLC
 * 
 * Python multiprocessing from an embedded context adds additional challenges.
 * Just attempting to run Python parallel processing modules, typically launches
 * multiple copies of JMP, not executing the Python code in parallel.
 *
 * The multiprocessing module allows setting the executable to launch, since we 
 * are loading Python into JMP, it's an embedded case thus sys.executable => JMP, not Python. 
 *
 * 
 * Note: Functions to be run in parallel must come from importable code, not local
 * code definitions. 
 * A multiprocessing 'Feature', not bug, See: https://bugs.python.org/issue25053
 *
 * Known Bugs:
 *   On Windows within JMP, sys.exec_prefix is empty, so path to pythonw.exe has to be set explicitly
 *
 * Create a defs.py file with the contents below.  Place in the same directory as this JSL file.
 def f(x):
   return x*x
 */
 
 // On my windows machine since I have multiple versions of Python, I have to specify the 
 // full path to the python39.dll, on my Mac it just works. 
If( Host Is("Windows"), 
	Python Init(Path("C:\\Users\\panels\\AppData\\Local\\Programs\\Python\\Python39\\python39.dll")),
	Python Init()
);

// send the current working directory where we loaded this script, down to Python so we can find defs.py
script_path = Get Default Directory();
Python Send(script_path);

Python Submit("\[

import sys
import os
import multiprocessing
from multiprocessing import Pool
import platform

# The loaded Python does not have the current working directory initialized, so change it 
# to the directory of the script, as sent in from JSL
print('script_path: ' + script_path)
hostPlatform = platform.system()
print(hostPlatform)
try:
	if 'Windows' == hostPlatform:
		# strip leading / off from path if it exists
		if '/' == script_path[0]:
			script_path = script_path[1:]
		norm_path = os.path.normpath(script_path)
		print(norm_path)
		os.chdir(norm_path)
	else:
		os.chdir(script_path)
except:
	print("Unable to change current working directory", sys.exc_info())
finally:
	print(os.getcwd())

# if '.' is not at the beginning of sys.path, prepend so we can look for modules within CWD
if sys.path[0] != '.':
	sys.path.insert(0, '.')
	print( sys.path )
	
# here argv is [''] empty string, doesn't seem to bother us in this use case.
#print( 'argv: ', sys.argv )

# prints the path to JMP, not Python
print('sys.executable: ' + sys.executable)

# build up path prefix to python3 exectable
if 'Darwin' == hostPlatform:
	pyPath = os.path.join(sys.exec_prefix, 'bin', 'python3')
elif 'Windows' == hostPlatform:
	# Bug - on windows sys.exec_prefix is empty in Python within JMP!
	pyPath = None
	if sys.exec_prefix:
		pyPath = os.path.join(sys.exec_prefix, 'pythonw.exe')
	else:
		# FixMe: hardcode for now
		pyPath = 'C:\\Users\\panels\\AppData\\Local\\Programs\\Python\\Python39\\pythonw.exe'
		
print ('Python Executable: ' + pyPath)

# Finally down to running something in parallel.  
#
# Import the function code that will be executed in parallel.
# JMP is not reporting ImportError or ModuleNotFound errors, the scripting just returns -1
# on an import failure, so wrap it within our own try: except: block
try:
	import defs as d

	# print( d.f(2) )
	# Tell multiprocessing to use the Python executable, not JMP to spawn the workers.
	multiprocessing.set_executable(pyPath)
	
	# Run the tasks in parallel - (separate Python instances) 
	with Pool(5) as p:
		print(p.map(d.f, [1,2,3]))    # Results in [1,4,9] the square of the numbers

except ModuleNotFoundError as mnf:
	print(mnf)
	print('Check that the moudule is located on your sys.path:')
	print( sys.path )
except ImportError as error:
	print(error)
except Exception as ex:
	print(ex)

]\");

matteo_patelmo · ‎03-22-2024

Hello @Paul_Nelson , how can the same result (running a python script or function in background) be achieved in JMP 18, using the embedded Python?

thanks!
Matteo

Paul_Nelson · ‎03-22-2024

Parallel.py using same defs.py as above, is done a lot easier with JMP 18's integration. JMP knows the location of the Python executable and is provided as a 'constant' in the jmp import package, and JMP now knows to add script's directory to the package search path so that local files can be imported.

# test_multiprocessing.py
# Author: Paul R. Nelson
# JMP Statistical Discovery LLC
#
# Description:  Demonstrate using Python's parallel processing capability.
# The file defs.py contains the code that will execute in parallel.
#
# Contents of defs.py:
#def f(x):
#    return x*x
#
import jmp
import multiprocessing
from multiprocessing import Pool

# jmp.PYTHON_EXE is the file path to JMP's Python executable.
print(jmp.PYTHON_EXE)

try:
	# defs.py is the code to be run in parallel, located in same directory as this file.
	import defs as d
	
	# tell multiprocessing to use Python executable, not spawn JMP for the workers
	multiprocessing.set_executable(jmp.PYTHON_EXE)
	
	# square each number in parallel - separate Python instances
	with Pool(5) as p:
		print(p.map(d.f, [1,2,3]))		# result is the square of the numbers
		
except ModuleNotFoundError as mnf:
	print(mnf)
	print('Check that the module is located on your sys.path:')
	print( sys.path )
except ImportError as error:
	print(error)
except Exception as ex:
	print(ex)

matteo_patelmo · ‎03-25-2024

Thanks for the quick response. Your code works fine as is, but I am having trouble extending it to my case of interest. I am trying to run cmdstanpy sample method (Bayesian MCMC) from JMP. Just using it in basic form, it works fine, but it opens a cmd.exe terminal per MCMC chain and keeps them open until the algorithm is finished (which can be quite a long time), not allowing any interaction nor displaying any log until the end.

This is the basic code:

import cmdstanpy as cs

model = cs.CmdStanModel(stan_file=stan_file)

fit = model.sample(data=datafile, show_progress=True, show_console=True)
fit_table = fit.draws_pd(vars=genVariables+modelParams)

I want to let it go in background and display log information while the algorithm runs. I tried this adaptation from your suggestion:

def sample_chain(chain_id):
    return model.sample(data=datafile, chains=1, parallel_chains=1, show_progress=True, show_console=True)

try:
	multiprocessing.set_executable(jmp.PYTHON_EXE)


	with multiprocessing.Pool(4) as pool:
		results = pool.map(sample_chain, range(4))

It does open 4 cmd.exe windows but then it never finishes and more over it is still not yielding back control to the main program while the 4 windows are open. Any further suggestion ?

thanks
Matteo

Paul_Nelson · ‎03-29-2024

I'm not familiar with the package, but I wonder if the parameters show_progress=True or show_console=True could be giving an issue. I would make sure as well that in the code running in parallel, you make sure that it has exception handlers, that exit if failure occurs. It may be because the windows are still open that it doesn't return. Try adding an exit to the end of the parallel script (not the one within JMP).

Since JMP loads the Python shared library into its process space, when Python is running it has control over the event loop instead of JMP. It is possible to hand control back to JMP periodically by using jmp.run_jsl('Wait(0);') such as within a loop. This returns control briefly back to JMP to run the JMP event loop. You may be able to experiment with that in your JMP run code as well, not sure exactly where since your isn't a loop but a parallel launch and the launched parallel code doesn't have any access to the import jmp functionality. I play with your code bit to see if anything stands out.

Paul_Nelson · ‎03-29-2024

I find starting small and simple is the best way to debug something. Before trying to 'run' by jumping into multi-threading, 'walk' by running cmdstanpy from within JMP.

I tried the following

import os

from cmdstanpy import CmdStanModel

stan_file = os.path.join('/Users/_me_/Desktop', 'bernoulli.stan')
print(stan_file)

model = CmdStanModel(stan_file=stan_file)

print(model)
print(model.exe_info())

the results

/*:
//:*/
import os

from cmdstanpy import CmdStanModel

stan_file = os.path.join('/Users/_me_/Desktop', 'bernoulli.stan')
print(stan_file)

model = CmdStanModel(stan_file=stan_file)

print(model)
print(model.exe_info())

/*:

/Users/_me_/Desktop/bernoulli.stan


19:01:04 - cmdstanpy - INFO - No CmdStan installation found.

19:01:04 - cmdstanpy - INFO - Cannot determine whether version is before 2.27.

19:01:04 - cmdstanpy - INFO - No CmdStan installation found.

19:01:04 - cmdstanpy - INFO - Cannot determine whether version is before 2.27.

Traceback (most recent call last):
  File "<string>", line 8, in <module>
  File "/Users/_me_/Library/Application Support/JMP/Python/3.11/lib/python/site-packages/cmdstanpy/model.py", line 212, in __init__
    model_info = self.src_info()
                 ^^^^^^^^^^^^^^^
  File "/Users/_me_/Library/Application Support/JMP/Python/3.11/lib/python/site-packages/cmdstanpy/model.py", line 318, in src_info
    return compilation.src_info(str(self.stan_file), self._compiler_options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/_me_/Library/Application Support/JMP/Python/3.11/lib/python/site-packages/cmdstanpy/compilation.py", line 354, in src_info
    [os.path.join(cmdstan_path(), 'bin', 'stanc' + EXTENSION)]
                  ^^^^^^^^^^^^^^
  File "/Users/_me_/Library/Application Support/JMP/Python/3.11/lib/python/site-packages/cmdstanpy/utils/cmdstan.py", line 170, in cmdstan_path
    raise ValueError(
ValueError: No CmdStan installation found, run command "install_cmdstan"or (re)activate your conda environment!

Understand what even this example is asking JMP to do. CmdStan is a a command line program, and the cmdstanpy package is a Python wrapper. So it appears it does something like a Python subprocess, to call out to the OS, to launch the command line program, and monitor the shell output. JMP and the packages installed by the JMP jpip wrapper wind up in JMP's private site-packages directory. We are not running a Conda init, as this is not a Conda distribution and the install_cmdstan packages may or may not configure the environment sufficient to run in the JMP virtual environment. Typically Conda or venv activation sets up environment variables for package paths and directory paths. Likely the install_cmdstan does the same thing. However, all of those are assuming that the same shell environment just configured will be used to run the python executable that's going to import cmdstan. Those assumptions do not hold, JMP has to set up those paths itself to be able to isolate the JMP package environment from other Python instances on your system. So while JMP can import and run the cmdstanpy package the package itself is looking for the cmdstan command line executable and not locating that. Then trying to add in the multiprocessing, is in effect asking JMP's Python to launch jmp.PYTHON_EXE copies running a Python script which then in turn attempts to launch a sub-process to run the command line program doing the task...

My original examples presented the use case of creating a pure Python function that would be run in parallel by launching multiple Python instances. What you are attempting to run, adds the complication of Python wrapper on a command line executable and issues with nested shell environments.

I don't think the use case you are attempting is supportable in the current form. Before trying in JMP you should try the same multiprocessing outside of JMP. If you can get it to work there, you may have success within JMP. ( just replace jmp.PYTHON_EXE with the full path to the python.exe and don't import jmp).

I know a parallel script of basic pure Python works. I believe even imported Python packages that do not depend on shared library imports should work in the parallel scripts... (but I have not tested that). It may require passing the JMP environment's site.USER_BASE through to the parallel jmp.PYTHON_EXE and properly setting its environment to match the same directory environment as JMP itself is setting.

If as you explore, please post your discoveries back to the community.

JMP is limited to by the fact that Python is running on JMP's main event loop. One of the possible solutions for JMP and multi-threading Python is launching independent Python sub-interpreters. That requires Python 3.13 if they get all the kinks out of removing Python's Global Interpreter Lock (GIL). I am researching that, but Python 3.13 will not be in JMP 18. JMP 18 will remain with Python 3.11.x throughout its lifecycle. Depending on timelines and stability of Python 3.13 it is possible such capability could make its way into JMP 19 or 20.

matteo_patelmo · ‎03-30-2024

Thanks for the explanations. I’ll study further and get back in case of new findings.

matteo

Recommended Articles

Get Going with JMP: Essentials for Using JMP

Binning Data Using Conditional IF-THEN Statements

Transforming Data