Over the Holiday break, I had time to tinker with Python. The issue we have run into with parallel processing from the Python/JSL interface is Python typically launches multiple copies of JMP, not Python. The reason for this is we embed Python in JMP, so the executable is JMP instead of Python. I discovered the multiprocessing package allows setting the executable to launch when running the parallel code. This is the missing piece to the puzzle.
The script below I have tested on JMP 15.2.1 on both Mac and Windows with Python 3.9. On my Windows machine I had to use the Path() argument to Python Init(). I have multiple versions of Python and want to ensure the correct one is picked. On my Mac, Python Init() without arguments works fine for my configuration. There is one issue I ran into on Windows, I had to hardcode the path to my pythonw.exe. You will need to change that to match your Python location. Search for FixMe:
Other than the Python Init() parameters and the path to pythonw.exe on Windows, this should run without a problem on JMP 15.2.1 and newer on either Mac or Windows. May run on older versions of JMP that have Python support, but that has not been tested.
You will need to create a defs.py file in the same directory as this script. The multiprocessing package requires that the code to run in parallel come from importable code, not local to the script. So most of the work in this script was making sure sys.path is setup properly so that the defs.py file can be simply imported by the script.
The entire contents of the defs.py file is in the file comments, making it easy to paste into a new file. That contents duplicated below is:
def f(x):
return x*x
The script takes a list of numbers [1,2,3] and in parallel calls f(x) to square the value giving the results [1,4,9].
pyParallel.jsl
Names Default To Here( 1 );
/* Description: Python mulltiprocessing from JMP JSL
* File: pyParallel.jsl, and defs.py
* Author: Paul Nelson
* JMP Statistical Discovery, LLC
*
* Python multiprocessing from an embedded context adds additional challenges.
* Just attempting to run Python parallel processing modules, typically launches
* multiple copies of JMP, not executing the Python code in parallel.
*
* The multiprocessing module allows setting the executable to launch, since we
* are loading Python into JMP, it's an embedded case thus sys.executable => JMP, not Python.
*
*
* Note: Functions to be run in parallel must come from importable code, not local
* code definitions.
* A multiprocessing 'Feature', not bug, See: https://bugs.python.org/issue25053
*
* Known Bugs:
* On Windows within JMP, sys.exec_prefix is empty, so path to pythonw.exe has to be set explicitly
*
* Create a defs.py file with the contents below. Place in the same directory as this JSL file.
def f(x):
return x*x
*/
// On my windows machine since I have multiple versions of Python, I have to specify the
// full path to the python39.dll, on my Mac it just works.
If( Host Is("Windows"),
Python Init(Path("C:\\Users\\panels\\AppData\\Local\\Programs\\Python\\Python39\\python39.dll")),
Python Init()
);
// send the current working directory where we loaded this script, down to Python so we can find defs.py
script_path = Get Default Directory();
Python Send(script_path);
Python Submit("\[
import sys
import os
import multiprocessing
from multiprocessing import Pool
import platform
# The loaded Python does not have the current working directory initialized, so change it
# to the directory of the script, as sent in from JSL
print('script_path: ' + script_path)
hostPlatform = platform.system()
print(hostPlatform)
try:
if 'Windows' == hostPlatform:
# strip leading / off from path if it exists
if '/' == script_path[0]:
script_path = script_path[1:]
norm_path = os.path.normpath(script_path)
print(norm_path)
os.chdir(norm_path)
else:
os.chdir(script_path)
except:
print("Unable to change current working directory", sys.exc_info())
finally:
print(os.getcwd())
# if '.' is not at the beginning of sys.path, prepend so we can look for modules within CWD
if sys.path[0] != '.':
sys.path.insert(0, '.')
print( sys.path )
# here argv is [''] empty string, doesn't seem to bother us in this use case.
#print( 'argv: ', sys.argv )
# prints the path to JMP, not Python
print('sys.executable: ' + sys.executable)
# build up path prefix to python3 exectable
if 'Darwin' == hostPlatform:
pyPath = os.path.join(sys.exec_prefix, 'bin', 'python3')
elif 'Windows' == hostPlatform:
# Bug - on windows sys.exec_prefix is empty in Python within JMP!
pyPath = None
if sys.exec_prefix:
pyPath = os.path.join(sys.exec_prefix, 'pythonw.exe')
else:
# FixMe: hardcode for now
pyPath = 'C:\\Users\\panels\\AppData\\Local\\Programs\\Python\\Python39\\pythonw.exe'
print ('Python Executable: ' + pyPath)
# Finally down to running something in parallel.
#
# Import the function code that will be executed in parallel.
# JMP is not reporting ImportError or ModuleNotFound errors, the scripting just returns -1
# on an import failure, so wrap it within our own try: except: block
try:
import defs as d
# print( d.f(2) )
# Tell multiprocessing to use the Python executable, not JMP to spawn the workers.
multiprocessing.set_executable(pyPath)
# Run the tasks in parallel - (separate Python instances)
with Pool(5) as p:
print(p.map(d.f, [1,2,3])) # Results in [1,4,9] the square of the numbers
except ModuleNotFoundError as mnf:
print(mnf)
print('Check that the moudule is located on your sys.path:')
print( sys.path )
except ImportError as error:
print(error)
except Exception as ex:
print(ex)
]\");