I have some python code that can generate about 3 gb of data over 15 minutes. It is currently streaming the data to a text file as a "column" value1|value2|value3.. line and then does some post processing to convert the whole text file into a csv that jmp can open. The csv file can have up to ~30k columns and 100 to 10k rows after it splits the values.
To make the data get to JMP faster, I would like to stream the text data directly from the memory of the python session into a jmp session.
I have tried some ole automation using pycom but it seems that the calls to pycom are very slow which makes it hard to iterate on each line and update the correct row for the couple of values that get produced in that "line".
Is there another way? sqlite memory table? JSL socket? JMP python API?
I was having an issue with Run Program that seemed to do with the options and file name I use for Run Program. If I use pick file(), it crashes for me. Or if I hard code a directory with a space in it.
Do you have any suggestions to make this dynamic? Using JMP 12.0.1.
Example
f = pick file();
//doesn't work
x = runprogram(executable("C:\Python34\python.exe"),
options(f),
readfunction("blob")
);
//doesn't work
x = runprogram(executable("C:\Python34\python.exe"),
options("/C:/Users/User/Desktop/generate.py"),
readfunction("blob")
);
//doesn't work
x = runprogram(executable("C:\Python34\python.exe"),
options("'C:/Google Drive/Work/Scripting/generate.py'"),
readfunction("blob")
);
//works
x = runprogram(executable("C:\Python34\python.exe"),
options("C:/Users/User/Desktop/generate.py"),
readfunction("blob")
);
I think you could move IEEE floating point data or integers between python and JMP using 7.1. struct — Interpret bytes as packed binary data — Python 3.5.0 documentation on the python end and blobToMatrix on the JMP end. No conversion between binary and character and back to binary will speed it up, a lot. Use a file or a socket between them. Are they on the same machine? that will make it a little easier. Not the same machine? might not be IEEE floating point... Character data would need different handling, but should be easier than numeric.
Here's a complete example that runs in about 2 seconds for 1,000,000 doubles:
the python code builds an array of doubles, makes a binary string of bytes, and writes the bytes to stdout. You could use a file if you like, just make sure it is opened in binary mode (wb). The ">" at the beginning of the pack format means "big endian".
"generate.py"
import array
import struct
import math
import sys
# windows python messes up binary newlines...unless...
if sys.platform == "win32":
import os, msvcrt
msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
bigmat = array.array("d")
for x in range(0,1000000):
bigmat.append(math.sqrt(x))
# %s is replaced by "1000000" ... d is for double precision 8 bytes each
binary = struct.pack(">%sd" % len(bigmat), *bigmat)
# print len(binary) # 8000000 bytes
sys.stdout.write(binary)
the JSL uses runProgram to run the python program, and reads a blob back from the stdout. BlobToMatrix specifies "big" to match the big endian data. You could use LoadTextFile( ...BLOB ) instead of runProgram if you make the python program run separately and write a file. You'd still need blobToMatrix.
"fetch.jsl"
x = runprogram(executable("C:\Python27\python.exe"),
options("C:\Users\User\Desktop\pythonExample\generate.py"),
readfunction("blob"));
xx = blobToMatrix(x,"float",8,"big");
// verify...
ok="good";
for(i=0,i<nrows(xx),i++,
if( xx[i+1] != sqrt(i), ok="bad")
);
show(ok);
ok = "good";
http://stackoverflow.com/questions/2374427/python-2-x-write-binary-output-to-stdout had the answer for the binary newline issue.
I was having an issue with Run Program that seemed to do with the options and file name I use for Run Program. If I use pick file(), it crashes for me. Or if I hard code a directory with a space in it.
Do you have any suggestions to make this dynamic? Using JMP 12.0.1.
Example
f = pick file();
//doesn't work
x = runprogram(executable("C:\Python34\python.exe"),
options(f),
readfunction("blob")
);
//doesn't work
x = runprogram(executable("C:\Python34\python.exe"),
options("/C:/Users/User/Desktop/generate.py"),
readfunction("blob")
);
//doesn't work
x = runprogram(executable("C:\Python34\python.exe"),
options("'C:/Google Drive/Work/Scripting/generate.py'"),
readfunction("blob")
);
//works
x = runprogram(executable("C:\Python34\python.exe"),
options("C:/Users/User/Desktop/generate.py"),
readfunction("blob")
);
Very cool way to use Run Program
I don't have the solution but this works in JMP 12.1 on Mac
f = Pick File();
x = RunProgram(
executable("/usr/bin/python"),
options(f),
readfunction("blob")
);
Thanks for looking at it, and good question.
I think you are fighting the windows command line behavior with embedded blanks in file names. In a DOS box, you'd use quotation marks like this:
the escaping gets a little ugly, but here it is in JSL:
runprogram(executable("cmd.exe"),
options({"/C","dir \!"c:\Users\chales\Saved Games\!""}),
readfunction("text"))
I don't have python here at the office, but I anticipate it will have the same behavior. It gets worse if you need to embed quotation marks; another project I'm working on used \" as described in one of the answers in http://stackoverflow.com/questions/7760545/cmd-escape-double-quotes-in-parameter
If you get to choose the path names for your projects, leaving out the blanks will make it easier. If you have a file picker, you need to allow for them.
Craig,
Not much help, but it also works this way in JMP 13. Have to ask JMP support...Maybe it will get fixed in JMP 14