Comment / Question on bringing data back to JMP from Python.
Currently we can do this :
dt = Python Get( df )
where 'df' is a Python dataframe.
I have noticed that what the above command actually does is write the dataframe out to a CSV in the $TEMP folder, then invoke a JSL to read it in, then delete that temporary CSV. For large files there's obviously an overhead to both writing the file and scanning it / reading it in.
It would be very nice if this could be an in-memory transfer direct from Python to JMP
In JMP 14-17 both the Python Send ( dt ) and Python Get( df ) operated by saving CSV through a temporary file. This is one of the major changes in JMP 18. With JMP 18, the Python Send ( dt ) now creates a jmp.DataTable object in Python. This is not a copy it is a live in memory reference to thedata table. It is able to be read, and modified directly from Python. From JSL if you do a Python Get ( dt ), the data table you will get the in-memory data table reference. Left for compatibility concerns was the original Python Get( df ); method of getting a pandas dataframe into a JMP data table as a copy.
You now have the power to take a data frame in Python and simply iterate on the data frame columns and create in-memory, the equivalent JMP data table. See jmp.DataTable. under the new Python category in JMP 18's scripting Index. There is also a Python subdirectory to Sample Scripts. In there you will find a JMP2pandas.py script that shows two ways to create in-memory a pandas.DataFrame from a jmp.DataTable object, and one example of creating the jmp.DataTable object from the pandas.DataFrame ( in-memory )
# create JMP DataTable from pandas DataFrame
dt2 = jmp.DataTable('BC2',df.shape[0])
for j in range( df.shape[1] ):
dt2.new_column(dt[j].name, dt[j].dtype)
dt2[j] = list(df.iloc[:,j])
Where df is a pandas dataframe, and dt2 is the new JMP DataTable. ( note dt is an also a JMP jmp.DataTable object, the type parameter is a jmp.DataType enumeration value. ). The example file opened 'Big Class.jmp' as dt, creates a pandas.DataFrame, the creates dt2 from the data frame.
Formula columns do not support Python code in JMP 18. It is something that is being investigated, and would be a great feature. There was an enormous amount of infrastructure that needed to be added to make JMP 18 robust, work on startup, support a Python aware script editor, enhanced error reporting, add scripting index, and opening up a bridge from Python to JMP. Many things that had been lacking for some time, have now been enhanced to make what be believe to be an inviting and productive Python environment within JMP.
Now that the heavy lifting has been done, we are not calling it a day. Much more to come.
Follow up note on Expression columns. While the jmp.DataTable.Column object does not support reading or writing the column from Python, you can create the column from Python and work around the issue using jmp.run_jsl()
...
# with a data table dt, having columns 'list' and 'np.array'
# add a new expression column with a formula using run_jsl()
# create an expression column from Python
dt.new_column('expression', jmp.DataType.Expression)
jmp.run_jsl('''
dt = Current Data Table();
:expression << formula(:list + :np.array);
''')
...
Getting an Image from a column is a bit more involved, but is possible using the run_jsl(), but that is a good topic for a blog post all of its own.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.