- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Importing large Parquet files into JMP
Hello,
I need to efficiently import large datasets in Parquet format (may be multiple GB) into JMP. I saw the instruction video for CSV import here and used the same approach using the pd.read_parquet instead. My worry in this case is the fact that the import will be very inefficient because the tables need to occupy memory both as Pandas df object and then as the JMP dt. Is there a better approach that may be better suited for PCs with modest specs?
Thanks a lot.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Importing large Parquet files into JMP
First and foremost - You will need plenty of RAM at least 2x the binary file size probably 3x or better, just to do this in-memory directly from Parquet to JMP data table. Use the Python Integration, and the PyArrow Python package.
import jmp from jmputils import jpip jpip('install','pyarrow')
Here is a sample script for JMP 18 that directly walks a parquet file and builds a data table from Python with the data, all in-memory.
In JMP 18 you will need to then walk each column in the data and data table column at a time from Python. You can directly walk the Parquet schema and data or you can convert the parquet table to a pandas dataframe An example of converting from a pandas dataframe to a jmp.DataTable object can be found in the $SAMPLE_SCRIPTS/Python/dt2pandas2dt.jsl and .py scripts.
# parquet.py # Author: Paul R. Nelson # JMP Statistical Discovery LLC # # Description: # Directly read a pyarrow table and create a JMP datatable. # The data for this file comes from a repository that is Apache licensed. # https://github.com/Teradata/kylo/blob/master/samples/sample-data/parquet/userdata1.parquet # import jmp import pyarrow.parquet as pq ver = jmp.__version__.split('.') if(ver[0] == '0' and ver[1] < '5'): print('This version requires JMP 18.0 EA-5 or newer.') pq_Table = pq.read_table('/Your_path/to/userdata1.parquet') #print(pq_Table) print(f'Table Rows: {len(pq_Table)}' ) print(f'Table Shape: {pq_Table.shape}') print(f'Table Schema:\n{pq_Table.schema}') print(f'Column Names: {pq_Table.column_names}') #print(pq_Table.columns) print(f'Num Columns: {pq_Table.num_columns}') dt = jmp.DataTable('From Parquet', pq_Table.num_rows) dt.new_column('first_name', jmp.DataType.Character) dt.new_column('last_name', jmp.DataType.Character) dt.new_column('Salary') # set column properites (widths) jmp.run_jsl(''' // Change column display width: last_name Data Table( "From Parquet" ):last_name << Set Display Width( 105 ); // Change column display width: Salary Data Table( "From Parquet" ):Salary << Set Display Width( 90 ); ''') #Create a data table column from a Python list dt[0] = pq_Table.column(2).to_pylist() dt[1] = pq_Table.column(3).to_pylist()
dt[2] = pq_Table.column(10).to_pylist()
# =================================================================================== # Copyright © 2025 JMP Statistical Discovery LLC, Cary, NC, USA. All rights reserved. # # JMP STATISTICAL DISCOVERY LLC ("JMP") PERMITS THE USE OF THIS COMPUTER SOFTWARE # CODE ("CODE") ON AN AS-IS BASIS AND AUTHORIZES YOU TO USE THE CODE SUBJECT TO # THE TERMS LISTED HEREIN. BY USING THE CODE, YOU AGREE TO THESE TERMS. YOUR USE # OF THE CODE IS AT YOUR OWN RISK. JMP MAKES NO REPRESENTATION OR WARRANTY, # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO, WARRANTIES OF # MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NON-INFRINGEMENT, AND TITLE, # WITH RESPECT TO THE CODE. # # You may use the Code solely as part of a software product you currently have # licensed from JMP, JMP's parent company, SAS Institute Inc. ("SAS US") or one # of SAS' subsidiaries (together with SAS US, "SAS") or authorized agents (the # "Software"), and not for any other purpose. The Code is designed to either # correct an error in the Software or to add functionality to the Software but # has not necessarily been tested. Accordingly, JMP makes no representation or # warranty that the Code (1) will operate error-free or (2) will not contain any # viruses or other applications or executables (including, without limitation, # any "trap doors," "worms" and "time bombs") that will degrade or infect any # software product that you license from JMP or any other software or your # network or systems. JMP is under no obligation to maintain, support, or # continue to distribute the Code. # # Neither JMP nor its licensors shall be liable to you or any third party for any # general, special, direct, indirect, consequential, incidental, or other damages # whatsoever arising out of or related to your use or inability to use the Code, # even if JMP has been advised of the possibility of such damages. Except as # otherwise provided above, the Code is governed by the same agreement that # governs the Software. If you do not have an existing agreement with JMP or SAS # governing the Software, you may not use the Code. # # US export laws and regulations apply to the Code and any other JMP-provided # technology ("Controlled Material"). The Controlled Material originates from the # United States. Customer agrees to comply with these and other applicable export # and import laws and regulations, except as prohibited or penalized by law # ("Trade Law"). Customer warrants that Customer and its users are not: (a) # prohibited by Trade Law from accessing Controlled Material without US # government approval; (b) located in or under control of any country or other # territory subject to general export or trade embargo under Trade Law; or (c) # engaged in any of the following end-uses: nuclear, chemical or biological # weapons; nuclear facilities not under International Atomic Energy Agency # safeguards; missiles or unmanned aerial vehicles capable of long-range use or # weapons delivery, military training or assistance, military or intelligence # end-use in Russia or in any country in Country Group D:5 of the United States # Export Administration Regulations; deep water, Arctic offshore or shale oil or # gas exploration involving Russia or Russian companies, or Russian energy export # pipelines. Customer will not import or use any data within the System that is # subject to the US International Traffic Arms Regulations. United States export # classification information for JMP software and its affiliates is available at # jmp.com/export. # # JMP and all other JMP Statistical Discovery LLC product or service names are # registered trademarks or trademarks of SAS Institute Inc. in the USA and other # countries. ® indicates USA registration. Other brand and product names are # registered trademarks or trademarks of their respective companies. # # ===================================================================================
For JMP 19 EA-4+ the situation is much better.
import jmp import numpy as np import pandas as pd import pyarrow.parquet as pq #parque sample files from # https://github.com/Teradata/kylo/tree/master/samples/sample-data/parquet pq_Table = pq.read_table('/Your_path/to/userdata1.parquet') pd_table = pq_Table.to_pandas() #create table in memory from pandas datadata frame. dt = jmp.from_dataframe(pd_table)
#Same Disclaimer applies.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Importing large Parquet files into JMP
The Python approach is probably better than what follows. But this does work, at least with a toy parquet file.
There is an Apache Drill project that can read parquet files; it wants to be a JDBC (java, not ODBC unfortunately) and the ODBC drivers for it, if they still exist, might not be free. But it has a REST based api and JSL can do that.
This post steps 1,2,3 pointed in the right direction (thanks @ robertspierre). Not sure how you'll extract the file on win; Linux can do it and some window tool can probably do it too. I used https://drill.apache.org/download/ and picked "Drill for Hadoop 3 and non-Hadoop environments, direct download". I also got microsoft-jdk-21.0.6-windows-x64.msi from https://learn.microsoft.com/en-us/java/openjdk/download .
I left the expanded apache-drill-1.21.2 directory (from the tar.gz) on the desktop and started it like this:
cd into apache-drill-1.21.2 then run bin/drill-embedded.bat. I then played with a query against a supplied parquet file, which you don't have to do because...start JMP and run this script:
fields = Associative Array();
fields["queryType"] = "SQL";
fields["query"] = "SELECT * FROM `dfs`.`C:\Users\c\Desktop\apache-drill-1.21.2\sample-data\nation.parquet`";
s = New HTTP Request( URL( "localhost:8047/query.json" ), Method( "POST" ), JSON( Fields( fields ) ), Headers( {"Accept: application/json"} ) ) <<Send;
dt = open(chartoblob(s),json,guess("tall"))
and get this table:
When you start the drill process above, it is launching a webserver that handled the REST api request from JSL; it also has a web interface, which I don't think is going to be interesting for reading parquet files.
https://drill.apache.org/docs/rest-api-introduction/#query might help with the api.
interesting reads before you build too much on this:
https://www.starburst.io/blog/the-death-of-apache-drill/
https://stackoverflow.com/questions/59754457/mapr-driver-discontinued-for-apache-drill-what-now
and yet there is still activity:
https://github.com/apache/drill
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Importing large Parquet files into JMP
Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Importing large Parquet files into JMP
Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Importing large Parquet files into JMP
Where can i try the JMP EA version? My Parquets have too many columns to explicitly define each of them, so the use of from_dataframe() is very alluring.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Importing large Parquet files into JMP
Contact JMP sales, there is an early adopter program. But I think if you have access to JMP 18 through MyJMP portal, you should have access to the Early Adopter releases. But maybe there is an additional non-disclosure form / step.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Importing large Parquet files into JMP
I'm so slow with the grok3 method