cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar

Opening Parquet Files

I would like for JMP to be able to open Apache Parquet files and handle them in the same way that HDF5, JSON etc...can be handled. Thanks a million.

20 Comments
Ryan_Gilmore
Community Manager
Status changed to: Archived
We are archiving this request. If this is still important please comment with additional details and we will reopen. Thank you!
brae_pete
Level I

Aw man bummer.  For what its worth, I think this is important! 

Parquet files are really efficient in terms of storage and read times, its particularly nice when you have to transfer data from another platform into JMP or out of JMP into another platform.

 

 

brae_pete
Level I
John89
Level II

If JMP really wants to do big data properly it should be able to read parquet/feather files and import the directly with appropriate compression and column definitions. There's not enough memory to open a file, then compress. And this issue will only get worse.

GoodwinJMP
Level I

100% agree with John's comments above.  JMP should be enabled to read parquet and feather directly into a JMP table. 

gcarmiol
Level I

I also would really like the capability to read directly from a parquet file in to a JMP table.  I work regularly with 2GB CSVs (130K rows, 2.5K columns) and have a fairly recent i7 with 64GB RAM laptop.  Opening one of these files takes around 5 minutes for the data scan and then another 4 and a half minutes to import the data.  This is almost 10 minutes of my time just waiting for JMP to open the file.

Our data system can produce a parquet instead of a csv and the same file is less than 200MB and have encoded the correct data types already in it.  When I open one of this parquet files with a single line of code (read_parquet) in pandas it can load it to a dataframe in memory in less than 30 seconds.  Lately I am very hesitant to open the files in JMP even if this would be my preference.  I can open the file in pandas, do a summary get the info I need in half the time it takes to just open it in JMP.

Having this capability would really be beneficial to people out there doing analysis with bigger files, hope it can be enabled in the close future.

 

mia_stephens
Staff

Thank you for the additional comments. We will investigate further.

BayesShark563
Level I

I made an account just to chime in and say the lack of this feature is currently driving users away from jmp in our organization since it is unable to open these files. 

Hi all, I have recently developed and put it into the File Exchange -> Add-ins section this "Apache Parquet file importer" add-in: Apache Parquet file importer 

It imports interactively .parquet file(s) into corresponding JMP table(s): it works both for a single file and for multiple files (all stores in the same folder)

It requires JMP V17 and a supported Python installation with PyArrow package (and its dependencies). 

 

John89
Level II

That's fantastic!  Thanks Massimo