Opening Parquet Files

kevin_lennon · ‎05-04-2018

I would like for JMP to be able to open Apache Parquet files and handle them in the same way that HDF5, JSON etc...can be handled. Thanks a million.

Ryan_Gilmore · ‎10-26-2022

We are archiving this request. If this is still important please comment with additional details and we will reopen. Thank you!

brae_pete · ‎10-28-2022

Aw man bummer. For what its worth, I think this is important!

Parquet files are really efficient in terms of storage and read times, its particularly nice when you have to transfer data from another platform into JMP or out of JMP into another platform.

brae_pete · ‎10-28-2022

Some extra information for the curious:

https://www.databricks.com/glossary/what-is-parquet#:~:text=What%20is%20Parquet%3F,handle%20complex%....

John89 · ‎02-03-2023

If JMP really wants to do big data properly it should be able to read parquet/feather files and import the directly with appropriate compression and column definitions. There's not enough memory to open a file, then compress. And this issue will only get worse.

GoodwinJMP · ‎03-13-2023

100% agree with John's comments above. JMP should be enabled to read parquet and feather directly into a JMP table.

gcarmiol · ‎03-13-2023

I also would really like the capability to read directly from a parquet file in to a JMP table. I work regularly with 2GB CSVs (130K rows, 2.5K columns) and have a fairly recent i7 with 64GB RAM laptop. Opening one of these files takes around 5 minutes for the data scan and then another 4 and a half minutes to import the data. This is almost 10 minutes of my time just waiting for JMP to open the file.

Our data system can produce a parquet instead of a csv and the same file is less than 200MB and have encoded the correct data types already in it. When I open one of this parquet files with a single line of code (read_parquet) in pandas it can load it to a dataframe in memory in less than 30 seconds. Lately I am very hesitant to open the files in JMP even if this would be my preference. I can open the file in pandas, do a summary get the info I need in half the time it takes to just open it in JMP.

Having this capability would really be beneficial to people out there doing analysis with bigger files, hope it can be enabled in the close future.

mia_stephens · ‎03-28-2023

Thank you for the additional comments. We will investigate further.

BayesShark563 · ‎04-06-2023

I made an account just to chime in and say the lack of this feature is currently driving users away from jmp in our organization since it is unable to open these files.

MassimoMartucci · ‎06-05-2023

Hi all, I have recently developed and put it into the File Exchange -> Add-ins section this "Apache Parquet file importer" add-in: Apache Parquet file importer

It imports interactively .parquet file(s) into corresponding JMP table(s): it works both for a single file and for multiple files (all stores in the same folder)

It requires JMP V17 and a supported Python installation with PyArrow package (and its dependencies).

John89 · ‎06-12-2023

That's fantastic! Thanks Massimo