cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
Apache Parquet file importer

This add-in imports interactively Apache Parquet files (https://parquet.apache.org/) into JMP tables.

It consists of two commands:

  • Single file: select a single .parquet file, checks its validity, then after a confirmation message creates and opens the resulting JMP data table, or issues a warning message if the file is not valid;
  • Multiple files: select a folder with all .parquet files and opens the resulting JMP tables (note: the folder must contain only  valid .parquet files, and no other files or subfolder)

System requirements:

Use the standard Python update tools to install these packages for your Python configuration.

Note: in order to test the add-in, you can find sample Parquet file here: https://github.com/Teradata/kylo/tree/master/samples/sample-data/parquet

Comments
dmmdiego

Massimo,

I tested the add-in in JMP Pro 17.2, to check loading a parquet file generated through pandas, and it did not work, it does not recognize the file (even if I can read it with parquet reader programs such as Tad).

Looking forward for any help, thanks!

Hi dmdiego,

did you install all needed Python packages?

If yes, please share the JMP log so we can see better what is happening.

By the way, JMP V18 (that is expected to ship March 2024) will have a more direct JMP-Python link, so chances are that we will update the add-in.

Cheers, Massimo

 

dmmdiego

Massimo,

I think I got all the packages, this is my setup:

Python version 3.11.7

numpy version 1.26.3

pandas version 2.1.4

matplotlib version 3.8.0

scipy version 1.11.4

sqlite3 (included in Python base)

PyQt5 version 5.15.10

pyarrow version 14.0.2

 

The only thing I can think of, I manage my packages on the base environment through conda, not sure if this is a conflict. Or the way Python was setup on my computer...


This is the log from JMP:

/*:
/**********/

import numpy as np
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq

# Read a Table from Parquet format
table = pq.read_table(parquet_file)

# Convert Table to a pandas-compatible DataFrame
df = table.to_pandas()

#print(df)


/**********/

 

Even with simple sample parquet files, I get the same error: 

dmmdiego_0-1707920721925.png

 

 

 

 

Hi dmdiego,

configuration seems ok, and conda should not be an issue.

The log is not saying much.

Could you please share with us the parquet file you are using, so that we can test directly with it?

Thanks, Massimo

 

 

dmmdiego

Using the sample files in your original link: https://github.com/Teradata/kylo/tree/master/samples/sample-data/parquet, did not work either. So I cannot load any of those files, I get the same error message.

Hi, it should not be like that, but possibly it must be related to the add-in installation, that may conflict with JMP Pro v17.2.

 

Could you please try to copy and to run the script below in a script window (without the add-in), and tell me if it works?

 

Thanks, Massimo

 

Names Default To Here( 1 );

// Pick Parquet file from directory
parquet_file_pf = Pick File( "Select Parquet File", , {"Parquet Files|parquet"} );

// Create Parquet_file string to pass to Python - exclude initial slash 
parquet_file = Substr( parquet_file_pf, 2 );

// Extract Parquet file name without .parquet extension - to be used as JMP table name
parquet_filename = Word( -2, parquet_file, "/." );

// Show( parquet_file );

// Init Python connection
Python Init();

// Send Parquet complete file string to Python
Python Send( parquet_file );

//Python Submit( "print(parquet_file)" );

// Read Parquet file as a table, then read table as pandas DataFrame
Python Submit(
	"\[
import numpy as np
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq

# Read a Table from Parquet format
table = pq.read_table(parquet_file)

# Convert Table to a pandas-compatible DataFrame
df = table.to_pandas()

#print(df)

]\"
);

// Get DataFrame as a JMP Data Table
dt = Python Get( df );

// check dt successfully imported
Try(
	Is Missing( dt );
	table_imported = 0;
	,
	table_imported = 1
);

// if dt is not a valid data table, set error message and close program
If( (table_imported == 0),
	nw = New Window( "Import not successful", Modal, Text Box( "File " || parquet_filename || ".parquet is not a valid Parquet file" ) );
	Stop();
);

// If dt is valid, continue execution
// Rename JMP Data Table as Parquet file name without extension
dt << Set Name( parquet_filename );

// Modal message window to confirm import
import_message = "File " || parquet_filename || ".parquet successfully imported";
nw = New Window( "Import successful", Modal, Text Box( import_message ) );

// Open JMP Data Table
dt << New Data View;

// Save JMP Data Table
// dt << Save();

// Terminate Python session
Python Term();

 

 

dmmdiego

Massimo,

The script works very well, and I can successfully load parquet files. Indeed this will be super useful, in lieu of the add-in, I can just run the script. Thanks for the kind help!

But indeed there is an issue with the add-in which is not working in my current installation of JMP Pro. But the script will work.