cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
ekallou3
Level II

Automate opening an hdf5 file

Hello,

 

I am trying to create a .jsl script that will import the hdf5 files from Tensorflow, identify how many files are there (kernels + bias) and save them as datatables to then translate it to a form readable and usable as a surrogate by JMP. The way I am opening the datatables so far is very manual; I use 

Open( "filename.h5", {"list_of", "dataset_names"});

to import all the dataset names. However, in order to be able to work with the datatables, I am then using this manual way (this is for a Keras 2-layer ANN, the last kernel and bias layer concatenates the others together) : 

layer_0_bias = Data Table( "-model_weights-dense-dense-bias-0" ) << Get As Matrix();
layer_0_kernel = Data Table( "-model_weights-dense-dense-kernel-0" ) << Get As Matrix();
layer_1_bias = Data Table( "-model_weights-dense_1-dense_1-bias-0" ) << Get As Matrix();
layer_1_kernel = Data Table( "-model_weights-dense_1-dense_1-kernel-0" ) << Get As Matrix();
layer_2_bias = Data Table( "-model_weights-dense_2-dense_2-bias-0" ) << Get As Matrix();
layer_2_kernel = Data Table( "-model_weights-dense_2-dense_2-kernel-0" ) << Get As Matrix();

 

What I want to do is automate the process of importing, saving the datatables, and renaming them and understanding how many layers are in the Keras model (number of datatables / 2). Is there a way to understand how many datatables are there and also renaming each one separately without having to do the manual process shown above?

 

Thanks! 

 

 

12 REPLIES 12
hogi
Level XII

Re: Automate opening an hdf5 file

2024 discussion, similar topic:
Help on opening hdf5 files 

Re: Automate opening an hdf5 file

I had a little time to tinker.  Here is a Python script utilizing jmp and h5py import packages to open 'Big Class.jmp' create a 'Big Class.h5' file then from that create a dt2 data table from the .h5 file.

 

# File: h5_example.py
# Author: Paul R. Nelson
# Description: Example showing use of h5py and jmp packages to 
#    create a .h5 file from a JMP data table, and recreate the JMP 
#    table from the .h5 file.  Note: the HDF5 file doesn't preserve 
#    column order so the new table comes back in alphabetic order
#
# Layout of my .h5 file - using a generic layout so it can always be easily read
#                         regardless of name.
#                         Other attrs could be added as needed.
# /table/                 group 'table' has attrs['name'] = table name
# /table/col_name
# /table/...
# /table/nth_col_name
#
# /scripts/script_name
# ...
# /scripts/nth_script_name
# 

import jmp
#from jmputils import jpip
#jpip('install', 'h5py')

import numpy as np
import h5py as h5

#callable function to use with h5py file.visit()
def printname(name):
    print(name)

# Build a hdf5 file from JMP data table. HDF5 is like NumPy and each dataset 
# must be homogeneous, so a JMP like table needs to be multiple datasets.  
# To make it easy, a dataset per column.
def h5_from_dt(file_path, dt):
    """Build and save an .h5 file at at file_path/dt_name.h5 from dt"""
    try:
        f = h5.File( file_path + dt.name + '.h5', 'w')
        h_tbl = f.create_group( 'table' )     # table data
        h_script = f.create_group('scripts')  # table scripts
        
        h_tbl.attrs["name"] = dt.name
        for col in dt:
            if col.dtype == jmp.DataType.Character:            
                ds = h_tbl.create_dataset( col.name, shape=len(col), dtype=h5.string_dtype())
                # I'm cheating - I'm saving the JMP data type so read can determine char type
                ds.attrs['jtype'] = 'jmp_Character'
            elif col.dtype == jmp.DataType.Numeric:
                ds = h_tbl.create_dataset( col.name, shape=len(col), dtype='f8' )
                ds.attrs['jtype'] = 'jmp_Numeric'
                
            ds[:] = dt[col.name]              # assign dt values to dataset
            
    except Exception as err:
        print(f'Error: {err}')
        f.close()
        return None
        
    return f
       
def dt_from_h5(file_path):
    """Create a data table from an .h5 file turning datasets to JMP columns"""
    f = h5.File(file_path, 'r')
    if f and 'table' in f.keys():
       print('A dt like h5.')
       tbl = f['table']
       cols = tbl.keys()
       if cols:
           print(cols)
           col_name = list(cols)[0]
           dt = jmp.DataTable('dt2', rows=len( tbl[ col_name ] ) )
       
           for cname in cols:
               c = tbl[cname]
               print(c)
               print(c.dtype)
               # My cheat on knowing utf-8 char columns
               if c.attrs['jtype'] and c.attrs['jtype'] == 'jmp_Character':
                   print('char col')
                   dt.new_column(cname, jmp.DataType.Character)
                   for x in range( len( c )):
                       # need to turn byte object back to utf-8 string
                       dt[cname][x] = c[x].decode('utf-8')
               else:
                   dt.new_column(cname, jmp.DataType.Numeric)
                   dt[cname] = c                  
           f.close()
           return dt
       else:
           f.close()
           return None
    f.close()  
    return None
         
# open 'Big Class'
dt = jmp.open(jmp.SAMPLE_DATA + 'Big Class.jmp')
try:
    # create 'Big Class.h5' file in my home directory
    hdf = h5_from_dt(jmp.HOME, dt)
    if hdf:
        print(list(hdf.keys()))
        hdf.visit(printname)

        h_tbl = hdf['table']
        print( f'Table Name: {h_tbl.attrs['name']}')

        print( hdf['table/name'])
        print( [x.decode('utf-8') for x in hdf['table/name'] ])

        print( hdf[ 'table/weight'])
        print( [x.item() for x in hdf['table/weight'] ] )

    else:
        print('Unable to create .h5 file from data table.')    

except Exception as err:
    print(f'Error: {err}')
    
finally:
    if hdf:
        hdf.close()

dt2 = dt_from_h5( jmp.HOME + 'Big Class.h5')
print(dt2)

Re: Automate opening an hdf5 file

Up through JMP 17 saves pandas data frame as a CSV file.  JMP 18 with its builtin Python support can create a Data Table in memory from a pandas data frame.  In 18 you have to iterate across the data frame and build up a JMP data table column at a time.  

 

Better data frame support is present in the upcoming in JMP 19 EA-5.

 

The Python package for working with HDF5 files is h5py.