Automate opening an hdf5 file

ekallou3 · Mar 9, 2020 8:08 AM

Hello,

I am trying to create a .jsl script that will import the hdf5 files from Tensorflow, identify how many files are there (kernels + bias) and save them as datatables to then translate it to a form readable and usable as a surrogate by JMP. The way I am opening the datatables so far is very manual; I use

Open( "filename.h5", {"list_of", "dataset_names"});

to import all the dataset names. However, in order to be able to work with the datatables, I am then using this manual way (this is for a Keras 2-layer ANN, the last kernel and bias layer concatenates the others together) :

layer_0_bias = Data Table( "-model_weights-dense-dense-bias-0" ) << Get As Matrix();
layer_0_kernel = Data Table( "-model_weights-dense-dense-kernel-0" ) << Get As Matrix();
layer_1_bias = Data Table( "-model_weights-dense_1-dense_1-bias-0" ) << Get As Matrix();
layer_1_kernel = Data Table( "-model_weights-dense_1-dense_1-kernel-0" ) << Get As Matrix();
layer_2_bias = Data Table( "-model_weights-dense_2-dense_2-bias-0" ) << Get As Matrix();
layer_2_kernel = Data Table( "-model_weights-dense_2-dense_2-kernel-0" ) << Get As Matrix();

What I want to do is automate the process of importing, saving the datatables, and renaming them and understanding how many layers are in the Keras model (number of datatables / 2). Is there a way to understand how many datatables are there and also renaming each one separately without having to do the manual process shown above?

Thanks!

hogi · | Posted in reply to message from briancorcoran 05-14-2020

2024 discussion, similar topic:
Help on opening hdf5 files

Paul_Nelson · | Posted in reply to message from briancorcoran 05-14-2020

I had a little time to tinker. Here is a Python script utilizing jmp and h5py import packages to open 'Big Class.jmp' create a 'Big Class.h5' file then from that create a dt2 data table from the .h5 file.

# File: h5_example.py
# Author: Paul R. Nelson
# Description: Example showing use of h5py and jmp packages to 
#    create a .h5 file from a JMP data table, and recreate the JMP 
#    table from the .h5 file.  Note: the HDF5 file doesn't preserve 
#    column order so the new table comes back in alphabetic order
#
# Layout of my .h5 file - using a generic layout so it can always be easily read
#                         regardless of name.
#                         Other attrs could be added as needed.
# /table/                 group 'table' has attrs['name'] = table name
# /table/col_name
# /table/...
# /table/nth_col_name
#
# /scripts/script_name
# ...
# /scripts/nth_script_name
# 

import jmp
#from jmputils import jpip
#jpip('install', 'h5py')

import numpy as np
import h5py as h5

#callable function to use with h5py file.visit()
def printname(name):
    print(name)

# Build a hdf5 file from JMP data table. HDF5 is like NumPy and each dataset 
# must be homogeneous, so a JMP like table needs to be multiple datasets.  
# To make it easy, a dataset per column.
def h5_from_dt(file_path, dt):
    """Build and save an .h5 file at at file_path/dt_name.h5 from dt"""
    try:
        f = h5.File( file_path + dt.name + '.h5', 'w')
        h_tbl = f.create_group( 'table' )     # table data
        h_script = f.create_group('scripts')  # table scripts
        
        h_tbl.attrs["name"] = dt.name
        for col in dt:
            if col.dtype == jmp.DataType.Character:            
                ds = h_tbl.create_dataset( col.name, shape=len(col), dtype=h5.string_dtype())
                # I'm cheating - I'm saving the JMP data type so read can determine char type
                ds.attrs['jtype'] = 'jmp_Character'
            elif col.dtype == jmp.DataType.Numeric:
                ds = h_tbl.create_dataset( col.name, shape=len(col), dtype='f8' )
                ds.attrs['jtype'] = 'jmp_Numeric'
                
            ds[:] = dt[col.name]              # assign dt values to dataset
            
    except Exception as err:
        print(f'Error: {err}')
        f.close()
        return None
        
    return f
       
def dt_from_h5(file_path):
    """Create a data table from an .h5 file turning datasets to JMP columns"""
    f = h5.File(file_path, 'r')
    if f and 'table' in f.keys():
       print('A dt like h5.')
       tbl = f['table']
       cols = tbl.keys()
       if cols:
           print(cols)
           col_name = list(cols)[0]
           dt = jmp.DataTable('dt2', rows=len( tbl[ col_name ] ) )
       
           for cname in cols:
               c = tbl[cname]
               print(c)
               print(c.dtype)
               # My cheat on knowing utf-8 char columns
               if c.attrs['jtype'] and c.attrs['jtype'] == 'jmp_Character':
                   print('char col')
                   dt.new_column(cname, jmp.DataType.Character)
                   for x in range( len( c )):
                       # need to turn byte object back to utf-8 string
                       dt[cname][x] = c[x].decode('utf-8')
               else:
                   dt.new_column(cname, jmp.DataType.Numeric)
                   dt[cname] = c                  
           f.close()
           return dt
       else:
           f.close()
           return None
    f.close()  
    return None
         
# open 'Big Class'
dt = jmp.open(jmp.SAMPLE_DATA + 'Big Class.jmp')
try:
    # create 'Big Class.h5' file in my home directory
    hdf = h5_from_dt(jmp.HOME, dt)
    if hdf:
        print(list(hdf.keys()))
        hdf.visit(printname)

        h_tbl = hdf['table']
        print( f'Table Name: {h_tbl.attrs['name']}')

        print( hdf['table/name'])
        print( [x.decode('utf-8') for x in hdf['table/name'] ])

        print( hdf[ 'table/weight'])
        print( [x.item() for x in hdf['table/weight'] ] )

    else:
        print('Unable to create .h5 file from data table.')    

except Exception as err:
    print(f'Error: {err}')
    
finally:
    if hdf:
        hdf.close()

dt2 = dt_from_h5( jmp.HOME + 'Big Class.h5')
print(dt2)

Paul_Nelson · | Posted in reply to message from robert_j_moser 05-14-2020

Up through JMP 17 saves pandas data frame as a CSV file. JMP 18 with its builtin Python support can create a Data Table in memory from a pandas data frame. In 18 you have to iterate across the data frame and build up a JMP data table column at a time.

Better data frame support is present in the upcoming in JMP 19 EA-5.

The Python package for working with HDF5 files is h5py.

Re: Automate opening an hdf5 file

Re: Automate opening an hdf5 file

Re: Automate opening an hdf5 file