cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
hogi
Level XII

Open multiple Files

I want to open multiple measurement files and concatenate them to a single file.

Unfortunately, due to the strange header of the files, it's not straightforward to use Import Multiple Files ... 

So, I set up a for loop and load every file via open() ..., process it one by one, and then concatenate them.

 

It's astonishing how much slower for( open(...))  is compared to MFI.

Some tricks that I tried:

- load all files, concatenate once --> helps

- option "invisible" (to get rid of table updates) --> helps a lot

but still, no chance to reach the performance of MFI.

 

Any further tricks?

It should help to use MFI (without stacking), process the data tables separately then and concatenate them.

Is there a command to tell MFI to open the files invisible?

 

 

11 REPLIES 11
Craige_Hales
Super User

Re: Open multiple Files

Good catch, the setup for the data table matrix columns is pretty steep. This way uses one setup for all 40 rows instead of one for every row. (see the temp list.) I see ~4X better for the big class test case:

// use MFI to stack a mixed bag of tables...
// assumes "text" and "file name" are not columns in the data AND assumes
// no commas in the data values. Not sure how you would work around that.
dir = "$temp/testfiles/"; // make sample files here
Open( dir );
Delete Directory( dir );
Create Directory( dir );
dt0 = Open( "$SAMPLE_DATA/Big Class.jmp" );

For Each( {i, idx}, 1 :: 300,
	dt0:height[1] = 1000 + i;
	dt0 << Save( dir || Char( 1000+i ) || ".csv" );
);

t1 = HP Time();

dtList = Multiple File Import(
	<<Set Folder( dir ),
	<<Set Add File Name Column( 1 ),
	<<Set Import Mode( "Row Per Line" ),
	<<Set Stack Mode( "Stack Similar" )
) << Import Data;

dt = dtList[1];

t2 = HP Time();

//dtList = dtList << subset(private);

// clean up "junk". It might not be this simple for your data...
dt << selectwhere( Contains( text, "junk" ) );
dt << deleterows;

// hunt for unique column names
headerrows = dt << getrowswhere( Row() == 1 | FileName[Row()] != FileName[Row() - 1] );
uniquecols = Associative Array( Words( Concat Items( Transform Each( {header}, dt[headerrows, {text}], header[1] ), "," ), "," ) ) << getkeys;

// add the unique column names, character data for now...
For Each( {cname}, uniquecols, dt << New Column( cname, character ) );

// transfer data from text to unique cols.
headerrows |/= N Rows( dt ) + 1; // append sentinel
For( iheader = 1, iheader < N Rows( headerrows ), iheader += 1,
	cols = Words( dt:text[headerrows[iheader]], "," );
	startrow = headerrows[iheader] + 1;
	stoprow = headerrows[iheader + 1] - 1;
	temp = {}; // create a bigger block of rows...
	For( irow = startrow, irow <= stoprow, irow += 1,
		temp[N Items( temp ) + 1] = Words( dt:text[irow], "," )
	);
	dt[startrow :: stoprow, cols] = temp; // ...and do the column lookup once
);

// remove the headers and left over bits
dt << selectrows( headerrows[1 :: N Rows( headerrows ) - 1] );
dt << deleterows;
dt << delete columns( {text, file name} );
dt << delete scripts( "files" );
dt << delete scripts( "source" );

//dtList << subset(visible);

t3 = HP Time();

Show( (t2 - t1) / 1000000, (t3 - t2) / 1000000 );

(t2 - t1) / 1000000 = 2.105032;
(t3 - t2) / 1000000 = 2.65675;

 

Craige
hogi
Level XII

Re: Open multiple Files

What an improvement by collecting the data in a temporary list - and block-writing to the data table!

nice!

 

thanks @Craige_Hales