Discussions

hogi · May 15, 2023 06:02 PM

I want to open multiple measurement files and concatenate them to a single file.

Unfortunately, due to the strange header of the files, it's not straightforward to use Import Multiple Files ...

So, I set up a for loop and load every file via open() ..., process it one by one, and then concatenate them.

It's astonishing how much slower for( open(...)) is compared to MFI.

Some tricks that I tried:

- load all files, concatenate once --> helps

- option "invisible" (to get rid of table updates) --> helps a lot

but still, no chance to reach the performance of MFI.

Any further tricks?

It should help to use MFI (without stacking), process the data tables separately then and concatenate them.

Is there a command to tell MFI to open the files invisible?

Craige_Hales · May 17, 2023 06:53 PM

Good catch, the setup for the data table matrix columns is pretty steep. This way uses one setup for all 40 rows instead of one for every row. (see the temp list.) I see ~4X better for the big class test case:

// use MFI to stack a mixed bag of tables...
// assumes "text" and "file name" are not columns in the data AND assumes
// no commas in the data values. Not sure how you would work around that.
dir = "$temp/testfiles/"; // make sample files here
Open( dir );
Delete Directory( dir );
Create Directory( dir );
dt0 = Open( "$SAMPLE_DATA/Big Class.jmp" );

For Each( {i, idx}, 1 :: 300,
	dt0:height[1] = 1000 + i;
	dt0 << Save( dir || Char( 1000+i ) || ".csv" );
);

t1 = HP Time();

dtList = Multiple File Import(
	<<Set Folder( dir ),
	<<Set Add File Name Column( 1 ),
	<<Set Import Mode( "Row Per Line" ),
	<<Set Stack Mode( "Stack Similar" )
) << Import Data;

dt = dtList[1];

t2 = HP Time();

//dtList = dtList << subset(private);

// clean up "junk". It might not be this simple for your data...
dt << selectwhere( Contains( text, "junk" ) );
dt << deleterows;

// hunt for unique column names
headerrows = dt << getrowswhere( Row() == 1 | FileName[Row()] != FileName[Row() - 1] );
uniquecols = Associative Array( Words( Concat Items( Transform Each( {header}, dt[headerrows, {text}], header[1] ), "," ), "," ) ) << getkeys;

// add the unique column names, character data for now...
For Each( {cname}, uniquecols, dt << New Column( cname, character ) );

// transfer data from text to unique cols.
headerrows |/= N Rows( dt ) + 1; // append sentinel
For( iheader = 1, iheader < N Rows( headerrows ), iheader += 1,
	cols = Words( dt:text[headerrows[iheader]], "," );
	startrow = headerrows[iheader] + 1;
	stoprow = headerrows[iheader + 1] - 1;
	temp = {}; // create a bigger block of rows...
	For( irow = startrow, irow <= stoprow, irow += 1,
		temp[N Items( temp ) + 1] = Words( dt:text[irow], "," )
	);
	dt[startrow :: stoprow, cols] = temp; // ...and do the column lookup once
);

// remove the headers and left over bits
dt << selectrows( headerrows[1 :: N Rows( headerrows ) - 1] );
dt << deleterows;
dt << delete columns( {text, file name} );
dt << delete scripts( "files" );
dt << delete scripts( "source" );

//dtList << subset(visible);

t3 = HP Time();

Show( (t2 - t1) / 1000000, (t3 - t2) / 1000000 );

(t2 - t1) / 1000000 = 2.105032;
(t3 - t2) / 1000000 = 2.65675;

Craige

hogi · May 19, 2023 04:58 PM

What an improvement by collecting the data in a temporary list - and block-writing to the data table!

nice!

thanks @Craige_Hales

Discussions

Open multiple Files

Re: Open multiple Files

Re: Open multiple Files

Recommended Articles