cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
deliaaa
Level II

Using MFI across for multiple folders

Hi, I'm trying to concatenate files using MFI but across different folders in the same main path but due to the massive number of files in the main path, I would like to direct it to specific folders using user input. However, it seems like MFI is only able to select a single folder as the path.

 

Another workaround I was thinking of is to search and copy the files with specific keywords in the file name to a new folder and then work from there, not sure if this way is faster and if there's a way to script this?

 

Thanks~

1 ACCEPTED SOLUTION

Accepted Solutions
Craige_Hales
Super User

Re: Using MFI across for multiple folders

MFI uses a single starting point. Specifying multiple starting points would be a good Wish List item.

 

Using the recursive option from a folder that contains all of the interesting folders, and a file name pattern with the keywords, should be about as fast as other tools that could find the files and move them to a common folder. I've seen it gather one million (short) files in a few minutes.

 

If you are dealing with much more than a million files in the recursive folders, and only have a few (5?) directories you care about, you could always do 5 separate MFIs, then stack those results using JMP's table stack operation.

Craige

View solution in original post

6 REPLIES 6
Craige_Hales
Super User

Re: Using MFI across for multiple folders

MFI uses a single starting point. Specifying multiple starting points would be a good Wish List item.

 

Using the recursive option from a folder that contains all of the interesting folders, and a file name pattern with the keywords, should be about as fast as other tools that could find the files and move them to a common folder. I've seen it gather one million (short) files in a few minutes.

 

If you are dealing with much more than a million files in the recursive folders, and only have a few (5?) directories you care about, you could always do 5 separate MFIs, then stack those results using JMP's table stack operation.

Craige
deliaaa
Level II

Re: Using MFI across for multiple folders

Thanks @Craige_Hales  for the prompt reply! If you happen to have a sample script or can direct me to some resources for this method that you have suggested that would help me a lot! Appreciate!

 

"recursive option from a folder that contains all of the interesting folders, and a file name pattern with the keywords"

Craige_Hales
Super User

Re: Using MFI across for multiple folders

Instead of recursive I should have said Include subfolders. Check the box to get the filter for filenames. The size and time filters can also be used, one or more together. At the right you can see how many files remain. Below I'm using .gif rather than .csv because I found a bunch of old ones in a folder tree.

 

wild cards: * matches 0 or more and ? matches exactly one.wild cards: * matches 0 or more and ? matches exactly one.

Craige
deliaaa
Level II

Re: Using MFI across for multiple folders

Multiple File Import(
	<<Set Folder( "\\fsftp1\ftp\"||tool1_input||"\reports" ),
	<<Set Show Hidden( 0 ),
	<<Set Subfolders( 0 ),
	<<Set Name Filter( num_input ||"*EB*.CST" ),
	<<Set Name Enable( 1 ),
	<<Set Size Filter( {14753758, 16813862} ),
	<<Set Size Enable( 0 ),
	<<Set Date Filter( {3702288653.475, 3702734626.833} ),
	<<Set Date Enable( 0 ),
	<<Set Add File Name Column( 1 ),
	<<Set Add File Size Column( 0 ),
	<<Set Add File Date Column( 1 ),
	<<Set Import Mode( "CSVData" ),
	<<Set Charset( "Best Guess" ),
	<<Set Stack Mode( "Stack Similar" ),
	<<Set CSV Has Headers( 1 ),
	<<Set CSV Allow Numeric( 1 ),
	<<Set CSV First Header Line( 15 ),
	<<Set CSV Number Of Header Lines( 1 ),
	<<Set CSV First Data Line( 16 ),
	<<Set CSV EOF Comma( 0 ),
	<<Set CSV EOF Tab( 0 ),
	<<Set CSV EOF Space( 1 ),
	<<Set CSV EOF Spaces( 1 ),
	<<Set CSV EOF Other( "" ),
	<<Set CSV EOL CRLF( 1 ),
	<<Set CSV EOL CR( 1 ),
	<<Set CSV EOL LF( 1 ),
	<<Set CSV EOL Semicolon( 0 ),
	<<Set CSV EOL Other( "" ),
	<<Set CSV Quote( "\!"" ),
	<<Set CSV Escape( "" ),
) << Import Data;
DT1 = current data table();
DT1 << set name("file1");

Multiple File Import(
	<<Set Folder( "\\fsftp1\ftp\"||tool2_input||"\reports" ),
	<<Set Show Hidden( 0 ),
	<<Set Subfolders( 0 ),
	<<Set Name Filter( num_input ||"*EB*.CST" ),
	<<Set Name Enable( 1 ),
	<<Set Size Filter( {14753758, 16813862} ),
	<<Set Size Enable( 0 ),
	<<Set Date Filter( {3702288653.475, 3702734626.833} ),
	<<Set Date Enable( 0 ),
	<<Set Add File Name Column( 1 ),
	<<Set Add File Size Column( 0 ),
	<<Set Add File Date Column( 1 ),
	<<Set Import Mode( "CSVData" ),
	<<Set Charset( "Best Guess" ),
	<<Set Stack Mode( "Stack Similar" ),
	<<Set CSV Has Headers( 1 ),
	<<Set CSV Allow Numeric( 1 ),
	<<Set CSV First Header Line( 15 ),
	<<Set CSV Number Of Header Lines( 1 ),
	<<Set CSV First Data Line( 16 ),
	<<Set CSV EOF Comma( 0 ),
	<<Set CSV EOF Tab( 0 ),
	<<Set CSV EOF Space( 1 ),
	<<Set CSV EOF Spaces( 1 ),
	<<Set CSV EOF Other( "" ),
	<<Set CSV EOL CRLF( 1 ),
	<<Set CSV EOL CR( 1 ),
	<<Set CSV EOL LF( 1 ),
	<<Set CSV EOL Semicolon( 0 ),
	<<Set CSV EOL Other( "" ),
	<<Set CSV Quote( "\!"" ),
	<<Set CSV Escape( "" ),
) << Import Data;
DT2 = current data table();
DT2 << set name("file2");

Data Table( "file1" ) <<
Concatenate(
	Data Table( "file2" ),
	Output Table( "Final" ),
	Create source column
);

Hi @Craige_Hales ,

 

As my directory has too many files, it was taking way too long (more than 30min and it's not yet done with the "Discovering files" portion, so I have decided to use the option to trigger multiple times of MFI and then concatenate. But I noticed it run the "Discovering files" (3?)times before outputting the 'file1' and another (3?)times before outing 'file2'. Ultimately it did give me file1 and file2 then concatenate into "Final", but was taking longer than it should.  I tried to remove the second MFI and it works fine, was able to output the file1 in within 2min. Below is my script, do you know what went wrong?

 

 

 

deliaaa
Level II

Re: Using MFI across for multiple folders

Hi @Craige_Hales ,

 

As my directory has too many files, it was taking way too long (more than 30min and it's not yet done with the "Discovering files" portion, so I have decided to use the option to trigger multiple times of MFI and then concatenate. But I noticed it run the "Discovering files" (3?)times before outputting the 'file1' and another (3?)times before outing 'file2'. Ultimately it did give me file1 and file2 then concatenate into "Final", but was taking longer than it should.  I tried to remove the second MFI and it works fine, was able to output the file1 in within 2min. Below is my script, do you know what went wrong?

 

Multiple File Import(
	<<Set Folder( "\\fsftp1\ftp\"||tool1_input||"\reports" ),
	<<Set Show Hidden( 0 ),
	<<Set Subfolders( 0 ),
	<<Set Name Filter( num_input ||"*EB*.CST" ),
	<<Set Name Enable( 1 ),
	<<Set Size Filter( {14753758, 16813862} ),
	<<Set Size Enable( 0 ),
	<<Set Date Filter( {3702288653.475, 3702734626.833} ),
	<<Set Date Enable( 0 ),
	<<Set Add File Name Column( 1 ),
	<<Set Add File Size Column( 0 ),
	<<Set Add File Date Column( 1 ),
	<<Set Import Mode( "CSVData" ),
	<<Set Charset( "Best Guess" ),
	<<Set Stack Mode( "Stack Similar" ),
	<<Set CSV Has Headers( 1 ),
	<<Set CSV Allow Numeric( 1 ),
	<<Set CSV First Header Line( 15 ),
	<<Set CSV Number Of Header Lines( 1 ),
	<<Set CSV First Data Line( 16 ),
	<<Set CSV EOF Comma( 0 ),
	<<Set CSV EOF Tab( 0 ),
	<<Set CSV EOF Space( 1 ),
	<<Set CSV EOF Spaces( 1 ),
	<<Set CSV EOF Other( "" ),
	<<Set CSV EOL CRLF( 1 ),
	<<Set CSV EOL CR( 1 ),
	<<Set CSV EOL LF( 1 ),
	<<Set CSV EOL Semicolon( 0 ),
	<<Set CSV EOL Other( "" ),
	<<Set CSV Quote( "\!"" ),
	<<Set CSV Escape( "" ),
) << Import Data;
DT1 = current data table();
DT1 << set name("file1");

Multiple File Import(
	<<Set Folder( "\\fsftp1\ftp\"||tool2_input||"\reports" ),
	<<Set Show Hidden( 0 ),
	<<Set Subfolders( 0 ),
	<<Set Name Filter( num_input ||"*EB*.CST" ),
	<<Set Name Enable( 1 ),
	<<Set Size Filter( {14753758, 16813862} ),
	<<Set Size Enable( 0 ),
	<<Set Date Filter( {3702288653.475, 3702734626.833} ),
	<<Set Date Enable( 0 ),
	<<Set Add File Name Column( 1 ),
	<<Set Add File Size Column( 0 ),
	<<Set Add File Date Column( 1 ),
	<<Set Import Mode( "CSVData" ),
	<<Set Charset( "Best Guess" ),
	<<Set Stack Mode( "Stack Similar" ),
	<<Set CSV Has Headers( 1 ),
	<<Set CSV Allow Numeric( 1 ),
	<<Set CSV First Header Line( 15 ),
	<<Set CSV Number Of Header Lines( 1 ),
	<<Set CSV First Data Line( 16 ),
	<<Set CSV EOF Comma( 0 ),
	<<Set CSV EOF Tab( 0 ),
	<<Set CSV EOF Space( 1 ),
	<<Set CSV EOF Spaces( 1 ),
	<<Set CSV EOF Other( "" ),
	<<Set CSV EOL CRLF( 1 ),
	<<Set CSV EOL CR( 1 ),
	<<Set CSV EOL LF( 1 ),
	<<Set CSV EOL Semicolon( 0 ),
	<<Set CSV EOL Other( "" ),
	<<Set CSV Quote( "\!"" ),
	<<Set CSV Escape( "" ),
) << Import Data;
DT2 = current data table();
DT2 << set name("file2");

Data Table( "file1" ) <<
Concatenate(
	Data Table( "file2" ),
	Output Table( "Final" ),
	Create source column
);
Craige_Hales
Super User

Re: Using MFI across for multiple folders

Hi, sorry, just saw this.

 

I think there are about three progress indicators if it takes long enough. One of them is for the file discovery phase, one of them is for collecting the file attributes (size, date), and one of them is for reading the files. If you are seeing three with the same file discovery title, something else is going on.

When you run it from a script (nice job on building the path!) then you can see all three of the progress indicators and some of them might be brief or not shown.

 

You might want to use something like this rather than current data table():

 

dtList = MultipleFileImport(...)<<ImportData;
if( nitems(dtList) != 1, throw("unexpected tables returned from MFI") );
DT1 = dtList[1];
DT1<<setname(...

MFI's ImportData method returns a list of tables, and you are hoping for exactly one because you expect all of the files to stack. If anything goes wrong, the if( ..., throw(...) ) will help you identify the problem. Using current data table works if nothing goes wrong. dtList[1] is the first and only item in the list.

 

Craige