Solved: Multi file Import: Discovering Files takes too long

Report Inappropriate Content · Sep 8, 2023 06:02 AM

MFI is great - I really love it.

But sometimes the low speed of the file search hurts.
If the data comes from a network drive and I want to access the folder from home office, it can take > 10 minutes for Discovering Files to find my 4 matching file out of thousand of folders/subfolders/files ...

Then it's faster to open Python in parallel, write a short GLOB search and load the files one by one via for loop and open(filename). Much faster than waiting for MFI to populate the Files list with all the file which fail the filename match.

The example with Python/glob shows that the file search can be orders of magnitude faster - Is there any trick in Jmp/MFI to speed this up?

Craige_Hales · Sep 11, 2023 6:34 AM

The glob search is probably faster because it does not retrieve date and size.

https://stackoverflow.com/questions/23430395/glob-search-files-in-date-order

There is no way to tell MFI to skip retrieving date and size (but it could be a wish.)

edit: this is much less noticeable on the machine's local file system; the network disk communication is probably 3X greater to get the name+date+size rather than just the name. Also, you might be able to use the recursive filesInDirectory, which should be similar to the glob search time, and pick the files you want from that full list with a pattern match. And be careful bench-marking; the 2nd time may be much faster if caching is involved--easily 100X if the communication time goes to zero.

(And change date+time to date+size.)

Craige

View solution in original post

hogi · Sep 11, 2023 11:43 AM

I posted a wish to optimize the speed of MFI:
Multi File import: add a fast mode

View solution in original post

Craige_Hales · Sep 11, 2023 6:34 AM

The glob search is probably faster because it does not retrieve date and size.

https://stackoverflow.com/questions/23430395/glob-search-files-in-date-order

There is no way to tell MFI to skip retrieving date and size (but it could be a wish.)

edit: this is much less noticeable on the machine's local file system; the network disk communication is probably 3X greater to get the name+date+size rather than just the name. Also, you might be able to use the recursive filesInDirectory, which should be similar to the glob search time, and pick the files you want from that full list with a pattern match. And be careful bench-marking; the 2nd time may be much faster if caching is involved--easily 100X if the communication time goes to zero.

(And change date+time to date+size.)

Craige

hogi · Sep 11, 2023 11:43 AM

I posted a wish to optimize the speed of MFI:
Multi File import: add a fast mode

hogi · Jun 26, 2024 11:38 AM

Amazingly useful: the new fast mode of MFI in JMP18

Craige_Hales · Jun 26, 2024 01:30 PM

@hogi Is that a button somewhere?

Craige

hogi · Nov 5, 2024 02:58 AM

I often used the Name Filter in MFI. The idea:
The user uses pick files to select some files, then the workflow uses MFI to import the files.

This is now possible via Select Files Manually:

Multi file Import: Discovering Files takes too long

Re: Multi file Import: Discovering Files takes too long

Re: Multi file Import: Discovering Files takes too long

Re: Multi file Import: Discovering Files takes too long

Re: Multi file Import: Discovering Files takes too long

Re: Multi file Import: Discovering Files takes too long

Re: Multi file Import: Discovering Files takes too long

Re: Multi file Import: Discovering Files takes too long