- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Get and Filter Files in Directory
I have a JMP script to prompt the user for a directory, and then filter for specific file types from within that directory. My problem is that the filtering process often takes longer than just getting the files. If the number of files is large (>100,000 files), the total time to collect and filter files can be more than 30 minutes. Is is there a faster or more efficient way to do this? I am using JMP11. Thanks!
// Example.
Names Default To Here( 1 );
dir = Pick Directory( "Select a directory", "/C:/Program Files/SAS/" );
t1 = Tick Seconds();
files = Files In Directory( dir, Recursive );
t2 = Tick Seconds();
t_getfiles = t2 - t1;
n_getfiles = N Items( files );
t3 = Tick Seconds();
For( i = N Items( files ), i >= 1, i--,
If( !Ends With( files, ".jmp" ),
Remove From( files, i )
)
);
t4 = Tick Seconds();
t_filterfiles = t4 - t3;
n_filterfiles = N Items( files );
Show( t_getfiles, n_getfiles, t_filterfiles, n_filterfiles );
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Get and Filter Files in Directory
Yes, rework the filtering loop to remove the N^2 behavior. JSL { lists } access elements by starting at the front. You started at the back to prevent the deleted elements from messing up the indexing. That leads to pretty much the worst case time behavior for manipulating the list, walking i elements to reach the i'th element. Here's a reworked version that removes the front-most element from the list of files, checks it, and inserts it as the front-most element of the filtered list.
dir = "c:\";
t1 = Tick Seconds();
files = Files In Directory( dir, Recursive );
t2 = Tick Seconds();
t_getfiles = t2 - t1;
n_getfiles = N Items( files );
filteredFiles = {};
t3 = Tick Seconds();
while( (testname = Remove From( files, 1 )) != {},
If( Ends With( testname[1], ".jmp" ),
insertinto(filteredFiles,testname,1);
)
);
t4 = Tick Seconds();
t_filterfiles = t4 - t3;
n_filterfiles = N Items( filteredFiles );
Show( t_getfiles, n_getfiles, t_filterfiles, n_filterfiles );
t_getfiles = 4.0333333333333;
n_getfiles = 171630;
t_filterfiles = 258.5;
n_filterfiles = 975;
About 5 minutes.
Another idea: you can load a data table like this:
dir = "c:\";
t1 = Tick Seconds();
files = Files In Directory( dir, Recursive );
t2 = Tick Seconds();
t_getfiles = t2 - t1;
n_getfiles = N Items( files );
t3 = tickseconds();
dt = New Table( "directory",
Add Rows( 0 ),
New Column( "filename", Character, Nominal, Set Values( files ) ),
New Column( "isTable",
Numeric,
Continuous,
Format( "Best", 12 ),
Formula( Ends With( :filename, ".jmp" ) )
)
);
dt<<runformulas;
dt<<selectwhere(isTable==1);
dtFiltered = dt<<subset(selectedrows(1));
t4=tickseconds();
t_filterfiles = t4 - t3;
n_filterfiles = N rows( dtFiltered );
Show( t_getfiles, n_getfiles, t_filterfiles, n_filterfiles );
t_getfiles = 4.01666666666642;
n_getfiles = 171635;
t_filterfiles = 0.716666666666697;
n_filterfiles = 975;
About 5 seconds. The <<runFormulas is required; the data table will still be evaluating the formula for the isTable column and the selectwhere won't find anything and the subset will be empty without it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Get and Filter Files in Directory
Yes, rework the filtering loop to remove the N^2 behavior. JSL { lists } access elements by starting at the front. You started at the back to prevent the deleted elements from messing up the indexing. That leads to pretty much the worst case time behavior for manipulating the list, walking i elements to reach the i'th element. Here's a reworked version that removes the front-most element from the list of files, checks it, and inserts it as the front-most element of the filtered list.
dir = "c:\";
t1 = Tick Seconds();
files = Files In Directory( dir, Recursive );
t2 = Tick Seconds();
t_getfiles = t2 - t1;
n_getfiles = N Items( files );
filteredFiles = {};
t3 = Tick Seconds();
while( (testname = Remove From( files, 1 )) != {},
If( Ends With( testname[1], ".jmp" ),
insertinto(filteredFiles,testname,1);
)
);
t4 = Tick Seconds();
t_filterfiles = t4 - t3;
n_filterfiles = N Items( filteredFiles );
Show( t_getfiles, n_getfiles, t_filterfiles, n_filterfiles );
t_getfiles = 4.0333333333333;
n_getfiles = 171630;
t_filterfiles = 258.5;
n_filterfiles = 975;
About 5 minutes.
Another idea: you can load a data table like this:
dir = "c:\";
t1 = Tick Seconds();
files = Files In Directory( dir, Recursive );
t2 = Tick Seconds();
t_getfiles = t2 - t1;
n_getfiles = N Items( files );
t3 = tickseconds();
dt = New Table( "directory",
Add Rows( 0 ),
New Column( "filename", Character, Nominal, Set Values( files ) ),
New Column( "isTable",
Numeric,
Continuous,
Format( "Best", 12 ),
Formula( Ends With( :filename, ".jmp" ) )
)
);
dt<<runformulas;
dt<<selectwhere(isTable==1);
dtFiltered = dt<<subset(selectedrows(1));
t4=tickseconds();
t_filterfiles = t4 - t3;
n_filterfiles = N rows( dtFiltered );
Show( t_getfiles, n_getfiles, t_filterfiles, n_filterfiles );
t_getfiles = 4.01666666666642;
n_getfiles = 171635;
t_filterfiles = 0.716666666666697;
n_filterfiles = 975;
About 5 seconds. The <<runFormulas is required; the data table will still be evaluating the formula for the isTable column and the selectwhere won't find anything and the subset will be empty without it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Get and Filter Files in Directory
The first example is much faster in JMP 12 (similar speed to the data table example); it appears to still be showing some N^2 behavior in JMP 11.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Get and Filter Files in Directory
Thanks Craige, that works great!