cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Submit your abstract to the call for content for Discovery Summit Americas by April 23. Selected abstracts will be presented at Discovery Summit, Oct. 21- 24.
Discovery is online this week, April 16 and 18. Join us for these exciting interactive sessions.
Choose Language Hide Translation Bar
robot
Level VI

Get and Filter Files in Directory

I have a JMP script to prompt the user for a directory, and then filter for specific file types from within that directory.  My problem is that the filtering process often takes longer than just getting the files.  If the number of files is large (>100,000 files), the total time to collect and filter files can be more than 30 minutes.  Is is there a faster or more efficient way to do this?  I am using JMP11.  Thanks!


// Example.

Names Default To Here( 1 );

dir = Pick Directory( "Select a directory", "/C:/Program Files/SAS/" );

t1 = Tick Seconds();

files = Files In Directory( dir, Recursive );

t2 = Tick Seconds();

t_getfiles = t2 - t1;

n_getfiles = N Items( files );

t3 = Tick Seconds();

For( i = N Items( files ), i >= 1, i--,

  If( !Ends With( files, ".jmp" ),

  Remove From( files, i )

  )

);

t4 = Tick Seconds();

t_filterfiles = t4 - t3;

n_filterfiles = N Items( files );

Show( t_getfiles, n_getfiles, t_filterfiles, n_filterfiles );

1 ACCEPTED SOLUTION

Accepted Solutions
Craige_Hales
Super User

Re: Get and Filter Files in Directory

Yes, rework the filtering loop to remove the N^2 behavior.  JSL { lists } access elements by starting at the front.  You started at the back to prevent the deleted elements from messing up the indexing.  That leads to pretty much the worst case time behavior for manipulating the list, walking i elements to reach the i'th element.  Here's a reworked version that removes the front-most element from the list of files, checks it, and inserts it as the front-most element of the filtered list.


dir = "c:\";


t1 = Tick Seconds();


files = Files In Directory( dir, Recursive );


t2 = Tick Seconds();


t_getfiles = t2 - t1;


n_getfiles = N Items( files );


filteredFiles = {};


t3 = Tick Seconds();


while( (testname = Remove From( files, 1 )) != {},


  If( Ends With( testname[1], ".jmp" ),


  insertinto(filteredFiles,testname,1);


  )


);


t4 = Tick Seconds();


t_filterfiles = t4 - t3;


n_filterfiles = N Items( filteredFiles );


Show( t_getfiles, n_getfiles, t_filterfiles, n_filterfiles );


t_getfiles = 4.0333333333333;

n_getfiles = 171630;

t_filterfiles = 258.5;

n_filterfiles = 975;

About 5 minutes.

Another idea:  you can load a data table like this:


dir = "c:\";


t1 = Tick Seconds();


files = Files In Directory( dir, Recursive );


t2 = Tick Seconds();


t_getfiles = t2 - t1;


n_getfiles = N Items( files );


t3 = tickseconds();


dt = New Table( "directory",


  Add Rows( 0 ),


  New Column( "filename", Character, Nominal, Set Values( files ) ),


  New Column( "isTable",


  Numeric,


  Continuous,


  Format( "Best", 12 ),


  Formula( Ends With( :filename, ".jmp" ) )


  )


);


dt<<runformulas;


dt<<selectwhere(isTable==1);


dtFiltered = dt<<subset(selectedrows(1));


t4=tickseconds();


t_filterfiles = t4 - t3;


n_filterfiles = N rows( dtFiltered );


Show( t_getfiles, n_getfiles, t_filterfiles, n_filterfiles );


t_getfiles = 4.01666666666642;

n_getfiles = 171635;

t_filterfiles = 0.716666666666697;

n_filterfiles = 975;

About 5 seconds.  The <<runFormulas is required; the data table will still be evaluating the formula for the isTable column and the selectwhere won't find anything and the subset will be empty without it.

Craige

View solution in original post

3 REPLIES 3
Craige_Hales
Super User

Re: Get and Filter Files in Directory

Yes, rework the filtering loop to remove the N^2 behavior.  JSL { lists } access elements by starting at the front.  You started at the back to prevent the deleted elements from messing up the indexing.  That leads to pretty much the worst case time behavior for manipulating the list, walking i elements to reach the i'th element.  Here's a reworked version that removes the front-most element from the list of files, checks it, and inserts it as the front-most element of the filtered list.


dir = "c:\";


t1 = Tick Seconds();


files = Files In Directory( dir, Recursive );


t2 = Tick Seconds();


t_getfiles = t2 - t1;


n_getfiles = N Items( files );


filteredFiles = {};


t3 = Tick Seconds();


while( (testname = Remove From( files, 1 )) != {},


  If( Ends With( testname[1], ".jmp" ),


  insertinto(filteredFiles,testname,1);


  )


);


t4 = Tick Seconds();


t_filterfiles = t4 - t3;


n_filterfiles = N Items( filteredFiles );


Show( t_getfiles, n_getfiles, t_filterfiles, n_filterfiles );


t_getfiles = 4.0333333333333;

n_getfiles = 171630;

t_filterfiles = 258.5;

n_filterfiles = 975;

About 5 minutes.

Another idea:  you can load a data table like this:


dir = "c:\";


t1 = Tick Seconds();


files = Files In Directory( dir, Recursive );


t2 = Tick Seconds();


t_getfiles = t2 - t1;


n_getfiles = N Items( files );


t3 = tickseconds();


dt = New Table( "directory",


  Add Rows( 0 ),


  New Column( "filename", Character, Nominal, Set Values( files ) ),


  New Column( "isTable",


  Numeric,


  Continuous,


  Format( "Best", 12 ),


  Formula( Ends With( :filename, ".jmp" ) )


  )


);


dt<<runformulas;


dt<<selectwhere(isTable==1);


dtFiltered = dt<<subset(selectedrows(1));


t4=tickseconds();


t_filterfiles = t4 - t3;


n_filterfiles = N rows( dtFiltered );


Show( t_getfiles, n_getfiles, t_filterfiles, n_filterfiles );


t_getfiles = 4.01666666666642;

n_getfiles = 171635;

t_filterfiles = 0.716666666666697;

n_filterfiles = 975;

About 5 seconds.  The <<runFormulas is required; the data table will still be evaluating the formula for the isTable column and the selectwhere won't find anything and the subset will be empty without it.

Craige
Craige_Hales
Super User

Re: Get and Filter Files in Directory

The first example is much faster in JMP 12 (similar speed to the data table example); it appears to still be showing some N^2 behavior in JMP 11.

Craige
robot
Level VI

Re: Get and Filter Files in Directory

Thanks Craige, that works great!