cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
StarfruitBob
Level VI

Multiple File Import, file filtering

Hello everyone,

 

JSL newbie here! I need to Import Multiple Files, where the file names that are imported have some parameters that need to be met.

 

  • Cannot contain certain characters.  These characters are always at the end of the file name.
  • Must include a certain string
  • If there are identical string segments within the file names found, only the one with the most recent system date & time stamp are imported. I need a way to chop up and compare parts of file names.

 

Ex: I want all fruit smoothie drinks containing “Apple”, but not “Tomato”. (file name, date & time YYYYMMDD HHMM)

 

Kiwi_Apple_Mango.csv                                            20220125 1421

Apple_Grape_Kiwi.csv                                              20220125 1243

Durien_Tomato.csv                                                  20220126 1607

Apple_Grape_Kiwi_Tomato.csv                              20220126 1537

Mango_Banana_Durien.csv                                     20220126 0915

Grape_Pineapple_Apple_Avocado.csv                   20220126 0711

Apple_Grape_Kiwi.csv                                              20220127 1111

 

The naming of the files does have structure.  The string I’m looking for is after multiple delimiters.

 

Additional asks

  • Are there any tips for speeding up the search for these files in a folder where many thousands of files are stored?  I can only filter by extension after the initial search. Currently, it takes many minutes to even search and find all the files in this folder.
  • Any tips for importing a growing number of files? The number of imported files will likely top 1000 in a few months.  All of them have the same data structure.  I’m aware I need to “Stack Similar Files”.  These are all data tables.

 

I’ve looked up the functions Words and Munger, but I’m not sure they’re what I need to complete the file import script.

Learning every day!
1 ACCEPTED SOLUTION

Accepted Solutions
Craige_Hales
Super User

Re: Multiple File Import, file filtering

Yes, but it is hard to say which one will be best for you.

 

You might be able to use contains(), and ignore the _ delimiters completely. I don't know if the tokens pineapple and apple are a problem or not, but they could be since one contains the other. Grape and Grapefruit too. Maybe uppercase/lower case resolves it?

 

You might want to learn about regex. (I used regular-expressions.info .) This function returns the character match, or a numeric missing value

regex("applepearorange","carrot|pear")

You can use it with isstring(...) or ismissing(...) to get a true/false answer. The | in the regex is for alternatives.

 

You can also use a regex with contains_item(). I've never used it except for this quick test, and this is the only way I found to pass it a pattern:

Contains Item( "apple_grapefruit_lime", Pat Regex( "carrot|grape" ), "_" )

Which means split the input on _ and test each segment with the regex.

 

Craige

View solution in original post

12 REPLIES 12
jthi
Super User

Multiple File Import, file filtering

First I would get a list of files. There are many methods of doing this in JMP and which to use depends on the speed requirements (and possible data requirements) you have. To get a list of files you can use can use operating system commands (in Windows command-line there is dir and in powershell for example get-childitem (docs.microsoft.com)  to get the file list or directly with JMP you can use Files In Directory() or Multiple File Import() (see Directory Tree: Explore Space Used by Folders  for MFI example). 

 

After you have a list of files in a list or in JMP data table you can start filtering it down with some algorithm. Here are some ideas:

  • Use Ends With() to check for end of string
  • Use Contains() to check for certain string and !Contains() to exclude other strings
  • For identical segments (is it just one common word, combination of words, words in specific order...) Words() and Associative Arrays() are most likely helpful here
  • See Scripting Indexes section for Character functions for more ideas

Without knowing how the datatable will be used, it is a bit difficult to give good tips on the additional asks, besides optional ways of loading the initial list of files. One option would be to create the datatable with some scheduled tasks so it would be ready for users when they need it.

-Jarmo
StarfruitBob
Level VI

Re: Multiple File Import, file filtering

Thank you for your reply, jthi!  I'll be able to test this by early next week and get back to you.

Learning every day!
StarfruitBob
Level VI

Re: Multiple File Import, file filtering

Hello jthi,

 

As soon as I add in Contains to the script, it keeps looping to find files in the folder and never imports the data. Any tips?

 

Multiple File Import(
	<<Set Folder("C:\Data\"),
	<<Set Show Hidden( 0 ),
	<<Set Subfolders( 0 ),
	<<Set Name Filter( "*.csv; " ),
	<<Set Name Enable( 1 ),
	<<Set Size Filter( {456, 1862046} ),
	<<Set Size Enable( 0 ),
	<<Set Date Filter( {3734756690.048, 3734950423.726} ),
	<<Set Date Enable( 0 ),
	<<Set Add File Name Column( 1 ),
	<<Set Add File Size Column( 0 ),
	<<Set Add File Date Column( 0 ),
	<<Set Import Mode( "CSVData" ),
	<<Set Charset( "Best Guess" ),
	<<Set Stack Mode( "Stack Similar" ),
	<<Set CSV Has Headers( 1 ),
	<<Set CSV Allow Numeric( 1 ),
	<<Set CSV First Header Line( 1 ),
	<<Set CSV Number Of Header Lines( 1 ),
	<<Set CSV First Data Line( 2 ),
	<<Set CSV EOF Comma( 1 ),
	<<Set CSV EOF Tab( 0 ),
	<<Set CSV EOF Space( 0 ),
	<<Set CSV EOF Spaces( 0 ),
	<<Set CSV EOF Other( "" ),
	<<Set CSV EOL CRLF( 1 ),
	<<Set CSV EOL CR( 1 ),
	<<Set CSV EOL LF( 1 ),
	<<Set CSV EOL Semicolon( 0 ),
	<<Set CSV EOL Other( "" ),
	<<Set CSV Quote( "\!"" ),
	<<Set CSV Escape( "" ),
	<<Set XML Method( "guess" ),
	<<Set XML Guess( "huge" ),
	<<Set XML Settings( XML Settings() ),
	<<Set JSON Method( "guess" ),
	<<Set JSON Guess( "huge" ),
	<<Set JSON Settings( JSON Settings() ),
	<<Set Import Callback( Empty() ),
	<<!Contains(part, "Tomato"),
	<<Contains(part, "Apple")
) << Import Data
Learning every day!
Craige_Hales
Super User

Re: Multiple File Import, file filtering

(de-quarantined, sorry, hope I picked the right one, not obvious why it was flagged.)

contains() is a function you could use, element-by-element, on the list of files returned from FilesInDirectory(...).

You can pass filenames to the filter that look like this:

??apple*pear*.csv

Each ? matches 1 and each * matches 0 or more.  w9apple_fig_pear_grape.csv would match.

You could probably use something like this

list = files in directory("$desktop",recursive(1));
list = filtereach({name},list,contains(name,".txt"));
namefilterdata = concatitems(list,";");

to create the value to pass to the MFI name filter.

 

The <<methodname list in the MFI parens can only set the values in the interface for MFI.

Craige
StarfruitBob
Level VI

Re: Multiple File Import, file filtering

Just wanted to say that I'll come back to this thread. My focus needs to be elsewhere at work at the moment. I haven't forgotten.

Learning every day!
StarfruitBob
Level VI

Re: Multiple File Import, file filtering

Hello, I've successfully populated a list of file names using Files In Directory().  I've investigated  Filter Each(), but I'm not able to add as many conditions as I'd like, so I'm looking to chain some conditional If statements nested in a For loop. In the background I have a dt of each of the file names, one in each row, all in a single column.

Basically, I want to inspect each element using !contains() or contains() and delete the row, or keep it, accordingly.  Yes, I'm aware this will cause problems with the " i " counter.  The counter is easy enough to adjust. One step at a time though. When I run this script, no error pops up, and nothing happens.  I checked the debugger and wasn't able to find any type of counter for contains(), so I'm a bit confused as to why this isn't working.  Any insight?

files_list = Files In Directory( "C:\Data\" );
dt = Current Data Table();

For( i = 1, i <= N Rows( dt ), i++,
	if( :Column 1[i] == contains(:Column 1[i], "abc"),
	dt <<delete row(i),
	continue()
	);	
);
Learning every day!
StarfruitBob
Level VI

Re: Multiple File Import, file filtering

I've realized part of the problem is with the condition of my if statement, so I've revised to the code below.  It always continues, unless I edit the code to search for the beginning substring of the whole. What I'm searching for is within the string and if I search backwards or forwards, only testing the beginning or end of the string isn't what I'm looking for. 

files_list = Files In Directory( "C:\Data\" );
dt = Current Data Table();

For( i = 1, i <= N Rows( dt ), i++,
	if( contains(:Column 1[i], "abc") == 1,
	dt <<delete row(i),
	continue()
	);	
);
Learning every day!
StarfruitBob
Level VI

Re: Multiple File Import, file filtering

I've now modified my code as below, and it's working, but not optimal. "abc" is not always alone and separate between delimiters "_".  For example, sometimes I may have "_xyzabc_", or "_abcxyz_". Is there a function that iterates through the entire string to find this set of characters?  The string is always in a certain position within the string.

 

files_list = Files In Directory( "C:\Data\" );
dt = Current Data Table();

For( i = 1, i <= N Rows( dt ), i++,
	if( Contains Item(:Column 1[i], "abc", "_") == 1,
	dt <<Delete Row(i),
	continue()
	);	
);
Learning every day!
Craige_Hales
Super User

Re: Multiple File Import, file filtering

Yes, but it is hard to say which one will be best for you.

 

You might be able to use contains(), and ignore the _ delimiters completely. I don't know if the tokens pineapple and apple are a problem or not, but they could be since one contains the other. Grape and Grapefruit too. Maybe uppercase/lower case resolves it?

 

You might want to learn about regex. (I used regular-expressions.info .) This function returns the character match, or a numeric missing value

regex("applepearorange","carrot|pear")

You can use it with isstring(...) or ismissing(...) to get a true/false answer. The | in the regex is for alternatives.

 

You can also use a regex with contains_item(). I've never used it except for this quick test, and this is the only way I found to pass it a pattern:

Contains Item( "apple_grapefruit_lime", Pat Regex( "carrot|grape" ), "_" )

Which means split the input on _ and test each segment with the regex.

 

Craige