- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Multiple File Import, file filtering
Hello everyone,
JSL newbie here! I need to Import Multiple Files, where the file names that are imported have some parameters that need to be met.
- Cannot contain certain characters. These characters are always at the end of the file name.
- Must include a certain string
- If there are identical string segments within the file names found, only the one with the most recent system date & time stamp are imported. I need a way to chop up and compare parts of file names.
Ex: I want all fruit smoothie drinks containing “Apple”, but not “Tomato”. (file name, date & time YYYYMMDD HHMM)
Kiwi_Apple_Mango.csv 20220125 1421
Apple_Grape_Kiwi.csv 20220125 1243
Durien_Tomato.csv 20220126 1607
Apple_Grape_Kiwi_Tomato.csv 20220126 1537
Mango_Banana_Durien.csv 20220126 0915
Grape_Pineapple_Apple_Avocado.csv 20220126 0711
Apple_Grape_Kiwi.csv 20220127 1111
The naming of the files does have structure. The string I’m looking for is after multiple delimiters.
Additional asks
- Are there any tips for speeding up the search for these files in a folder where many thousands of files are stored? I can only filter by extension after the initial search. Currently, it takes many minutes to even search and find all the files in this folder.
- Any tips for importing a growing number of files? The number of imported files will likely top 1000 in a few months. All of them have the same data structure. I’m aware I need to “Stack Similar Files”. These are all data tables.
I’ve looked up the functions Words and Munger, but I’m not sure they’re what I need to complete the file import script.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Multiple File Import, file filtering
Yes, but it is hard to say which one will be best for you.
You might be able to use contains(), and ignore the _ delimiters completely. I don't know if the tokens pineapple and apple are a problem or not, but they could be since one contains the other. Grape and Grapefruit too. Maybe uppercase/lower case resolves it?
You might want to learn about regex. (I used regular-expressions.info .) This function returns the character match, or a numeric missing value
regex("applepearorange","carrot|pear")
You can use it with isstring(...) or ismissing(...) to get a true/false answer. The | in the regex is for alternatives.
You can also use a regex with contains_item(). I've never used it except for this quick test, and this is the only way I found to pass it a pattern:
Contains Item( "apple_grapefruit_lime", Pat Regex( "carrot|grape" ), "_" )
Which means split the input on _ and test each segment with the regex.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Multiple File Import, file filtering
First I would get a list of files. There are many methods of doing this in JMP and which to use depends on the speed requirements (and possible data requirements) you have. To get a list of files you can use can use operating system commands (in Windows command-line there is dir and in powershell for example get-childitem (docs.microsoft.com) to get the file list or directly with JMP you can use Files In Directory() or Multiple File Import() (see Directory Tree: Explore Space Used by Folders for MFI example).
After you have a list of files in a list or in JMP data table you can start filtering it down with some algorithm. Here are some ideas:
- Use Ends With() to check for end of string
- Use Contains() to check for certain string and !Contains() to exclude other strings
- For identical segments (is it just one common word, combination of words, words in specific order...) Words() and Associative Arrays() are most likely helpful here
- See Scripting Indexes section for Character functions for more ideas
Without knowing how the datatable will be used, it is a bit difficult to give good tips on the additional asks, besides optional ways of loading the initial list of files. One option would be to create the datatable with some scheduled tasks so it would be ready for users when they need it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Multiple File Import, file filtering
Thank you for your reply, jthi! I'll be able to test this by early next week and get back to you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Multiple File Import, file filtering
Hello jthi,
As soon as I add in Contains to the script, it keeps looping to find files in the folder and never imports the data. Any tips?
Multiple File Import(
<<Set Folder("C:\Data\"),
<<Set Show Hidden( 0 ),
<<Set Subfolders( 0 ),
<<Set Name Filter( "*.csv; " ),
<<Set Name Enable( 1 ),
<<Set Size Filter( {456, 1862046} ),
<<Set Size Enable( 0 ),
<<Set Date Filter( {3734756690.048, 3734950423.726} ),
<<Set Date Enable( 0 ),
<<Set Add File Name Column( 1 ),
<<Set Add File Size Column( 0 ),
<<Set Add File Date Column( 0 ),
<<Set Import Mode( "CSVData" ),
<<Set Charset( "Best Guess" ),
<<Set Stack Mode( "Stack Similar" ),
<<Set CSV Has Headers( 1 ),
<<Set CSV Allow Numeric( 1 ),
<<Set CSV First Header Line( 1 ),
<<Set CSV Number Of Header Lines( 1 ),
<<Set CSV First Data Line( 2 ),
<<Set CSV EOF Comma( 1 ),
<<Set CSV EOF Tab( 0 ),
<<Set CSV EOF Space( 0 ),
<<Set CSV EOF Spaces( 0 ),
<<Set CSV EOF Other( "" ),
<<Set CSV EOL CRLF( 1 ),
<<Set CSV EOL CR( 1 ),
<<Set CSV EOL LF( 1 ),
<<Set CSV EOL Semicolon( 0 ),
<<Set CSV EOL Other( "" ),
<<Set CSV Quote( "\!"" ),
<<Set CSV Escape( "" ),
<<Set XML Method( "guess" ),
<<Set XML Guess( "huge" ),
<<Set XML Settings( XML Settings() ),
<<Set JSON Method( "guess" ),
<<Set JSON Guess( "huge" ),
<<Set JSON Settings( JSON Settings() ),
<<Set Import Callback( Empty() ),
<<!Contains(part, "Tomato"),
<<Contains(part, "Apple")
) << Import Data
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Multiple File Import, file filtering
(de-quarantined, sorry, hope I picked the right one, not obvious why it was flagged.)
contains() is a function you could use, element-by-element, on the list of files returned from FilesInDirectory(...).
You can pass filenames to the filter that look like this:
??apple*pear*.csv
Each ? matches 1 and each * matches 0 or more. w9apple_fig_pear_grape.csv would match.
You could probably use something like this
list = files in directory("$desktop",recursive(1));
list = filtereach({name},list,contains(name,".txt"));
namefilterdata = concatitems(list,";");
to create the value to pass to the MFI name filter.
The <<methodname list in the MFI parens can only set the values in the interface for MFI.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Multiple File Import, file filtering
Just wanted to say that I'll come back to this thread. My focus needs to be elsewhere at work at the moment. I haven't forgotten.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Multiple File Import, file filtering
Hello, I've successfully populated a list of file names using Files In Directory(). I've investigated Filter Each(), but I'm not able to add as many conditions as I'd like, so I'm looking to chain some conditional If statements nested in a For loop. In the background I have a dt of each of the file names, one in each row, all in a single column.
Basically, I want to inspect each element using !contains() or contains() and delete the row, or keep it, accordingly. Yes, I'm aware this will cause problems with the " i " counter. The counter is easy enough to adjust. One step at a time though. When I run this script, no error pops up, and nothing happens. I checked the debugger and wasn't able to find any type of counter for contains(), so I'm a bit confused as to why this isn't working. Any insight?
files_list = Files In Directory( "C:\Data\" );
dt = Current Data Table();
For( i = 1, i <= N Rows( dt ), i++,
if( :Column 1[i] == contains(:Column 1[i], "abc"),
dt <<delete row(i),
continue()
);
);
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Multiple File Import, file filtering
I've realized part of the problem is with the condition of my if statement, so I've revised to the code below. It always continues, unless I edit the code to search for the beginning substring of the whole. What I'm searching for is within the string and if I search backwards or forwards, only testing the beginning or end of the string isn't what I'm looking for.
files_list = Files In Directory( "C:\Data\" );
dt = Current Data Table();
For( i = 1, i <= N Rows( dt ), i++,
if( contains(:Column 1[i], "abc") == 1,
dt <<delete row(i),
continue()
);
);
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Multiple File Import, file filtering
I've now modified my code as below, and it's working, but not optimal. "abc" is not always alone and separate between delimiters "_". For example, sometimes I may have "_xyzabc_", or "_abcxyz_". Is there a function that iterates through the entire string to find this set of characters? The string is always in a certain position within the string.
files_list = Files In Directory( "C:\Data\" );
dt = Current Data Table();
For( i = 1, i <= N Rows( dt ), i++,
if( Contains Item(:Column 1[i], "abc", "_") == 1,
dt <<Delete Row(i),
continue()
);
);
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Multiple File Import, file filtering
Yes, but it is hard to say which one will be best for you.
You might be able to use contains(), and ignore the _ delimiters completely. I don't know if the tokens pineapple and apple are a problem or not, but they could be since one contains the other. Grape and Grapefruit too. Maybe uppercase/lower case resolves it?
You might want to learn about regex. (I used regular-expressions.info .) This function returns the character match, or a numeric missing value
regex("applepearorange","carrot|pear")
You can use it with isstring(...) or ismissing(...) to get a true/false answer. The | in the regex is for alternatives.
You can also use a regex with contains_item(). I've never used it except for this quick test, and this is the only way I found to pass it a pattern:
Contains Item( "apple_grapefruit_lime", Pat Regex( "carrot|grape" ), "_" )
Which means split the input on _ and test each segment with the regex.