- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Extract and concatenate csv files from folders that have the same structure
Hi,
I have a whole bunch of csv files in different folders that have common name and csv file is in the same structure, for instance
<1st folder>dfgdwgssd
<2nd folder which have common name> commonname
xxx-commonname.csv
I'm thinking to use script to extract and concatenate all csv files. I found an easy way with bash commands ("for" loop together with "find"), could some one give me a hand? Thank you so much!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Extract and concatenate csv files from folders that have the same structure
Here is a script that should give you a good start:
Names Default To Here( 1 );
// Setup an empty table to start to concatenate other tables to
dtbase = New Table( "Base Table" );
// Have the user select the starting folder
dirpath = Pick Directory( "Pick the Top Level Directory" );
// Get all of the files in all of the folders that are under the picked folder
TheFileList = Files In Directory( dirpath, "recursive" );
// Go through all of the files and read in append those that meet your
// requirement. You will need to amend the IF statement to meet your criteria
For( i = 1, i <= N Items( TheFileList ), i++,
If( Uppercase( Word( -1, TheFileList[i], "." ) ) == "CSV" & Is File( dirpath || TheFileList[i] ) == 1,
dt = Open( dirpath || TheFileList[i] );
dtbase = dtbase << concatenate( dt, append to first table( 1 ) );
Close( dt, nosave );
);
);
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Extract and concatenate csv files from folders that have the same structure
Here is a script that should give you a good start:
Names Default To Here( 1 );
// Setup an empty table to start to concatenate other tables to
dtbase = New Table( "Base Table" );
// Have the user select the starting folder
dirpath = Pick Directory( "Pick the Top Level Directory" );
// Get all of the files in all of the folders that are under the picked folder
TheFileList = Files In Directory( dirpath, "recursive" );
// Go through all of the files and read in append those that meet your
// requirement. You will need to amend the IF statement to meet your criteria
For( i = 1, i <= N Items( TheFileList ), i++,
If( Uppercase( Word( -1, TheFileList[i], "." ) ) == "CSV" & Is File( dirpath || TheFileList[i] ) == 1,
dt = Open( dirpath || TheFileList[i] );
dtbase = dtbase << concatenate( dt, append to first table( 1 ) );
Close( dt, nosave );
);
);
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Extract and concatenate csv files from folders that have the same structure
Hi Jim,
I got it working, but it listed all CSV files, I'm trying to find the way to sort out or specify the 2nd layer. could you please give a bit more tips?
Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Extract and concatenate csv files from folders that have the same structure
Do you know ahead of time what the "common name" is? Or do you need to have the script figure that out? And in that case, will there be more than one possible common name found in the folder?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Extract and concatenate csv files from folders that have the same structure
yes, I do know what the "common name" is for both 2nd layer and the csv files. Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Extract and concatenate csv files from folders that have the same structure
Names Default To Here( 1 );
// Setup an empty table to start to concatenate other tables to
dtbase = New Table( "Base Table" );
// Have the user select the starting folder
dirpath = Pick Directory( "Pick the Top Level Directory" );
// Get all of the files in all of the folders that are under the picked folder
TheFileList = Files In Directory( dirpath, "recursive" );
// Go through all of the files and read in append those that meet your
// requirement. You will need to amend the IF statement to meet your criteria
For( i = 1, i <= N Items( TheFileList ), i++,
If( Uppercase( Word( -1, TheFileList[i], "." ) ) == "CSV" & Is File( dirpath || TheFileList[i] ) == 1,
If( Contains ( TheFileList[i], "<2nd folder common name"),
If( Contains (TheFileList[i],"<common name of the csvfile>"),
print(TheFileList[i]);
dt = Open( dirpath || TheFileList[i] );
dtbase = dtbase << concatenate( dt, append to first table( 1 ) );
Close( dt, nosave );
);
);
);
);
add two more lines to get it working, the alignment/indent needs to be adjusted though.
Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Extract and concatenate csv files from folders that have the same structure
Hi jyw,
In addition to Jim's suggestion you could try the following generalized approach for selecting the files you want. See below for example:
allFiles = FilesInDirectory("$Documents", recursive);
selectedFiles = {};
commonName = "BrownP";
For( i = 1, i<=NItems(allFiles), i++,
If( !IsMissing( Regex(allFiles, EvalInsert(".*<commonName>.*\.csv","<",">" )) ),
InsertInto(selectedFiles, allFiles);
)
);
1. Retrieve all filenames in the My Documents folder and its subfolders. (Replace with your top level folder path).
2. Create an empty list to store the filenames you want.
3. Define the commonName string.
4. Loop through all filenames, selecting only the ones that match the pattern and store in the selectedFiles list. In this example I'm retrieving all the CSV files that contain "BrownP" in the filename. (Replace with your own commonName string).
So, the full implementation (using Jim's code from the previous post):
Names Default To Here( 1 );
// Setup an empty table to start to concatenate other tables to
dtbase = New Table( "Base Table" );
// Have the user select the starting folder
dirpath = Pick Directory( "Pick the Top Level Directory" );
// Get all of the files in all of the folders that are under the picked folder
TheFileList = Files In Directory( dirpath, "recursive" );
// Define common string to search for
commonName = "file";
// Go through all of the files and read in append those that meet your
// requirement. You will need to amend the IF statement to meet your criteria
For( i = 1, i <= N Items( TheFileList ), i++,
If( !Is Missing( Regex( TheFileList, Eval Insert( ".*<commonName>.*\.csv", "<", ">" ) ) ),
dt = Open( dirpath || TheFileList );
dtbase = dtbase << concatenate( dt, append to first table( 1 ) );
Close( dt, nosave );
)
);
Hope this helps!
Best,
Phil
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Extract and concatenate csv files from folders that have the same structure
Hi Phil,
Thank you so much for your tips! I didn't run you script though.
I was using small amount of folders to try out and realized later on that it took ages with large amount of folder (say >1,300), since JMP will go to 2nd directory and extract *.csv file and so on, very very time-consuming though.
In order to improve the efficiency, I thought about using bash to move all csv file to a folder and then concatenate them. A quick check to use bash to do this (extract and concatenate csv files) with >1,300 folders took me 3 seconds with a concatenated csv file of 13MB, 3 seconds!!! I will need to look into the possibility to run bash command inside JSL.
btw,
dt = Open( dirpath || TheFileList,invisible );
will not pop up the data table during processing
Thank you again!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Extract and concatenate csv files from folders that have the same structure
Sure my pleasure! I was trying to point out that you could still use JMP to retrieve the entire list of files and then use REGEX to search the for the filenames that contain <commonName> . Indeed I can imagine with 1300+ folders it could take some time.
A couple pointers on speed:
- Printing to the log does slow things down.
- Also, I would open all the files first before concatenating and closing each one.
*If you're using a Mac you can launch a shell script from JMP. On Windows, unless you're using Cygwin or something like that, Powershell is a good option to write utility procedures that can be called from JMP.
Phil
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Extract and concatenate csv files from folders that have the same structure
Thank you again, Phil!
I'm using Mac, and just searched on Google, found little information about running shell from JMP, do you have some examples?