Subscribe Bookmark RSS Feed

Extract and concatenate csv files from folders that have the same structure

jyw

Occasional Contributor

Joined:

Jul 4, 2016

Hi,

I have a whole bunch of csv files in different folders that have common name and csv file is in the same structure, for instance

<1st folder>dfgdwgssd

     <2nd folder which have common name> commonname

          xxx-commonname.csv

I'm thinking to use script to extract and concatenate all csv files. I found an easy way with bash commands ("for" loop together with "find"), could some one give me a hand? Thank you so much!

1 ACCEPTED SOLUTION

Accepted Solutions
Solution

Here is a script that should give you a good start:

Names Default To Here( 1 );

// Setup an empty table to start to concatenate other tables to

dtbase = New Table( "Base Table" );

// Have the user select the starting folder

dirpath = Pick Directory( "Pick the Top Level Directory" );

// Get all of the files in all of the folders that are under the picked folder

TheFileList = Files In Directory( dirpath, "recursive" );

// Go through all of the files and read in append those that meet your

// requirement.  You will need to amend the IF statement to meet your criteria

For( i = 1, i <= N Items( TheFileList ), i++,

       If( Uppercase( Word( -1, TheFileList, "." ) ) == "CSV" & Is File( dirpath || TheFileList ) == 1,

              dt = Open( dirpath || TheFileList );

              dtbase = dtbase << concatenate( dt, append to first table( 1 ) );

              Close( dt, nosave );

       );

);

Jim
9 REPLIES
Solution

Here is a script that should give you a good start:

Names Default To Here( 1 );

// Setup an empty table to start to concatenate other tables to

dtbase = New Table( "Base Table" );

// Have the user select the starting folder

dirpath = Pick Directory( "Pick the Top Level Directory" );

// Get all of the files in all of the folders that are under the picked folder

TheFileList = Files In Directory( dirpath, "recursive" );

// Go through all of the files and read in append those that meet your

// requirement.  You will need to amend the IF statement to meet your criteria

For( i = 1, i <= N Items( TheFileList ), i++,

       If( Uppercase( Word( -1, TheFileList, "." ) ) == "CSV" & Is File( dirpath || TheFileList ) == 1,

              dt = Open( dirpath || TheFileList );

              dtbase = dtbase << concatenate( dt, append to first table( 1 ) );

              Close( dt, nosave );

       );

);

Jim
jyw

Occasional Contributor

Joined:

Jul 4, 2016

Hi Jim,

I got it working, but it listed all CSV files, I'm trying to find the way to sort out or specify the 2nd layer. could you please give a bit more tips?

Thank you!

txnelson

Super User

Joined:

Jun 22, 2012

Do you know ahead of time what the "common name" is?  Or do you need to have the script figure that out?  And in that case, will there be more than one possible common name found in the folder?

Jim
jyw

Occasional Contributor

Joined:

Jul 4, 2016

yes, I do know what the "common name" is for both 2nd layer and the csv files. Thank you!

jyw

Occasional Contributor

Joined:

Jul 4, 2016

Names Default To Here( 1 );

// Setup an empty table to start to concatenate other tables to

dtbase = New Table( "Base Table" );

// Have the user select the starting folder

dirpath = Pick Directory( "Pick the Top Level Directory" );

// Get all of the files in all of the folders that are under the picked folder

TheFileList = Files In Directory( dirpath, "recursive" );

// Go through all of the files and read in append those that meet your

// requirement.  You will need to amend the IF statement to meet your criteria

For( i = 1, i <= N Items( TheFileList ), i++,

      If( Uppercase( Word( -1, TheFileList[i], "." ) ) == "CSV" & Is File( dirpath || TheFileList[i] ) == 1,

If( Contains ( TheFileList[i], "<2nd folder common name"),

          If( Contains (TheFileList[i],"<common name of the csvfile>"),

  print(TheFileList[i]);

  dt = Open( dirpath || TheFileList[i] );

  dtbase = dtbase << concatenate( dt, append to first table( 1 ) );

  Close( dt, nosave );

);

);

        );

);


add two more lines to get it working, the alignment/indent needs to be adjusted though.


Thank you!

Phil_Brown

Super User

Joined:

Mar 20, 2012

Hi jyw​,

In addition to Jim's suggestion you could try the following generalized approach for selecting the files you want. See below for example:

allFiles = FilesInDirectory("$Documents", recursive);

selectedFiles = {};

commonName = "BrownP";

For( i = 1, i<=NItems(allFiles), i++,

If( !IsMissing( Regex(allFiles, EvalInsert(".*<commonName>.*\.csv","<",">" )) ),

      InsertInto(selectedFiles, allFiles);

)

);

1. Retrieve all filenames in the My Documents folder and its subfolders. (Replace with your top level folder path).

2. Create an empty list to store the filenames you want.

3. Define the commonName string.

4. Loop through all filenames, selecting only the ones that match the pattern and store in the ​selectedFiles​ list. In this example I'm retrieving all the CSV files that contain "BrownP" in the filename. (Replace with your own commonName string).

So, the full implementation (using Jim's code from the previous post):

Names Default To Here( 1 );

// Setup an empty table to start to concatenate other tables to

dtbase = New Table( "Base Table" );

// Have the user select the starting folder

dirpath = Pick Directory( "Pick the Top Level Directory" );

// Get all of the files in all of the folders that are under the picked folder

TheFileList = Files In Directory( dirpath, "recursive" );

// Define common string to search for

commonName = "file";

// Go through all of the files and read in append those that meet your

// requirement.  You will need to amend the IF statement to meet your criteria

For( i = 1, i <= N Items( TheFileList ), i++,

If( !Is Missing( Regex( TheFileList, Eval Insert( ".*<commonName>.*\.csv", "<", ">" ) ) ),

dt = Open( dirpath || TheFileList );

dtbase = dtbase << concatenate( dt, append to first table( 1 ) );

Close( dt, nosave );

)

);

Hope this helps!

Best,

Phil

PDB
jyw

Occasional Contributor

Joined:

Jul 4, 2016

Hi Phil,

Thank you so much for your tips! I didn't run you script though.

I was using small amount of folders to try out and realized later on that it took ages with large amount of folder (say >1,300), since JMP will go to 2nd directory and extract *.csv file and so on, very very time-consuming though.

In order to improve the efficiency, I thought about using bash to move all csv file to a folder and then concatenate them. A quick check to use bash to do this (extract and concatenate csv files) with >1,300 folders took me 3 seconds with a concatenated csv file of 13MB, 3 seconds!!! I will need to look into the possibility to run bash command inside JSL.

btw, dt = Open( dirpath || TheFileList,invisible ); will not pop up the data table during processing

Thank you again!

Phil_Brown

Super User

Joined:

Mar 20, 2012

Sure my pleasure! I was trying to point out that you could still use JMP to retrieve the entire list of files and then use REGEX to search the for the filenames that contain  <commonName> . Indeed I can imagine with 1300+ folders it could take some time.

A couple pointers on speed:

     - Printing to the log does slow things down.

     - Also, I would open all the files first before concatenating and closing each one.

*If you're using a Mac you can launch a shell script from JMP. On Windows, unless you're using Cygwin or something like that, Powershell is a good option to write utility procedures that can be called from JMP.

Phil

PDB
jyw

Occasional Contributor

Joined:

Jul 4, 2016

Thank you again, Phil!

I'm using Mac, and just searched on Google, found little information about running shell from JMP, do you have some examples?