cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
View Original Published Thread

Extract and concatenate csv files from folders that have the same structure

jyw
jyw
Level II

Hi,

I have a whole bunch of csv files in different folders that have common name and csv file is in the same structure, for instance

<1st folder>dfgdwgssd

     <2nd folder which have common name> commonname

          xxx-commonname.csv

I'm thinking to use script to extract and concatenate all csv files. I found an easy way with bash commands ("for" loop together with "find"), could some one give me a hand? Thank you so much!

1 ACCEPTED SOLUTION

Accepted Solutions
txnelson
Super User


Re: Extract and concatenate csv files from folders that have the same structure

Here is a script that should give you a good start:

 

 

Names Default To Here( 1 );
// Setup an empty table to start to concatenate other tables to
dtbase = New Table( "Base Table" );
// Have the user select the starting folder
dirpath = Pick Directory( "Pick the Top Level Directory" );
// Get all of the files in all of the folders that are under the picked folder
TheFileList = Files In Directory( dirpath, "recursive" );
 
// Go through all of the files and read in append those that meet your
// requirement.  You will need to amend the IF statement to meet your criteria
For( i = 1, i <= N Items( TheFileList ), i++,
       If( Uppercase( Word( -1, TheFileList[i], "." ) ) == "CSV" & Is File( dirpath || TheFileList[i] ) == 1,
              dt = Open( dirpath || TheFileList[i] );
              dtbase = dtbase << concatenate( dt, append to first table( 1 ) );
              Close( dt, nosave );
       );
);

 

Jim

View solution in original post

9 REPLIES 9
txnelson
Super User


Re: Extract and concatenate csv files from folders that have the same structure

Here is a script that should give you a good start:

 

 

Names Default To Here( 1 );
// Setup an empty table to start to concatenate other tables to
dtbase = New Table( "Base Table" );
// Have the user select the starting folder
dirpath = Pick Directory( "Pick the Top Level Directory" );
// Get all of the files in all of the folders that are under the picked folder
TheFileList = Files In Directory( dirpath, "recursive" );
 
// Go through all of the files and read in append those that meet your
// requirement.  You will need to amend the IF statement to meet your criteria
For( i = 1, i <= N Items( TheFileList ), i++,
       If( Uppercase( Word( -1, TheFileList[i], "." ) ) == "CSV" & Is File( dirpath || TheFileList[i] ) == 1,
              dt = Open( dirpath || TheFileList[i] );
              dtbase = dtbase << concatenate( dt, append to first table( 1 ) );
              Close( dt, nosave );
       );
);

 

Jim
jyw
jyw
Level II


Re: Extract and concatenate csv files from folders that have the same structure

Hi Jim,

I got it working, but it listed all CSV files, I'm trying to find the way to sort out or specify the 2nd layer. could you please give a bit more tips?

Thank you!

txnelson
Super User


Re: Extract and concatenate csv files from folders that have the same structure

Do you know ahead of time what the "common name" is?  Or do you need to have the script figure that out?  And in that case, will there be more than one possible common name found in the folder?

Jim
jyw
jyw
Level II


Re: Extract and concatenate csv files from folders that have the same structure

yes, I do know what the "common name" is for both 2nd layer and the csv files. Thank you!

jyw
jyw
Level II


Re: Extract and concatenate csv files from folders that have the same structure

 

Names Default To Here( 1 );
// Setup an empty table to start to concatenate other tables to
dtbase = New Table( "Base Table" );
// Have the user select the starting folder
dirpath = Pick Directory( "Pick the Top Level Directory" );
// Get all of the files in all of the folders that are under the picked folder
TheFileList = Files In Directory( dirpath, "recursive" );
 
// Go through all of the files and read in append those that meet your
// requirement.  You will need to amend the IF statement to meet your criteria
For( i = 1, i <= N Items( TheFileList ), i++,
      If( Uppercase( Word( -1, TheFileList[i], "." ) ) == "CSV" & Is File( dirpath || TheFileList[i] ) == 1,
If( Contains ( TheFileList[i], "<2nd folder common name"),
          If( Contains (TheFileList[i],"<common name of the csvfile>"),
  print(TheFileList[i]);
  dt = Open( dirpath || TheFileList[i] );
  dtbase = dtbase << concatenate( dt, append to first table( 1 ) );
  Close( dt, nosave );
);
);
        );
 
);

 

 

add two more lines to get it working, the alignment/indent needs to be adjusted though.

 

Thank you!

Phil_Brown
Super User (Alumni)


Re: Extract and concatenate csv files from folders that have the same structure

Hi jyw​,

 

In addition to Jim's suggestion you could try the following generalized approach for selecting the files you want. See below for example:

 

 

allFiles = FilesInDirectory("$Documents", recursive);
selectedFiles = {};
commonName = "BrownP";
For( i = 1, i<=NItems(allFiles), i++,
If( !IsMissing( Regex(allFiles, EvalInsert(".*<commonName>.*\.csv","<",">" )) ),
      InsertInto(selectedFiles, allFiles);
)
 
);

 

 

 

1. Retrieve all filenames in the My Documents folder and its subfolders. (Replace with your top level folder path).

2. Create an empty list to store the filenames you want.

3. Define the commonName string.

4. Loop through all filenames, selecting only the ones that match the pattern and store in the ​selectedFiles​ list. In this example I'm retrieving all the CSV files that contain "BrownP" in the filename. (Replace with your own commonName string).

 

 

So, the full implementation (using Jim's code from the previous post):

 

 

 

Names Default To Here( 1 );
// Setup an empty table to start to concatenate other tables to
dtbase = New Table( "Base Table" );
// Have the user select the starting folder
dirpath = Pick Directory( "Pick the Top Level Directory" );
// Get all of the files in all of the folders that are under the picked folder
TheFileList = Files In Directory( dirpath, "recursive" );
 
// Define common string to search for
commonName = "file";
 
// Go through all of the files and read in append those that meet your
// requirement.  You will need to amend the IF statement to meet your criteria
For( i = 1, i <= N Items( TheFileList ), i++,
If( !Is Missing( Regex( TheFileList, Eval Insert( ".*<commonName>.*\.csv", "<", ">" ) ) ),
dt = Open( dirpath || TheFileList );
dtbase = dtbase << concatenate( dt, append to first table( 1 ) );
Close( dt, nosave );
)
 
);

 

 

 

Hope this helps!

 

Best,

Phil

PDB
jyw
jyw
Level II


Re: Extract and concatenate csv files from folders that have the same structure

Hi Phil,

 

Thank you so much for your tips! I didn't run you script though.

 

I was using small amount of folders to try out and realized later on that it took ages with large amount of folder (say >1,300), since JMP will go to 2nd directory and extract *.csv file and so on, very very time-consuming though.

 

In order to improve the efficiency, I thought about using bash to move all csv file to a folder and then concatenate them. A quick check to use bash to do this (extract and concatenate csv files) with >1,300 folders took me 3 seconds with a concatenated csv file of 13MB, 3 seconds!!! I will need to look into the possibility to run bash command inside JSL.

 

btw,

dt = Open( dirpath || TheFileList,invisible );

will not pop up the data table during processing

 

Thank you again!

Phil_Brown
Super User (Alumni)


Re: Extract and concatenate csv files from folders that have the same structure

Sure my pleasure! I was trying to point out that you could still use JMP to retrieve the entire list of files and then use REGEX to search the for the filenames that contain  <commonName> . Indeed I can imagine with 1300+ folders it could take some time.

A couple pointers on speed:

     - Printing to the log does slow things down.

     - Also, I would open all the files first before concatenating and closing each one.

*If you're using a Mac you can launch a shell script from JMP. On Windows, unless you're using Cygwin or something like that, Powershell is a good option to write utility procedures that can be called from JMP.

Phil

PDB
jyw
jyw
Level II


Re: Extract and concatenate csv files from folders that have the same structure

Thank you again, Phil!

I'm using Mac, and just searched on Google, found little information about running shell from JMP, do you have some examples?