BookmarkSubscribeRSS Feed
Byron_JMP

Staff

Joined:

Apr 26, 2012

Combine multiple txt or csv files

This script will combine multiple txt or csv files into one file. The script asks for a target directory, opens the first file and then iteratively adds the subsequent files to the first using the Table Concatenate function.  One new column is created in the final file: this column will contain the name of the file that each row came from. If all the the text files have the same (identical) column names, those columns will be stacked (concatenated) all in one column. Unique columns will also be transferred to the final file.

 

The script has several sections block commented out, these sections of the script could be used for combining multiple Excel files. When combining Excel files, there is an assumption that these .xls or .xlsx files will only have one worksheet.

 

 

-------Updated 01Jun2016

I had an interesting question about this script. Someone had a pile of CSV files where the column names were on the second row, and to make matters worse, the column names were semicolon delimited while the text was comma delimited.  One way to deal with this is to add Import Settings to the script.  This is one of those things that I'm happy to have JMP write for me.  I manually opened and imported the file (using File>Open>Open with Preview, then in the dialog set the columns to start on the second row, and added an "other" delimiter of  ";".  After importing the file I went to the source script and found the Import Settings argument.

 

To make the script open this files, I added an expression, importset=expr(import settings(...paste script here...)); up near the top of the script. (its an expression because its not useful to have the import settings evaluated until I need them later when I open the files.)

Then down near the bottom, I added the expression to the open argument, open(filename,importset). 

 

IMPORTANT NOTE:  If you're going to download this second script with the import filter and expect it to work automagically on your unusually formatted CSV tables..  its just not going to work, you have to make you own import setting and substitute them into the settings that are here for and example. 

 

--------There was also a question about how to make this work on a MAC a while back.

I added one other bit so that it works on a mac too (which means I didn't test this on a PC)

 

 

If( Host is( Mac ),
  filepath="/"||prefilepath,
filepath = Convert File Path( prefilepath, Windows )
  );

 

 

-----Updated 10Oct2016

Justin C noticed that file names with multiple "."s didn't work so well and suggested changing the "item" to be -1 so that the file extensions rather than the second item are used for file extensions. Nice fix, and an important idea to keep in mind when working with file names. Thanks Justin!

 

Comments
dale_lehman

Where is the script?  I can't seem to find it anywhere.

Byron_JMP

Dale,

Would you mind checking the attachment again?

I re-attached it.

-Byron

dale_lehman

Thanks, I got it now.

seanlee

Bryan,


I use the JMP 10.0.0 on OS X 10.10.2. There will be a window launched when I run the script for selecting the files. However, I can not select any .txt or .csv file in the launched window. I am not sure if the script needs further revision for JMP (OS X ver.). Could you help for solving this issue? Thank you.

Sean

Byron_JMP

Sean,

 

On the MAC, the dialog window to select a directory or to select a file are identical. The scrip is looking for you to identify a directory containing csv and/or txt files. The script currently allows both csv and txt files in the list of files to concatenate. If you want it to be more restrictive, then just edit out the file type you would like excluded from this part of the script.

 

 

//filter out any non-txt or csv files
For( i2 = 1, n2 >= i2, i2++,
  file=(prefilelist[i2]);
  If( Item( 2, prefilelist[i2], "." ) == "txt" | Item( 2, prefilelist[i2], "." ) == "csv",
  Insert Into( filelist,file),
  show(file)
  )
);

 

 

What would be kind of cool is if the script would return a list of the file extensions in the selected directory and then let the user check off the types of files to add to the filelist variable.

 

If you have time to do some scripting, please post the upgraded script!

 

Best regards,

Byron

kaushik_pushpen

Hi Byron,

Thank you for this valuable script. I am using this script to add multiple .txt files. But it is running after copying the same script into new script. I want to ask that why it is not running when opening the same script in first attempt. Thank you for the answer in advance. This script is very helpful.

Byron_JMP

I'm not sure I understand the question. Is this behavior replicated with other scripts?

kaushik_pushpen

Hello Byron,

It is happening only in this script. It is showing "NItem()" blank error in log window after running the script. So i just copy the whole script and paste it in new script window, then it is running ok.

Thanks

mhegazy

Hi Byron,

Nice script and works like a charm in JMP 12. Was wondering about a good way to begin reading the source data files from a specific row, truncate header info if you will.  For instance, begin with row that starts with the word "CH" if all source files format will be the same?

mhegazy

Hi Bryon and all,

I was able to add a small update: remove trailing file extensions like .csv or .txt from the source column name :

so in the last section instead of :

set each value (filenow)

 

I added :

set each value (remove (filenow,-4,-1))

I was surprised that the negative reference worked !

Byron_JMP

yes, the negative ref is handy, and it works in nearly all the functions that reference character position, as well as in some places you might not expect ;like,  imagine having -1 decimal places.

jiangrong_cao_g

This script doesn't run on my system: Mac OS X 10.10.5 + JMP Pro 12.1.0

tsolomon

Hi Byron, this script works great. I modified slightly to select a subset of the data from the files opened, add columns, change characters to numeric, etc. One question I had, is there a simple way to modify the script to specify the folder where the files are and not have a prompt pop up?

Thanks

TS

smap10060

Is there any easy way to combine multiple txt or csv files side by side using JOIN instead of vertically combining with Concatenate in Combine txt tables.jsl?

Byron_JMP

The same approach will work with "join". Just try substituting the concat argument with a join argument.  (do the join manually, and then copy and the Source script in the resulting table for an example of what the join argument would look like

Diag

Just want to say thank you Byron. That's a great and practical example.

This script saves me so much time !!!

That script does within seconds what was taking me hours and hours of work.

Thank you!!!

KST-CPT

I am having some trouble getting this to run.  I think the issue is I am using files that are essentially comma delimeted text files but with a .log extension. 

I changed the txt to a log in your code at:

Item( -1, prefilelist[i2], "." ) == "log"

 

I get an error of

Send Expects Scriptable Object in access or evaluation of 'Send' , dt <<  /*###*/Run Formulas() /*###*/

at: 

dt<< run formulas();

 

If I change the extensions of my files to .txt, it works.  However I have thousands of files I want to combine and dozens are added daily. Is there a way to get this to work with files that have a .log extension?

 

 

2nd question.

My files have no headers in them.  Is there a way to combine the files then add headers to the data table?  It currently takes the top row of the first file and makes that a header, which is all data.

 

Thanks for any help.

 

 

 

 

David_

I've created a work around for this that involves copying the files to the temp directory and renaming them. I can't see any means to attach a file so here's the code

 

names default to here (1);
Clear Symbols();

//Import settings, sometimes CSV files aren't exaclty formatted as CSV. Add import settings
//This is an expression to use later in the Open argument.
//Column names start on the second row, and are semicolon delimited
importset=expr(	Import Settings(
		End Of Line( CRLF, CR, LF ),
		End Of Field( Comma, Other( ";" ), CSV( 0 ) ),//use ";" as a delimiter too
		Strip Quotes( 1 ),
		Use Apostrophe as Quotation Mark( 0 ),
		Scan Whole File( 1 ),
		Treat empty columns as numeric( 0 ),
		CompressNumericColumns( 0 ),
		CompressCharacterColumns( 0 ),
		CompressAllowListCheck( 0 ),
		Labels( 1 ),
		Column Names Start( 2 ),//starting on second row
		Data Starts( 3 ),//data starts on 3rd row
		Lines To Read( "All" ),
		Year Rule( "20xx" )
	));


//set up variables with starting values
filelist={};   //empty list

//launch dialog to get the file location

prefilepath = munger(Pick Directory( "Browse to the Directory of the .txt / .csv files " ),1,"/","");
If( Host is( Mac ),
	filepath="/"||prefilepath,
filepath = Convert File Path( prefilepath, Windows )
	);
prefilelist = Files In Directory( filepath );
n2=nitems(prefilelist);

//filter out any non-txt or csv files
For( i2 = 1, n2 >= i2, i2++,
	file=(prefilelist[i2]);
	If( Item( 2, prefilelist[i2], "." ) == "txt" | Item( 2, prefilelist[i2], "." ) == "csv",
		 Insert Into( filelist,file),
		 show(file)
	)
);
nf=nitems(filelist); //number of items in the working list tempDir = "$TEMP\JMP\"; Create Directory(tempDir); //create the temp folder in the "%temp%" windows directory //for each .csv file found on the server For( i3 = 1 , i3 <= nf, i3++, copyFileName = filelist[i3]; csvPos = Contains(filelist[i3],".csv"); //determine where in the filename .csv is newFileName = Left(copyFileName, csvPos + 3 ); //determine the filename without the extra bits after .csv Copy File (filepath||filelist[i3], tempDir||copyFileName);//copy the file to the temp folder Rename File (tempDir||copyFileName, newFileName);//rename the file without the extra bits after .csv filelist[i3] = newFileName; //replace the reference in the array to this file );//end of the for loop cctable= New Table( "Combined data table ");//make an empty table cctable<<New Column( "Source", Character, Nominal ); For( iii = 1 , iii <= nf, iii++, //this starts the first loop filenow = ( filelist[iii] ); fileopen=(tempDir||filenow); //dt=open(fileopen,private); dt=open(fileopen,importset,private);//Import settings used in the open argument New Column( "Source", Character, Nominal ); :Source << set each value( filenow ); //dt<<new column("Source", character, nominal)<<set each value(9999); dt << Run Formulas(); //add the current table to the bottom of the combined data table cctable << Concatenate( Data Table( dt ), Append to first table ); //don't use "Create Source Column" argument Close( dt, NoSave );//after concatenating the table, close it and move on );//end of the first for loop delete directory(tempDir); //delete the temp directory
Article Labels
Contributors