cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
rverma
Level III

Parallel Assign concatenating JMP tables inside a for loop

I have several folders and inside each folder there are several CSV files. Each CSV file has around 500,000 rows. I use a JSL script to concatenate all csv files inside each folder into one JMP table. This process is repeated on each folder and hence in the end there are as many concatenated JMP tables as the no of folders. But it takes a lot of time.

I searched for parallel programming in JSL and came to know about Parallel Assign. I need some guidance on how to put the script below inside Parallel Assign to speed up the concatenation. I will greatly appreciate it.

path = munger(Pick Directory( "Browse to the Directory of the .txt / .csv files " ),1,"/","");
Print(dirList(path));
folderlist={};
folderlist = dirList(path);
count=nitems(folderlist);
//prefolderlist = Files In Directory( path );
//filepath = Convert File Path(path, Windows )
For( j2 = 1, count >= j2, j2++,
	folderpath= path || folderlist[j2] || "/" ;
	Print(folderpath);
	prefilelist = Files In Directory( folderpath );
	n2=nitems(prefilelist);
	
	filelist = {};
	
	//filter out any non-txt or csv files
	For( i2 = 1, n2 >= i2, i2++,
		file=(prefilelist[i2]);
		If( Item( 2, prefilelist[i2], "." ) == "txt" | Item( 2, prefilelist[i2], "." ) == "csv",
			Insert Into( filelist,file),
			show(file)
		)
	);
	nf=nitems(filelist); //number of items in the working list
	cctable= New Table( "Combined data table ");//make an empty table
	cctable<<New Column( "Source", Character, Nominal );
	For( iii = 1 , iii <= nf, iii++, //this starts the first loop
	filenow = ( filelist[iii] );
	fileopen=(folderpath||filenow);
	//dt=open(fileopen,private);
	dt=open(fileopen,importset,private);//Import settings used in the open argument

	New Column( "Source", Character, Nominal );
	:Source << set each value( filenow );
	//dt<<new column("Source", character, nominal)<<set each value(9999);
	dt << Run Formulas();
	//add the current table to the bottom of the combined data table
	cctable << Concatenate( Data Table( dt ), Append to first table );
	//don't use "Create Source Column" argument
	Close( dt, NoSave );//after concatenating the table, close it and move on
	);//end of the first for loop
);
1 ACCEPTED SOLUTION

Accepted Solutions
Craige_Hales
Super User

Re: Parallel Assign concatenating JMP tables inside a for loop

Take a look at File-> MultipleFileImport. It can dig through nested directories, select by filename patterns, and concatenate similar files. And it is faster than using open() on a CSV, usually even for a single file. It also makes a script that you can modify and reuse. It can add a column with source file information. 

 

Start interactively, look for the checkbox to keep the window open, do a couple of experiments.

 

The GUI has four filters that choose the files selected.  The folder/hidden/recursive filter is always visible, and there are three checkboxes to show and enable the filename, file time, and file size filters. The file list shows what is selected; it is not for selecting files.

 

Worst case, you can use MFI on each directory, one at a time, and probably get a nice speed up. Best case it might do what you want on the entire nest of directories.

 

Parallel Assign is designed to fill in an array AND to run the JSL that fills in each array element in an isolated environment that prevents errors between threads. It tries to avoid things like accessing a data table at the same time from two different threads. If you find a way to to it, it will probably crash.

Craige

View solution in original post

2 REPLIES 2
Craige_Hales
Super User

Re: Parallel Assign concatenating JMP tables inside a for loop

Take a look at File-> MultipleFileImport. It can dig through nested directories, select by filename patterns, and concatenate similar files. And it is faster than using open() on a CSV, usually even for a single file. It also makes a script that you can modify and reuse. It can add a column with source file information. 

 

Start interactively, look for the checkbox to keep the window open, do a couple of experiments.

 

The GUI has four filters that choose the files selected.  The folder/hidden/recursive filter is always visible, and there are three checkboxes to show and enable the filename, file time, and file size filters. The file list shows what is selected; it is not for selecting files.

 

Worst case, you can use MFI on each directory, one at a time, and probably get a nice speed up. Best case it might do what you want on the entire nest of directories.

 

Parallel Assign is designed to fill in an array AND to run the JSL that fills in each array element in an isolated environment that prevents errors between threads. It tries to avoid things like accessing a data table at the same time from two different threads. If you find a way to to it, it will probably crash.

Craige
rverma
Level III

Re: Parallel Assign concatenating JMP tables inside a for loop

Thank you so much. I was able to incorporate Import Multiple Files in my script and now it is able to concatenate tables lot faster.