- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Parallel Assign concatenating JMP tables inside a for loop
I have several folders and inside each folder there are several CSV files. Each CSV file has around 500,000 rows. I use a JSL script to concatenate all csv files inside each folder into one JMP table. This process is repeated on each folder and hence in the end there are as many concatenated JMP tables as the no of folders. But it takes a lot of time.
I searched for parallel programming in JSL and came to know about Parallel Assign. I need some guidance on how to put the script below inside Parallel Assign to speed up the concatenation. I will greatly appreciate it.
path = munger(Pick Directory( "Browse to the Directory of the .txt / .csv files " ),1,"/",""); Print(dirList(path)); folderlist={}; folderlist = dirList(path); count=nitems(folderlist); //prefolderlist = Files In Directory( path ); //filepath = Convert File Path(path, Windows ) For( j2 = 1, count >= j2, j2++, folderpath= path || folderlist[j2] || "/" ; Print(folderpath); prefilelist = Files In Directory( folderpath ); n2=nitems(prefilelist); filelist = {}; //filter out any non-txt or csv files For( i2 = 1, n2 >= i2, i2++, file=(prefilelist[i2]); If( Item( 2, prefilelist[i2], "." ) == "txt" | Item( 2, prefilelist[i2], "." ) == "csv", Insert Into( filelist,file), show(file) ) ); nf=nitems(filelist); //number of items in the working list cctable= New Table( "Combined data table ");//make an empty table cctable<<New Column( "Source", Character, Nominal ); For( iii = 1 , iii <= nf, iii++, //this starts the first loop filenow = ( filelist[iii] ); fileopen=(folderpath||filenow); //dt=open(fileopen,private); dt=open(fileopen,importset,private);//Import settings used in the open argument New Column( "Source", Character, Nominal ); :Source << set each value( filenow ); //dt<<new column("Source", character, nominal)<<set each value(9999); dt << Run Formulas(); //add the current table to the bottom of the combined data table cctable << Concatenate( Data Table( dt ), Append to first table ); //don't use "Create Source Column" argument Close( dt, NoSave );//after concatenating the table, close it and move on );//end of the first for loop );
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Parallel Assign concatenating JMP tables inside a for loop
Take a look at File-> MultipleFileImport. It can dig through nested directories, select by filename patterns, and concatenate similar files. And it is faster than using open() on a CSV, usually even for a single file. It also makes a script that you can modify and reuse. It can add a column with source file information.
Start interactively, look for the checkbox to keep the window open, do a couple of experiments.
The GUI has four filters that choose the files selected. The folder/hidden/recursive filter is always visible, and there are three checkboxes to show and enable the filename, file time, and file size filters. The file list shows what is selected; it is not for selecting files.
Worst case, you can use MFI on each directory, one at a time, and probably get a nice speed up. Best case it might do what you want on the entire nest of directories.
Parallel Assign is designed to fill in an array AND to run the JSL that fills in each array element in an isolated environment that prevents errors between threads. It tries to avoid things like accessing a data table at the same time from two different threads. If you find a way to to it, it will probably crash.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Parallel Assign concatenating JMP tables inside a for loop
Take a look at File-> MultipleFileImport. It can dig through nested directories, select by filename patterns, and concatenate similar files. And it is faster than using open() on a CSV, usually even for a single file. It also makes a script that you can modify and reuse. It can add a column with source file information.
Start interactively, look for the checkbox to keep the window open, do a couple of experiments.
The GUI has four filters that choose the files selected. The folder/hidden/recursive filter is always visible, and there are three checkboxes to show and enable the filename, file time, and file size filters. The file list shows what is selected; it is not for selecting files.
Worst case, you can use MFI on each directory, one at a time, and probably get a nice speed up. Best case it might do what you want on the entire nest of directories.
Parallel Assign is designed to fill in an array AND to run the JSL that fills in each array element in an isolated environment that prevents errors between threads. It tries to avoid things like accessing a data table at the same time from two different threads. If you find a way to to it, it will probably crash.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Parallel Assign concatenating JMP tables inside a for loop
Thank you so much. I was able to incorporate Import Multiple Files in my script and now it is able to concatenate tables lot faster.