Subscribe Bookmark RSS Feed

Chunking and concatenating files

Nalla

Occasional Contributor

Joined:

Feb 3, 2017

Hi!

 

I am working on big files that eat up a lot of computer memory when I open even a single file...causing my PC to be sluggish and sometimes hang up. Was wondering if there is a jsl solution where a single file can be chunked into smaller files before opening...Like File_A=2GB, will be chunked into 4 smaller files of 250MB each. I will then automate opening and aggregating/computing of each chunked file, close the open chunked file, then concatenate the results to the next chunked file that will be processed....then the loop goes on.

 

Thanks in advanced.

3 REPLIES
uday_guntupalli

Community Trekker

Joined:

Sep 15, 2014

@Nalla : 
      Hello , This can certainly be done . 

      Look at "Subset" function in the Scripting Index - this will help you form smaller subsets of your large data table . Here is an example . You can then use "Concat" to join the table . 

// Lets start with the assumption that there are 1000 rows in dt table 
// Also say you want to subset n rows in each iteration ; 
LL = 1; 
UL = 250 ; 
for(i = 1 , i <= 4 , i ++, 
       LL = LL * i ; 
       UL = UL * i ;
      dt << Select Rows(Index(LL,UL,1));    // Assuming "dt" is your master table 
      dt1 = dt << Subset("Private",Selected Rows(1),Selected Columns(0));
LL = UL; );
Best
Uday
uday_guntupalli

Community Trekker

Joined:

Sep 15, 2014

@Nalla : 
          I would also like to add that this is not the only approach that can speed up processing of this data . If you were to make the table "invisible" or "private" , that can significantly speed up the process. 

 

          dt << Show Window(0); 

 

 

Best
Uday
ih

Community Trekker

Joined:

Sep 30, 2016

If you don't want to load the table in memory before splitting it up you can be creative with how you load the data.  For flat files specify the start position and number of rows to read inside a loop:

 

dtProbe = Open("$SAMPLE_DATA/Probe.jmp");

dtProbe << save( "$temp/deleteme123.txt" );

for( i = 1, i <= 2, i++,
	dt = Open(
		"$temp/deleteme123.txt",
		columns(
			New Column( "security to create", Character, "Nominal" ),
			New Column( "resource", Character, "Nominal" ),
			New Column( "group", Character, "Nominal" ),
			New Column( "(ADM V-net)", Character, "Nominal" )
		),
		Import Settings(
			Fixed Column Widths( 19, 9, 6, 105 ),
			Strip Quotes( 0 ),
			Use Apostrophe as Quotation Mark( 0 ),
			Use Regional Settings( 0 ),
			Scan Whole File( 0 ),
			Treat empty columns as numeric( 0 ),
			CompressNumericColumns( 0 ),
			CompressCharacterColumns( 0 ),
			CompressAllowListCheck( 0 ),
			Labels( 1 ),
			Column Names Start( 1 ),
			Data Starts( 1 + 100 * (i - 1) ),
			Lines To Read( 100 ),
			Year Rule( "20xx" )
		)
	)
);