Share your ideas for the JMP Scripting Unsession at Discovery Summit by September 17th. We hope to see you there!
Choose Language Hide Translation Bar
Highlighted
jerryspilTC
Level III

Can JMP concatenates 3500 csv files ~20MB each files and total 5 million rows when combined togeth

Hello,

I have a weekly data to create from 3500 csv files ~ 20mb each files (identical columns)

After combining this, it will have around 5million rows and around a 100 columns.

I use Windows JMP 11 64-bit, my RAM is 8 gig, processor core i5?

 

Question: If I build a query builder and connect to a ODBC database to access these 3500 csv files, can JMP manage to concatenate these into 1 data table containing 5 million rows?

BR,

Jerry

jerryspilTC
2 ACCEPTED SOLUTIONS

Accepted Solutions
Highlighted
Craige_Hales
Staff (Retired)

Re: Can JMP concatenates 3500 csv files ~20MB each files and total 5 million rows when combined tog

If all of the data is numeric, maybe. 5,000,000 rows * 100 cols * 8 bytes/cell -> 4GB, just for the table. That's going to be tight on an 8GB machine and will likely run slowly if you do much with the table (sorting, analysis...) Text cells have additional overhead.

(3500 files of 20MB each is quite a bit larger...70GB...so I suspect there are some text columns. That will need subsetting, rows or cols or both, as the files are loaded, not at the end.)

 

JMP 14 has a multiple file import option (if there is enough memory!) that is designed to load CSV files in parallel from a local drive. I'm not an ODBC expert and not familiar with using ODBC to read CSV files.

 

Craige

View solution in original post

Highlighted
Craige_Hales
Staff (Retired)

Re: Can JMP concatenates 3500 csv files ~20MB each files and total 5 million rows when combined tog

probably, give it a try.

The text column overhead is roughly 32 bytes. Or, each text column uses at least as much memory as four numeric columns; long strings in the cells will need still more memory. 

Keeping 1/3 of the data seems a good start.

You might also find the Cols->Utilities->CompressSelectedColumns useful; it will convert columns to representations that use less memory.

Craige

View solution in original post

3 REPLIES 3
Highlighted
Craige_Hales
Staff (Retired)

Re: Can JMP concatenates 3500 csv files ~20MB each files and total 5 million rows when combined tog

If all of the data is numeric, maybe. 5,000,000 rows * 100 cols * 8 bytes/cell -> 4GB, just for the table. That's going to be tight on an 8GB machine and will likely run slowly if you do much with the table (sorting, analysis...) Text cells have additional overhead.

(3500 files of 20MB each is quite a bit larger...70GB...so I suspect there are some text columns. That will need subsetting, rows or cols or both, as the files are loaded, not at the end.)

 

JMP 14 has a multiple file import option (if there is enough memory!) that is designed to load CSV files in parallel from a local drive. I'm not an ODBC expert and not familiar with using ODBC to read CSV files.

 

Craige

View solution in original post

Highlighted
jerryspilTC
Level III

Re: Can JMP concatenates 3500 csv files ~20MB each files and total 5 million rows when combined tog

Hi Craig,

Thanks, if I reduce my data points 5M/3, 1.67M rows*100cols*8byte/cell = 1.33gb, this has some text cols so I need a little overhead, so my system can process it by 3 runs including  analysis like tabulate and pivots, I will just need to count pass/fails out of this 5M data. Is it doable?

 

Thanks

jerryspilTC
Highlighted
Craige_Hales
Staff (Retired)

Re: Can JMP concatenates 3500 csv files ~20MB each files and total 5 million rows when combined tog

probably, give it a try.

The text column overhead is roughly 32 bytes. Or, each text column uses at least as much memory as four numeric columns; long strings in the cells will need still more memory. 

Keeping 1/3 of the data seems a good start.

You might also find the Cols->Utilities->CompressSelectedColumns useful; it will convert columns to representations that use less memory.

Craige

View solution in original post

Article Labels

    There are no labels assigned to this post.