Subscribe Bookmark RSS Feed

JMP took 2hrs for 2GB CSV file

sivabalanp

Occasional Contributor

Joined:

Jun 6, 2017

My user having issue to query about 2gb csv file took 2hrs, is there any recommeneded way or have any data  to show how big file took how many minutes eg:1GB CSV -- 30min

8 REPLIES
briancorcoran

Joined:

Jun 23, 2011

I don't have performance data, I will let others share that.

 

Please know, though, that you can often speed up the text import process if you are fairly confident of the layout of the data ahead of time.  If you open the file using the "Text Import Preferences" rather than "Best Guess", it will speed up import.  Also, in the Preferences panel for Text Data Files, if you select "Scan for 5 seconds" in the "When determining column types" it can help if you don't have thousands of columns.

 

Brian Corcoran

JMP Development

sivabalanp

Occasional Contributor

Joined:

Jun 6, 2017

Yes we did ran through the 5secs options didn't only the scan went fast but import remain slow.

But I cant see the option "Text Import Preferences" in this JMP Pro 13.

 

pref.PNG

 

 

 

briancorcoran

Joined:

Jun 23, 2011

I was referring to File->Preferences->Text Data Files->Import Settings for the 5 second preference. For the open operation, Under File->Open on Windows, you can select between "Data, using Text Import Preferences", "Data, using best guess" and others.



Brian


Craige_Hales

Staff

Joined:

Mar 21, 2013

You can use the source script from a previous import to import the file (or similar file) again and it will be faster than having JMP re-discover everything. The source script knows how many columns to expect and what the data types are.

A 2GB text file is likely to be even larger in memory; it is possible an 8GB machine is going to be using the disk, a lot, for paging memory.

There is a significant performance improvement in JMP 13.1 for MAC. If you are using JMP on MAC this may be what you need.

This should run in under 15 minutes on a machine with sufficient memory.

start=tickseconds();
data = repeat("1,2,3,4,5\!n",2e8); 
stop=tickseconds();
show(stop-start,length(data));

start=tickseconds();
file = savetextfile("$temp\deleteme.csv",data);
stop=tickseconds();
show(stop-start,filesize(file));

data = 0; // release 2GB before opening file
start=tickseconds(); dt = Open( file, columns( New Column( "a", Numeric, "Continuous", Format( "Best", 12 ) ), New Column( "b", Numeric, "Continuous", Format( "Best", 12 ) ), New Column( "c", Numeric, "Continuous", Format( "Best", 12 ) ), New Column( "d", Numeric, "Continuous", Format( "Best", 12 ) ), New Column( "e", Numeric, "Continuous", Format( "Best", 12 ) ) ), Import Settings( End Of Line( CRLF, CR, LF ), End Of Field( Comma, CSV( 1 ) ), Strip Quotes( 0 ), Use Apostrophe as Quotation Mark( 0 ), Use Regional Settings( 0 ), Scan Whole File( 0 ), Treat empty columns as numeric( 0 ), CompressNumericColumns( 0 ), CompressCharacterColumns( 0 ), CompressAllowListCheck( 0 ), Labels( 0 ), Column Names Start( 0 ), Data Starts( 1 ), Lines To Read( "All" ), Year Rule( "20xx" ) ) ); stop=tickseconds(); show(stop-start,nrows(dt));
Craige
sivabalanp

Occasional Contributor

Joined:

Jun 6, 2017

Here below the pc specs, the info you have provided is applicable only for MAC?

If is it for Windows as well required what platform to use for this script.

Please advice as we are new to JMP and assisting user reporting issue. thank you.

 

 

system info.PNG

 

 

sivabalanp

Occasional Contributor

Joined:

Jun 6, 2017

Since we are new to JMP i cant identify, have requested my user to check, thak you.

Will update the outcome.

Craige_Hales

Staff

Joined:

Mar 21, 2013

The hardware list looks fine; the script above should run on JMP on either Windows or Mac. You should contact our technical support staff if the problem continues. They will need:

  • a small sample of the data from the CSV file to better understand the problem
  • the source script from a successful import
  • the version of JMP you are using (12.1, 13.1, etc, and for MS-Windows or Apple-Mac)

If the CSV contains characters that are not Unicode, fields that are quoted incorrectly, or other odd things, please include that in the CSV sample.

Craige
sivabalanp

Occasional Contributor

Joined:

Jun 6, 2017

Since its confidential data may not able to share the file.

Is there any recommened clock speed or increase the clock speed performance from JMP application.