Just had a request for a way to move data from an external program into JMP. Generally I'd recommend using CSV files for this, but if compression is an issue, maybe something else is needed. Here's a couple of proof-of-concept scripts (you'll need to make sure they do what you need). Since I don't have an external program, each script is in two parts: create a zip file from a table, then recreate the table from the zip file. This assumes you are in control of how an external process might package and present the data to JMP.
Key points: zip files can hold binary (first example) or printable (second example) data. Jsl matrixToBlob and BlobToMatrix are fast. Don’t loop, or just loop over a few columns. Avoid looping for every row for a lot of rows. JMP’s zip file API appends new members, possibly changing the actualname to avoid earlier members…notice the deleteFile(). Clearing the zip (za=0) is NOT required; it is hinting that the file is outside of JMP (on disk). Converting numbers to printable and back is slow, and might be lossy as well if the formatted values don't have enough digits.
The first script puts raw data in a zip file member in "little-endian" format, a row at a time. Fast because it doesn't convert back and forth to printable.
The second script puts formatted data for a column into a zip file member, one column per member. Numeric data goes in a matrix, character in a list.
You can combine ideas from both scripts.
Example 1
This example assumes binary numeric data. It should be really fast. Character data won’t work like this…
// (this code is untested!) make sample numeric data
dt=New Table( "people", Add Rows( 1e7 ),
New Column( "fred", Numeric, Continuous, Format( "Best", 12 ), Formula( Random Normal() ) ),
New Column( "ralph", Numeric, Continuous, Format( "Best", 12 ), Formula( Random Normal() ) ),
New Column( "george", Numeric, Continuous, Format( "Best", 12 ), Formula( Random Normal() ), )
);
// make sample zip file with binary data
datamat = (dt:fred<<getasmatrix) || (dt:ralph<<getasmatrix) || (dt:george<<getasmatrix);
blobmat = matrixtoblob(datamat,"float",8,"little");
za = open("$temp/deleteme2.zip", "zip");
actualname = za<<write( "data", blobmat );
// clear zip
za = 0;
// re-open zip
za = open("$temp/deleteme2.zip", "zip");
show(za<<dir); // check members
start = tickseconds();
blobextract = za<<read(actualname,format(blob));
dataextract = blobtomatrix( blobextract, "float", 8, "little", 3 /*columns*/);
dtextract = newtable();
dtextract << setmatrix(dataextract);
stop=tickseconds();
show(stop-start);
stop - start = 2.01666666666279; // decompressed+loaded 10,000,000 rows x 3 columns in 2 to 3 seconds
Example 2
Here’s another variation, slower but flexible (handles numeric and character):
// (this code is untested!) make sample numeric data
dt=New Table( "people", Add Rows( 1e6 ),
New Column( "fred", Numeric, Continuous, Format( "Best", 12 ), Formula( Random Normal() ) ),
new column("fred char", character, formula(char(randominteger(1000,99999)))),
New Column( "ralph", Numeric, Continuous, Format( "Best", 12 ), Formula( Random Normal() ) ),
new column("ralph char", character, formula(char(randominteger(1000,99999)))),
New Column( "george", Numeric, Continuous, Format( "Best", 12 ), Formula( Random Normal() ), ),
new column("george char", character, formula(char(randominteger(1000,99999))))
);
dt<<runformulas();
// make sample zip file with binary data
try(deletefile("$temp/deleteme.zip"));
za = open("$temp/deleteme.zip", "zip");
collist = dt<<getcolumnreference;
for(i=1,i<=nitems(collist),i++,
data = collist[i]<<getasmatrix;
name = collist[i]<<getname;
za<<write(name,char(data));
);
// clear zip
za = 0;
// re-open zip
start = tickseconds();
za = open("$temp/deleteme.zip", "zip");
colnames = za<<dir; // check members
dtextract = newtable("extracted");
for(i=1,i<=nitems(colnames),i++,
txt = za<<read( colnames[i]);
dtextract<<newcolumn( colnames[i], values(parse(txt)));
);
stop=tickseconds();
show(stop-start);
stop - start = 12.3500000000349; // decompressed+loaded 1,000,000 x 6 in 12 to 14 seconds
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.