Subscribe Bookmark RSS Feed

JMP Table & Column Compression

clausa

Community Trekker

Joined:

Jan 16, 2014

Does anyone have any experience with JMP Table and/or Column compression?

Any concerns/watchouts to using either? I have some JMP tables that are >1GB so would be nice to compress them...

As far as I can tell, there are 2 types of compression built into JMP:

  1. Compress Tables
    1. How to Access:
      1. Open file
      2. Click red arrow next to table name on the left, above where the scripts are located, in the table panel
      3. Click "Compress Table When Saved"
      4. Can also be set as default by "Preferences > General > Save Data Table Columns GZ Compressed"
    2. Purpose: This just basically does a compression ala zip (though I think it is actually GZ) of the whole file so it is smaller on your computer. Same type of effect you would get if you zipped a file, but it remains a .jmp vs a .zip with a .jmp inside of it. I have seen up to a 10x compression here.
  2. Compress Selected Columns
    1. How to Access:
      1. Select columns in file
      2. Click red arrow for columns (or Cols from the top)
      3. Select "Compress Selected Columns"
      4. Can also be auto-enabled
    2. Purpose: Uses List Check and compressed integers where available to make columns/cells actually take less memory. This reduces the file size and speeds up analysis.
1 ACCEPTED SOLUTION

Accepted Solutions
Solution

You've outlined the two options very well.

The main difference is that Compress Tables affects only how big the file is on disk. JMP will use the un-compressed version in memory. So, this option is useful to keep your drive from filling up with JMP data tables.

Compress Selected Columns results in smaller files on disk as well as using less memory. Unfortunately, not every column can benefit here.

Here's what Compress Selected Columns does:

  • It adds a List Check (default order) to character column if the column has less than 255 distinct values.
  • Change numeric columns to the smallest 1-byte, 2-byte, or 4-byte integer if all values in the column can be stored. Only integer values columns are checked.
    • For 1-byte integer, the range of numbers that you can store is from -126 to 127.
    • For 2-byte integer, the range of numbers that you can store is from -32,766 to 32,767.
    • For 4-byte integer, the range of numbers that you can store is from -2,147,483,646 to 2,147,483,647.
  • It will not change the columns if they have list check already.

HTH,

-Jeff

-Jeff
3 REPLIES
Solution

You've outlined the two options very well.

The main difference is that Compress Tables affects only how big the file is on disk. JMP will use the un-compressed version in memory. So, this option is useful to keep your drive from filling up with JMP data tables.

Compress Selected Columns results in smaller files on disk as well as using less memory. Unfortunately, not every column can benefit here.

Here's what Compress Selected Columns does:

  • It adds a List Check (default order) to character column if the column has less than 255 distinct values.
  • Change numeric columns to the smallest 1-byte, 2-byte, or 4-byte integer if all values in the column can be stored. Only integer values columns are checked.
    • For 1-byte integer, the range of numbers that you can store is from -126 to 127.
    • For 2-byte integer, the range of numbers that you can store is from -32,766 to 32,767.
    • For 4-byte integer, the range of numbers that you can store is from -2,147,483,646 to 2,147,483,647.
  • It will not change the columns if they have list check already.

HTH,

-Jeff

-Jeff
bswedlove

Community Member

Joined:

Nov 2, 2016

I run "dt<<compress selected columns();" on tables with hundreds of columns and the command fills up my log with all the changes. Can I run the command but stop it from writing in the log?

Jeff_Perkinson

Community Manager

Joined:

Jun 23, 2011

Unfortunately I don't see any way to keep it from writing to the log. I'll enter an enhancement request to see if we can add this in a future release.

 

In the meantime I can only come up with some unsatisfying hacks involving saving the log before and clearing it after the call to compress the columns.

-Jeff