Subscribe Bookmark RSS Feed

Performance of Tables > Tabulate?

pmroz

Super User

Joined:

Jun 23, 2011

I've never really had trouble with the tabulate platform until now.  I'm doing a tabulation of a table that can be anywhere from 40,000 to 800,000 rows that looks like this:

DatabaseDrugEventNumber of Events
1Drug001AE00142
1Drug001AE00213
1Drug001AE00381
1Drug001AE00481
1Drug002AE00111
1Drug002AE00254

The database column can go from 1 to 20.  I'm creating a tabulation that looks like this:

2012_Tabulation.png

The performance of tabulate gets really slow when there are a lot of rows.  Perhaps because of the double-nesting of the row variables Database and Event.

I compared the performance to doing a pivot table in Microsoft Excel, and it was no contest: Excel won hands down!  A 400,000 row table in JMP took 3-6 minutes, while the same thing in Excel took less than a second!

Here's a chart showing the performance using two different PCs.  JMP 9 vs 10 made no difference.

2013_Response Times.png

Are there any tricks to speeding up tabulate?  I've attached some code you can play around with.  Just random numbers but you'll get the idea.  The first program creates the table, and the second one runs the tabulation.

Thanks!

Peter

1 REPLY
mpb

Super User

Joined:

Jun 23, 2011

I just took the briefest look at this ...for reference I generated a 500,000 row table and used Table>Summary with grouping variables Database and Event, Subgrouping variable Drug and Sum variable Drug. This took about 4 seconds on a T410 / I5 64 bit system using 32 bit JMP. I started up the Tabulate script but cancelled out when I saw it was running long so I don't have a result but it would probably be as you said. I don't know if a Summary based solution would be helpful to you but it's interesting to see the difference. Wonder if the slowness of Tabulate is due the Formatting.

Michael