I have a data table with approximately 100,000 rows and 2000 columns. Dozens of these columns contain formulas. When I make a change somewhere I notice the status bar says "evaluating ##column_xyz" and it proceeds to evaluate every formula in the data table. Since this is a rather large table and there are many formulas, it takes up to 60 seconds to evaluate all of them. Consequently, I am very sensitive to performing actions which trigger formula evaluation, because the user interface will become unresponsive until it finishes and this can be quite annoying when it happens unexpectedly. On the other hand, I do not want to "Suppress Formula Eval" all the time because sometimes I will forget about it and data won't be updated when it should have been.
So on this large data table I have a JSL script that generates a set of charts using the Graph Builder platform, applies some local data filters, performs some selective highlighting of the data. Essentially the purpose of the script is to dynamically generate charts and prepare them for cut-and-paste into an email or report.
I noticed that this script seems to take a long time to run. Longer than it should. The UI goes unresponsive for almost 60 seconds before the graphs finally come up. I spent some time trying to figure out what is happening during those 60 seconds. Then it dawned on me that somehow formula evaluation is being triggered by this script. This was not immediately obvious because the data table status bar doesn't show the normal "evaluating..." message; it remains blank, presumably because the evaluation is being triggered by a script instead of by user action. When I ran the script while the Script Log window was open, the log status bar said "evaluating ##column_with_formula". The log status bar was frozen on that message even though the duration of the delay leads me to believe it was, in fact, evaluating every formula column in the table. And then I confirmed that formula evaluation was the culprit by enabling "Suppress Formula Eval" for the entire table and running the script again - the graphs came up almost instantly.
Through trial and error I narrowed it down to one line in the script - a "Select Where" is used to highlight some of the data on the generated graphs. It looks something like this:
myDT << Select Where( :Name("##column_with_formula") == "value" );
I thought this was strange; maybe it had something to do with the "where" part. So I replaced this line with something simpler:
myDT << Select Rows( {1,3,5,7} );
Even this line still seems to trigger formula evaluation behind the scenes. Then I did some more experimentation:
- With task manager open, I triggered formula evaluations the normal way (change a value in a column). The status bar showed formulas being evaluated and instantly my CPU usage shot up to 30% (one CPU core maximized).
- I waited for the activity to stop, and I tried selecting a row on the data table using the UI. The status bar did not change; it still showed "evaluations done". But task manager showed exactly the same CPU activity as if formula evaluation were happening.
- I enabled "Suppress Formula Eval" for the data table and selected a different row on the data table. The CPU usage did not spike up.
To summarize, through experimentation I have demonstrated the following:
- JMP performs column formula evaluation superfluously; basic interactions with the user interface cause calculation to occur in the background even though none of the input data has actually been modified.
- JMP is not transparent as to when it is performing these formula evaluations. The status bar sometimes displays a message but most of the time gives no indication that this is occurring. This often manifests as user interface lag / stuttering / unresponsiveness for no apparent reason.
- It seems like if one column is triggered for evaluation, ALL columns are evaluated, regardless of whether or not they needed it.
With smaller data sets and fewer, less-complex formulas, the excessive formula evaluation doesn't seem to be an issue. But with enough complex formulas and a large data set, this really becomes a hindrance. In my case, I guess I will have to adapt to leaving formula eval suppressed all the time and only evaluating formulas manually. But this is prone to mistakes, just like in Excel when workbook recalculation is set to manual. At least Excel has a dependency tree that allows it to evaluate only the necessary formulas for the data that has changed, and if the user designs the formulas appropriately the calculation can be left in Automatic all the time with minimal overhead. Admittedly I am not as familiar with JMP as I am with Excel, so there may be ways to deal with this in JMP that I have not learned yet.
Does anyone have any suggestions on how to manage large numbers of complex formulas WITHOUT suppressing eval all the time?
Also, as stated in the title of my post, I would be interested to know WHY a change in row selection trigger formula eval. And also, why does does JMP sometimes tell you it's evaluating formulas, and sometimes hides it?