JMP Challenge 8: My solutions

DonMcCormack · Jun 16, 2021 12:38 PM

What is the fastest way to turn the elements in a list of number lists into characters? This challenge is an extension of the question found here in the Community. I decided to test eight approaches: three iterative methods (For, For Each, and Transform Each), four non-iterative methods (table operations, expressions, Words + expressions, and Regex) and one hybrid (Words + Transform Each). Five of these methods, those not involving Regex or expressions, were considered in the linked post. The methods using Words and Regex are a bit of a cheat in that they exploit a side effect of string operations -- specifically, any data structure can be turned into a character string and parsed accordingly. I will leave it up to the reader whether they feel this approach is legitimate. I know a few folks who would frown on it!

Using a space-filling design, I looked at four factors: number of lists, number of list elements, whether the lists had an equal or unequal number of elements, and number type: integer or float. The design is in List of lists – Experimental Runs.JMP. Fractional values from the original design generation have been rounded. Ten replications were made for each run. The code I used for running the experiment is in Run list of lists experiment.JSL and the results in List of lists results.JMP. Preliminary runs using Regex were extremely slow, and it was removed from consideration.

To compare methods, I calculate the rank order of each method by run and replication and the difference between the run time and the median run time for all replications within an experimental run. Smaller is better. The graphs below suggest Transform Each, table operations, and the Words methods were superior. While the actual time difference are small (values are in microseconds), these methods are consistently ranked higher and have less variability.

Median Diff by Method.png

Rank by Method.png

To see if there were differences among the four methods, I reran the experiment with bigger lists (List of lists big – Experimental Runs.JMP and List of lists big results.JMP). Looking at the time differences from the median time within run, it appears that as the total number of items increases, Transform Each and table operations perform best. This is a fortunate result, since Transform Each was only just introduced in JMP 16, and table operations provides a good solution for those folks still on JMP 15.

Median diff by total items.png

For the second challenge, I asked you to transpose a list of lists. A transposed list of lists collects all the first elements from the input lists into the first output list, 2^nd input elements into the 2^nd output list, etc. If the input had n lists then each output list would have n elements. The number of output lists would equal the maximum of the number of items in each list. While the number of input lists is easy to determine, the number of output lists requires counting the items in each input list. Is this necessary? Does it lead to a faster process?

To answer this question and to test specific methods, I ran another space-filling design. In it, I looked at three iterative methods (For, For Each, and Transform Each) each using two approaches: one calculating the number of output lists before building the output, necessitating looping over the input twice, and the other building the output using a single loop over the input. Additionally, I looked at using table operations. Six different list element types were considered (integer, float, 1‑character, 8‑character, 15‑character, and variable width character (maximum length 15) as well as fixed and variable (inner) list lengths.

During my initial coding, I discovered that it is faster to build the output list starting with an empty list and using Insert Into rather than to prepopulate the output with missing values using Repeat or expressions. You can use the code in Prebuild or Insert.JSL to check this out for yourself. A second discovery had to do with Transform Each and For Each. If you’re unfamiliar with these functions, they behave a bit differently than your standard For or While loop in that they allow you to iterate over lists, matrices, or associative arrays, without having to specify their size. This can make the code a bit more compact and cleaner. The structure of a basic call to either of these functions is

fnName({indValue,ind},container,body)

where fnName is For Each or Transform Each, indValue is the value of the container at index ind, container is a matrix, list, or associative array, and body is the body of the code. Care must be exercised when working with large containers. Using large data structures as container or indValue will slow down execution considerably. An example of this is given in Using For Each and Transform Each.JSL. In it, we are counting the number of items in each list of a list of lists. The first two blocks of code correspond to how you might expect to code the problem, the first using each sublist as an input value, the second indexing the sublists. Compare this to the third block where the indices to aList are explicitly created and used in the loop. Depending on the size of aList, the improvements to processing speed can be 100X or better. Even faster is Transform Each, allowing it to return the resulting values. Blocks 4 and 5 show that the output data structure will be the same as the data structure used for the input container. There is an optional Output() argument to control the type of output data structure.

The code for testing the different transposition methods is in Transpose lists.JSL. As above, a space-filling design was used to compare the output under different conditions. Unlike the code above, a new data structure is created for every repetition. The first three methods read the input and built the output in a single block. The next three predetermine the size of the output lists first, before processing the input. The second to last method used table operations and the last Transform Each, structured to use its output values.

Screen Shot 2021-06-15 at 10.52.42 AM.png

Predetermining the size of the output lists proved to be faster than attempting to do this in a single block. The different implementations of Transform Each show the speed improvements gained by working with its output values. The results are in Transpose results.JMP. An experiment with larger lists comparing For Each, Transform Each (2^nd implementation), and table operations can be found in Transpose results big.JMP. The results suggest that table operations, in general, outperform the other two approaches.

Screen Shot 2021-06-15 at 11.09.54 AM.png

To recap what we’ve discovered:

Timewise: Transform Each < For Each < For
Table operations can be super-fast if column formulas aren’t involved. In earlier posts we saw that depending on their complexity, column formulas can slow operations to the point where iteration or expressions are favorable.

I didn’t get the typical feedback for this challenge that I’ve had in the past, so I’m sure there’s a better approach. I'd love to hear your comments. Look for a new challenge next month.