Our World Statistics Day conversations have been a great reminder of how much statistics can inform our lives. Do you have an example of how statistics has made a difference in your life? Share your story with the Community!
Jul 29, 2020 1:10 PM
| Last Modified: Aug 1, 2020 12:37 PM
Hopefully, you've had fun with Challenge 4. In the future, if you have an idea for something that would make a good Challenge, please let me know. I’m always looking for good material!
Congratulations, @landon! You had the fastest (non-staff) entries for both the 100 pairs of 1,000-digits (9.88 sec) and longest digit (7,150) challenges. Not only was it the only script capable of reading all 100 pairs of 1,000-digit numbers in under 10 seconds (the next closest could only manage 10), but it could also multiply two numbers 3.5 times larger than the next closest entry! In addition, it was the shortest of all the scripts (staff and non-staff) at just over 1,000 characters. I have included it in the scripts below. The point totals after four challenges are:
Before we talk about multiplying big numbers, let’s talk about unpacking or vectorizing a string of digits – turning a string of digits into a vector of numbers. As we saw last month, parsing a string of digits one at a time using Substr in a loop is painfully slow. This leaves us two general alternatives: 1) deconstructing the string into individual units, then putting them back together into a form that can be turned into a vector (using Parse, for example) or, 2) using pattern matching. My initial inclination would be to use approach 1. For years, I have resisted learning about the JSL pattern matching functions. The effort was too great, the information and examples too scant, and I was too lazy. I figured I had all I needed with the Regex function, something I knew very well. No more.
These challenges have opened my eyes to the speed and efficiency of pattern matching. As explained to me by @Craige_Hales, the developer responsible for these functions and regular entrant of the challenges, the pattern matching routines in JSL are the basis on which Regex is built. To demonstrate their efficiency, run the code in Vectorization Example.JSL, which compares four different methods of vectorization. I have also included a short function to generate a long string of random numbers in the script. For a string of 10,000 numbers, parsing the digits one at a time in a loop is, by far, the slowest, taking about 11 seconds on my machine. Using Words and Concat List to deconstruct and reconstruct the digits into a string that can be parsed into a vector is worlds better (0.0965 sec).
Doing the same thing using Regex and Parse is slightly better than that (0.08543). Pattern matching reduces this time by an amazing 4X (0.02179)! While the time differences are minuscule for 50,000 digits, if we consider 5,000,000 digits, both parsing approaches take about 10 minutes whereas pattern matching takes 2 seconds, scaling nearly linearly, unlike the other methods! If you want to learn more about pattern matching in JSL, take a look at the scripts written by @Craige_Hales or @David_Burnham (who has a few blog posts of his own with more information and examples.)
Onward to the multiplication routines. Pattern matching is used for all vectorization. In addition, the fastest addition algorithms from the last challenge is used in the first example. The first approach (multiplyBig Slow.JSL) uses nothing beyond what we have learned in the challenges so far. It multiplies digits two at a time, making sure to add zeros depending on the position of the multiplicand and multiplier (e.g., adding five zeros when a digit from the hundreds place is multiplied by a digit from the tens place). Despite the function ordering the inputs so the first is larger, this is of no benefit. The same number of operations is performed. This will change in the final example, however. The function is slow, taking nearly 30 seconds to get through a set of 100 100-digit numbers.
In multiplyBig Faster.JSL the number of digits in a register is increased from one to seven, approximately half of the number digits JSL can represent without overflow. The reason for this choice is apparent when we multiply two large seven-digit numbers, the result of which can be as large as fourteen digits. Since a seven-digit register may contain 14 digits, the addition routine used in the last example must be altered so that anything beyond the 7th digit is carried to the next register. Despite this, the resulting code is considerably shorter since only positive numbers need be considered. Additionally, the results improve substantially for multiplying 100 100-digit number pairs (0.04375 sec). This implementation also does a reasonable job multiplying 100 1000-digit pairs (1.54 sec).
With one simple change, you can go even faster. Rather than multiplying one register at a time, multiplyBig Fastest.JSL uses JSL matrix operations to multiply each register in the shorter number with the entire the larger number. This approach is faster than the previous, clocking in at 0.3015 seconds for 100 1,000-digit pairs, a 5X improvement.
And you can go even faster. The file multiplyBig – Brady.JSL, while slightly slower for multiplying 1,000-digit numbers, was superior when the size of the numbers reached about 2,000-digits and beyond. This was achieved by changing the order of operations so that the multiplications and additions could be done in the same loop. The details behind this are beyond what I care to cover, but @brady_brady tells me he’d be glad to do a short write-up that I can add to this post if there is interest. Faster? Maybe. There are two notable multiplication algorithms typically used in cryptology that reduce the number of multiplication operations from O(n2) to O(n1.58) or lower. Because they require more addition operations with large numbers, it’s hard to say whether they would be faster in JSL. As of this post, I tried but was unable to implement a fast JSL version of either of these algorithms.
1Aug2020: Thanks to @Craige_Hales for catching a bug in multiplyBig Fastest.JSL. I have corrected the problem. I have also added Craige's multiplyBig script, which I intended to do but forgot. He uses lookup tables as a way to speed the algorithm (with a few caveats). His script is thoroughly comment, as usual.