Comparing polynomial multiplication routines Each of two inputs is a pair of exponents and coefficients. The exponents are necessarily single-word integers. The coefficients are potentially arbitrary precision, but for these tests are set to numbers less that 2^16, so that none of the examples would necessitate an actual bignum. Each routine returns such a pair, representing the product. D dense multiplication [our program: mul-dense] F fast Fourier transform (FFTW double-float), subject to roundoff. [mulfft3] K Karatsuba (Divide and conquer) [polymultk] K was doing so badly we decided to give it a head start and didn't charge it for converting inputs to an array of double-floats. Even so, it didn't do so well. Results. For small polynomial problems, e.g. dense 10 term by 50 term, or even 50 term by 50 term, D is best, typically by 5X or more. The rest of the results are grouped according to the sum of the number of terms in the inputs. For instance,the first section indicates which program is best when multiplying polynomials with a total of about 100 terms. spacing [1,1], meaning ``dense'' -- the space between adjacent exponents is in the range [1,1]. 10x90 D (2.5X faster) 20x80 D 30X70 F 50X50 F 40X60 F What if about half the coefficients are zero? (spacing randomly in [1,2]) 10x90 D (3X faster) 20x80 D 30X70 D 40X60 F (close) 50x50 D (close) spacing [1,4] 10x90 D (2.5X faster) 20x80 D 30X70 D 40X60 D 50x50 D spacing [1,8] 10x90 D (5X faster) 20x80 D 30X70 D 40X60 D 50x50 D (2X faster) In this section we have problems with a total of about 100 terms in the two inputs. spacing [1,1] 50x150 K (! surprise. 1.4X faster than F, 3.3 faster than D) 100x100 K (! surprise. 1.4X faster than F, 3.2 faster than D) spacing [1,2] 50x150 F 100x100 F spacing [1,4] 50x150 D 100x100 D spacing [1,8] 50x150 D 100x100 D spacing [1,16] 50x150 D (3X faster) 100x100 D (2.4X faster) spacing [1,32] 50x150 D (4X faster) 100x100 D (3.5X faster) For larger problems, the advantage for the FFT is considerable ...... [1,1] 100X900 F (5X faster) 200X800 F (9X faster) 300X700 F (10X faster) 400X600 F (15X faster) 500X500 F (15X faster) Even if the polynomials are slightly sparse [1,2] 100X900 F (3X faster) 200X800 F (5X faster) 300X700 F (7X faster) 400X600 F (9X faster) 500X500 F (9X faster) [1,4] 100X900 F (2X faster) ... [1,8] 100X900 F (almost a tie) 200X800 F (2X faster) ... The advantage disappears for polynomials that are yet sparser, with an average of 8.5 zeros between the non-zero coefficients. [1,16] 100X900 D (2X faster) 200x800 D (almost a tie) 300x700 F (1.2X faster) 400x600 F (1.3X faster) 500x500 F (1.5X faster) For very large dense problems the FFT advantage is enormous, though it starts to pale if the problems are sparse. 1000X2000 F (300X faster) [1,2] 1000X2000 F (15X faster) [1,4] 1000X2000 F (15X faster) [1,32] 1000X2000 F (3X faster) Sometimes Karatsuba beats D, but usually in those cases F beats Karatsuba. For small cases Karatsuba simply calls D, so it will never beat it D there. Here are two additional Karatsuba winners we found: [1,1] 100X200 K (almost a tie with F) 50X350 K (almost a tie with F) and as an example, 5000x5000 K beats D by 4X, but F is a winner over D by 25X. What about a naive list-based program? This competes effectively and way beats FFT when the data become really sparse. Thus [1,512] 50X50 N (1.5X faster than D, 31X faster than FFT). This is multiplying degree 12,649 by 13,982. what about heap-based (H) ? We never got a correct implementation that seemed to incorporate all of Monagan/Pearce improvements, so it might be better than we indicate here. Yes, it does appear superior, but only for a class of extremely sparse cases. In our testing, it first becomes superior to all the alternatives here: [1, 512] 500X500 H (1.1X faster than D, 5X faster than F) These are polynomials of degree 125,000. [1, 1024] 500X500 H (1.1X faster than D, 8.5X faster than F) These are polynomials of degree 250,000. [1, 16384] 500X500 H (3.6X faster than D.) These are polynomials of degree 4.1 million Somewhat surprisingly, D is still competitive. More notes on the FFT. In our tests here we have already given the FFT an advantage in that we are ignoring the possibility of overflow, roundoff error or other problems in converting to floats and back to exact integers; D and K automatically take care of this. Even so, it is worth exploring how we can give the FFT even more benefits; perhaps in a particular application these are justified. 1. Provide an array of double-floats for each input, so that the exponent/coefficient processing to load up from a sparse representation into an array is unnecessary. 2. Allow the answer to be an array of double-floats without rounding to the nearest integer, and not converting coefficients that round to zero into actual zeros. 3. We could provide slightly more benefit by pre-computing "optimal" FFT "plans" for each anticipated transform size in advance of its use. Tools for this are provided by the FFTW library. In our limited testing, we found that using FFT plans based on what the FFTW programmers call "estimates" was quick and adequate. At one point we did compute more elaborate plans, but ultimately left it out since the plans interacted with memory buffer locations in an inconvenient manner and seemed to have speedup effects in the range of 0 to 20 percent. Anyway, we found that these benefits (1,2) reduced the time for the F algorithm by a factor of between 3.6 and 5.2. The larger number was for the sparse case, in which the FFT program was returning a result that was substantially unattractive: an array with coefficients that were close to zero. There is also the issue of the FFT finite precision: What is the cost if the FFT implementation is used as the basis for a modular exact multi-word answer? (If an alternative "bignum" FFT is used, it would probably be a factor of 10 or more slower than double-float, probably making it unattractive. A finite-field FFTW might be possible.)