Method 1: computer, use MUL to multiply X and Y. Method 2: computer, remember th...

ffast-math · on Sept 1, 2021

So this misses a few aspects of why the method works:

- You can't actually get a speedup from the proposed approach. You'd need a lookup table of size b^2 to multiply two b-bit numbers, which will be much slower than just doing the multiplication.

- Most of what we do is perform fewer operations. We're actually slower per op than multiply-adds because those are supported better in hardware. We just compress the matrices by so much that it's worth it. Relatedly, for sufficiently few codebooks, we're actually sublinear in the input size--i.e., we don't even read it all.

- We have some lightweight machine learning in there to make the sublinearity and other approximations not suck. Getting the ML fast enough to beat a BLAS GEMM is far from trivial.

MauranKilom · on Sept 1, 2021

Or, oversimplified in a different way:

Method 1: Computer, tell me the angle between these two 3D lines!

Method 2: Computer, look at this (second) 3D line from a few different perspectives (determined by the first line) and then tell me how often it looks more vertical than horizontal. That'll give me a good enough idea of the angle between the two.

rm445 · on Sept 1, 2021

How about using a look-up table for small numbers (say up to 12x12), left shifts for powers of ten, and decomposing larger multiplications into summation of those operations?