
Automatic Algorithms Optimization via Fast Matrix Exponentiation (2015) - zawerf
https://kukuruku.co/post/automatic-algorithms-optimization-via-fast-matrix-exponentiation/
======
DannyBee
So, compilers can do this, and do do this, depending on the language and it's
guarantees.

GCC and LLVM use a chain of recurrence algebra and form for symbolic analysis
that could easily represent this (or be extended to). But they probably won't
bother any time soon.

To understand why requires a little more detail first:

The statement "However, compilers do not replace the computational algorithm,
written by a human, by a more asymptotically efficient one."

is mostly wrong.

They generally do where they are allowed. They just aren't often allowed, or
it's not worth it.

There are languages that are playing with this even more heavily.

But whether it's a good idea, eh.

In numerics in particular, people often pick their algorithms very carefully
for stability/etc reasons.

For more generic things like sorts, a lot of this gets hidden in standard
libraries anyway.

For things nobody could really complain about, the compilers do idiom
recognition of things like memcpys, string compares, etc and replace them.

In any case, the problem is not recognizing and replacing the algorithm or
idiom. The problem is deciding when doing so actually serves your users well.

Going back up, when we first implemented scalar evolutions in GCC, we
supported everything you can think of - closed form polynomial functions,
matrices, mixers.

In the end though, we were spending a lot of time handling cases that were
very rare, mostly optimal as written anyway (so not useful for the optimizer),
but still required exponential time solving/etc to do anything with.

So a lot of that code was removed.

Could you handle matrix exponentiation because it's fast? Sure.

But it's unlikely to be worth it.

~~~
AstralStorm
It is not about being allowed at all. The compiler does not understand the
required high level result almost always.

There are some attempts to patch typical general algorithms (e.g. tree
searches and implementations) with faster equivalents. They typically fail as
devising such structure safely from low level code is a really hard problem.

~~~
jcranmer
There's always been a sort of pie-in-the-sky goal of "let people just say what
the algorithm does, and let the compiler figure out the best way to implement
that algorithm" with the idea being that the programmer doesn't have to
concern himself/herself with details about the hardware that bear on the
implementation.

The main problem with that idea is that you have to find some language to
describe the design, which often ends up being _an_ implementation, and when
you look at the invariants that must be maintained for correctness, it's
difficult to deviate from that implementation substantially.

Floating point arithmetic, which is the core of most numerical algorithms, is
not associative (it is commutative). If someone asks you to sum up a list
going from left to right, it is not legal to transform that into a sum that
goes from right to left. And unfortunately, in a mathematical sense,
associativity is the property that's often the most useful.

So, in practice, the most useful cases to do this kind of heavy-mathematics
optimization on are the ones where it's not legal in the first place. Even
beyond that, many of these kinds of algorithms are already implemented in a
core library that's been around for decades and has been highly-tuned for any
architecture you want to run on, so the actual prospect for seeing real gains
with a compiler are rather small.

~~~
Darmani
> The main problem with that idea is that you have to find some language to
> describe the design, which often ends up being an implementation, and when
> you look at the invariants that must be maintained for correctness, it's
> difficult to deviate from that implementation substantially.

Can we please stop repeating this line? It was false 50-60 years ago when
garbage collection and register allocation was invented. It was false when
yacc was built. And it's false today.

The gap in detail between specification and implementation is huge. If you can
change a function without changing its callers, then your program has some
slack between the assumptions placed on that function and its implementation.
From low-level performance things like changes to improve cache locality, to
high-level ones like scheduling of data pipelines, there's a lot of space for
synthesizers and search-based software improvement tools to work with.

So, are you curious about program synthesis for floating-point accuracy? Have
a look at Herbie.

~~~
jcranmer
> Can we please stop repeating this line? It was false 50-60 years ago when
> garbage collection and register allocation was invented. It was false when
> yacc was built. And it's false today.

No, it's still very true today. We have pushed the bounds of what
implementation details we are free to ignore, but those bounds are still very
much on the order of "you have to program the algorithm I implemented" as
opposed to "you have the freedom to choose any algorithm so long as the output
obeys these mathematical constraints." Program synthesis techniques have
improved and are improving, but they're still only really applicable to
kernel-style transformations.

~~~
Darmani
What's a kernel-style transformation? That term is unfamiliar to me? Are you
referring to e.g.: optimizing stencil kernels?

You're correct that classical semantics-preserving compiler optimizations are
very restricted, when not user guided as in the OP's tool. They do not have
any form of specification weaker than the input code. This is why most
synthesis tools do take something higher level than code.

Then again, some really do have the workflow "figure out existing code's
algorithm, and pick a better one." The first thing in this space that comes to
mind is verified lifting, being pursued by my collaborators at UW.

------
Darmani
This was a hot-topic 10-15 years ago. Then, over a few years, we went from
"there has been little work applying computer algebra methods to program
analysis and optimization" to "everything under the sun has already been
done."

It's been about 3 years since I looked at this stuff, and I'm having trouble
remembering the names of the people in this area (lots of Europeans). Most of
the action is in finding ever-larger fragments of programming languages they
can solve. The one in this article handles what are called "arithmetic
programs;" a bunch of people have worked on handling restricted shades of
recursive programs.

------
ychen306
Unfortunately it only works for toy examples.

> cpmoptimize.recompiler.RecompilationError: Can't optimize loop: Unsupported
> instruction (LOAD_ATTR, 'b') at line 12 in fib, file "test.py"

------
taeric
It is funny to see the repeated squares algorithm every time it is
rediscovered. Knuth, unsurprisingly, had one of his earlier papers involving
it. He presented it as related to addition chains and shows that just repeated
squaring is not always superior.

I'm curious what all came of this effort, seeing it is marked 2015. It is also
fun to see benchmarks of this method used in executing fib.

~~~
zawerf
It's not _just_ the repeated squares algorithm though. The novel part about
this post is the application of it.

For most languages, if you write a for loop, the compiler is not smart enough
to generate anything that runs in sublinear time. It might do some loop
unrolling, reordering, or just skip running the whole loop altogether if it's
dead code.

But this is the first time I have seen something that can rewrite a loop to
run in a different runtime complexity (specifically logarithmic time).

He relies on rewriting python byte code but there's no reason why let's say a
C compiler can't automatically identify a loop that can be rewritten as a
linear recurrence relation and thus has a matrix exponentiation form.

~~~
jcranmer
Clang will optimize basic loops away:

[https://godbolt.org/g/GMjKZm](https://godbolt.org/g/GMjKZm)

