

Using the Quick Raise of Matrices to a Power to Write a Fast Interpreter - skazka16
http://kukuruku.co/hub/algorithms/using-the-quick-raise-of-matrices-to-power-to-write-a-very-fast-interpreter-of-a-simple-programming-language

======
tel
This is an instance of the "Maximally Powerful, Minimally Useful" principle
from a few days ago [0]. Because the language being described is sufficiently
minimal it has multiple valid, meaningful interpretations including the direct
"just execute it" interpretation, a theoretical "compile this to some kind of
other language and interpret it there", and, most interestingly, interpret
these relations as a linear algebra problem.

It should be quite clear that a more "powerful" language would not admit this
last meaning/interpretation/denotation.

[0] [http://blog.higher-order.com/blog/2014/12/21/maximally-
power...](http://blog.higher-order.com/blog/2014/12/21/maximally-powerful/)

------
Animats
Arithmetic confined to addition, subtraction, and multiplication by constants
is Presburger arithmetic. It's decidable; there are automatic theorem provers
for statements in it. The classic Oppen-Nelson prover converts theorems to be
proven to matrices, then uses linear programming on rational numbers as a
solution mechanism. That may be isomorphic to this approach.

~~~
Chinjut
The problem of interest to the OP is not determining the truth value of
sentences in Presburger arithmetic (that is, sentences using addition,
equations/inequalities, Boolean operations, and the quantifiers "There exists
an integer such that" and "For all integers, it is the case that"), which,
although algorithmically decidable, is nonetheless in general very
computationally taxing (the time to compute whether such a sentence is true or
false is doubly exponential in the size of the sentence).

Rather, the OP is simply interested in carrying out arithmetic calculations of
a sort which are trivially computable (after all, the inputs the OP is dealing
with are the very programs which would compute the answer!), but notes that
one can in a principled way transform naive specifications of these
computations (including loops like "Do this a million times") into faster
methods of attaining the same answer (by thinking of such loops as matrix
exponentiations and then using quick, rather than naive, exponentiation).

~~~
Animats
Statements in Presberger arithmetic are equivalent to systems of linear
equations/inequalities, which are representable as matrices. That's why both
provers and this exponentiation scheme work. If you have two variables
multiplied together, you're beyond Presberger arithmetic and linear equations.
The simple matrix representation no longer applies.

------
Someone
_" The exponentiation algorithm is quite simple:

If the power is zero, we will return the identity matrix. If the power is even
(2N), we can recursively compute M^N, and then return the square of the
obtained matrix. If the power is odd (2N+1), it’s enough to recursively
compute M^2N value, and return the obtained matrix, that has been multiplied
by M."_

Tidbit: that likely is the best approach in practice, but perhaps
surprisingly, it is not optimal in the number of matrix multiplications. See
[http://en.m.wikipedia.org/wiki/Addition-
chain_exponentiation](http://en.m.wikipedia.org/wiki/Addition-
chain_exponentiation) and
[https://oeis.org/A003313](https://oeis.org/A003313). Wikipedia has a link
mentioning addition-subtraction chains, so it may even be beneficial to
compute the inverse of the matrix in some cases (M^1023 likely is such a case,
if inverting M isn't too hard)

~~~
Sniffnoy
Well -- the thing about using addition chains is, which addition chain do you
use? If we're talking about a general algorithm for any n, rather than for a
specific n, then we need to algorithmically generate our addition chain. So
going with the shortest addition chain is a bad idea, because it will take you
too long to find it. Any method of finding short addition chains based on
factoring is similarly a bad idea.

(It's worth noting that while finding short addition chains isn't known to be
NP-complete, a slight variant of the problem is.)

I think there are methods of finding short addition chains that reliably do
better than the binary method, though; in TAOCP volume 2, "Evaluation of
powers" section (which is a very good reference on addition chains in
general!), Knuth talks about the power tree method, which from a brief skim
seems like it might be better even when you include the time to find the
chain. I think there might also be better methods based on continued
fractions, going from memory.

~~~
bmh100
Would an optimizing compiler be able to understand these chains such that it
would be a net gain on execution time?

~~~
Sniffnoy
I'm not certain I understand what's being asked. I'm going to assume you mean,
"Could an optimizing compiler for a language with a built-in exponentiation
operator use short addition chains to speed up exponentiation at runtime?" In
which case, again, the answer depends on whether the exponent is fixed or is
allowed to vary.

For a fixed exponent, certainly; you could just spend additional time in the
compiler finding short addition chains for all the exponents that are used.
When exponents are allowed to vary, then we're in the case I described in my
post above -- maybe using the power-tree method, or certain other methods,
would improve on the binary method; but finding the _shortest_ addition chain
at runtime would not. (By which I mean, not with existing methods, and it
seems unlikely that there's any fast way.)

------
ivan_ah
How about computing any Fibonacci number in __constant __time? This can be
done using the eigendecomposition trick for computing powers of
matrices:[http://minireference.com/static/excerpts/fibonacci_in_consta...](http://minireference.com/static/excerpts/fibonacci_in_constant_time.pdf)

caveat: will fail for large N due to limited precision of floats... works fine
on paper though.

[update] caveat 2: assumes a model of computation where ϕ^N can be computed in
constant time, which only works on paper.

~~~
ajkjk
Raising numbers to large powers is still O(log n), though. Even if it's a
single term when written out as an equation.

~~~
ivan_ah
Thx for pointing this out. I updated the comment. It seems O(log n) is as low
as it will go, still I find it to be a neat (theoretical) trick.

------
aruss
The author needs to be careful in what model they are using to show asymptotic
runtime. Sure, this reduces to O(log n) matrix multiplications, but matrix
multiplication is not a constant time operation (and is actually between
quadratic/cubic in the dimension of the matrix, assuming arithmetic operations
are performed in constant time, which is also not always a great assumption).

------
okigan
Link bait ?

Original article:
[http://habrahabr.ru/post/148901/](http://habrahabr.ru/post/148901/) Post
date: 2012

More from the same author:
[http://habrahabr.ru/users/skidanovalex/topics/](http://habrahabr.ru/users/skidanovalex/topics/)

~~~
SkidanovAlex
Articles on habrahabr are published under CC-BY-3.0, a very permissive
license, that allows articles to be shared and adapted (in particular,
translated), so these guys don't really do anything that authors (in this
particular case myself) didn't let them do originally by publishing the
material on habrahabr.

~~~
dllthomas
I'm not sure "this is permitted under the license" and "this is clickbait" are
mutually exclusive.

------
jmount
More good ideas in this direction: Hashlife
[http://en.wikipedia.org/wiki/Hashlife](http://en.wikipedia.org/wiki/Hashlife)

------
carlob
> […] will calculate the 1 000 001st and 1 000 002nd numbers in 4 seconds.
> Classic version would take much more time, as the program has to operate
> with multi-thousand-digit numbers.

I'm not sure what classical algorithm of computing Fibonacci numbers OP is
mentioning, but using Mathematica on my machine takes about 4 ms to compute
the 1,000,001st Fibonacci number exactly…

Maybe by classical he means naive, in that case: doesn't it ever get tired to
pick as an example an algorithm that will be optimized away by any half-decent
compiler?

~~~
ekimekim
There's a closed formula for computing the nth fibonacci number - however it
remains a good example in numeric contexts, a Hello World equivalent.

------
grondilu
So basically this interpreter checks if a loop contains only linear
transformations of the variables in the lexical scope, and if so it builds the
corresponding matrix and computes the appropriate power of it.

If so that's fine but it's probably a rare use case, since loops usually have
conditional branches, don't they? If they don't the algorithm is poorly
written in the first place as it should have been noticed that there was a
linear transformation involved.

------
TheLoneWolfling
(Slightly tangential)

> I have recently read an article about calculating the Nth Fibonacci number
> in O(log N) arithmetic operations.

Either you assume fixed-precision integers, in which case you just use a
lookup table and call it O(1), or you assume arbitrary-precision integers, in
which case you _cannot_ have anything less than a O(n) algorithm, as the
number of digits of the Nth Fibonacci number is O(n) (as the Fibonacci
sequence is exponential).

I see this all over the place, and it's frustrating every time.

~~~
zodiac
I believe OP is aware of this as he specifically uses the phrase "arithmetic
operations" instead of "time", so adding two very large numbers only counts
once. It's true that you can't have an O(log N) time algorithm, as arithmetic
operations cannot be performed in constant time.

A similar situation arises in, say, mergesort when we say that the algorithm
takes O(n log n) comparisons. That's not the same as an O(n log n) time
algorithm; for instance if you think of sorting distinct integers, each
integer will take at least O(log n) bits to represent, and to compare two
integers you could potentially need to read all the bits.

~~~
TheLoneWolfling
In which case, as I said, just use a lookup table, which is constant-time
under those same constraints, and has the bonus of automatically caching
frequently used values for you.

And as the Fibonacci sequence is exponential, any lookup table will be
relatively small. (The maximum number of elements in your lookup table is
linear w.r.t. the number of digits of your (fixed-precision) integers. I mean,
even a 128-bit return value only requires 185 elements in the table, for a
total of 2960 bytes.)

> A similar situation arises in, say, mergesort

This is why you have to specify exactly what you mean by the big-O notation
you specify. Do you mean comparisons? Time? Time assuming everything but the
variables specified are constant? Etc.

>It's true that you can't have an O(log N) time algorithm, as arithmetic
operations cannot be performed in constant time.

Yep. You can perform some operations in sublinear time w.r.t. the number of
bits, however, by exploiting hardware-level parallelization. It would be
interesting to see what the best (in terms of big-O) possible implementation
of fib(n) would be, given constant-time operations for a fixed-length machine
word.

~~~
zodiac
> This is why you have to specify exactly what you mean by the big-O notation
> you specify.

He did specify it, it's O(log N) arithmetic operations.

Note: I don't mean "assuming arithmetic operations take constant time, we can
compute F(n) in O(log N) time", in which case you can correctly say that,
since arithmetic operations don't take constant time in arbitrary precision
arithmetic, we must be using fixed precision (or something essentially
equivalent), in which case a lookup table takes O(1) time and O(precision)
space. I mean "F(n) can be computed with O(log N) arithmetic operations". If I
perform my arithmetic operations over arbitrary precision integers, and I
implement simple arithmetic operations (so each arithmetic operation takes
O((log n)^2) time), then F(n) takes O((log n)^3) time.

------
crb002
I believe this technique is by Valiant?

Leslie G. Valiant: General Context-Free Recognition in Less than Cubic Time.
J. Comput. Syst. Sci. 10(2): 308-315 (1975)

------
mjcohen
How about

a+=1; b+=1/a;

