

Matrix multiplication in O(n^2.373) - davepeck
http://www.scottaaronson.com/blog/?p=839

======
wbhart
Note that the bound (as given in the paper) is now even better than the one
mentioned in this blog (and the title).

This is of course huge news after twenty years of people trying to crack this.

You need matrices of dimension 1.66x10^91 before this result yields half the
number of steps. But if an algorithm exists which made effective omega = 2
then this could have huge implications. So any improvement in techniques, no
matter how slight, is very welcome, as it may lead to much more significant
improvements later on.

~~~
Rickasaurus
So now we know that the bound is at least 2.3727. Is it safe to assume that as
they use higher and higher tensor powers the bound will shrink asymptotically
toward the real bound?

~~~
wbhart
You mean at most 2.3727. It's not really a safe assumption that the bound will
shrink asymptotically to the real omega as far as I know. But it is suspected
that higher tensor powers may drop the exponent slightly.

I believe there are approaches related to other constructions in group theory,
such as the wreath product, which may eventually lead to better results. It's
just a huge unknown at this point. (My knowledge of these techniques can be
written on the back of a postage stamp otherwise I'd try and provide more
detail.)

One certainly hopes that higher and higher tensor powers are not the way to
get omega = 2, as the algorithms just become more and more complex.

Someone may correct me if I am wrong but I think it is still not known even
what the best algorithm is which breaks matrices up into 3x3 and whether one
exists with better complexity than Strassen even (which breaks the matrix up
into 2x2). A _huge_ number of CPU cycles have been used trying to find the
best algorithm using a 3x3 decomposition.

~~~
jules
Is it known that 2x2 matrices can't be multiplied with fewer than 7
multiplications?

~~~
wbhart
Sorry, I got confused there for a minute. The answer is in fact just "yes".
I'm not sure why this was voted down.

The reason is that the border rank of 2x2 matrix multiplication is seven. This
was proved 5 years ago by Landsberg, though it took quite some effort for me
to track this down. Sometimes "yes" is the best one can do at short notice.
Perhaps individuals who enjoy taking points away from people might consider
this!

However, Winograd's algorithm can do nxn matrix multiplication with n^3/2 +
2n^2 multiplications (and many more additions than usual). This is only useful
if the coefficients are _very_ large because of the cost of the extra
additions. Moreover, this is not a block algorithm, i.e. it cannot be applied
recursively. Anyhow, for 2x2 matrices, n = 2 and so the total number of
multiplications is 2^3/2 + 2n^2 = 12 which is more than the usual 8
multiplications (or 7 if you use Strassen). Similarly for n = 3 it is less
efficient than the naive 27 multiplications (and even less efficient than
Lederman's algorithm: 23 multiplications and Bard's algorithm: 22
multiplications), but for n = 4 we get 4^3/2 + 2x4^2 = 64 which is the same as
the naive algorithm and for n = 6 we get 6^3/2 + 2x6^2 = 180 which is well
under the usual 216 multiplications required by the naive algorithm and even
less than the number of multiplications required if you use one level of
Strassen. Already by n = 8 Strassen is better if you go by the number of
multiplications alone. Of course in practice the coefficients need to be
really large (perhaps tens of machine words) before either Winograd or
Strassen are better than the naive algorithm.

Note that if you break a larger matrix up into 2x2, i.e. 4 blocks then the
smaller submatrices no longer commute (matrix multiplication is not
commutative). This is why Winograd's algorithm is not a block algorithm like
the Strassen method.

------
pantaloons
A bit of a shallow point, but will anyone ever actually implement this
algorithm?

I'm having trouble seeing the value of such a paper -- the size of matrices
required for this result to have a clear advantage is significant, and that is
completely ignoring constant factors, real world performance and
parallelization considerations.

~~~
jey
No, these algorithms are impractical, but their utility is in shedding more
light on the matrix multiplication problem. So it enables further study, which
could lead to more practical advances.

~~~
dj_axl
Right, from what I can tell the Strassen algorithm has been implemented but
the others are typically pseudocode only...

[http://stackoverflow.com/questions/1920031/strassens-
algorit...](http://stackoverflow.com/questions/1920031/strassens-algorithm-
for-matrix-multiplication)

~~~
wbhart
Yes, Strassen is straightforward to implement and makes a huge difference.
Another practical algorithm applicable over small Galois fields is the "method
of the four Russions" (none of whom are Russian). It's often used as a
basecase for Strassen when multiplying matrices over GF2, which has a number
of applications in the real world, including crypto research.

------
alex-g
The author, Virginia Vassilevska Williams, is married to Ryan Williams, who
last year proved a major result in circuit complexity lower bounds (NEXP is
not contained in ACC0). It's interesting to see that both of them are making
important discoveries, in different areas of computational complexity.

~~~
antics
A bit more context: Scott Aaronson called Ryan Williams' discovery "one of the
best of the decade". He recently got hired as professor at Stanford, and his
advisor was the legendary Manuel Blum, who teaches at CMU, where both his wife
and son are also tenured CS professors.

~~~
greeneggs
A bit more context: Stanford did _not_ hire Virginia Williams, and she is
looking for a job. Academic hiring in CS departments is driven by tweets.
Hence the sensational blog post. /cynic (I do wish her luck.)

~~~
antics
I wouldn't call it sensationalist. This is an important paper.

And CS is not particularly unique in hiring people who are popular. A lot of
being a good professor is getting gathering money, which enables you to do
good work, and brings prestige to the university, and the sad fact is that
this is easier to do the more famous you are. Also part of the job is
convincing people that you have actually done significant work, and the people
who are good at that are usually are well-known.

While it's true that things like blog posts and tweets help drive people to
realize this, really IMO most of the work is put in around these things; a
good blog post will not replace the other factors in most cases.

------
spaznode
Forgive my ignorance, but does anyone know any real world examples where
something like this might improve a certain type of work?

I'm sort of fuzzily half-assedly thinking this might be applicable to image -
and by extension video - work but maybe that's not the case at all. I want to
be more excited, please enlighten me. :)

~~~
mattdeboard
You are probably about as excited about this as people in the 20s and 30s were
about quantum theory. "Immediate practical impact" is not how one qualifies
research, at least not from my perspective.

~~~
spaznode
That's a perfectly good response. Just wanted to know if this was a "people
quietly shuffling and re-implementing some core algo" kind of announcement or
more general scientific theory.

So, thank you. By all means keep researching and advancing.

------
FaceKicker
Based on skimming the "our contribution" section of the paper (page 2), it
seems like this isn't a new algorithm, but rather a new analysis of the same
algorithm (Coopersmith Winograd) that proves a tighter lower bound on its
runtime.

Not that that's not a valuable contribution, but the linked article seems kind
of misleading, unless I'm misunderstanding the paper...

~~~
wbhart
That is sort of true. The _approach_ to constructing an algorithm is not new,
being essentially due to Coppersmith-Winograd. They used an algorithm A (which
doesn't actually multiply matrices, but from which a matrix multiplication
algorithm can be constructed) which yielded omega < 2.39. They noted in their
own paper that if the second tensor power of algorithm A is used instead of A
when constructing the matrix multiplication algorithm, they get the bound
2.376.

This paper analyses the 8th tensor power of the original _algorithm_ A in what
is really a tour-de-force and show that it leads to a better bound. So
technically the algorithm (the eight tensor power of the original algorithm
that CW used) was "known". The innovation here is showing that this is
actually better for constructing a matrix multiplication algorithm than the
original or second tensor power algorithms.

This paper is also of interest because it allows analysis of tensor powers of
other algorithms. It's probably just the beginning of a slew of new records.

There is no question this is a landmark paper. There has been an enormous
amount of work for a very long time on this subject.

For those with infinite patience, there is a slightly simplified version of CW
presented here:

[http://bioinfo.ict.ac.cn/~dbu/AlgorithmCourses/Lectures/MAnd...](http://bioinfo.ict.ac.cn/~dbu/AlgorithmCourses/Lectures/MAndersonSBarman2009.pdf)

~~~
FaceKicker
> There is no question this is a landmark paper. There has been an enormous
> amount of work for a very long time on this subject.

I don't doubt that at all (though I personally know nothing about the field),
I was only criticizing the linked article.

~~~
wbhart
Yes, what you stated is not incorrect. The linked blog is not terribly clear
on what has been done. Unfortunately what has been done is very subtle, so I
can't really envisage a good blog article announcing this.

------
pluies_public
For those of us a bit light on the theoretical CS aspect, is this article
ironical? It sounds like _quite_ a small improvement...

~~~
_mrc
The difference between n^2.373 and n^2.376 (i.e. n^0.003) is about 1.3% for
n=100, 2% for n=1000, and keeps getting better for large n.

[http://www.wolframalpha.com/input/?i=n%5E2.373%2Fn%5E2.376%2...](http://www.wolframalpha.com/input/?i=n%5E2.373%2Fn%5E2.376%2C+n%3D100)

~~~
LokiSnake
When you start applying real numbers to n, using O() doesn't make sense. When
comparing the two in a slightly practical sense, the constant factor has to be
taken into account.

~~~
regularfry
If I understand correctly, in this case you can directly compare them - the
algorithm is the same, so the constant factors are the same.

However, this is in _so_ many ways not my field, so I could easily have
completely misunderstood.

~~~
dalke
You do not understand correctly. Compare the ratio of the quadratic f(n) =
1E100 + n __2 to the linear g(n) = 1E100 + n. Using n=100 and n=1000 the ratio
is effectively 1.0. You need to get to
n=100000000000000000000000000000000000000000000000000 before the ratio is 2.

~~~
regularfry
I see your point - we can compare the powers that n is raised to, but without
knowing the constant factor we can't know the proportion of the result that it
is responsible for.

------
amund
The lower bound is Omega(n^2 lg n), but waiting for an algorithm (if there is
one). Ref: Ran Raz - <http://dl.acm.org/citation.cfm?id=944299> (similar
complexity for Matrix Inversion due to the relationship between matrix
inversion and multiplication, ref Introduction to Algorithms, 1st ed, pp
762-765, wrote the proof here
[http://amundtveit.info/publications/2003/ComplexityOfMatrixI...](http://amundtveit.info/publications/2003/ComplexityOfMatrixInversion.pdf))

------
ot
For those interested in the details, the (excellent, as always) blog of
Richard Lipton sheds some light

[http://rjlipton.wordpress.com/2011/11/29/a-breakthrough-
on-m...](http://rjlipton.wordpress.com/2011/11/29/a-breakthrough-on-matrix-
product/)

------
flourpower
I couldn't find this by googling - if A is an n by n matrix, can you get A^k
strictly faster than you can get a product of k arbitrary n by n matrices?

~~~
Rhapso
Just spending 30 seconds looking at it, think of it like the bit shifting
solution to multiplication, do a related faster operation then correct.

You can reduce A^k to be a lot faster if you calculate A^2 then A^4 then A^8
.. A^N where N is 2^(Log(k)-1) (log base 2) then multiply A^N by A^(k-N)
(which you can speedup using the same method if it is larger then 4)

That was a bit rough and quick, but the basic idea, is that doing (nxn)^k
would likely be reduced to O(log(k) _n^2.373) versus the more naive O(k_
n^2.373) which is a speedup of O(k/log(k)) (or it's inverse, I am not sure how
it is best to represent the ratio) which is decent, I am sure there is a
better solution out there.

~~~
wbhart
Multiplying k arbitrary matrices can be done in less time than O(kn^omega).
But certainly for k > 3 you wouldn't multiply naively k times as you've
pointed out.

------
dbbo
So, it's a blog post that announces a paper. Why not just post the link to the
paper?

~~~
wging
As smart as the Hacker News set may be, most people are not experts in the
field and could do with a little context.

Also, Aaronson's a well-known computer scientist, so this helps establish that
the result's both credible and important in the eyes of people who may not
(forgive me) all be qualified to judge the paper on its merits.

~~~
dbbo
Context: fair point. I can't say I'm familiar with Aaronson, however (of
course I studied math and not CS).

~~~
wging
"Well-known" in a particular field doesn't necessarily mean "household
name"... do you know who Tim Gowers is, for example?

<http://en.wikipedia.org/wiki/Scott_Aaronson>

A lot of people might know him from
<http://www.scottaaronson.com/writings/bignumbers.html>, which is _the_
article about that topic.

edit: Okay, rereading this article I recovered an insight that leads to a
truer reason Aaronson's worth linking to. It is not that he's a famous
professor (he is) so much that he is, in general, a _great expositor_. Read
that big-number article and tell me he's not.

