This is of course huge news after twenty years of people trying to crack this.
You need matrices of dimension 1.66x10^91 before this result yields half the number of steps. But if an algorithm exists which made effective omega = 2 then this could have huge implications. So any improvement in techniques, no matter how slight, is very welcome, as it may lead to much more significant improvements later on.
I believe there are approaches related to other constructions in group theory, such as the wreath product, which may eventually lead to better results. It's just a huge unknown at this point. (My knowledge of these techniques can be written on the back of a postage stamp otherwise I'd try and provide more detail.)
One certainly hopes that higher and higher tensor powers are not the way to get omega = 2, as the algorithms just become more and more complex.
Someone may correct me if I am wrong but I think it is still not known even what the best algorithm is which breaks matrices up into 3x3 and whether one exists with better complexity than Strassen even (which breaks the matrix up into 2x2). A huge number of CPU cycles have been used trying to find the best algorithm using a 3x3 decomposition.
This (http://www.siam.org/pdf/news/174.pdf) short, highly readable paper gives a high-level overview of the group-theoretic approach. Here are the key points, quoted from the paper:
* "What Cohn and Umans realized is that under certain conditions, matrix multiplication, too, can be embedded into the group algebra of a finite
group G, although the embedding is more complex and the group must be non-abelian to yield a nontrivial bound."
The conditions that must be satisfied are as follows:
* "...the group has three subgroups, H_1, H_2, H_3, with the property that whenever h_1 ∈ H_1, h_2 ∈ H_2, and h_3 ∈ H_3 with h_1 h_2 h_3 = 1, then h_1 = h_2 = h_3 = 1" (AKA the "triple-product property,") and
* "|H1||H2||H3| > Σd_i^3," where the d_i's are the dimensions of the square submatrices into which we decompose the original matrix. (AKA "beating the sum of cubes.")
The groups that have been found to have these properties are "wreath products of abelian groups with the symmetric group S_n."
The reason is that the border rank of 2x2 matrix multiplication is seven. This was proved 5 years ago by Landsberg, though it took quite some effort for me to track this down. Sometimes "yes" is the best one can do at short notice. Perhaps individuals who enjoy taking points away from people might consider this!
However, Winograd's algorithm can do nxn matrix multiplication with n^3/2 + 2n^2 multiplications (and many more additions than usual). This is only useful if the coefficients are very large because of the cost of the extra additions. Moreover, this is not a block algorithm, i.e. it cannot be applied recursively. Anyhow, for 2x2 matrices, n = 2 and so the total number of multiplications is 2^3/2 + 2n^2 = 12 which is more than the usual 8 multiplications (or 7 if you use Strassen). Similarly for n = 3 it is less efficient than the naive 27 multiplications (and even less efficient than Lederman's algorithm: 23 multiplications and Bard's algorithm: 22 multiplications), but for n = 4 we get 4^3/2 + 2x4^2 = 64 which is the same as the naive algorithm and for n = 6 we get 6^3/2 + 2x6^2 = 180 which is well under the usual 216 multiplications required by the naive algorithm and even less than the number of multiplications required if you use one level of Strassen. Already by n = 8 Strassen is better if you go by the number of multiplications alone. Of course in practice the coefficients need to be really large (perhaps tens of machine words) before either Winograd or Strassen are better than the naive algorithm.
Note that if you break a larger matrix up into 2x2, i.e. 4 blocks then the smaller submatrices no longer commute (matrix multiplication is not commutative). This is why Winograd's algorithm is not a block algorithm like the Strassen method.
I'm having trouble seeing the value of such a paper -- the size of matrices required for this result to have a clear advantage is significant, and that is completely ignoring constant factors, real world performance and parallelization considerations.
And CS is not particularly unique in hiring people who are popular. A lot of being a good professor is getting gathering money, which enables you to do good work, and brings prestige to the university, and the sad fact is that this is easier to do the more famous you are. Also part of the job is convincing people that you have actually done significant work, and the people who are good at that are usually are well-known.
While it's true that things like blog posts and tweets help drive people to realize this, really IMO most of the work is put in around these things; a good blog post will not replace the other factors in most cases.
I'm sort of fuzzily half-assedly thinking this might be applicable to image - and by extension video - work but maybe that's not the case at all. I want to be more excited, please enlighten me. :)
So, thank you. By all means keep researching and advancing.
The Strassen algorithm (http://en.wikipedia.org/wiki/Strassen_algorithm) is used in practice though, and is faster than the naive multiplication approach for moderately large matrices, so this kind of research sometimes produces practical speedups. Not in this case without significant further breakthroughs to really bring down the size of matrices that it wins on, though.
But I always wondered what FEA would look like if we removed the constraint of model size and computing power.
Not that that's not a valuable contribution, but the linked article seems kind of misleading, unless I'm misunderstanding the paper...
This paper analyses the 8th tensor power of the original algorithm A in what is really a tour-de-force and show that it leads to a better bound. So technically the algorithm (the eight tensor power of the original algorithm that CW used) was "known". The innovation here is showing that this is actually better for constructing a matrix multiplication algorithm than the original or second tensor power algorithms.
This paper is also of interest because it allows analysis of tensor powers of other algorithms. It's probably just the beginning of a slew of new records.
There is no question this is a landmark paper. There has been an enormous amount of work for a very long time on this subject.
For those with infinite patience, there is a slightly simplified version of CW presented here:
I don't doubt that at all (though I personally know nothing about the field), I was only criticizing the linked article.
However, this is in so many ways not my field, so I could easily have completely misunderstood.
You can reduce A^k to be a lot faster if you calculate A^2 then A^4 then A^8 .. A^N where N is 2^(Log(k)-1) (log base 2) then multiply A^N by A^(k-N) (which you can speedup using the same method if it is larger then 4)
That was a bit rough and quick, but the basic idea, is that doing (nxn)^k would likely be reduced to O(log(k)n^2.373) versus the more naive O(kn^2.373) which is a speedup of O(k/log(k)) (or it's inverse, I am not sure how it is best to represent the ratio) which is decent, I am sure there is a better solution out there.
 - http://en.wikipedia.org/wiki/Cayley%E2%80%93Hamilton_theorem
However, algorithms that do actually improve the complexity of powering when this technique isn't applicable do exist. Here is a paper on a recent one:
Also, Aaronson's a well-known computer scientist, so this helps establish that the result's both credible and important in the eyes of people who may not (forgive me) all be qualified to judge the paper on its merits.
A lot of people might know him from http://www.scottaaronson.com/writings/bignumbers.html, which is the article about that topic.
edit: Okay, rereading this article I recovered an insight that leads to a truer reason Aaronson's worth linking to. It is not that he's a famous professor (he is) so much that he is, in general, a great expositor. Read that big-number article and tell me he's not.