To be clear, I can see why coding it purely recursively would be a bad idea. Sam...

GFK_of_xmaspast · on Jan 3, 2017

The mention of "around 10% or less" is based on a 2005 paper, and even in the past ten years there's been a lot of change in computer architecture (the machines used in the paper were mostly ultrasparcs, and they write "Note that for the (pentium 3 system), we could find no problem size for which Strassen’s is faster than ATLAS’s").

See also : https://www.quora.com/What-algorithm-does-BLAS-use-for-matri... https://github.com/flame/how-to-optimize-gemm/wiki#notice-on... http://www.cs.utexas.edu/~flame/pubs/GotoTOMS_revision.pdf

taeric · on Jan 3, 2017

That is an interesting note. I remember reading that the lower bound on multiplications was lower due to swapping a lot of them for additions. Knuth indicated that this did not matter in practice, since few people are multiplying large matrices.

My interest here is really just if that is still a good assumption. With the multiplications that happen in "deep" fields, I'm curious if there are now reasons to look to these algorithms.

GFK_of_xmaspast · on Jan 3, 2017

We might be talking at cross-purposes, my angle is that of dense linear algebra, single or double precision, on NUMA architectures with vector operations, and all respect to Knuth but it requires a different model for the machine.

If that's changed, like for example if you're computing over a different ring where arithmetic ops are expense as compared to cache misses and there's no stability issues, then yes the issue is back open and Strassen is put back on the table.

taeric · on Jan 3, 2017

Knuth seems to be in agreement with you, from my understanding. My comments are only of a general interest and curiosity, not of knowledge. (If that makes sense.)

That is, I certainly was not trying to second guess what you are saying. Just noting my curiosity on whether or not this has, or will, change. Knuth's view that I'm quoting is from Vol 2, which is pretty old at this point. I'd be a little surprised if it was outdated. But not shocked.

rrobukef · on Jan 3, 2017

Very large matrices is >50kx50k (maybe even bigger). Strassen is then just a tiny bit better when implementing both cache and allocation aware.(Tested in Fortran)

At least it's at a non practical size.

taeric · on Jan 3, 2017

It is always interesting to see just how valuable a "tiny bit" can be, though.

Granted, I confess most is almost always navel gazing. Increases in the "best" sorting/map algorithm have probably not lead to any noticeable improvements in modern codebases. Usually, those benefit more from being the low hanging fruit, not most computationally complex. (That is, I'm sure in these large matrix problems, there are other low hanging fruit that will yield better improvements.)