> [re:matrix multiplication] However, an O(n^3) algorithm will always loose out ...

> [re:matrix multiplication] However, an O(n^3) algorithm will always loose out to an O(n^2*log(n)) algorithm, when the input gets big enough.

You have to be very careful about what 'big enough' means. In practice, Strassen multiplication is not faster than the naive algorithm until you get to the point where you're multiplying matrices with hundreds of rows/columns. Additionally, naive matrix multiplication is well suited to GPUs, while Strassen multiplication on the GPU requires temporary buffers and multiple jobs and sequencing and whatnot.

As a general rule, matrix multiplication with complexity better than the naive algorithm should probably not be used. Do naive matrix multiplication on the CPU. If you need it to be faster, do naive matrix multiplication on the GPU. If you need it to be faster, the numerical stability of your problem has probably already come a gutser and will get worse if you switch to Strassen or any of the other asymptotically faster algorithms.

And the algorithms faster than Strassen? Forget about it. After Strassen multiplication was invented, about a dozen or so other algorithms came along, slowly reducing that O(n^2.8) to about O(n^2.37188) or so. (most recently in 2022; this is still an area of active research) The problem is that for any of these algorithms to be faster than Strassen, you need matrices that are larger than what you can keep in memory. There is no big enough input that will fit in the RAM of a modern computer. One estimate I've heard is that if you convert every atom in the observable universe into one bit of RAM, and you use that RAM to multiply two 10^38 by 10^38 matrices to get a third 10^38 by 10^38 matrix, you're still better off using the O(n^2.8) Strassen multiplication instead of the state of the art O(n^2.37188) algorithm. The constant slowdown in the other algorithms really are that bad.