It's easy enough to do a basic implementation, but getting good numerical stability and good performance is hard (and in Javascript, it's pretty much impossible with current implementations). See also the matrix multiplication benchmarks in the post: JS is 60x slower than Matlab, even though it's already using typed arrays. A naive triply nested for loop in C would probably perform similarly to the JS, which is to say a lot slower than something optimized for the characteristics of the processor (number of registers, vectorized floating point and cache sizes mostly, I'm not sure if Matlab is using multiple cores here).
There are a lot of them, they're all very picky, detailed inner loops, and they're already written and highly tested and optimized. People rewrite it all the time just to find that their versions are incomplete, slow, and buggy and nobody who wants to use LAPACK has patience for any of those three things.
Is it because it would take a long time or because it's inherently hard?