Hacker News new | past | comments | ask | show | jobs | submit login

The GEMM is O(n^3) actually. Transformers are quadratic in the size of their context window.





I read that as a typo given the next sentence.

I’m on the fence about cubic time. I was mostly thinking of exponential and factorial problems. I think some very clever people can make cubic work despite my warnings. But most of us shouldn’t. General advice is to be ignored by masters when appropriate. That’s also the story arc of about half of kung fu movies.

Did chess solvers really progress much before there was a cubic approximation?


> I read that as a typo given the next sentence.

Thank you for the courtesy.

> I think some very clever people can make cubic work despite my warnings.

I think you're selling yourself short. You don't need to be that clever to make these algorithms work, you have all the tools necessary. Asymptotic analysis is helpful not just because it tells us a growth, but also because it limits that growth to being in _n_. If you're doing matmul and n is proportional to the size of the input matrix, then you know that if your matrix is constant then the matmul will always take the same time. It does not matter to you what the asymptotic complexity is, because you have a fixed n. In your program, it's O(1). As long as the runtime is sufficient, you know it will never change for the lifetime of the program.

There's absolutely no reason to be scared of that kind of work, it's not hard.


Right but back up at the top of the chain the assertion was that if n grows as your company does then IME you’re default dead. Because when the VC money runs out you can’t charge your customers enough to keep the lights on and also keep the customers.

I mean, the general advice is that if you actually understand what you're doing, then you don't need general advice.



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: