Have you heard of modular, mojo, and max?

viraptor · 2024-06-16T04:35:28.000000Z

They're designed for fast math and python similarity in general. Llama.cpp on the other hand is designed for LLM as we use it right now. But Mojo is general purpose enough to support many other "fast Python" use cases and if we completely change the architecture of LLMs, it's still going to be great for them.

It's more of a generic system with attention on performance of specific application rather than a system designed to cater to current LLMs.

dcow · 2024-06-16T07:00:03.000000Z

No. Max is an entire compute platform designed around deploying LLMs at scale. And Mojo takes a Python syntax (it’s a superset) but reimplements the entire compiler so you (or the compiler on your behalf) can target all the new AI compute hardware that’s almost literally popped up overnight. Modular is the company that raised 130MM dollars in under 2 years to make these two plays happen. And Nvidia is on fire right now. I can assure you without a sliver of a doubt that humans are most certainly redesigning entire computing hardware and the systems atop to accommodate AI. Look at the WWDC Keynote this year if you need more evidence.

viraptor · 2024-06-16T08:33:35.000000Z

Sure it's made to accommodate AI or more generally fast vector/matrix math. But the original claim was about "people surely aren’t going to design systems that cater to the current versions of LLMs." Those solutions are way more generic than current or future versions of LLMs. Once LLMs die down a bit, the same setups will be used for large scale ML/research unrelated to languages.