Hacker News new | past | comments | ask | show | jobs | submit login

It’s basically their minimum cluster size for a reasonable model requires 8ish racks of compute.

Semi analysis did some cost estimates, and I did some but you’re likely paying somewhere in the 12 million dollar range for the equipment to serve a single query using llama-70b. Compare that to a couple of gpus, and it’s easy to see why they are struggling to sell hardware, they can’t scale down.

Since they didn’t use hbm, you need to stich enough cards together to get the memory to hold your model. It takes a lot of 256mb cards to get to 64gb, and there isn’t a good way to try the tech out since a single rack really can’t serve an LLM.

The cloud provider path sounds riskier since that’s two capital intensive businesses, chip design and production and running a cloud service provider.




This is some fantastic insight. That would also inflate the opex (power/space/staffing) needs as well as people need to consider all the networking gear to hook this stuff together. 400G nic/switches/cables aren't cheap and in some cases, very hard to obtain in any sort of quantity.

It does seem like an odd move in that case. I liken this to a company like Bitmain. Why sell the miners when you could just run them yourselves? Well, fact is that they do both. But in this case, Groq is turning off the sales. Who knows, maybe it just ends up being a temporary thing until they can sort all of the pieces out.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: