Hacker News new | past | comments | ask | show | jobs | submit login

I'll agree with you, and add that inference speed is a big factor too.

SDXL-ligtning/cascade can generate images in 200ms which is fast enough to fit in a web request, and paradoxically makes it even cheaper to generate.

And using groq at 500 t/s is wild compared to any of the other platforms.




500 t/s is uncomfortably fast to me. Generating high quality answers at speeds faster than I can read is the point at which I feel like LLMs are magic.

I’m glad people are doing it though, and I’ll happily adapt to accessing inference at that speed.


That's important for new applications to emerge where this happens on lots of data. You can't run LLMs at scale on tasks like Google might (every webpage) when the cost of each document is so high to process. Interactive chatbots are just the tip.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: