500 t/s is uncomfortably fast to me. Generating high quality answers at speeds f... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

pennomi 3 months ago | parent | context | favorite | on: The Era of 1-bit LLMs: ternary parameters for cost...

500 t/s is uncomfortably fast to me. Generating high quality answers at speeds faster than I can read is the point at which I feel like LLMs are magic.

I’m glad people are doing it though, and I’ll happily adapt to accessing inference at that speed.

azinman2 3 months ago [–]

That's important for new applications to emerge where this happens on lots of data. You can't run LLMs at scale on tasks like Google might (every webpage) when the cost of each document is so high to process. Interactive chatbots are just the tip.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact