I'll agree with you, and add that inference speed is a big factor too. SDXL-ligt...

pennomi · 2024-02-28T16:35:12

500 t/s is uncomfortably fast to me. Generating high quality answers at speeds faster than I can read is the point at which I feel like LLMs are magic.

I’m glad people are doing it though, and I’ll happily adapt to accessing inference at that speed.

azinman2 · 2024-02-28T18:43:21

That's important for new applications to emerge where this happens on lots of data. You can't run LLMs at scale on tasks like Google might (every webpage) when the cost of each document is so high to process. Interactive chatbots are just the tip.