I don't know, I've been working with LLMs a lot recently and for the first time in a while I am wishing I had access to much more compute than I do. Imagine having the power of a H100 locally without having to pay thousands of dollars a month.
For inference, at least locally, the bottleneck is usually the memory bandwidth (and quantity, of course).
I hope that AI hype lead us to more memory and more memory bandwidth, because they are really lagging behind computer power increase from like 15 years already.
Oh, 100%. But you can do some pretty amazing things with fine-tuning LLMs too, and that is very compute intensive. Not to mention it's ridiculously hard even getting access to a cloud GPU instance nowadays.
I don't know, I've been working with LLMs a lot recently and for the first time in a while I am wishing I had access to much more compute than I do. Imagine having the power of a H100 locally without having to pay thousands of dollars a month.