The primary push that OpenAI and most other AI-hype companies now are talking about, to improve performance, is to do the same thing but bigger. No doubt there will be architectural changes (certainly many are in fact needed), but that is hard to predict, and "make it bigger" is easier to plan for.
So, going from GPU's to some more focused chip architecture could reduce the power usage some, but more likely is that the total power usage will go up not down, as these companies try to address any problem with bigger scale. Not saying it will work, but it is the most likely thing to be tried, and it will take several years before they give up on that route.
Valid point. Scaling up seems to be the default approach for many AI companies to enhance performance, albeit with potential environmental concerns. While transitioning to more specialized chip architectures might mitigate power usage to some extent, the overarching trend seems to be towards increased power consumption in pursuit of larger scale. It's a trajectory worth monitoring closely.
Hard to fully predict the future, of course, but changes are the answer is "no".
Assuming you are comfortable with the current model performance rather than always trying to have the most performant model...
1. Lots of researchers are looking at ways to reduce model size, inference time, etc... We see smaller models outperforming older benchmarks/achievements. Look into model distillation to see how this is done for specific benchmarks, or new approaches like Mixture of Experts (MOE) that reduce compute burden but get similar results as older models.
2. If your concern is cost of running a model, then GPU tech is also getting faster/better so even if compute requirements stay the same, the user will get an answer faster + will be cheaper to run due to the new hardware.
According to Kaparthy, in his LLM talk, he mentions the limitation right now is compute and there are no signs otherwise. So in a sense, it is moores law all over again for GPUs + these models.
sounds terrific. we can see the model companies are seeking ways to improve the model performance as the priority currently. More computing power is needed now. I do agree with your two ideas, but I am concerning a lot about whether the current GPUs are fully utilized or if they truly need more GPUs to improve the performance...
For AI research groups: funding (venture capital, mostly) -> hardware that can be bought -> amount of compute that can be done. Efficiency improvements likely means: 'heavier' models deployed.
For end-user applications: as determined by that applications' constraints. Like TDP or battery life on a smartphone. Or (for subscription based online services): there's a price tag attached, and someone picks up the bill.
So, long-term it'll probably depend on how many useful applications are developed. If changing society completely, then sure expect a continuing stream of $$ to be thrown at the required compute. If mostly a hype, that $$ will quickly dry up.
The main issue with current LLMs is that they're autoregressive, i.e. you feed in the whole previous text to generate the next token, which seems extremely wasteful.
There's an email that gets posted on HN periodically that says something like "To the chagrin of AI researchers, the best way to improve AI is to throw more computing power at it"
So yes we still need new ideas like transformers, but the real enabler of more and more powerful AI is more and more powerful computers
Training a deep network using gradient descent is what takes all the compute. Actually using the trained network takes far, far less compute, and can be done on home computers with a good video card.
I suspect training algorithms will get better over time, however, the ambitions of models to train will likely keep things constrained for the next decade or so.
So, going from GPU's to some more focused chip architecture could reduce the power usage some, but more likely is that the total power usage will go up not down, as these companies try to address any problem with bigger scale. Not saying it will work, but it is the most likely thing to be tried, and it will take several years before they give up on that route.