Whenever I see discussion of stuff like ChatGPT it seems like there is this common assumption that it will get better every year.
And in 10-20 years it’ll be capable of some crazy stuff
I might be ignorant of the field but why do we assume this?
How do we know it won’t just plateau in performance at some point?
Or that say the compute requirements become impractically high
No one has hit a model/dataset size where the curves break down, and they're fairly smooth. Usually simple models that accurately predict performance work pretty well nearby existing performance, so I expect trillion or 10-trillion parameter models to be on the same curve.
What we haven't seen yet (that I'm aware of) is whether the specializations to existing models (LoRa, RLHF, different attention methods, etc.) follow similar scaling laws, since most of the efforts have been focused on achieving similar performance on smaller/sparser models and not investing the large amounts of money into huge experiments. It will be interesting to see what Deepmind Gemini reveals.