TLDR: bigger models don’t necessarily mean better, future training data may be polluted with AI generated data now.
This seems like a big leap from current “known problems” to “doomed”.
It seems like smaller models are more desirable anyways (faster less resource usage etc), so a system that distills models to be smaller is more desirable than ever increasing model sizes. Additionally, there’s some evidence that using random internet data might not be as high quality as professionally written data (eg books, journalism) anyways, so I wouldn’t be surprised to see future models move away from internet scraping for everything but actual fact gathering. I think most people realize that entirely relying on “knowledge” trained into the model instead of a hybrid approach where the model handles the NLU/NLP aspect but farms out facts and computations to dedicated systems/APIs leads to worse hallucinations and results anyways.
What I want to read is the doom theory related to copyright issues, or cost issues, or energy usage issues. Those are the open questions. There was a recent article saying GitHub Copilot cost twice what they charged. If true, that spells doom for the sustainability of the product. I want to hear that Google thinks training Bard on daily facts is too expensive compared to search engines, that’s the warning signs for “doom”.
https://www.javatpoint.com/ubuntu-vs-kubuntu
https://www.google.com/search?q=ubuntu+vs+kubuntu