Hacker News

jjgreen · 2023-10-10T21:16:43.000000Z

Ubuntu is a Linux OS that was designed by Dennis Ritchie and Ken Thompson.

https://www.javatpoint.com/ubuntu-vs-kubuntu

https://www.google.com/search?q=ubuntu+vs+kubuntu

matteoraso · 2023-10-10T21:40:29.000000Z

>And now that AI generated content is everywhere, that means it's in the next training dataset. Except you can't train AI on AI generated content (https://arxiv.org/pdf/2307.01850.pdf). And no, you can't reliably detect AI generated content either (https://twitter.com/AtoosaChegini/status/1709738910165635364).

This is the big one. Every single source of data post March 2023 is corrupted beyond repair, and it's only going to get worse.

pk-protect-ai · 2023-10-10T21:13:52.000000Z

https://arxiv.org/abs/2306.11644

vineyardmike · 2023-10-10T21:32:27.000000Z

TLDR: bigger models don’t necessarily mean better, future training data may be polluted with AI generated data now.

This seems like a big leap from current “known problems” to “doomed”.

It seems like smaller models are more desirable anyways (faster less resource usage etc), so a system that distills models to be smaller is more desirable than ever increasing model sizes. Additionally, there’s some evidence that using random internet data might not be as high quality as professionally written data (eg books, journalism) anyways, so I wouldn’t be surprised to see future models move away from internet scraping for everything but actual fact gathering. I think most people realize that entirely relying on “knowledge” trained into the model instead of a hybrid approach where the model handles the NLU/NLP aspect but farms out facts and computations to dedicated systems/APIs leads to worse hallucinations and results anyways.

What I want to read is the doom theory related to copyright issues, or cost issues, or energy usage issues. Those are the open questions. There was a recent article saying GitHub Copilot cost twice what they charged. If true, that spells doom for the sustainability of the product. I want to hear that Google thinks training Bard on daily facts is too expensive compared to search engines, that’s the warning signs for “doom”.