Hacker News new | past | comments | ask | show | jobs | submit login

So much old, out of date, factually incorrect, racist, sexist and even illegal information. I think it is already clear that training systems on everything is not the way forward, and about as reliable as the set of 80s Encyclopedias my mother refuses to throw out. The current tech needs to be trained on good data to produce good results, as it can't reason and gauge reliability or even pick up when its output is self contradictory.



The geniuses and stewards of our civilization of just a couple of decades ago were trained on this very data. We don't yet know what outcome we'll get by handing out the world to the people trained on "new, up to date, factually correct, egalitarian and legal" data.


We hope they used their reason to maintain their knowledge over the years, or at least updated their poor fashion choices. Or maybe not given so much effort is made to enforce moral opinions from biblical times.


I’d be willing to bet the proportion factual is roughly the same as today’s data. We love to think we’re better, but most people aren’t expert scientists who know all of the latest literature. Also, what information is illegal?


I'd bet that today's data is worse. Data used to be expensive, because you needed to print it or store it on expensive media. Now, bullshit is cheaper than ever to reproduce. And more publishing than ever is about making money rather than disseminating information. But I think the point holds that adding even more, different bullshit into the training data with a few new facts isn't going to improve the quality of output.

Illegal information depends on jurisdiction. Even before you get to governments restricting access to facts, unflattering opinions, or information they consider immoral, you have information considered fraudulent, defamatory and perhaps even treasonous. Secret information might also count here, since governments don't want that ending up in your training data even if they do trust your cloud storage. Illegal information can get you thrown in prison if you seek it out or publish it or sometimes even just read it.


I'd argue in an era of hyper competition the incentives to lie are higher than before.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: