There's a schlocky Victorian pulp novel that's of no use to anyone - except that it happens to contain a fantastically detailed description of an abandoned saltings in my hometown that nobody ever thought to record in any way. For me, those two paragraphs are gold.
If the novel hadn't been digitised as part of Google's Books Archive Project, I wouldn't have been able to find those two paragraphs. Digitisation not only creates backups, it enables completely new ways of interacting with those texts (eg Google's Ngram Viewer).
Well I guess your one valuable paragraph that matters only to you justifies backing up millions (billions?) of human and soon to be AI generated books, because someone, somewhere, at some time will find a line or two valuable. Maybe.
I think that's the case. IIRC The British Library has copies of all published material in the UK, including flyers and such.
What seems banal and useless to you, might be extremely important for future historians, and to be honest, books are pretty compressible and storage is cheap.
I think its a law in almost all nations in fact that forces publishers to sent a copy of everything they publish to a national archive like that (the US equivalent is the Library of Congress). If you bring up the topic of preservation, most people won't understand why, or even be opposed to the idea, goes to show that sometimes its a good idea to ignore the ignorant public.
A rule that dates back to when books were rare, expensive, and useful I suspect.
Many books are just electronic garbage at this point, and backing them all up is like going to a landfill and saying "We should make another one, exactly like this one, in case this landfill proves to be valuable to someone, someday."
It might be useful for LLM training to produce garbage. Although many say they already do a good job at that already.
I don't think you seriously suggest that there aren't books worth saving published even today, so the argument left over is who determines what is worth saving? The only reasonable answer to that question is: nobody.
I think that there are books published today - especially published today - that aren't worth "saving".
I'd start with every single AI generated book that's said to be available on Amazon (300 or so iirc).
And people can and do judge things all the time: Nobel prizes, juried contests, review boards, movies, music, and yes - even books! - as being worthwhile or garbage. Rotten tomatoes, Nobel prize committees, and so on.
So yeah, I think your answer is not the only reasonable one. And maybe 41% is way too low.
Let’s say there are ten billion such marginally-useful books published by the time the next few decades. Many epub books are like a couple MB. So 30 petabytes total. That’s something you could fit in one room. One rich guy could buy enough hard drives to do that today. Why not?
There's a schlocky Victorian pulp novel that's of no use to anyone - except that it happens to contain a fantastically detailed description of an abandoned saltings in my hometown that nobody ever thought to record in any way. For me, those two paragraphs are gold.
If the novel hadn't been digitised as part of Google's Books Archive Project, I wouldn't have been able to find those two paragraphs. Digitisation not only creates backups, it enables completely new ways of interacting with those texts (eg Google's Ngram Viewer).