Hacker News new | past | comments | ask | show | jobs | submit login
Stop using the Internet Archive as the sole host for preservation projects (reddit.com)
87 points by leotravis10 7 months ago | hide | past | favorite | 27 comments



I can't imagine any other mission-driven nonprofit entity with a real operating budget disregarding the law to such a degree that it threatens the very existence of its operations, including important services that have nothing to do with the CDL case.

The way that nonprofits usually respond to major lawsuits are board meetings and discussions with legal counsel to evaluate the relevant issues and come up with a plan of action that can reduce legal costs (including settlements) and mitigate risks to the organization's ongoing operations and status. Board members also have a fiduciary responsibility to the organization that may go against the wishes of the chair or CEO or founder.

So I have to ask: Why is the IA board different? How come this crisis wasn't averted years ago when the first C&Ds were issued, and presumably, legal counsel gave input and the board deliberated a response? Is it a true independent board or a rubber stamp board? Or, are they idealists who hew to the founder's values despite the existential threat to the organization? Does anyone have insights to share?

Or is there some other explanation, such as a plan B or an ace in the hole that will bring in funding or transfer of assets if and when the final appeals fail?

I am truly worried for the wonderful public domain archives that could disappear.


Do you remember the day after 9/11, when it felt like everything changed, the rules were being rewritten, and anything was on the table?

I felt something similar during the height of the early pandemic and lockdown measures, and I suspect that IA's decision was made in the same spirit. It's hard to look back on it now because so much shitty and divisive stuff happened since, but there was a genuine feeling at the time: Anything is possible and everything must be done, to lessen the blow of lockdowns. Do what's morally right and hope the details get worked out later. Giving people books seemed morally right. I'm personally still confident in that, even if the laws of intellectual poverty say otherwise. We'll see what the courts say.


Internet Archive's mission is to provide universal access to all knowledge. Short-term, they could fold, and provide less access to knowledge. But long-term, a legal precedent that helps not only the Internet Archive, but others to provide more access to knowledge is surely a preferable outcome for everyone involved. The publishers literally don't sell these books anymore. Come on.


Isn’t it fundamental in the nature of copyright holders and producers to want to destroy and limit access to past content, especially ones that they don’t profit from? There is a finite amount of attention and consumer money, and the attention particularly directed to old content they don’t own is something they will always view negatively and try to suppress. At some point, we must accept that the entire copyright system doesn’t really incentivise creation but control, even if it means destroying content and restricting access to it.”



That OP has every reason to be concerned if/when they lose this case. As mentioned, it opens the door to even more lawsuits (they're already fighting the record labels for instance) and I don't know if they can survive those.

It's best to take action now and backup what you want than waiting until it's too late.

The OP did post Brewster's response at the end of the post.


> it opens the door to even more lawsuits

At least on HN, the principle mode I see them in is paywall circumvention. That’s tangible revenue to news organisations. I love the Archive’s mission, but they go about it in a risky way.


That's not the Internet Archive (archive.org). The paywall circumvention links are at archive.is


i don't think you can say it's a kneejerk reaction. Brewster Khale's reply doesn't really address the concerns in the original post - if anything, it seems to validate those concerns. specifically:

"The Archive is an ongoing evolution towards "What is a Library in the 21st Century going to be?""

seems to confirm that the internet archive's primary goal is not the continued preservation of the archive.


That doesn't really confirm that, IMHO. The line before claims the opposite, in fact: "The Internet Archive has been around since 1996, and while that does not guarantee anything, it shows continuity of support and strong commitment to digital preservation with as much access as possible."


"stop relying ... SOLE ..."

I don't think the OP is wrong by advocating more copies, they're not telling anyone to not fill up the IA (from a quick skim).


It's tragic that this lawsuit is causing such a headache for the archiving community. But it was rather brazen for IA to do what they did during COVID. Not to say the appeals court won't see it differently. Although it's possible since the IA is a nonprofit, it might end up okay regardless, but who knows. Copyright is a tricky beast.


While the books are a great project, I'm stunned that IA didn't take the step to do it as a separate corporate entity, so as to not put the whole enterprise at risk. Surely, someone must have seen that there is a risk going against the deepest pockets in publishing? Or, were they just so all-convinced of their own righteousness that they just ignored it?

As my grandfather said, "Just being right isn't always the end of it, you also don't want to be dead right.". It was about driving, and how sticking to your idea of "being right/having the right of way" is pretty meaningless if you get run over by the other guy being wrong — seems to apply here.

I hope IA doesn't become dead right, but it looks like they are on that path.


Internet Archive is great, but it does make me nervous about how much would be lost if it goes down.


Especially since the Internet Archive - which is in San Francisco - has no full backup anywhere else.


What leads you to believe this?


Because Jason Scott said so? They worked on it briefly back in 2016 but nothing ever came of it.


Your statement is incorrect, then and now.


Well that’s good to hear! How much of it is replicated outside of SF? How often?


Probably because they had a fire a few years ago, and weren't able to fully recover everything.

https://blog.archive.org/2013/11/06/scanning-center-fire-ple...

"We lost maybe 20 boxes of books and film, some irreplaceable"


Also from that link: "No servers were affected. If some had been damaged, we have backups in different locations."

The fire happened in a scanning center, and the irreplaceable items were things they were right about to scan. Definitely unfortunate, but you can't digitally back up physical media that you haven't scanned yet, anyway.

Also, that was eleven years ago, which I would say is more than "a few" (I know, time's moving too quickly these days).


Legislation needs to be crafted to specifically give IA legal protections for many of its activities.

In the event IA loses its case, we must quickly step in to safeguard their data and establish a successor entity.


Legislation is crafted in the interests of those who would prefer IA didn't exist.

We need to have a Bittorrent style P2P system to ensure IA content survives and remains shareable in some form if their main storage goes down or ends up acquired/controlled by some untrustworthy actor.


Every item in the Internet archive has a torrent file served by the Archive. A list of every item can be retrieved. One could, if motivated, build the equivalent of an ArchiveTeam warrior; instead of grabbing content for archival, you would start up, query for the least available items, retrieve, and then seed the swarm (limited by local disk space available). I maintain a copy of the Internet Archive catalog/item metadata in Backblaze B2, it’s fairly lightweight (comparatively).

You’ll need ~500PB of coordinated decentralized storage to ensure high durability, but can still achieve a somewhat favorable outcome with ~220PB.


Good to know! Their magazine archive is amazing, so I kay try to maintain a seed for some of those categories.


> Legislation needs to be crafted to specifically give IA legal protections for many of its activities

Why?

Within certain scopes, sure. But the IA seems keen on pushing boundaries. That’s fine, but it is a strategic decision to stand on perilous ground.


> Why?

Many things that we do are not protected by constitutional rights. Because of weak protections, those activities are subject to be attacked or go away entirely at any time. LGBT marriage and abortion are two examples that get significant press coverage, but there are so many activities that deserve protections.

Archiving the history of our species, as long as it fits within certain parameters, should be fundamentally protected. It's important for the future to understand us [1]. The internet is one of the most observable means of taking measurement of our lives and zeitgeist. Keeping snapshots is important.

[1] I don't think this is a self-important delusion. I also don't think it's impossible for this reasoning to coexist with statements that decry social media as drivel and garbage. Our outputs can be both valuable and "worthless" at the same time.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: