Hacker News new | past | comments | ask | show | jobs | submit login

If there's copyrighted material on the Archive you find valuable, download and save it now before it's the target of the next lawsuit. I expect it won't be long before you can't get old magazines or other nostalgic and niche material. This is exactly what we all said would happen when the IA gave a giant "Fuck You!" to copyright holders with the Emergency Library. The era of copyright holders ignoring the IA if they're not actively making money off the material is officially over.



The IA managed for years to cruise under the radar by mostly hosting things no publisher or label cared about and removing it if someone did care. But now that the ant’s nest is kicked up it’s easy to imagine a lot of things are now seen as fair game.


It would seem prudent for them to partition their business into two separate entities - one which handles non-contentious archiving (the wayback machine, etc) and another one which handles the stuff which seems to attract repeated lawsuits.


In other words, the archive itself needs archived --- preferably in a highly distributed and fault-tolerant fashion.


Torrents and pinned IPFS stashes stored on machines in countries with "free-er" copyright laws would probably be an easy way to start.


There are hundreds of people on r/datahoarders that would gladly mirror it many times over.


Do they have the space? From what I've read, the Internet Archive is huge.


How big are we talking?


90 PB as of a couple years ago: https://www.protocol.com/internet-archive-preserving-future

> The web archive alone is about 45 petabytes — 4,500 terabytes — and the Internet Archive itself is about double that size (the group has other collections, like a huge database of educational films, music and even long-gone software programs).


How the fuck do you get that unit conversion wrong and not fixed in the 3 years the article has been published.


Oh damn, didn't notice the conversion issue. Not sure which one is correct (9PB or 90PB). I'd expect it to be 90, but either one is a _lot_ of data.


Yeah, the NEL will probably go down as one of the dumbest decisions in Internet history. The IA got away with a lot of stuff and then just decided throwing out all semblance of copyright credibility was the right call, for some reason. That's a cat they can't put back in the bag either, as long as the same people are running the show there's no reason to believe the IA won't do future stunts either.


Counterpoint:

It wasn't like IA people woke up one day and decided to give publishers the middle finger because they felt like it. Trump had declared a national emergency due to COVID-19, under the guise of which Biden later tried to forgive hundreds of billions of dollars of student loans because he felt like it. Libraries were shut down and inaccessible to a lot of people during pandemic lockdowns. The NEL involved loaning people DRM'd books, not giving them out without controls. NEL also offered an email address copyright holders could contact to remove their content from the NEL; granted, visibility into that process wasn't great, but if you're a publisher or make a living as a self-published author and you aren't keeping tabs on major book-related news, isn't that on you?

IA's lending library is made up of scans. No casual reader wants to read scanned digital books.

I don't know what overtures were made to the publishers who ended up suing in part over the NEL (the lawsuits were more over digital lending of format-shifted works not specific to the NEL), but IA didn't try to blindside publishers in general. They had extensive dialogue with university presses, documented here: https://blog.archive.org/2020/04/27/forging-a-cooperative-pa...


if you're a publisher or make a living as a self-published author and you aren't keeping tabs on major book-related news, isn't that on you?

Most people would think they don't have to worry about it because there are these things called laws that say you can't make unlimited copies of copyrighted material. Which is what the IA did, independent of the fact they didn't allow their users to also make unlimited copies.

Don't get me wrong, I hate the very concept of intellectual property. But there are right and wrong people to engage in political activism through flagrant lawbreaking and the thirty year old forty million dollar nonprofit is the wrong frickin' person. Being morally justified doesn't make it not apocalyptically stupid and now the stupendously obvious result is happening and I don't even feel bad anymore.


It's ok. They did the right thing, and continue to do so even without your approval or sympathy. You can move on with your life, they will continue to fight the good fight as they always have.


If that was guaranteed I'd be more than happy. But it isn't, because their foolish actions have endangered the existence of the Archive. These first two lawsuits are just the beginning. There's blood in the water now.


If they are going to continue this weird strategy of tanking lawsuits that any competent counsel would advise they're going to lose, they should split off the more-irreplaceable and more frequently used part of the archive (that is, the web archive) into a separate legal entity, so that attacks on one don't endanger the other.

As you seem concerned about the existence of the archive being endangered I presume you agree.


Oh certainly, if IA wants to split off an independent entity for bulldog IP freedom advocacy I'll open my wallet right now. But putting all the legal and semi-legal content I care about at risk because you want to swing around a big freedom dick doesn't make you a hero; it makes you a jackass.


The thought did cross my mind that being destroyed by lawsuits might actually be their objective...


Nobody can make unlimited copies of anything. There isn't an unlimited amount of bandwidth or storage capacity or users.

Did the finite number of copies they actually made at any given time even exceed the number of copies locked up in closed libraries everywhere?


> That's a cat they can't put back in the bag either

Trivial fix here - split off the non-contentious archiving (the web archive / wayback machine) into a separate legal entity from the contentious archiving.

That preserves the more-irreplaceable material (the web archive.)


>I expect it won't be long before you can't get old magazines or other nostalgic and niche material.

I will point out that the Internet Archive is among the foremost source of warez[1] today.

Being brutally honest, what Internet Archive is doing these days isn't archiving or academic fair use anymore. They are flagrantly violating copyright, or enabling violations of copyright, and that is straight up not okay.

As an aside, do not be fooled into thinking all those ISOs[1] were provided by the rightsholders. They were not, even if they seem so at first glance. I would say the way they present the information is disingenuous at best, deliberate obfuscation at worst.

[1]: https://archive.org/details/cdromimages?tab=collection


Yeah, and if you read the ruling in the controlled digital lending case, it was a mess. IA wasn't making sure the print copy came off the shelf, and they were linking to their own site (BWB) to sell copies of the book. Not a well built case to take to court.


Isn't linking to a place to buy a licensed copy an argument in favor? It implies that they sincerely believe that the "free download" doesn't hurt the market for the book, given that they expect people to still be willing to pay for it.

And not making sure the book is removed from the self seems kind of irrelevant when the book is on a shelf in a closed library where nobody can borrow it anyway. Are we really supposed to believe that the number of copies they lent out during COVID exceeded the number of copies locked up in libraries everywhere?


If I put up a copyrighted work and link from that work to my own company to buy a copy, that is not going to look very good in court. It also undermines my claimed altruistic reasons. Linking to the publisher site or even Amazon to purchase would have been better to show increasing marketshare.

Ignore the covid emergency library. The entire CDL was never setup to do what was claimed. Libraries uploaded their holdings list, the books were made available digitally, and nothing was done to verify the books came off the shelf when a digital copy was checked out. The emergency library simply pushed publishers to stop looking the other way.

What IA did here actually hurt possibility of CDL or an interesting court case challenging various pieces of copyright.


> If I put up a copyrighted work and link from that work to my own company to buy a copy, that is not going to look very good in court. It also undermines my claimed altruistic reasons.

How is a free copy less altruistic when you also provide one for sale? Does the free copy make it more likely to buy the paid one? What would it imply about the alleged damages to the copyright holder if that were true?

> Libraries uploaded their holdings list, the books were made available digitally, and nothing was done to verify the books came off the shelf when a digital copy was checked out.

Wasn't this the difference between CDL and the emergency library?

And the argument for the latter is presumably something like this: They could go contact every closed library and inventory their books, but the emergency is happening right now and in many cases contacting them has high latency or isn't possible because they're closed, so they're going to temporarily guestimate that there are more books in libraries everywhere than they're lending out. Which isn't a bad guess, and if they went over by a slim margin in some specific case, it's a trivial amount of harm that only occurs during a temporary emergency, i.e. the effect of that on the market for the book is negligible.


The first is I'm now trying to profit from someone else's copyrighted work. That's always going to be an issue.

For the later, read the judges ruling. The EL was not the issue, but what came out is that the CDL was never what they claimed. By being so cavalier they ruined what could have been a great test case.


> The first is I'm now trying to profit from someone else's copyrighted work. That's always going to be an issue.

It's an authorized copy. Is Amazon in trouble because they sell used books, i.e. are trying to profit from someone else's copyrighted work?

Doesn't the copyright holder also profit from selling the used books, by taking them off the market so the next customer has to buy a new one?

> The EL was not the issue, but what came out is that the CDL was never what they claimed. By being so cavalier they ruined what could have been a great test case.

It seems clear that the judge in that case was intent on finding against the Internet Archive, and explicitly stated that they wouldn't have been allowed to win regardless:

> Even full enforcement of a one-to-one owned-to-loaned ratio, however, would not excuse IA’s reproduction of the Works in Suit.

See also concluding that the use wasn't non-commercial despite being a non-profit who didn't charge for it, because members of the public might have liked that they did this and made a donation. Which likewise moots the implications of them selling used books (as they're indisputably allowed to do), because the next excuse was already lined up.

One wonders how a use could ever be non-commercial under this line of reasoning.


Like the rest of the internet, it is not the IA's job to verify content users post isn't a copyright violation.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: