
Activists rally to save Internet Archive as lawsuit threatens site - KANahas
https://decrypt.co/31906/activists-rally-save-internet-archive-lawsuit-threatens
======
maddyboo
I'm surprised and disappointed that the IA ever thought this would go over
well, though I do respect the reasons for which they took these actions.

Circumstances around the COVID-19 lockdowns have been very unusual. If you
consider the hundreds of millions (+?) of books in libraries around the world
that were temporarily inaccessible to people, including children and students,
I feel the IA's actions were somewhat justified, perhaps not legally, but at
least morally. Young people losing months worth of time that could have been
spent reading and learning constitutes an emergency in my mind, and the IA
stepped up to help lessen the societal damages.

You could consider that because these many millions of book licenses were
temporarily "invalidated" by circumstance, the Internet Archive simply
rebalanced the scales by providing the same people who would have been
deprived of books at their local/school library a similar means to access
them. Such a scenario probably never even crossed the mind of anyone involved
in IP law before COVID-19, but if it had there might be some provisions on the
books for cases like this.

On another note, I would actually expect publishers' sales to increase during
the lockdowns for the above reasons of books being held in purgatory at
libraries. That publishers should profit from a global crisis in this way
seems wrong.

Again, none of this is to say that the Internet Archive's actions were legally
justified, but I think to equate them to pure piracy is ignoring the nuance
and context of this extraordinary situation.

I desperately hope the publishers drop the suit and come to an agreement with
the IA that doesn't result in the loss of this treasure trove of knowledge or
the end of the organization.

~~~
gigama
My thoughts exactly, thanks for summing up so nicely. (upvoted)

IA's decision was apparently done with good intentions and under extraordinary
circumstances. The "open sharing" was not meant to be permanent, and anyway
has been discontinued. IA never sold the content for profit so any "harm"
should be similarly limited.

Publishers would generate nothing but good karma by dropping their lawsuit. As
an author it is more valuable to me to know my words and ideas are being
preserved so they can be read and absorbed by future generations. Trying to
extract every last dollar from such works undermines the pursuit of knowledge
and human potential.

------
jacquesm
I have three favorite properties on the web: Wikipedia, the Khan Academy and
the Internet Archive.

Whoever thought this was a good idea should own up to it and step down, this
was a dumb move if there ever was one and risking one of the prime - and very
fragile - properties on the web like this was highly irresponsible. Keep in
mind that IA already has plenty of enemies who are continuously monitoring it
and hoping to find a way to shut them down, then hand them this sort of thing
on a golden platter. Beyond stupid, really.

~~~
Vinnl
That's a good list, but OpenStreetMaps should really be added to it :)

~~~
ris
I'm an old-time OSMer, but really I have had zero use for any kind of map in
the last 2 months.

------
jedberg
The more interesting thing here is the "controlled digital lending" test case
that will almost certainly come of all of this.

If they somehow prevail on that aspect and set a legal precedent there, that
would resolve a lot of issues for a lot of companies.

For example, Netflix could stream every movie ever, as long as they bought
enough DVDs for peak demand. Or those companies that would set up 1000 TV
antennas in a datacenter and then stream the signal.

I just hope the IA doesn't get CDL shot down because of this and ruin it for
everyone.

~~~
wriq
"Or those companies that would set up 1000 TV antennas in a datacenter and
then stream"

I believe Aereo tried just that and failed -
[https://www.npr.org/sections/thetwo-
way/2014/06/25/325488386...](https://www.npr.org/sections/thetwo-
way/2014/06/25/325488386/tech-firm-aereo-performs-an-illegal-service-supreme-
court-says)

~~~
JoshTriplett
They didn't fail, they were sued out of business. They had a good, innovative
product that people wanted, and they got killed by a lawsuit. This is the
tradeoff we're making when we grant more exclusive privileges over works: we
kill innovations like these, and we get...probably not any more creative works
than we would have gotten otherwise.

~~~
ghaff
>They had a good, innovative product

They had a product that involved giving away other people's stuff. Yes,
copyright terms are too long. But I'm pretty sure that a vanishingly small
amount of that stuff they were giving away would be out of copyright even with
a copyright term of 20 years.

~~~
TaylorAlexander
Well I really take issue with the idea that IP is "other people's stuff". I
might say they were giving away copies of data that the government had granted
others a monopoly on. I am being intentionally pedantic because copyright is
much more harmful than people give it credit for. The internet archive is
desperately trying to give people free books and others are trying to stop
them.

~~~
rgbrenner
Are you planning to make all your code on GitHub public domain? Or are you
just referring to other people’s copyrights?

~~~
TaylorAlexander
Yes. All of the writing on my website is CC0 licensed (public domain). The
robot I have been designing for the last 2.5 years is CC0 licensed. I design
PCBs for fun and license them CC0. I have been licensing my youtube videos as
CC0, and I just started a new youtube channel for my own CC0 licensed 4k
videos of nature. Most of the content on my github is already licensed with
some kind of permissive open license either CC0 or MIT/BSD licensed, though
some of my older work is licensed GPL.

Intellectual property restrictions raise costs and keep people poor. 3D
printers were $50,000+ until the patents expired, and now you can get a decent
printer for $300. And books of course could be distributed for free. Every
person on earth could be born in to great wealth if we simply allowed it.

Of course I do not advocate that we take income from hard working artists.
Instead I advocate for a world where we make living so cheap that artists need
little in the way of income for survival. Reducing intellectual property
restrictions is one important factor in lowering the cost of living for all,
and would make it easier for billions to benefit from the technologies people
like me in the USA enjoy.

Supporting copyright means supporting the idea that we do not build a
comprehensive library of all books accessible to all people. In contrast to
the idea that every child born on earth should have access to a complete
library of the written word, intellectual property advocates push for the
impoverishment of the billions on earth who cannot pay tithing to book
publishers and movie studios. It's quite the bold position to take.

~~~
salawat
Great people deserve great recognition. Thank you.

May your works act as a seed to a tree whose shade you may one day enjoy.

~~~
TaylorAlexander
Thanks! As long as the seeds grow, someone will enjoy the result. :)

------
slim
I stand behind Internet Archive on this. It was the right opportunity to try
and push our freedoms forward. Internet Archive is political by design, it's
very existence is merely tolerated by the Intellectual Property Feudalists. It
was the good fight to fight.

~~~
nix23
Same here! That is a digital lending system like every other public-library
does it...

~~~
dhosek
No, it really isn't. Library contracts for digital lending vary based on
publisher and service provider, but aside from some very small publishers (if
that), there is no publisher that allows for unlimited unmetered lending of
e-books and certainly not for the price of a print book. Note also that
physical books face physical wear and usually have to be retired after 25–40
(at most) lends.

~~~
nix23
>allows for unlimited unmetered lending of e-books

It is metered, you can lend max 5 book for max of 1 month.

>Note also that physical books face physical wear and usually have to be
retired after 25–40 (at most) lends

Is that something positive? Should we go back to DVD because publishers would
love that, or is streaming still ok?

~~~
dhosek
It's an observation that points out that physical books are not being lent out
hundred or thousands of times so the physical book argument (or the digitized
physical book argument) don't hold water.

~~~
nix23
I never said anything about physical book's, many public-library have already
e-books, like research-papers and so on.

~~~
dhosek
Sorry, lots of commentary. The IA lending is premised on the idea that a
physical book can be lent an unlimited number of times. Public library lending
of e-books is premised on limited number of lends with most lends (Libby/Media
on Demand being the most common of these, but there are other programs as well
that work the same such as 3M's digital library offering). With Hoopla,
there's a per-lend charge that takes place.

------
swiley
At least the internet archive usually follows the law. When IA is gone
everyone will just go to libgen which actively opposes copyright. Not only
that but a bunch of history will be lost and source links on Wikipedia will
break.

IMO: most publishers are worthless anyway. I feel bad for authors but it’s
hard to sympathize with publishers.

------
toss1
I had not realized that the Internet Archive had 'expanded' into archiving &
re-publishing books

While that endeavor certainly has value, the risks - exactly these risks - are
so large & obvious that it really should have been done as a separate
corporate organization. That is what the corporate structure is for - to
separate liability.

Now, the entire, and very valuable, core mission is threatened by this one
project.

~~~
sp332
They already had 20,000,000 books (yes that's a real number) available. The
ones with Controlled Digital Lending allowed a single DRM-encrusted copy to be
loaned out for each physical copy that they owned. This isn't explicitly
allowed by any law but publishers hadn't challenged its claim of fair use and
a lot of libraries use this technique. What was new with the NEL was just
removing the one-copy-at-a-time limit. But the lawsuit is challenging the
whole concept, including what other libraries are doing.

------
pixxel
> The Internet Archive removed waitlists for books in its "National Emergency
> Library" so that multiple readers could simultaneously download the same
> digital copy.

>Four major book publishers have responded by suing the Internet Archive. If
successful, they could bankrupt the nonprofit.

>The publishers take issue not only with the National Emergency Library, but
with the Internet Archive as a whole.

~~~
dehrmann
I'm not sure why they thought this would be OK without getting permission from
rights holders.

~~~
themodelplumber
It seems more like they thought it was worth a try, for prospective reasons.
From that perspective I find it hard to fault them based on the potential
reward ratio for opening such knowledge, much of it no longer in stores or
libraries at all, to all of society.

The Internet Archive is broadly improving on the library concept by leveraging
their technology to push for innovation in intersecting areas of law and
culture that are far from settled. While traveling this course, they have made
day-to-day life and future prospects better for ordinary people who lack a
voice in this domain in the first place. For this bold approach among other
things they have what meager donation money I can give.

~~~
AmericanChopper
This reward ratio concept that you invented to frame this only accounts for
one side of the transaction. For starters, to give something away for free,
you have to own it first. There no moral high ground in deciding to give away
other people’s things. But if you want to only focus on the potential for
public good, then where would you put the “reward ratio” for the authors who’s
work was given away by the internet archive? They struggle enough as it is to
make money without a global financial crisis, and without large organisations
taking it upon themselves to give their content away. What’s the “reward
ratio” for all the book retails going out of business at the moment, and all
the staff they’re laying off? Even if you hate publishers, and for some reason
think they’re not entitled to property rights, what’s the “reward ratio” for
their employees, or the people who’s retirements are invested in their
publicly traded companies?

It’s easy to sympathise with the motivation behind the decision, but really
their decision was to summarily strip people of their rights, and the payoff
for their hard work. It wasn’t a noble decision, because they didn’t own what
they were giving away, and an attempt to rationalize the deprivation it
contributed to is really just sad. It was truly a monumentally stupid decision
on their part, and they’ve jeopardized their entire mission because of it.

~~~
matheusmoreira
> There no moral high ground in deciding to give away other people’s things.

Copyright does not have the moral high ground either. The fact is most books
should already have entered the public domain. However, the public was cheated
out of its rights due to ever increasing copyright durations. Why should
authors and publishers retain monopoly rights to works _for over a hundred
years?_ It should be a few years at most and that's being exceptionally
generous.

There's absolutely nothing moral about copyright. When properly constrained to
a reasonable duration, it could be considered a necessary evil _at best_. In
its current form, it is equivalent to rent seeking and should be straight up
abolished.

~~~
AmericanChopper
Copyright is neither moral nor immoral. It simply gives people who create
content the ability to exercise certain property rights over that content for
a certain period of time. There would be nothing immoral about abolishing
copyright, but if you did then you could be sure that a lot less people would
be willing to spend years writing the books you enjoy reading, or millions of
dollars creating the TV shows and movies you enjoy watching.

Whether or not you think the existing copyright laws have exceeded their
purpose is a completely different discussion, and one that is not at all
related to this particular topic at all. Because the Internet Archive did not
only violate copyright for content benefiting from whatever your opinion of
excessive copyright is, they violated it for all the content they had.

------
RcouF1uZ4gsC
> In doing so, it essentially allowed for a single copy of a book to be
> downloaded an infinite number of times.

The publishers response to doing this was completely predictable. Maybe the
leadership needs more diversity of background. Having someone who has been in
the publishing business in that meeting when this was deemed a good idea maybe
could have prevented this disaster.

~~~
dehrmann
I used to work for Spotify, and pretty much everything we did with music,
lyrics, or album art had to go through legal, and projects often had to get
label approval. This is just how you do things when you're working with IP.

It is a bit surprising that a non-profit dedicated to archiving knowledge
doesn't have more experience with IP law. Not even lawyer-level, just enough
to know when CC general counsel.

~~~
zucker42
[https://techcrunch.com/2018/12/20/spotify-settles-
the-1-6b-c...](https://techcrunch.com/2018/12/20/spotify-settles-
the-1-6b-copyright-lawsuit-filed-by-music-publisher-wixen/)

> The complaint had alleged that “Spotify brazenly disregards United States
> Copyright law and has committed willful, ongoing copyright infringement,” it
> said.

Note that I'm not saying Spotify did anything wrong in this case, but it does
show two things.

1\. Big companies can manipulate the system better than nonprofits or
individuals.

2\. Publishers sue to exploit the unreasonable monopoly the law gives them.

~~~
dehrmann
This is a _much_ better article on the Wixen lawsuit (be careful when you come
across iTunes it it; it might refer to downloads, not streaming):
[https://www.theverge.com/2018/3/14/17117160/spotify-
mechanic...](https://www.theverge.com/2018/3/14/17117160/spotify-mechanical-
license-copyright-wixen-explainer)

IIRC, the problem was that Spotify didn't know who to pay because there isn't
a list of song to song writer, unlike song to label. Apple Music had the same
problem.

------
sxates
I've got a synology NAS with at least 10TB spare capacity I could donate... If
only there were a way to use it, and thousands more, to back up important
projects like this in a decentralized way.

~~~
Cthulhu_
I keep thinking that the Bittorrent protocol would be an ideal fit for this;
specifically, it would need a way to have all peers agree on what chunks of a
file they themselves host and share to ensure the most complete copies
available. And it would need a system for keeping it updated.

I wouldn't mind hosting a NAS in my closet and some bandwidth + storage to
donate to the IA.

I also think cloud providers - all of them - have the capacity to host backups
of the IA; I'd prefer if companies like Amazon, Google would donate their
storage and bandwidth capacity to the IA instead of just money.

~~~
rbanffy
Bittorrent would require the totality of the archive to be preserved in
multiple copies. I can't have a copy of that ledger here at home. We'd need a
distributed highly redundant filesystem for that.

~~~
toomuchtodo
44 million items in the Internet Archive are available as torrents. It's
pretty straightforward to enumerate them, download them, and serve them up in
a swarm.

~~~
rbanffy
That doesn't make it sure the torrented data is always available. The index is
important, but the data and its metadata are much more important.

~~~
toomuchtodo
Right, which is why you want to start getting the data served by seeders
outside of the Internet Archive proper.

The Pirate Bay is all magnet links. A model to be considered.

~~~
rbanffy
That still doesn't ensure all the data is available all the time.

A solution should be one where whoever hosts the data can't pick which pieces
they want to host. Otherwise content that's not popular will disappear.

------
musicale
> Controlled digital lending is a legal framework, developed by copyright
> experts, where one reader at a time can read a digitized copy of a legally
> owned library book.

And digital downloading is a communication framework, developed by technology
experts, where an unlimited number of readers can read the same digital
library book at the same time.

~~~
arpa
One of the benefits of the press was decreased price of a copy and increased
availability of printed material. Look where it got us. Digital medium is no
different from the press, but of course, papyrus industry is going to fight
it, because profits are more important, always.

------
Santosh83
As for saving their data, well, about 60 million people need to donate 1 Gb of
their hard drive space, or just 6 million people, 10 Gb of their HDDs.

Or more soberingly, any one of the world's millions of multi-millionaires
could write a single cheque to back up all their info, but will anyone do so?
Probably not, as collective resources and knowledge are of no benefit to them,
indeed even detrimental.

~~~
andromeduck
If I did the math right, it actually only costs about 72k per year to archive
6PB on Google cloud. Thats well within the range of one Googler's annual
discretionary income.

~~~
mkl
I'm not sure that's right. It's 60PB [1], not 6PB. According to
[https://cloud.google.com/storage/pricing](https://cloud.google.com/storage/pricing)
Google's cheapest (?) non-redundant archive storage seems to be USD
$0.0012/GB/month. 60 PB = 60e6 GB, so 1 year would cost 12 _60e6_.0012 = 864e3
= USD $864 thousand. That's not counting the cost of getting the data in or
getting any of it out to use.

[1]
[https://news.ycombinator.com/item?id=23485594](https://news.ycombinator.com/item?id=23485594)

~~~
andromeduck
Derp! Yeah I guess that's still reasonable for a few dozen donors to pick up.
I hope the data survives for posterity.

------
sergeykish
> Libraries pay three-to-five times more than retail price for eBook access.
> If an individual is charged $15 for an eBook license, a library often pays
> $50 or even $84 for one license.

Why? I should be able to donate book to the library. Is it possible with
eBook?

~~~
zozbot234
Because an eBook is not at all like a physical book. It's more like a
_service_ that interacts with DRM to provide an illusion of physicality.
Libraries should be enabled to digitally lend their physical holdings and not
have to rely on these eBook licenses, and CDL is a great model to that effect.

~~~
matheusmoreira
> Libraries should be enabled to digitally lend their physical holdings and
> not have to rely on these eBook licenses, and CDL is a great model to that
> effect.

Why must we come up with increasingly complicated ways to hide the fact
unlimited copies are available? The right thing to do is to abolish copyright.
It's time to stop pretending copyright makes sense in the 21st century.

~~~
manigandham
Just because the marginal cost of a copy is zero does not mean there's no
value in the information, and it's perfectly reasonable to gate access to that
value.

Copying a video file is free too but I'm sure you wouldn't argue that anyone
can watch a movie for free just because someone else made it. People have a
right to own and control the distribution of their works that they invested in
creating.

~~~
matheusmoreira
> it's perfectly reasonable to gate access to that value

It's also perfectly reasonable to distribute that information widely and
without limits. The fact information is valuable to someone doesn't make it
scarce. The harsh reality that creators need to face is that only the first
copy need be paid for.

> I'm sure you wouldn't argue that anyone can watch a movie for free just
> because someone else made it

I would. Instead of charging money for copies of a movie, film makers need to
figure out how to get paid _before_ the movie is made. Creation must act like
an investment, not a product. Maybe the answer is crowdfunding? Whatever it
is, it needs to pay the creators _before_ they start working so that the final
result can be released into the public domain immediately.

> People have a right to own and control the distribution of their works that
> they invested in creating.

That's nothing but an illusion. Once the information is out there, it can no
longer be controlled. People will copy it, distribute it, edit it, create
derivative works, memes... And there's next to nothing creators can do to stop
it. The work becomes part of mankind's culture. People infringe copyright
every day without even realizing it.

"Creators have the right to control..." sounds like a neat idea on paper but
it completely breaks down when put into practice. When authors try to
"exercise control over their content", we end up with websites which disable
right click and create annoying pop ups when we try to copy paste. It's
completely ineffective and serves only to annoy people.

The only way to control information is to control _all_ the computers that
process it. Currently, it's impossible but not for lack of trying. In order to
prevent infringement, the copyright industry is prepared and willing to
sacrifice computing freedom: their ultimate goal is to prevent us from running
"unauthorized" software. Programs that do subversive things like copy movies
or play movies without checking for a valid license first would not be signed
by the authorities and the processor would then refuse to execute such code.
Therefore, the copyright industry is an existential threat to hackers and the
free and open source software community. I'd rather sacrifice the entire
copyright industry than computing freedom.

~~~
ericathegreat
Why would someone pay in advance for something that they will get for free at
the same time as everything else? Fundamental limitation of capitalism is that
one of its goals is to acquire the maximum amount of value for yourself, while
losing the minimum amount. Even the most successful Patreon users rely on
"Patreon exclusive content" for their supporters to be able to make some kind
of livable money.

~~~
zozbot234
> Why would someone pay in advance for something that they will get for free
> at the same time as everything else?

Because the content wouldn't get created unless they _do_ pay. That's why
worthwhile Kickstarter projects tend to reach their funding threshold, even
when they're to be released for no added cost.

------
jroseman93
This wasn't the smartest move on behalf of the Internet Archive. One could
argue they were pushing boundaries to get courts to rule in their favor but in
the process they are putting the whole project in jeopardy. Taking current
copyrighted works and just giving them away en-mass is obviously not going to
be seen favorably by many.

------
notatoad
Good. the internet archive does important work, and i'd hate to see a stupid
idea like this put their larger mission at risk. i'm not sure how they ever
thought they could get away with giving away other people's property, just
because pandemic.

------
mikequinlan
I would love to support the Internet Archive but the people behind it
deliberately took steps that were guaranteed to bankrupt them once the
publishers filed their (inevitable) lawsuit. Why would I support an
organization that is determined to commit suicide?

~~~
Scientificx
> I would love to support the Internet Archive but the people behind it
> deliberately took steps that were guaranteed to bankrupt them once the
> publishers filed their (inevitable) lawsuit. Why would I support an
> organization that is determined to commit suicide?

Because they hold a part of our digital history.

~~~
shard
You make it sound like this is the tech version of "too big too fail".

~~~
polytely
Well it's the only place of its kind, it's more like it's too important to
allow to be destroyed. Because you know that these copyright holder do not
actually care about history or culture. Look what happened to what.cd, they
will burn down everything to protect their profits

~~~
s1artibartfast
Maybe someone can buy their assets when they go trough bankruptcy

~~~
vulcan01
The fear is that the "someone" will be the selfsame publisher companies.

~~~
s1artibartfast
Why would they want 30 years of archived websites?

~~~
vulcan01
To prevent others from buying it, causing another lawsuit?

~~~
s1artibartfast
Why would there be another lawsuit? Book publishers dont care about archived
websites. They care about IAs side activity of digitizing books and ignoring
copyright. I can see them bankrupting the IA to stop their copyright
infringement, but I don't think they have an interest is stopping the sale of
the archive. They may not even have a legal ability to stop a sale during
bankruptcy.

------
CaliforniaKarl
I wish that the Internet Archive had tried to work around this by reaching out
to public libraries, and having those public libraries agree to "loan" their
still-on-the-shelf books to the Internet Archive. That might not cover all the
copies that were simultaneously checked out by people (I don't know the
numbers involved), but it would have helped.

------
aurizon
Remember (the few that are over ~~600 years old might??:) ) Gutenberg invented
movable type and a publication explosion followed. Heretofore books were hand
copied, with errors etc. A medieval ~Xerox room was full of scribes - hard at
work. A Xerox salesman who promised to double the speed of your copying - he
walked in a hulking slave with a whip - all the monks visibly and hurriedly
sped up. Back to Gutenberg, [https://www.livescience.com/2569-gutenberg-
changed-world.htm...](https://www.livescience.com/2569-gutenberg-changed-
world.html) the Authors fought printing presses, they fought even lending
libraries. Books were often chained to the shelves to limit access as well as
theft. [https://www.amusingplanet.com/2015/04/the-last-surviving-
cha...](https://www.amusingplanet.com/2015/04/the-last-surviving-chained-
libraries.html)

So here we are now. Progress is hampered by the old farts, the author's
guilds, the Enslaviers,(intentional typo on Elsevier) who want us to be in
permanent economic thralldom to them. They are mere pebbls in the rivers of
progress, so we pay them to go away, or break them up. MIT has the right idea.
I wish the Nobel Committee would announce that they will only consider openly
published knowledge for future prizes. I wish all governmental other funders
of research would mandate open publication. I wish all past published work was
declared open NOW!!

------
headalgorithm
See announcement: [https://blog.archive.org/2020/06/10/temporary-national-
emerg...](https://blog.archive.org/2020/06/10/temporary-national-emergency-
library-to-close-2-weeks-early-returning-to-traditional-controlled-digital-
lending/)

------
mcguire
" _Many open-Internet activists have been discussing how to back up the
archive and make it more resilient for years. The temptation would be to
employ a distributed system, such as a blockchain, that would be censorship-
resistant and couldn’t be legally shut down._ "

A blockchain? Oy, vey.

~~~
rodiger
Enjoy the multi-petabyte full node :)

------
csense
Is there any easy way to download large chunks of the Internet Archive? Maybe
IPFS mirror?

~~~
Maxious
"Let's Say You Wanted to Back Up The Internet Archive" by Jason Scott
[https://www.reddit.com/r/DataHoarder/comments/h02jl4/lets_sa...](https://www.reddit.com/r/DataHoarder/comments/h02jl4/lets_say_you_wanted_to_back_up_the_internet/)

~~~
fouc
Very interesting discussion here. Apparently can get 16TB drives at $100 each
on enterprise bulk pricing. An entire backup would be 3,125 drives for
$312,500.

But apparently the wayback machine itself is only about 2 Petabytes.. so if
you don't need the collections perhaps only 125 drives needed, or $12,500.

~~~
jonah-archive
Disclaimer: I run the infrastructure/ops team at the Internet Archive.

Unfortunately none of those numbers are really even close to correct (the
discussion is always fun, but the folks in r/datahoarder are often not
correctly informed. textfiles has more patience for it than I do). It would
probably cost around 1.5M in drives, even at reasonable current enterprise
volume pricing, to back up the 60+ PB of unique data in the Internet Archive
(plus, as someone does note in that thread, the cost of running them -- even
if it were a static backup to cold disks, you still need chassis to run them
in for the backup process, space and infra for them, electricity, people, &c).
I don't know offhand how much space the contents of the Wayback currently take
up, but it's definitely an order of magnitude more than that number as well.

~~~
DaiPlusPlus
Does that 1.5m figure include redundancy (e.g. RAID)? What about if the data
is compressed (either trivial gzipping each "file" the archive has, or using a
compression window that spans multiple files)? I imagine the HTML/plaintext
content of the Internet Archive would compress very well.

~~~
jonah-archive
No, that's for a single raw copy (it could go lower based on implementation --
the sweet spot on $/bit pricing is around 8TB/disk right now, but that would
actually be more expensive for us in total cost because of the increased
infra/space/power necessary to run them). We could probably get a relatively
trivial 20-30% savings on space for Wayback contents via compression, maybe
more with work (various projects are underway to do this), but much of the
rest of the contents are difficult to compress, or already compressed (music,
imagery for books, video, software archives, etc). We have also historically
been very reluctant to deduplicate heavily, though we are experimenting with
it for certain types of content -- one principle of operation is that as an
archive of last resort, we're unable to have a true deaccession plan as some
other archives have. A compromise we make is that our hard drives are
"landfill-ready" \-- that is, the contents of a drive (assuming you can read
the filesystem) are inherently meaningful, content is housed with its
metadata, and so forth. This produces some unusual restrictions on how far we
can take compression and certain types of bitwise redundancy.

~~~
jakeogh
By unique data, that excluding generated data? Can you please estimate the
space for just the wayback machine? It's the actual target.

What's the wayback machine with and without images? Is it possible that we
could distribute the ASCII/Unicode content now?

~~~
sp332
_It 's the actual target._

It may be your target, but I would not be so dismissive of the other data in
the Archive. The Software Library, the tens of millions of scanned books, the
music, etc etc. On top of that, the raw scrape data driving the Wayback
Machine is not currently made available to download from the archive. It's
stored in WARC files, which would include both the images and text of all
scrapes and would not be trivial to disentangle.

~~~
jakeogh
I'm not dismissing any other section of the archive. The the wayback machine
has the big red target on it. It's a snapshot of recent history, I have lost
track of how many times I have needed it for things that would otherwise be
memoryholed.

Please, consider making a subscription service for the warc files, let us pay
to get access to a query interface. archive.org could raise significant
defense funds.

------
slaymaker1907
I’m a big supporter of IA, but they really goofed on this one. I donated after
this incident but mostly because I don’t want this lawsuit to shut them down.

------
ascorbic
The publishers should stop now that they've taken them down, but this was
grossly irresponsible of the Internet Archive to do this, and they brought it
on themselves. Releasing all of these copyrighted books for free hurt authors
who rely on royalties to live, and it was inevitable that publishers would
sue. I honestly don't understand why they did it. They have a responsibility
as custodians of the archive to not put it at risk of destruction.

~~~
MrStonedOne
>I honestly don't understand why they did it.

Its right there in the article.

>In March, as the COVID-19 pandemic _led to the shutdown of public libraries_
, the Internet Archive created the National Emergency Library and temporarily
suspended book waitlists

Public libraries got shut down, they opened a replacement. Makes perfect sense
to me.

~~~
lordlimecat
>Public libraries got shut down, they opened a replacement

And they did it in the most blatantly illegal way possible, with a press
release that removed any doubt as to whether the infringement was willful.
Brilliant!

IA could have, for instance, reached out to local libraries to see if IA could
"use" their physical copies for proxies of the digital ones IA was loaning.
This would likely have been illegal, but far more palatable, justifiable, and
importantly not willful infringement.

~~~
shervinafshar
Not a IP lawyer here, but I speculate that when and if this lawsuit goes to
higher courts, the fact that IA was acting in favour of the public in a
national emergency would be in their favor.

~~~
Shivetya
That won't fly because it is just as easy to lay claim that ignoring copyright
is against the public because it devalues private property rights of which the
creator did not assign to the IA.

~~~
lordlimecat
The argument being put forward here is as if IA just broken into peoples
houses and redistributed 3m masks, and tried to justify it because they're
doing good.

~~~
Dylan16807
I agree! Many people are grossly exaggerating the harm so that they can scoff
at the justification of "doing good".

~~~
afastow
Fine, but the good has been exaggerated too. Some people who heard about this
were able to read a book or two for free during a time when everyone was stuck
in their house. That's a good gesture that I'm sure was appreciated, but they
didn't change anyone's life here.

And it was a gesture that was not theirs to make. Did they ever consider just
asking the publishers for permission? They might actually have gotten it.
Companies were quite eager to do things to show they were trying to help back
in March and April. But I've looked and found nothing saying they asked, so
I'm guessing they didn't. Probably because they thought asking would look bad
if they were told "no" and did it anyway.

------
omginternets
I wonder if there’s a turnkey(ish) way to crowdsource a back-up the archive on
IPFS.

IPFS seems ideally suited for two reasons:

\- it has decent censorship-resistant properties

\- content addressing is ideal for partial backups because individuals can
mirror as little (or as much) as they want.

I’m guessing the simplest approach would be to somehow get access to the
archive’s database? Is this something they’d be willing to consider?

~~~
erk__
There is probably not enough people on IPFS to store all of it in a safe way
that makes it always aviable. I think all the items on the page already have
torrents so it would likely be better to share them around.

~~~
omginternets
How many people are on IPFS?

------
jborichevskiy
I really don't know much about the intersection of IP and legal entities, but
in the worst case scenario would IA be able to move its web captured assets to
a separate company instead of tanking the whole thing? In other words, just
leave the "library" division to tank and take the heat while everything else
remains safe elsewhere?

------
kazinator
The Internet Archive should do the lawful thing and remove all the content
that the publishers want removed, to prevent the collateral damage to the
important resource for those who just want to see old versions of pages that
have changed or disappeared.

Leave the books and whatever to pirate torrent sites and just do the Wayback
Machine thing.

------
ComputerGuru
I warned about this very outcome when they announced the initiative [0]. The
Internet Archive is too big and too important to gamble on this short term of
a win (giving everyone free and unlimited access to all books for a few months
at most). Assuming the organization continues to exist after this ordeal,
their entire board should step down for putting at risk one of the most
invaluable archives in existence.

I’m an executive board member for a much smaller IRL non-profit and could
never imagine opening us up to such liabilities. I honestly cannot fathom how
this came to pass.

[0]:
[https://news.ycombinator.com/item?id=22732640](https://news.ycombinator.com/item?id=22732640)

~~~
AnthonyMouse
If you want to establish a legal precedent, you can't just go to a court and
ask them to. Courts only hear "cases and controversies" \-- to establish what
the law is, you have to get somebody to sue you.

The reason this could bankrupt them is mostly because they don't have a lot of
money. Their net assets are only a couple million dollars. They have to raise
more than that every year just to keep operating.

But it's also because the plaintiffs are vindictive. They know this is about
setting a precedent. They don't like the precedent, so they're out for blood.
They could have been civilized and only asked for an injunction.

~~~
nordsieck
> If you want to establish a legal precedent, you can't just go to a court and
> ask them to. Courts only hear "cases and controversies" \-- to establish
> what the law is, you have to get somebody to sue you.

> The reason this could bankrupt them is mostly because they don't have a lot
> of money. Their net assets are only a couple million dollars. They have to
> raise more than that every year just to keep operating.

Are you trying to claim that Internet Archive purposely tried to set a
precedent, knowing they don't have enough money to actually do it? That sounds
like a pretty damning accusation of gross incompetence.

~~~
saikia81
They can count on support from many. Maybe they have a plan.

~~~
nordsieck
> They can count on support from many. Maybe they have a plan.

I've looked at the complaint, and I've lived through the Napster trial. It
would be one thing if they had a good case and just needed funds to make it
through the trial. That's not what's going on here.

When you mix statutory copyright violation with digital technology, you get
infinite fines. Any money you donate is going to the book publishers at the
end of the day.

------
echelon
Could the Internet Archive donate its holdings to a new nonprofit? If the
servers were transferred to a new custodian and the anticipated bandwidth
charges were paid forward, then what would they have to lose?

~~~
jacquesm
Adding bankruptcy fraud to the list of items on the docket is likely not the
best way forward.

~~~
mindslight
The data stored on the disk drives is what the world values, but it actually
has very little economic value. Internet Archive should start selling off the
storage/server setup to a different nonprofit at _fair market value_
("Internet Cloud"), and renting continued access. If some eventual creditor
wants to claim that the specific data collection on the disks had a value that
was improperly sheltered from bankruptcy, they would be free to make a copy.

------
EamonnMR
I'd be disappointed if they end up dying on this particular hill.

------
downerending
Seems like a good week to be erasing history.

------
6510
Yeah, its only 10 PB, why don't we p2p a copy? 1000 people with 10 TB to spare
wouldn't work but 50 000 storing 100-400 GB each would be managable. 50 k pp
is ofc nothing and probably not even enough to get it done before everyone
gets bored. (you know who you are) Tubing it into the boxes would be the hard
part. Most wouldn't need to do much more than contribute bandwidth and temp
storage.

Why isn't file coin done?

~~~
kempbellt
The archive is closer to 60PB -
[https://news.ycombinator.com/item?id=23485594](https://news.ycombinator.com/item?id=23485594)

So, more like 300,000 people storing 200GB, which is quite a tall order to
fill.

~~~
myself248
> There are 237,000 members of this subreddit. > If each of us on average
> contributed 1TB (I know many people, myself included, would give a more than
> that for IA), we'd have 237PB, which feels like it's the right ballpark of
> raw storage to host 30PB in a reasonable, redundant, "not ideal but at least
> functional" manner.

[https://www.reddit.com/r/DataHoarder/comments/h02jl4/lets_sa...](https://www.reddit.com/r/DataHoarder/comments/h02jl4/lets_say_you_wanted_to_back_up_the_internet/ftk6p59/)

I'm small-potatoes for a datahoarder, but I could chip in a couple tens of TB
to a project like this. If only I knew what button to push to make it do the
thing.

------
mosburger
Sorry for my ignorance on this matter, but why do safe harbor provisions not
apply to the Internet Archive? Shouldn't they have been served a DMCA takedown
request bu the publishers, and then been in violation of those rules and paid
the applicable penalties had they not complied and removed the offending
content?

~~~
watwut
Safe harbor applies when other people upload books on your service and you are
taking it down as you go. It does not apply when you publish books on your
service.

------
zatel
I wonder if this rhymes with whatever happened to the library of Alexandria.

------
hnaccy
Not sure why IA thought it was smart to do unlimited lending and risk the
organization.

Did they just get caught up wanting to do something during virus stuff?

------
linuxhansl
"Hachette, HarperCollins, Penguin Random House, and Wiley [...] sued the
organization."

Noted. I'll spend my money elsewhere.

~~~
Ensorceled
Well, you were not spending money there anyway if you were downloading your
books from IA ...

~~~
shervinafshar
Ad hominem; Not everyone standing in support of IA in this case is even a user
of their emergency library.

~~~
Ensorceled
I guess it's an ad hominem, except the original comment made no argument, just
implied they were boycotting those publishers going forward. I made no
argument, just implied their boycott would probably be infective because they
probably weren't buying books anyway.

Ad hominem is a logical fallacy where the debaters character is attacked
rather than their point addressed. There is no argument being made here on
either side.

~~~
shervinafshar
I beg to differ. I see at least two main arguments here:

Argument 1: Since these publishers are suing IA (premise), I find it justified
for myself to boycott them (conclusion)

Argument 2: Well, you were not spending money there anyway (conclusion) if you
were downloading your books from IA (premise).

Why did I feel that Argument 2 was unsubstantiated and ad hominem? I asked
myself, how did the person making Argument 2 know that the OP...

(a) ...has used IA National Emergency Library at all? may be they are just
unhappy with these publishers bullying an NGO?

(b) ...has used IA NEM to download books from these particular publishers? may
be they are buying everything Penguin publishes, but enjoyed using NEM for
reading other books from other publishers?

Assuming things about the characters and actions of the person you are
discussing with and making your argument about those rather than the topic of
discussion, is a sign of a logical fallacy to me.

------
bosswipe
Losing the Wayback Machine alone as a free resource would be a tragedy of
bigger magnitude than the destruction of the Library of Alexandria. It must be
saved!

~~~
jariel
"would be a tragedy of bigger magnitude than the destruction of the Library of
Alexandria."

No, it would not even be close to that scale.

~~~
Ygg2
Well yes, the library was smaller.

~~~
readarticle
And it was hardly the only one around at the time, many other libraries in
many other Greek cities with Pergamum probably being the most notable.

~~~
kalium-xyz
The same can be said about the internet archive though.

~~~
drngdds
Can it? I don't know of any other public internet archive anywhere near the
scale of the Wayback Machine.

~~~
kalium-xyz
Its not an argument on the scale of the operation but rather if alternatives
exist. I am merely stating that burning down a library is not less of a loss
if an alternative exists, especially seeing that you'd need to find the new
archive and get awareness out about its new location.

------
jwilk
See also:
[https://news.ycombinator.com/item?id=23491229](https://news.ycombinator.com/item?id=23491229)

~~~
dang
That appears to mostly overlap with the current story, so we'll merge the
threads. Thanks!

------
ulfatufo
Hello

------
xhkkffbf
To appease publishers? How about so they can do right by the many authors out
there who are still able to sell books through Powell's, Amazon or other local
book sellers?

~~~
intopieces
When America fixes copyright to be life of the author only, I will agree with
your concerns.

~~~
miles
Life of the author seems potentially too short. What if an author is struck
down early? Shouldn't their family left behind be able to enjoy some of the
profits for a period of time?

~~~
JoshTriplett
Indeed. A fixed duration would be preferable for that and many other reasons.

~~~
votepaunchy
The constitutional requirement is for “limited times”, and current law is not.

