
Help Us Keep the Archive Free, Accessible, and Private - aaronbrethorst
https://blog.archive.org/2016/11/29/help-us-keep-the-archive-free-accessible-and-private/
======
bane
The Internet Archive is one of the crown jewels of the Internet. It's one of
the things that I feel we were promised in the early days of technology, and
it actually has managed to exist despite the massive commercialization of the
Internet. In many ways it's the future we were promised, and it's an infinite
pile of stuff so deep and wide that you could never buy another piece of
entertainment and survive almost entirely off of the holdings in the archive
and still not even scratch what's in there.

~~~
devoply
Pretty much. I think it's infinitely more important than anything else on the
internet. Yet probably the least used considering the amount of content
available. But if in a thousand years you want a record of what happened here
now, that's what you really really need. Everything else is superfluous.

------
flashman
The Internet Archive is a modern Library of Alexandria. The latter was
destroyed intentionally or accidentally, nobody knows for sure, but the point
is that we have the technology to ensure it doesn't happen again.

Jason Scott has more on the backup:
[http://ascii.textfiles.com/archives/5110](http://ascii.textfiles.com/archives/5110)

~~~
nandhp
Note that Jason Scott is part of Archive Team, which has its own project to
backup the Internet Archive:

[http://archiveteam.org/index.php?title=INTERNETARCHIVE.BAK](http://archiveteam.org/index.php?title=INTERNETARCHIVE.BAK)
[http://iabak.archiveteam.org/](http://iabak.archiveteam.org/)

~~~
flashman
If there was a simple all-in-one device you could buy to become part of IABAK,
I'd probably do it. Embedded firmware, off-the-shelf terabyte hard drive...
they could double or triple their userbase.

~~~
padraic7a
I wonder if the NextCloud Box could be adapted for this?
[https://nextcloud.com/box/](https://nextcloud.com/box/) It's basically a hard
drive in an enclosure, space for a raspberry pi 2 and some software on an sd
card [Snappy Ubuntu Core as OS, nextcloud etc installed].

~~~
pfg
I've thought about building an IA.BAK extension for typical NAS software like
Synology's DSM, QNAP's QTS, etc. Unfortunately, I've found that these
platforms don't come close to the developer friendliness we've come to expect
from app or extension stores - probably one of the reasons why there aren't
really all that many extensions available on those devices.

Anyway, I'm hoping that situation will improve - unused NAS space would be a
pretty big addition to IA.BAK, and you can't beat the UX of just installing an
extension on a device you might already own.

------
Lukas_Skywalker
Let's appreciate the comments section on the Archive page for a second. So
many kind people. What a beautiful corner of the internet.

~~~
welly
Went to the blog expecting a shit fight (I assumed your comment was
sarcastic), left pleasantly surprised.

------
pasbesoin
Things aren't going so well for me, this year. Not for the first year, in a
row.

I'm going to respond by making donations I've been deferring, such as to
archive.org .

Tomorrow...? Who knows?

Waiting doesn't work. I'm going to do what I can, now. Maybe it'll help, and
hopefully I'll feel a little better about myself.

My thanks to Jason and all, for every time I've found a resource I was seeking
mirrored and preserved, for my use and for posterity.

I'll add that a lot of "older" pages seem to -- still -- be more useful than
many newer ones. The archive isn't just about maintaining some record "for
posterity". It proves useful in current circumstances, daily.

And... No one should be able to make our history "go away."

~~~
Jaymoon85
No one _should_... but that doesn't mean the don't/can't.

[http://www.usnews.com/news/articles/2016-08-17/wayback-
machi...](http://www.usnews.com/news/articles/2016-08-17/wayback-machine-wont-
censor-archive-for-taste-director-says-after-olympics-article-scrubbed)

------
tetraodonpuffer
from the linked page

> so no one will ever be able to change the past just because there is no
> digital record of it. The Web needs a memory, the ability to look back.

I thought the issue is/was that site owners (original, or purchasing the
domain afterwards) could via a robots.txt remove their site from the archive?

or has this changed and now no matter what happens if the archive crawls a
site on a date it stays no matter if 10 years down the road somebody buys the
domain and decides to retroactively erase everything?

~~~
db48x
Adding a robots.txt file to a domain doesn't cause them to delete their
archive of it, only to hide it.

~~~
eljimmy
Which is one and the same to the public.

IMO, they need to stop applying robots.txt retroactively if they want to be
considered a valid archive.

~~~
ghaff
The problem is that the Internet Archive exists on legally shaky ground.
Neither they nor anyone else has a right to archive copyrighted web content
and display it to the public. They manage to continue doing so in part because
they're clearly non-commercial. They also manage to continue doing so because
they voluntarily respond to robots.txt, even retroactively.

Libraries/archives have no special exemption from copyright law, which is
actually a good thing, because otherwise libraries would presumably need to be
licensed in some way by the government to qualify for special treatment.

~~~
CM30
Why not look at WHOIS information when getting an update, and then class a
site as 'different' based on whether that changes? In most cases, a new domain
owner usually means the site isn't the same as the earlier versions.

You'd then just have to stop the archive indexing/showing content after the
WHOIS information changed, while leaving the stuff before it intact. Maybe
you'd then have a nice form to report pages you want removed/hidden (for the
edge cases), or even a seperate robots.txt/meta declaration you can make
confirming you're the same person that owns the site. After all, most of the
reasons why sites go missing aren't deliberate attempts to rewrite history,
but domain squatters not wanting holding pages indexed.

Feels like it'd be so easy to implement robots.txt in a more logical way on
the Internet Archive.

~~~
db48x
It's been suggested, but there's no way to automatically do it correctly. The
whois info might be anonymized, in which case a change means nothing at all.
It might just be someone's name and address, with no way of verifying who that
someone works for. Also meaningless. Better just to default to something safe,
and spend your manpower on something more important.

------
webaholic
I wonder if Wikimedia can do something to help the internet archive. They are
sitting on huge amounts of money and their goals are somewhat similar.

~~~
baby
I don't fully agree with this. I like that Archive.org exists, but I don't
really mind if most of the archive would come to disappear. There is a lot of
garbage being generated on the web and I really don't think there is sense in
saving it.

On the other side, Wikipedia is knowledge, pure knowledge, and this is worth
preserving in my opinion.

~~~
anigbrowl
I agree about the tide of garbage, but on the other hand garbage is intensely
interesting to historians. Think what one could do with a decent sentiment
analysis AI and billions of comments on news stories for example. By
themselves many of them are just nonsensical ranting for one or other
political viewpoint, but in the aggregate you could probably identify
significant historical tipping points that inflected much earlier than
'official' indicators.

~~~
ghaff
This comment from a long-ago article [1] about saving many Usenet postings
that would otherwise have been lost applies:

That’s why not only the very earliest Usenet posts, before Spencer started
archiving in 1981 (Usenet began in 1979) but even some of the posts in the
1980s are still lost. It’s too bad; today, wouldn’t more of us rather see what
was being said about abortion in 1984 than sift through the arcana of bug
fixes in systems that have probably been long since retired? “It was perfectly
reasonable from the viewpoint of stuff that we might want to use again, but a
little sad from today’s viewpoint,” Spencer admits.

[1]
[http://www.salon.com/2002/01/08/saving_usenet/](http://www.salon.com/2002/01/08/saving_usenet/)
A great read BTW.

------
dirkg
I wish one of the ultra rich people who are in tech would donate a huge amount
and keep essential projects like this alive.

------
qwertyuiop924
The Archive is absolutely a worthy cause. Most people know the WayBack machine
(although I wonder how many know where the name comes from), but that's not
all the Archive's got. Music, Audio, Video, so much incredible content.

And that's not to mention their software library. Sketch (Jason Scott) seems
to be the driving force behind it. As much as it's backed by ugly hacks
(emulators compiled to JS. Yuck) it's pretty magical to be able to boot up,
say, Fantasy World Dizzy in a web browser, and just play it, no install
required.

~~~
textfiles
It's hard to reconcile "ugly hack" and "magical".

All magic is ugly hacks!

~~~
qwertyuiop924
Fair enough.

Anyways, thanks for uploading Fantasy World Dizzy. Now I can experience the
horror the same way that the children of the '80s and '90s did.

------
mark_l_watson
Internet Archive is great, both for the fun of having frequent copies of my
web site going back almost 20 years, and it is important for preserving
digital history.

Fortunately storage and bandwidth costs will keep decreasing so more replicas
can be built over time. I just made a contribution.

BTW, I was in their building in SF in June for the Decentralized Web
conference - a fantastic location, and I recommend that you visit.

~~~
dajohnson89
Have they recovered fully from the fire?

~~~
textfiles
Jason Scott, Internet Archive. We've recovered from the fire in terms of book
scanning and operations. We have not rebuilt on the spot where the building
that burned down was - several possibilities have been floated but nothing has
happened in that direction. And of course there were some nice awards and
mementos in that building that are gone. But on the whole, we're good
regarding that. People were kind and we worked hard to replace the lost
resources.

~~~
dajohnson89
That's really good to hear.

On a side note, I love textfiles.com. thanks for providing such a cool and
important resource.

------
int_19h
Canada seems a weird choice for this sort of thing, if freedom of speech is a
worry, given stuff like this:

[https://en.wikipedia.org/wiki/Canadian_Human_Rights_Commissi...](https://en.wikipedia.org/wiki/Canadian_Human_Rights_Commission_free_speech_controversy)

~~~
mthoms
Care to expand on this? The wiki page describes some legislation that has
since been repealed and I can't even deduce what the legislation exactly said
from that article.

~~~
int_19h
The brief story is that Canada does not have free speech guarantees to the
same extent that US, for example, does. The Canadian Charter of Rights and
Freedoms says:

"2\. Everyone has the following fundamental freedoms:

(a) freedom of conscience and religion;

(b) freedom of thought, belief, opinion and expression, including freedom of
the press and other media of communication;

(c) freedom of peaceful assembly; and

(d) freedom of association."

So far, so good. But it also has a section, the so-called "limitations
clause", that states:

"The Canadian Charter of Rights and Freedoms guarantees the rights and
freedoms set out in it subject only to such reasonable limits prescribed by
law as can be demonstrably justified in a free and democratic society."

The Charter does not define what constitutes "reasonable" or "demonstrably
justified", so it was left up to the courts to rule on that. The current
interpretation is known as the Oakes test, and is actually fairly sensible.

However, the problem remains that this basically gives the government the
ability to restrict freedom of speech, if such restriction can be
"demonstrably justified". Consequently, for a long time, Canadian law
prohibited a fairly broad category of speech labeled as "hate speech", and
said prohibition was found by the courts to be consistent with the Charter.

It had also created a special tribunal to deal with the purported violations
of one of the laws in question (specifically, Section 13), which operated
under principles somewhat different from the regular court system. The article
I linked to was about that. You can read the law here:

[http://laws-lois.justice.gc.ca/eng/acts/H-6/section-13-20021...](http://laws-
lois.justice.gc.ca/eng/acts/H-6/section-13-20021231.html)

This particular law was, indeed, repealed by the Harper government. However,
it only dealt with Section 13 law. There are other laws in Canada that are
still in force that regulate "hate speech"; in particular:

[http://laws-lois.justice.gc.ca/eng/acts/C-46/section-319.htm...](http://laws-
lois.justice.gc.ca/eng/acts/C-46/section-319.html)

[http://laws-lois.justice.gc.ca/eng/acts/C-46/section-320.htm...](http://laws-
lois.justice.gc.ca/eng/acts/C-46/section-320.html)

Furthermore there's nothing precluding any future government from enacting a
law to restore Section 13 and reinstate the Commission - all it takes is a
simple majority in the legislature. Some people have called for the Trudeau
government to do just that, although it did not indicate the desire to do that
so far.

The other issue is that the Charter can be circumvented by both the federal
and the provincial governments by their use of the Notwithstanding Clause,
which is as follows:

"(1) Parliament or the legislature of a province may expressly declare in an
Act of Parliament or of the legislature, as the case may be, that the Act or a
provision thereof shall operate notwithstanding a provision included in
section 2 or sections 7 to 15.

(2) An Act or a provision of an Act in respect of which a declaration made
under this section is in effect shall have such operation as it would have but
for the provision of this Charter referred to in the declaration.

(3) A declaration made under subsection (1) shall cease to have effect five
years after it comes into force or on such earlier date as may be specified in
the declaration.

(4) Parliament or the legislature of a province may re-enact a declaration
made under subsection (1).

(5) Subsection (3) applies in respect of a re-enactment made under subsection
(4)."

In other words, the legislature can effectively limit _any_ fundamental
freedom (this is Section 2, the one that includes freedom of speech and
expression), and the only thing that they need to do so is 1) declare that
they're doing it, and 2) renew that declaration every 5 years.

So far, the only instance of the Notwithstanding Clause used to limit freedom
of speech that I'm aware of is its use by the legislature of Quebec in the 80s
to pass their language protection laws (that mandated use of French in certain
public signage etc). However, it could, in theory, also be used for "hate
speech" laws and other similar restrictions.

The general point is that, in terms of both actual and potential curtailment
of the freedom of speech, Canada offers far fewer guarantees than US does.
While the Trump administration has expressed some hostility towards the
concept of free speech already, actually acting out on it would put them on
the collision course with the Supreme Court and its currently standing
Brandenburg v. Ohio ruling interpreting the First Amendment, which provides
extremely broad free speech protections, far exceeding anything that Canada
has in the Charter, even ignoring the Notwithstanding Clause.

In terms of other countries that have laws and legal checks and balances
comparable to those in US, the only one that I happen to know of is Estonia.
But I'm sure there are others, it just needs researching. For something like
the Internet Archive, which is archiving materials that can be contentious, I
would expect legal freedom of speech to be a very strong consideration when
picking jurisdictions in which to operate.

~~~
mthoms
Thanks for the thoughtful response. I'd actually forgotten about the
notwithstanding clause entirely.

Still though, Canada's legislation seems no more restrictive than most other
(Non-U.S.) democracies. That's according to my quick read of
[https://en.wikipedia.org/wiki/Freedom_of_speech_by_country](https://en.wikipedia.org/wiki/Freedom_of_speech_by_country)
(so take it for what it's worth). I mean, surely there are some that are
marginally better but it doesn't seem like there are any obvious leaders here.
Maybe I'm missing something though.

Given that, I don't see how Canada would be a bad choice for a mirror.
Especially given the other distinct advantages. Physical proximity being an
obvious one (it's probably much more cost effective to build some servers pre-
loaded with data and drive them up versus almost any other option). Same time
zone, same language, and general political/social/economic stability are
probably also pretty key. And then there are other threat considerations (eg.
the Baltics being so close to Russia) that come into play.

~~~
int_19h
I mentioned Estonia before. So far as I know, their level of protection is the
same as in US - restricting speech requires imminent danger stemming from that
speech. So no political speech, no matter how hateful, can be restricted,
unless it is inciting imminent violence. It also has fairly lax libel laws,
which is also a benefit

Geographic proximity has both upsides and downsides - the downside is that
something that affects US is also more likely to affect Canada than any other
nation (except, perhaps, Mexico).

As far as threat consideration, you have a point there - but I think that
having a distributed network of server mirrors is part of mitigating any such
_sudden_ threats against any particular one. In a sense, something like a
Russian invasion can probably be treated similarly to, say, a possibility of a
major earthquake on the West Coast disrupting infrastructure.

But yes. I do see how Canada is probably the easiest to set up for someone in
US. If they just want something done _right now_ , as immediate mitigation,
and consider better options later, it makes sense.

------
SnowingXIV
What do you do about things you "don't" want backed up? Say old portfolio or
social media sites tied to your name? If you can wayback any site doesn't this
present some issues to sanitizing your online footprint?

~~~
ghaff
If you control the site, you can use robots.txt if you really want to. Though
I'd think carefully about if you _really_ want to do this.

If someone else owns the site or if it's a social media site, you'd have to
see what the site owner will do. There's probably not much you can do on your
own to prevent the site from being archived,

------
jmuguy
I've donated, and bought some stickers. You guys should get some cooler swag
:)

~~~
dajohnson89
I donated. It feels really good to help such a good cause, however little it
may be.

------
intopieces
I don't have a lot of money to give (the holidays are expensive), is there a
way to make a continuing contribution monthly? Is there a way I can volunteer
my time? I live in the bay area.

~~~
donohoe
Yes - they do have options for a monthly recurring donation. I just went for a
one-time donation, but it is definitely an option.

Link: [https://archive.org/donate/](https://archive.org/donate/)

------
pgl
Why is it important to keep it "Reader Private" (as in the article title? What
does that actually mean?

------
astrostl
I almost never use this service, but I'm happy to throw $5/month at it.

------
agumonkey
I prefer to donate to them rather than wikipedia these days.

------
aq3cn
Is there Internet Archive of YouTube?

~~~
textfiles
Jason Scott, Internet Archive. We do archive Youtube videos but not, you know,
every single one.

------
tmptmp
Internet archive is a great project. It has been allowed to be created, in the
first place. Then it was and is allowed to exist. This is what I like to about
the modern, freer, liberal western democratic nations.

Then there are the great people who spend their time, energy and resources to
make such things tick. A great thank you to all those philanthropic people
behind the Internet archive and similar such projects. It's because of you,
people like me have a hope to learn something significant and with a
relatively low cost footprint.

I learned many things thanks to FSF, GNU, Gutenberg, Wikipedia, Internet
archive and currently the scihub. I spent only about $10 per month for
internet access. Could I even imagine getting such highclass knowledge at such
a low cost? Not spent ridiculously high fees for college and still could learn
a lot in history, economics, and some things from science, math, technology,
engineering and many fields of knowledge. In fact, most of my significant
education happened on Internet, thanks to such projects.

I love the USA and the modern liberal western world who made such things
happen. Hats off.

Disclaimer: I am from a third world country. $10 p.m. was an expensive thing
for me for a large time.

PS: I hope to be able to contribute more to such projects soon. I do
contribute a rather insignificant amount as compared to the scale of things.

------
icantdrive55
They want to preserve the data.

How about this:

1\. Prgram an app that you asks the user if archive.org could store, say 1 gig
of encrypted data on your hard drive? It wouldn't be mandatory, but you could
help if desired. It would just sit on your hard drive. That gig of data would
be changing on a regular basis. (Big data centers could offer to take in data.
Hell, they could have another tax right off at the end of the year.)

2\. After all the data has been distributed around the world; the data
transfer would start over again, but on different computers. In a short amount
of time you might have millions of computers with part of The Internet Archive
sitting idle on users hard drives. The end result is the users would be worker
bees; waiting for the queen to call them home. (In the end, you might have
1000 computers with the same block of data on their hard drive. Why because
computers don't last forever.)

3\. If we had a catastrophe, once the new Internet Archive was
repaired/restored; the data lying dormant on millions of hard drives would
come home to papa in a orderly manner.

4\. It would remind people of the importance of preserving history. It would
bring more attention to The Internet Archive. It would bring in a sence of
team. Why not try it until this 592c3 gets their donations?

5\. Yes--this is off the top of my head. I would need to put more thought into
it.

~~~
agumonkey
Distributed archiving feels like oral tradition. I'm smiling.

~~~
intopieces
How about a distributed, social archive of music, dedicated to high quality
and bit-level verification, with metadata and user discussion? With a ratio
economy that rewarded contributions?

French police recently raided the infrastructure of such a place, and now it's
gone. It was around for 8 years.

~~~
agumonkey
what.cd ? it seems like it wasn't that distributed. Also it was illegal; I
love music, it appears to me they were music lovers before pirates, am sure
there was a lot of rare and valuable content, but it was illegal, I can't
really be too sad if they're busted.

Who knows, maybe users will organize in a different way to make an more legal
repository of music.

~~~
icebraining
What's the legal distinction between what.cd and the Internet Archive?

~~~
agumonkey
What archive.org backs up is already public content ? not paid one. There are
exceptions (books and videos) but I assume they are negotiated with rights
owners. Did what.cd do this too ? I don't know how they operate, I only heard
about them last week.

~~~
icebraining
_What archive.org backs up is already public content ? not paid one._

Legally, that's irrelevant (except maybe for calculating damages). Publicly
available content is just as copyrighted, and paid content may be in the
public domain (e.g. printed copies of Oliver Twist).

Barring an explicit license, one can't copy any content on any website, except
for simply displaying it (there's an implicit license). And you certainly
can't re-distribute it.

 _There are exceptions (books and videos) but I assume they are negotiated
with rights owners._

Why do you assume that, when anyone can upload them?

[https://archive.org/details/HarryPotterAndTheCursedChild_201...](https://archive.org/details/HarryPotterAndTheCursedChild_201607)

~~~
agumonkey
I assume that because archive.org is a massive open public fucking website,
not a closeted circle like what.cd, requiring invitations to even log in
apparently.

I'll also assume that it's as easy to upload copyrighted material than it is
to remove them for the rights owner.

You're totally right about the license of publicly available content. I
handwaved over it, assuming that people still wouldn't mind backup by a tier
as long as it doesn't damage them (and again I'll assume archive.org accepts
removal when demanded... which I'm gonna check right now).

pse:
[https://archive.org/about/faqs.php#Rights](https://archive.org/about/faqs.php#Rights)

[https://archive.org/about/faqs.php#Movies](https://archive.org/about/faqs.php#Movies)
(search for Who owns the rights to these movies?)

------
slaveofallah93
I agree with the idea of what archive.org claims to be doing but it doesn't
seem right the way that they are going about it.

The exclusion specifically of any ISIS supporting articles and videos makes it
seem that archive.org is not truly interested in creating an archive for
future generations but is instead interested in creating an archive which
supports their political/religious beliefs.

Cataloging and archiving Islamic State videos doesn't mean that one endorses
their beliefs or supports the organization.

It's a shame that what could've been an organization for good has become a
islamaphobic political organization.

~~~
greglindahl
You're confusing archiving and public access. Most archives don't have public
access to all of their materials.

