
The People Behind the Wayback Machine - vinhnx
http://www.motherjones.com/media/2014/05/internet-archive-wayback-machine-brewster-kahle
======
Smerity
The team at the Internet Archive, responsible for Wayback Machine,
ArchiveTeam, the TV News Archive, and many other projects, are true gems. I
had the chance to meet Brewster and many others whilst last in San Francisco
and their passion is infectious. If anyone is interested, they have an open
lunch on Fridays[1] where you get to see the church, the tech, and meet the
team. Each team member and guest gives a few sentences about who they are and
what they do, which really gives you a feel to how much work is going on under
the covers.

I feel humanity will look back on this period -- lost diskettes, CDs, DVDs,
game consoles, Betamax, VHS -- and lament the comparative black hole of
information. Myspace history deleted with no warning, Justin.tv deleting
history with only 8 days warning[2], ... and a million more examples. The
Internet Archive are fighting that one bit at a time.

[1]:
[https://news.ycombinator.com/item?id=7826313](https://news.ycombinator.com/item?id=7826313)

[2]:
[https://twitter.com/internetarchive/status/31565557291982848...](https://twitter.com/internetarchive/status/315655572919828480)

~~~
voltagex_
#justouttv on EFNet has just been set up for the justin.tv effort.

------
sp332
A lot of people here are confusing the Archive Team with the Internet Archive.
They are not the same. IA is more polite, they always respect robots.txt and
they will sometimes remove data if you ask politely. AT are self-described
"rogue archivists" and their motto is "we are going to rescue your shit".

------
Paul12345534
I love the Wayback Machine, I wish they'd archive _all_ pages though even
those that don't wish to be archived.... keeping them away from public view
until copyrights expire someday.

~~~
ekr
I don't know about this, to me archiving everything seems like a gross
inefficiency. Most of the internet is spam and advertising, and of the rest,
less than 5% is actually useful information or knowledge.

Archiving books, scientific journals and the likes would seem much more
useful, but obviously you'd run into copyright issues.

~~~
6cxs2hd6
Agree that highest priority should go to the "serious" stuff. However the most
interesting part of a really old magazine or newspaper, for me, is the
advertising. For example an early 80s computer ad, or a 50s railroad or
airline ad. I find that stuff really fascinating, and it gives more of the
flavor of the era. It might have a surprising amount of value to a historian
or anthropologist.

------
hkmurakami
I love what the archive team does. I used their VM when the posterous backup
effort was happening last year, and today I sent a link to my friend's now
defunct posterous blog.

But I know that the owner of days posterous page had no intent on keeping the
page a going concern, and was happy to see it disposed of. In light of the
recent Google "right to be forgotten" ruling, will there come a day when the
right to be forgotten will extend to archive.org?

~~~
Springtime
>will there come a day when the right to be forgotten will extend to
archive.org?

Sites can at any time opt out of being archived via a robots.txt exclusion (IA
still keep their previous archives privately). However for public blogging
sites operated by a third-party that's another matter.

~~~
ewillbefull
The right to be forgotten is way different than a robots.txt file...

------
jpswade
Don't forget to donate:

[https://archive.org/donate/](https://archive.org/donate/)

------
pcocko
It sounds as if this project is the Hari Sheldon psychohistory. These
archivists are awesome!

------
butwhy
All this great content and their website is designed in a way that discourages
people to peruse it. They really need a re-design and "relaunch" of their
brand to flaunt the great things that they're doing.

~~~
weland
Unlike most of the web 2.0 world, there is substantially more value in their
content than in their design wizardry. It's a team with limited resources
whose can barely keep up with the information they archive and is doing an
impressive work, not at all devalued by the absence of some precious
yetanothercrap.js.

------
jshb
What's the difference between these folks and the Pirate Bay?

~~~
Crito
Among many many other differences, assuming your angle isn't to denigrate the
Archive team, one of the important differences is that TPB makes no garantees
about content availability. Information that you can find through TPB _(they
host magnet links, not content. magnet links can be used to find other who
host content)_ is only available as long as those who are interested in the
content are interested in hosting it. Conversely, the Archive people seek to
ensure that content remains available _even after_ everyone else seemingly
loses interest in it.

It's _something_ like the difference between your local used book store, and
the Library of Congress. Or maybe the difference between the display cases of
your local natural history museum, and the basement of the Smithsonian.

~~~
Springtime
That and they primarily archive public domain material and abandonware (apart
from their web archiving project). They really couldn't be more different.

~~~
morsch
I think simply disregarding the web archiving is a bit of a cop out. It's
interesting though that for the most part, nobody minds them redistributing
loads of copyrighted material. Here's some reasons that come to mind:

They web material was distributed for free in the first place. They're
redistributing ad-ware, not stuff behind a paywall. (The same can be said of
some TV shows and indeed I think TV show piracy if often met with a
comparatively cavalier attitude.)

It's used as a measure of last resort. If I want to read an article from
Wired, I'm going to try to find it on Wired -- or more likely, I'm going to
Google it and get a link to Wired, and not the archive. It's only when it's
unavailable from the original publisher or when I have specific historic
interest that I end up using the web archive. The result is that publishers
aren't denied their ad revenues as long as they host their material. Your
abandonware argument translates neatly to the web archiving efforts.

They're archiving. This gives them a touch of academia and altruism that's
casts them in a totally different light.

~~~
ghaff
And they also respect robots.txt.

None of these things in isolation necessarily makes what the IA does entirely
legit under current copyright law; they effectively operate in something of a
legal grey area. But add it all together and not many people are going to get
upset--especially given that they'll remove material if asked to do so.

There have been a few legal cases
[http://en.wikipedia.org/wiki/Wayback_Machine](http://en.wikipedia.org/wiki/Wayback_Machine)
but not many considering the scope of what they archive.

