
Can the Internet Be Archived? (2015) - edward
http://www.newyorker.com/magazine/2015/01/26/cobweb?mbid=social_twitter
======
vervas
The importance of archiving is huge taking in account the numerous times
libraries have been burnt, footage being lost and others parts of history
being buried into the ocean with shipwrecks.

But more importantly if one thinks of Orwell's 1984, were all printed records
were manipulated on a daily basis to reflect a different state of reality
while the events were changing, in the days of internet that is not just a
tedious sci-fi process but one that can be performed in a fairly efficient and
methodical manner to an extent that it cannot be told what was real or not.

~~~
aethertron
To guard against manipulation is why we should have our own personal archives.
Decentralisation. Also, cryptography to validate authorship. (Basically, I
still want Xanadu.)

~~~
EGreg
I personally think that HTTP should be gradually supplanted by things like
IPFS, all documents would be content-addressable and both accessing an
archiving would be done in a decentralized manner.

------
ry_ry
Should the internet be archived?

Seriously. It's a transient medium, somewhere between a beautifully
illustrated book and a shopping list scrawled on the back of a napkin in
crayon, recited down a megaphone by a drunk.

Let content live or die in its merit, rather than the desire to arbitrarily
preserve it.

~~~
pmoriarty
_" Should the internet be archived?"_

This would be a great question to ask researchers in archaeology,
anthropology, linguistics, history, etc.

My own impression is that the more data there is, the better for researchers.
They can always weed out what they don't need later, but there's no way to get
back information that has been destroyed or lost.

Also, in the past people have made some horrible choices regarding what's
valuable and what's not. For all sorts of reasons, from political to
psychological to social, they've done things like burn, discard, or destroy
texts and artwork we now consider valuable but they did not. Often, what was
considered valuable (like sacred texts) was not really as revealing about
their daily lives as things they did not value and discarded.

So I'd really think twice before we discard any records. If there's some kind
of serious pressing need to be selective (such as lack of space) that's one
thing. But if keeping it more would not be a huge burden, I'm much more of an
inclusionist than a deletionist.

~~~
mindcrime
_Also, in the past people have made some horrible choices regarding what 's
valuable and what's not. For all sorts of reasons, from political to
psychological to social, they've done things like burn, discard, or destroy
texts and artwork we now consider valuable but they did not._

Case in point:

[https://en.wikipedia.org/wiki/Doctor_Who_missing_episodes](https://en.wikipedia.org/wiki/Doctor_Who_missing_episodes)

~~~
Consultant32452
Don't forget the original moon landing recordings. How could anyone have
thought so little of those that we lost them?

~~~
tdburn
That's because we never really landed on the moon!

~~~
ant6n
I see what you did here: trying to start one of those flame wars that future
anthropologists will be as excited about as when they found that village
garbage!

~~~
awqrre
We need ultra-HD-3D footage anyways, let's land on the moon again.

------
delegate
Archived - maybe.

But it would be much more fun to _transcribe_ it.

You know, the way they used to copy books back in the days before the printing
press.

Lock yourself in a cave with some (hemp) oil candles, paper and an iPad and
... you'd be doing humanity a huge favor.

When, in your monumental quest, you come across this message, know that
somebody thought of this moment a long long time ago and hereby officially
thanks you for your effort.

~~~
DavidFlint
Possibly archiving the internet is, what transcribing was to speech and events
in old times.

------
jorgec
Years ago, somebody asked me if i could copy the internet in a 3 1/2 floppy.

~~~
hexane360
Well I sent an internet to my friend the other day and he just got it. How big
can it really be?

~~~
basicplus2
Isn't it a little black box with a red light on top? I'm sure I saw it on an
IT show...

------
ComputerGuru
The Washington Post has published a yearly list of the most challenging
schools in the nation since 2013. This year, they redirected all guys to the
old lists to the 2017 version. They block Google from caching their content,
and the internet archive caches only the front-end HTML and not the final DOM,
which means the list, populated via JS, is not in the IA's cache.

I needed to reference that list and spent hours scouring the internet for
individual mentions of awards and placements to piece together a partial view
of the results for one state for some of the years. It was horrible, but the
worst part of it was the realization that this is but a single example, that
the impermanence of the internet it's going to lead to a very sad loss of some
very important data that we will dearly regret in the years to come.

It's also no longer sufficient to cache text and HTML; sites like NYT and WaPo
have put massive work and countless man-hours into web apps that contain
valuable data that relies on the presence of a back end server to populate the
front-end, and rich JS apps to portray that data. It's going to be a
challenge.

------
jxramos
Let's submit HackerNews comments to the Library of Congress. Seriously!

~~~
xj9
IA is better! [https://archive.org/donate/](https://archive.org/donate/)

~~~
jxramos
touché, but my comment was precipitated not by any particular practicalities
but by the article's text: "Sites hosted by corporations tend to die with
their hosts.... Twitter is a rare case: it has arranged to archive all of its
tweets at the Library of Congress."

~~~
greglindahl
That was the intent, but I think it ended up being too big for the lack of
budget.

------
aethertron
I'm interested in the the possibility of individual internet archives. Web
browsers (or something like those) could automatically save all of a user's
own stuff (article comments, blog posts, tweets, emails) and everything
they're interested in (bookmarks, rss feeds). Users could connect p2p and make
bigger, more public archives.

------
Overtonwindow
(2015)

~~~
nervousvarun
Why are you repeating a section of the linked title?

~~~
eric_h
Likely it wasn't in the title when this poster posted it, 2 hours before you
replied.

It's common on HN for older stories to get posted and get upvoted to the front
page, and commenters will make a note in the comments that it is an older
story and should be marked as such in the hopes that the mods will notice and
add e.g. (2015) to the title.

The tersest way to do this is to simply post a top level comment with the year
the article was written, surrounded by parentheses.

~~~
nervousvarun
That makes perfect sense thank you.

------
nippples
Archiving is a great tool against those who like to employ memory holes as
strategy.

------
basicplus2
Don't forget to do two separate backups...

~~~
avuserow
The Archive Team (not affiliated with the Internet Archive) is working on it
(and you can help):
[http://archiveteam.org/index.php?title=INTERNETARCHIVE.BAK](http://archiveteam.org/index.php?title=INTERNETARCHIVE.BAK)

