
Raiders of the Lost Web - danso
http://www.theatlantic.com/technology/archive/2015/10/raiders-of-the-lost-web/409210/?single_page=true
======
vezzy-fnord

        “At this point, if you mean the web when Tim Berners-Lee
        invented it, right now that web does not exist,” Scott 
        said. “Not really. News organizations kill old articles, 
        YouTube’s old videos go away. And while the Archive and 
        other entities are saving—quote-unquote saving—these sites, 
        even those will go to new URLs. They won’t be in the same 
        place. You’ll have to search for them... There are success 
        stories. But meanwhile, silently, thousands of useful things
        are disappearing. As time goes on, I have even less and less
        hope for how long it will last.
    

I can't find any evidence of TimBL ever having any particularly grand
intentions regarding the WWW, especially not anything in the like of content
replication. In fact, if you read TimBL and Robert Caillau's 1990 proposal for
the "WorldWideWeb" [1], you will find it to be a textbook example of worse-is-
better with a limited scope, mostly a way of unifying CERN's documents from a
physical hierarchy into a minimum viable networked web of nodes. The project
objectives were indeed rather humble, and it overall appeared to be a way of
bringing the general gist of TimBL's earlier hypertext research with ENQUIRE
over the by-then established TCP/IP and DNS in the absolute quickest way
possible.

Compare and contrast with Ted Nelson and Xanadu.

[1] [http://www.w3.org/Proposal.html](http://www.w3.org/Proposal.html)

~~~
williamcotton
You make a good point by showing that worse-is-better is probably the way to
get there rather than trying to solve everything at once, but I bet when all
is said and done we have something that looks a lot closer to Xanadu than the
WorldWideWeb!

~~~
vezzy-fnord
_You make a good point by showing that worse-is-better is probably the way to
get there rather than trying to solve everything at once_

I did not make any such claim, and I do not advocate for any view resembling
that in the slightest.

 _I bet when all is said and done we have something that looks a lot closer to
Xanadu than the WorldWideWeb_

If you have read Nelson's writings (e.g. _Xanalogical Structure_ ), you will
know the two models are fundamentally irreconcilable.

~~~
williamcotton
Or perhaps the web has always been a broken model and in practice it has begun
to turn in to a hypertext systems more along the lines of Xanadu!

I see projects like Open Publish and IPFS as piecemeal attempts to build
Xanadu-like systems on top of the existing web infrastructure.

[https://github.com/blockai/openpublish](https://github.com/blockai/openpublish)

[https://github.com/ipfs/ipfs](https://github.com/ipfs/ipfs)

 _" The lesson to be learned from this is that it is often undesirable to go
for the right thing first. It is better to get half of the right thing
available so that it spreads like a virus. Once people are hooked on it, take
the time to improve it to 90% of the right thing."_ \- The Rise of "Worse is
Better"

I've always thought that Xanadu was the "right thing" and that the web was
"worse-is-better", and that's why the web was successful, but I'm sorry that I
put words in your mouth, I was just trying to be agreeable!

~~~
vezzy-fnord
IPFS looks good, but it's quite antithetical to Xanadu. It's more akin to a
Plan 9 or Gopher idea of networked file systems, but backed by content
addressable storage and a name service. Xanadu is the very anathema of that.

OpenPublish just looks like a simple insertion tool for a distributed public
ledger (i.e. the blockchain). There's not an actual "system" to it.

Worse-is-better is (ironically) hard to get right, because, as it turns out,
designing systems so that they can be properly evolved from 50% of the right
thing to 90% of the right thing is almost as difficult as building it
correctly in the first place.

(I don't think IPFS is even about hypertext or hypermedia per se, so much as
content distribution which then would be interpreted by whatever local reader
is relevant. I do not think it has any intrinsic understanding of hypertext.)

~~~
williamcotton
Open Publish has a lot more going on that just that single repo!

There's a whole suite of supporting services for interfacing with the
blockchain, parsing the blockchain, creating transactions.

[https://github.com/blockai](https://github.com/blockai)

One piece that we're using in production and hopefully publicly releasing soon
is the state engine that turns blockchain transactions in to a state of asset
ownership. We run our own public access state engine endpoint behind this
module so anyone can query about published documents or ownership here:

[https://github.com/blockai/openpublish-
state](https://github.com/blockai/openpublish-state)

We built another service called Bitstore, a content-addressable cloud storage
web service that uses Bitcoin public key infrastructure for authentication and
payment, and it pairs nicely.

[https://github.com/blockai/bitstore-
client](https://github.com/blockai/bitstore-client)

I'm on Freenode as "williamcotton". You can msg me and I'll give you a link to
our staging server to show you the full system.

As for how IPFS relates to Xanadu, IPFS is very much interested in the
"permanent web":

[https://neocities.org/permanent-web](https://neocities.org/permanent-web)

 _" Where links do not break as versions change; where documents may be
closely compared side by side and closely annotated; where it is possible to
see the origins of every quotation; and in which there is a valid copyright
system-- a literary, legal and business arrangement-- for frictionless, non-
negotiated quotation at any time and in any amount."_

[http://www.xanadu.com.au/ted/XUsurvey/xuDation.html](http://www.xanadu.com.au/ted/XUsurvey/xuDation.html)

Open Publish is an attempt to build a content-addressable copyright system
that pairs with the content-addressable delivery system that IPFS is building
with the hopes to create something along the lines of what Ted Nelson was
envisioning.

BTW, BitTorrent is another great example of a related content-addressable
delivery system.

Content-addressable systems are at the root of Xanadu-like systems and it
seems like the web is headed that way!

------
zeveb
> “The interesting thing is that, at that time [in the 1990s], it was easier
> to archive the web because everything was flat web pages,” said Alexander
> Rose, the executive director of the Long Now Foundation, an organization
> dedicated to establishing a framework for long-term thinking on a scale of
> 10,000 years. “So if you did save something, your chances of being able to
> see it and use it would be vastly better than if a company folded today,
> with deep back-ends of content-management systems like Drupal and Ruby and
> Django and all these things. The pages are not actual pages.”

This right here is something I wish all the folks building web-apps-to-
display-pages would think about. It's straightforward to archive HTTP
resources; it's not at all easy (nor, really, practical) to archive every API
response used in building these dynamic sites. Web apps are fine, for what
they are, but please, _please_ , for posterity's sake: don't use dynamic
content if you can use static content instead.

~~~
cableshaft
I used to be all gung-ho about dynamic websites for my personal projects (I
made a custom CMS for my site once), but then I started thinking about what
would happen if I disappeared and if I wanted to leave my projects for other
people to maintain in an easy way (assuming they even want to), and realized
static content is the way to go.

Now I try to do things client side and powered by data in JSON format instead
of creating relational databases and requiring web servers for everything. If
I can't copy it to a DVD and have it run in a browser without connecting to
the internet, I don't consider it to be good enough.

I do have a lot of flash games I need to figure out what to do with now that
it's being phased out, though :/.

~~~
digi_owl
As best i recall, the first time i ran into HTML was as a content index on a
demo CD attached to a magazine.

Basically each demo was presented with a screenshot, a text blurb, and a link
that would start the installer.

All of it static pages using frames.

------
danso
I would love it if news orgs agreed to at least contribute to Pulitzer.org and
put their winning entries online, even as PDF format...Just reading the brief
summaries of the winners, it seems like the proper nouns might differ, but the
best journalism back then is still as relevant today. There a few entries from
1986 that I've been trying to find for research purposes but are nowhere to be
found...it's hard not just to find the full text content, but sometimes the
actual headlines and bylines for the team projects:

[http://www.pulitzer.org/awards/1986](http://www.pulitzer.org/awards/1986)

I've thought about going to the library and getting things off of microfiche,
but the thought of having to deal with a takedown notice kind of makes me less
enthusiastic to try.

------
jccalhoun
Link rot is pervasive. A study found more than 70% of the URLs within the
Harvard Law Review and other journals, and 50% of the URLs found within United
States Supreme Court opinions, do not link to the originally cited
information.
[http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2329161](http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2329161)

A year or two ago I was doing a project about late 1990s gaming and so many of
those sites are gone and were around before archive.org was around. And even
if they are on archive.org it is often in a broken state. For example, there
were all these Real Audio interviews with people like John Carmack and John
Romero and they are just gone. (I actually tracked down one of the people that
owned the site in question and he said he had them on a computer in storage
and wanted to post them some day...)

------
muddi900
I think comic book scanners, especially those scanning long out-of-print work,
can be the perfect example of the trials and tribulations of digital
preservation:

-The works they are trying to preserve are copyrighted and still trademarked, making it illegal to make copies and distribute them, but a lot of the rights holder themselves see no value in preservation or even reprinting them.

-They use P2P technologies to distribute the scanned copies, since it does not require them to host their content on a server. However the tech they use primarily is DC++ a stopgap between Gnutella and BitTorrent, which has a pro that as long as people keep connecting to each other(even momentarily) the comics can be distributed. However it requires a network of 'hubs' to connect users, creating pseudo-centralization. The secondary tech is of course BitTorrent, which is far more widespread and quite easier to use than DC++, but as with every torrent, there is a huge attrition rates and a lot of preservation comics are sitting with very few or no seeds at all. Most of them have even lost their trackers, Demonoid being the biggest one.

-There is tertiary level of distribution; Cyber-locker sites. People upload these scans to zippyshare, tusfiles, Mega, etc. and post them on forums. Most of these forums have blocked crawlers in their robots.txt. The biggest source of these is the 4chan board /co/. 4chan threads have very short half-lives, doubly so for threads with these links, because the admins want to pay lip-service to copyright holders, and to be frank, most of the links posted there have no preservative value, but are really a source of pure piracy. 4chan does have third-party archives, but just last week I learned that the archive I used to use has folded. Most cyber lockers themselves don't have long life-spans, and Link-rot is more prevalent in their case.

-The fourth level of distribution is imageboards/chans themselves. Some user would 'storytime', post each jpeg in a scaned comic in sequence, a comic if they feel like it, but they too are subject to the imageboard thread half-life.

Despite the effort and complexity, DC++ remains the best way to discover old
comics lost to time, but one day maintaining a hub would be too expensive or
the imminent crackdown on IP infringement would shut them down.

This is just for American comics. French/Franco-belgian comics, a far more
bigger market, face similar problems. Same goes for Manga.

~~~
angersock
This is the frustrating thing, right?

 _We have the means and mechanisms to save all written work, for everyone, for
forever, for negligible cost, and yet we aren 't allowed to do it._

:(

------
PavlovsCat
Even in the short term, link rot can be so maddening. For example, does anyone
know where to find the demo described in this article?
[https://www.unrealengine.com/news/epic-games-releases-
epic-c...](https://www.unrealengine.com/news/epic-games-releases-epic-citadel-
on-the-web)

Nevermind the technical challenges, there seems to be hardly a culture of this
even mattering; other than for transferring page rank to the new location of
pages for SEO purposes :/

~~~
starshadowx2
It only took one click to find that demo from your link.

[https://www.unrealengine.com/showcase/epic-
citadel/](https://www.unrealengine.com/showcase/epic-citadel/)

~~~
PavlovsCat
I know that, but it links to an app, not the WebGL browser version that the
page I linked, and numerous articles all over the web, all are referring to.
The words "WebGL" don't even appear on the page.

~~~
technomalogical
[https://web.archive.org/web/20141024042516/https://www.unrea...](https://web.archive.org/web/20141024042516/https://www.unrealengine.com/news/epic-
games-releases-epic-citadel-on-the-web)

Has the article, but the link seems to be to different content (Tappy Bird,
not Citadel)

