
There’s a simple alternative to the current web - mgunes
http://hapgood.us/2014/08/14/the-web-is-broken-and-we-should-fix-it/
======
idlewords
Clearly I'm a biased observer, but I really think people should take steps to
archive stuff that is important to them. Of course it's terrible when large
sites go offline and take vast swaths of the Internet with them, and we should
continue to shame the ones that do it. At the same time, if something is
really important to you, you shouldn't store it in the form of links to random
third-party servers.

One problem we need to solve as coders is giving people better tools for
saving stuff. It's really hard right now to save a webpage (or worse, series
of connected pages) with any confidence that you've captured everything you
need to see it again if the original server disappears.

A project that I think has struck a really good balance between permanence and
retaining authors' control over their writing is the Archive of Our Own (AO3).
A bunch of fanfic authors got tired of sites falling out from under them, and
decided to implement their own system, along with sensible governance and a
way to fund its ongoing operations. The only broken links I've ever seen to
AO3 are ones where the author consciously decided to take the material
offline.

~~~
DanBC
Many people just don't have the clue to make usable local archives.

You're also proposing mass copyright infringement which - stupidly in this
example - is not legal.

~~~
anon4
If the system itself is federated the same way as the rest of the data, then
it doesn't matter if it's legal or not.

You can't make an omelette without breaking a few eggs and you can't fix the
world without breaking a few laws.

~~~
krapp
>You can't make an omelette without breaking a few eggs and you can't fix the
world without breaking a few laws.

Heh. The NSA should put this on a t-shirt and sell it.

------
spindritf
_It’s interesting that Andreessen can’t see the solution, but perhaps
expected._

What a weird dig. It's neither expected, nor established that he can't see a
solution. I'm not as smart as Andreessen and I could come up with half a dozen
solutions.

Author's favourite is fine but far, far from obvious. How viable is it to run
your own federated wiki anyway? Are there packages for popular systems? Are
there plugins for major browsers? Is there any federation actually happening?
I skimmed the resources[1] and don't know. Does anyone here run one? That
would be a solution, this seems more like an idea.

And it's not like no one's doing anything. There are services like Pocket or
Readability to store an article until you want to read it, Evernotes, Google
Keeps... Our very own 'idlewords will archive the contents of your bookmarks
for a fee[2]. Finally, there's archive.org.

[1] [https://github.com/WardCunningham/Smallest-Federated-
Wiki#ho...](https://github.com/WardCunningham/Smallest-Federated-Wiki#how-to-
participate)

[2] [https://pinboard.in/tour/#archive](https://pinboard.in/tour/#archive)

~~~
can09
What he seems to be getting at is a much larger, more revolutionary approach
to not just "the web", but "the internet" as we know it:

[https://en.wikipedia.org/wiki/Named_data_networking](https://en.wikipedia.org/wiki/Named_data_networking)

~~~
MrBuddyCasino
Thats a very academic and static view of content - I don't see how that would
work in todays hyper-dynamic environment, where the Ads that are displayed on
a site are priced by millisecond real-time auctions before they are delivered
to the user, and websites are single-page apps with REST APIs in the
background. How would that work?

~~~
tonyg
By transclusion. You'd cache the big, reusable chunks of content, and serve up
a fresh transient little document that transcluded both the larger content
chunks and the dynamically-included advertisements.

------
purplerails
I've been thinking about this for a while now. Please check out my web app to
solve this problem:
[https://www.purplerails.com/](https://www.purplerails.com/)

The main idea is to use a browser extension to _automatically_ save pages that
you read to the cloud (including the images, stylesheets etc) in the
background. Saved pages are searchable and sharable.

~~~
zargon
This sounded really great until I went to the website and saw that I can't use
my own cloud storage, only purplerails'. As soon as purplerails disappears all
my saved pages are gone. I already have this functionality with diigo and it
makes me very uncomfortable not to have a copy of the data.

~~~
purplerails
Excellent point. The ability to download your data in a well-documented format
is coming soon. See also my reply to hollerith on a native client.

Time limitations are what is preventing me from doing this.

Thanks for your feedback! Hope you will use Purplerails. :)

------
dilap
"The Tyranny of Print" has a nice ring to it, but mediums that give the
creator more control over appearance+behavior are going to lend themselves to
crafting more compelling experiences.

Sure, not disappearing in 10 years (or whenever the original server goes poof)
would also be nice, but it's of little benefit if no one ever sees the thing
in the first place.

And disappearing is the default, natural state of things.

If I see some people playing music on a corner and return the next night to
see they've left, I may be wistful, but it would be silly to argue "playing
live music is broken and we should fix it".

If you think of web sites as performances put on for a limited time by the
server, it doesn't seem so terrible that they disappear after a while.

~~~
mgunes
> _And disappearing is the default, natural state of things._

Books, clay tablets, scrolls, engraved stone, to which humans owe their entire
knowledge of their premodern history, seem to have put up pretty well against
entropy. The same is not the case for information disseminated in a controlled
manner from privately owned servers.

> _If I see some people playing music on a corner and return the next night to
> see they 've left, I may be wistful, but it would be silly to argue "playing
> live music is broken and we should fix it"._

> _If you think of web sites as performances put on for a limited time by the
> server, it doesn 't seem so terrible that they disappear after a while._

Thankfully, the generations who produced and preserved knowledge on paper,
clay and stone before the onset of digital technology - that is, every
generation of humans that has ever lived, except ours - did not think of books
and libraries as throwaway pamphlets. And it would take more than an arbitrary
interchange of modes of cultural production to argue that we should be doing
otherwise in the technological circumstance we find ourselves in.

The "tyrants of the server" are not thinking of server-centric aggregation and
dissemination of as a performance put on for a limited time: they are betting
on it as the future of all human literary activity. Google doesn't want to
read you a paragraph, take your money and say goodbye; it wants to swallow all
the world's books and information, chop it to tiny pieces, store and own it
forever, and extract the maximum profit from each tiny piece, without having
you pay a penny. And it wants you to come back for more. The persistence of
the server-centric model of content dissemination is not an accident; it is
dictated by the political economy of the web brought about by the Googles of
the world.

~~~
al2o3cr
"Books, clay tablets, scrolls, engraved stone, to which humans owe their
entire knowledge of their premodern history, seem to have put up pretty well
against entropy."

Only the ones that have survived. For every book or tablet we have, there are
certainly tens of THOUSANDS of which every copy ever published has been lost -
most of those are ephemera that wouldn't mean much to us anyways, but the lost
also include things that would be nice: the majority of Livy, any of the
original source material for the Gospels, etc.

Even considerably more modern material has been vanishing at a significant
rate; for instance, most of the output of the silent film era has _already_
been lost.

~~~
sroerick
You're absolutely right. Maybe it's a fool's errand to try and hold onto the
past.

But many people consider those losses to be an immeasurable tragedy.

------
hackaflocka
I will pay good money for a Chrome extension that does the following:

1) I can select (or do select all) Chrome bookmarks that I want to keep
offline page backups/archives of (saved to google drive or dropbox or some
such).

2) Whenever I want, instead of seeing the current online version of that
bookmarked page, I can look up the originally bookmarked archived page.

3) It allows me to choose the level of links to the bookmarked page to also
backup/archive (e.g., every single page that is linked to that page, x links
deep, is also automatically archived -- think httrack or wget).

As someone on Hacker News once said to me: my bookmarks are my knowledge
graph. As important to me as any library.

~~~
Detrus
Pinboard archiving costs about $25 a month. Not sure it does the deep link
archiving.

------
TelmoMenezes
Coming up with architectures to decentralize servers is the fun part.
Convincing people outside of our bubble to use the new system is the very hard
part. It has to be able to do something the regular person really wants that
the previous system didn't allow. This is why Linux never caught up on the
desktop.

Now excuse me while I go curate my socks collection.

------
nb13
This wouldn't work for any web page that has dynamic content stored in a
database. If the database no longer exists a decade from now this doesn't
solve that problem.

Also, wouldn't this break analytics and reporting for most websites too? It'll
be much tougher to track user behavior to improve user experience. And
debugging using log data? I get what the author is suggesting but "fixing the
web" this way would break more things that large websites and companies rely
on.

~~~
idlewords
Well, as the Internet once said to the music companies, it's not our fault if
our new technology breaks your business model. People would find new ways to
solve these problems.

~~~
corin_
It's fine to be that stubborn if you can win. But against the world of
business that needs analytics... I suspect their business model trumps your
technology for now.

------
Houshalter
Link rot is a serious problem: [http://www.gwern.net/Archiving%20URLs#link-
rot](http://www.gwern.net/Archiving%20URLs#link-rot)

>In a 2003 experiment, Fetterly et al. discovered that about one link out of
every 200 disappeared each week from the Internet. McCown et al. (2005)
discovered that half of the URLs cited in D-Lib Magazine articles were no
longer accessible 10 years after publication [the irony!], and other studies
have shown link rot in academic literature to be even worse (Spinellis, 2003,
Lawrence et al., 2001). Nelson and Allen (2002) examined link rot in digital
libraries and found that about 3% of the objects were no longer accessible
after one year.

>Bruce Schneier remarks that one friend experienced 50% linkrot in one of his
pages over less than 9 years (not that the situation was any better in 1998),
and that his own blog posts link to news articles that go dead in days; the
Internet Archive has estimated the average lifespan of a Web page at 100 days.
A Science study looked at articles in prestigious journals; they didn’t use
many Internet links, but when they did, 2 years later ~13% were dead. The
French company Linterweb studied external links on the French Wikipedia before
setting up their cache of French external links, and found - back in 2008 -
already 5% were dead. (The English Wikipedia has seen a 2010-2011 spike from a
few thousand dead links to ~110,000 out of ~17.5m live links.) The dismal
studies just go on and on and on (and on). Even in a highly stable, funded,
curated environment, link rot happens anyway. For example, about 11% of Arab
Spring-related tweets were gone within a year (even though Twitter is -
currently - still around).

~~~
idlewords
My own research (which I hope to publish soon) shows a slightly better link
rot rate for bookmarked URLs (which are presumably ones people are most
interested in keeping). The attrition rate I see so far is roughly linear and
about 5% a year. Which is still shocking by any non-web standard, but a little
better than the figures cited above.

~~~
sillysaurus3
You do research using bookmarks on Pinboard as your dataset? May I ask how
this data is used and disclosed to others?

~~~
idlewords
[https://blog.pinboard.in/2014/08/researching_link_rot/](https://blog.pinboard.in/2014/08/researching_link_rot/)

~~~
sillysaurus3
_To run the experiment, I am going to be drawing a few thousand links at
random from the entire pool of Pinboard bookmarks. This will include private
bookmarks, which make up about half the Pinboard collection._

You chose to include everyone's private bookmarks in your research without
asking their consent? What?

 _I will publish some aggregate information about what I find, and use it to
seek glory, and persuade people to sign up for archiving. But I won 't release
anything that could lead back to specific users or links._

There is roughly a boatload of evidence that anonymized datasets can be
deanonymized in unexpected ways.

Even if you don't release any anonymized datasets, it's really not good that
you decided to take such liberties with people's private links in the first
place.

~~~
idlewords
Why would I need consent to study the global link rot rate? Publishing it
reveals no information about users, either individually or in the aggregate.

I've made an effort to let anyone who wants to opt out of the research,
because I know people can have strong feelings about privacy.

I agree with you that publishing an 'anonymized' dataset would be a violation
of privacy guarantees. I wouldn't even do it for public bookmarks.

~~~
sillysaurus3
"Private" means "private," not "private unless Maciej wants to study them."

I didn't get an email saying "There's a chance I might select your private
bookmarks and examine them." A blog post doesn't count when you're messing
with people's private data. Certainly it should be opt-in, and not opt-out?

You're doing this for a noble purpose, but for what it's worth, this is the
first time my trust in you has ever felt violated.

People have entrusted you with years worth of private data, and you just
asked, "Why should I ask permission to study their private links?"

Actually, as far as I can tell, your comment seems to implicitly assume that
you already have consent to examine all private links, and that asking consent
would only be necessary if you were planning on publishing something that
might reveal some of their private links. Isn't that the opposite of privacy?

~~~
idlewords
I think the tension between us is that you think 'private' means 'visible only
to me', and I think private means 'never displayed to any other user, or on a
public page'.

There are a thousand routine tasks that require me to have unrestricted access
to bookmarks and URLs. I try to be as uninvasive as I can about it, but you
have no way of verifying that.

If you want something to remain truly private to you—and I say this in full
sympathy to your feelings—don't put it on a stranger's computer. Where there's
a server, there's an admin.

~~~
sillysaurus3
It seems like "private" could be defined as, "This is my stuff, and if you
want access to it, then come ask me. It's okay if you accidentally access it,
but if you want to intentionally look through it, ask me first." It seems hard
to argue that most people wouldn't feel that way about their own private
stuff.

I fully understood the implications of giving you the data. I'm a fan of your
work and your writing, and I had full confidence in your stewardship of my
data. Essentially, I was totally okay with _you_ being the admin, or anyone
you decided to hire, and I trusted you to take reasonable steps not to look
through your users' private data unless it was to track down some bug, test
some new feature, or some other incidental task that was unrelated to
analyzing that private data.

What I didn't expect was that you'd specifically and intentionally create a
program whose sole purpose was to analyze private user data and report on the
results.

Why didn't I expect that? The only answer is that I should have expected that.
I just didn't realize you were that type of developer. It was a bit shocking
that someone who has trumpeted the benefits of sticking with businesses that
haven't taken VC investment would explicitly break their users' trust like
this.

In this case, you have both the legal right and the moral high ground. But
intentionally seeking through your own users' private data without getting
consent isn't something that can easily be forgotten.

~~~
tptacek
You are on tilt. You lost it, completely, when you wrote "specifically and
intentionally create a program whose sole purpose was to analyze private user
data", a statement that is not only self-evidently false but deceptive. You've
gone from "aggressive good-faith commenter whose points I often do not get" to
"alarming and hostile" all in the span of a single comment thread.

Re-evaluate. You can often dig your way out of these stupid message board
holes by simply apologizing. It's worked for me repeatedly.

~~~
sillysaurus3
You're right. I apologize. Both to you, for getting heated, and to Maciej, for
misrepresenting his actions in this matter. The misrepresentation was
accidental, but it happened nonetheless. My comments were also extremely
disrespectful and totally uncalled for. I'm truly sorry.

------
magila
The fact that he thinks a federated wiki would be "simple" or "easy" leads me
to believe he has not actually thought through the details of how it would
work in practice.

~~~
michaelchisari
It would not be simple or easy, but crypto-currency blockchains make it more
possible than ever.

~~~
grkvlt
What does this mean? I can't understand the relationship between the two
ideas?

~~~
michaelchisari
In the most rudimentary form, this is how:

[http://www.righto.com/2014/02/ascii-bernanke-wikileaks-
photo...](http://www.righto.com/2014/02/ascii-bernanke-wikileaks-
photographs.html)

More reasonable ways of utilizing a blockchain for undeletable data are
emerging, though.

------
grmarcil
I find Bret Victor's comparison between the internet and the LOC a little
weird. I've always thought of the internet as a publishing/sharing medium, not
an archive.

There are plenty of books that go out of print within ten years, we just
happen to have infrastructure beyond publishers (libraries) that preserve
published copies.

~~~
idlewords
I think it's significant that the Library of Congress has funding, an official
mandate, employees, a clear legal status, and stores complete copies of the
works in its catalog. A similar model would work great for the Internet (and
archive.org is doing its best to fill the role).

------
wyager
This would not work with dynamic content.

We already have systems like this (bittorrent, freenet, etc.), and almost no
one sees them as a viable replacement for the web because they can't do 99.9%
of the things we want (social networks, forums, email, etc.)

------
lutusp
> There’s actually a pretty simple alternative to the current web. In
> federated wiki, when you find a page you like, you curate it to your own
> server (which may even be running on your laptop). That forms part of a
> named-content system, and if later that page disappears at the source, the
> system can find dozens of curated copies across the web.

This is a simple and very bad idea. If it were the norm, instead of one or no
copies of a particular work online, you would have any number of "curated"
copies of uncertain vintage, downloaded at different times in the lifetime of
an original whose content might well have changed as time passed. You would
have curations of curations, and curations of those, _ad infinitum_.

Pages that depended on remote Web content (increasingly common) and/or that
linked to online references, would gradually become unreadable or
incomprehensible as its links vanished into other offline "curations".

Not to mention the copyright issues. And I'm not crazy about the term
"curation" either -- it's obviously meant to try to elevate the practice of
downloading anything we please, without regard to copyright.

------
pjbrunet
I'd rather a page go offline than have it taken out of context. As if
plagiarism wasn't bad enough already. (Yahoo Answers _cough_ )

These crooks will even steal your copyright notice. It's quite possible the
original content producers are offline _because_ scraper thieves stole so much
content that it's no longer possible to earn a living.

As an artist, this reminds me of the condescending attitude that gave us fake
Rolexes, Facebook & North Korea's 28 state-approved haircuts. Either it's
"just content" to stuff in a database somewhere or you understand the medium
is the message too.

~~~
sroerick
As someone with a teensy bit of film background, I have to disagree. The
number of early Hollywood films that were lost is astounding. This is a
massive part of our visual history that is completely gone. It will never be
restored.

With the current environment on the internet, with DRM'd video, music and
text, I have to assume that we will lose far more from this time period than
we ever had before.

While I don't pirate things (I'd rather just consume Creative Commons and
Public Domain content), I wholehartedly support people who are trying to
archive the things that are part of our collective culture. When I have kids,
I'd like to be able to show them where they came from.

~~~
pjbrunet
I'm not against reproduction if care is taken to preserve what can be
preserved, as close to the original as possible while giving credit,
compensating creators, etc. In your example, I think reproduction/conservation
technology was available but the studios couldn't justify spending the
time/money it would cost to preserve their entire library. Who would have paid
for that? I don't know. Supposedly, half of Van Gogh's entire lifetime output
was burned because he was too poor to find a place to store it. At the same
time, I'd rather see a bug-eaten Van Gogh with fugitive reds long faded away,
than a flat, lifeless high-def copy. I suppose it's a complex subject and each
work is unique. Sometimes I'm thrilled when I can find an old page in "Wayback
Machine" but what they manage to save is typically broken and low quality.

~~~
sroerick
I'm having a little trouble understanding your perspective. Copying films is
good, letting bugs eat Van Gogh is good, but copying websites is bad?

Well, luckily, with digital technology, we can copy things flawlessly with
very little cost. Unfortunately, most content creators are still stuck trying
to adapt physical distribution models to the information age, which is why
we're stuck with DRM. You can't say "The Medium is the Message" and then get
angry because you're producing content for a medium that is infinately
copyable.

We need to move to a model of perceiving data as holographic. Especially with
the advent of blockchain technology, we're increasingly moving to a model
where every node in a network contains the entire network. Trying to adapt
20th century Disney copyright to that paradigm is stupid.

------
mark_l_watson
I like the idea of the federated wiki, but search engines rank copies of pages
poorly, so it is not clear how visible copies would be after the original
content disappears.

I used Evernote for years, but recently canceled the service because I spent
too much time curating compared to reading old material.

One option that I am considering is archiving really good web content as web
archive files and saving them locally in folders indicating the year of
capture. Local file search would quickly find old stuff and if I stored the
yearly web archive folders in Dropbox, I would have them available on
different systems.

------
CCs
Hosting your own server might not be a scalable solution either. There's a
reason why SaaS is popular: it's not that easy.

On the downside: stuff hosted by others might go away. Web pages, web
services, apps requiring server side support...

Investing a lot in a service makes it more painful to lose, like the
apparently discontinued Amazon Cloud Drive (supposed to be a cheaper Dropbox):
[https://news.ycombinator.com/item?id=8219257](https://news.ycombinator.com/item?id=8219257)

------
ilaksh
Named data networking of some kind is likely to become popular at some point.
This is that kind of idea but doesn't look like a really general protocol
since he mentions a specific wiki.

I wonder if there are browser extensions that do p2p caching/distribution of
content. Then you could standardize a protocol used for that type of
communication.

I believe there are many efforts along these lines. The trick is as usual
getting everyone on the same page or at least working together more.

------
twoodfin
I'd love it if browsers natively supported URI's derived from cryptographic
hashes of content by looking them up in a distributed store _a la_ BitTorrent.
Imagine if Chrome supported such a thing, for example. Perfectly reliable
cache-ability (or archive-ability), P2P hosting, ... All the good stuff for
any web content that its creator wants to so expose, albeit at the price of
immutability.

~~~
ottonomy
Especially under the current copyright regime, finding some solution that
preserves the intent of the creator to publish in a fixed format, would be a
great component of a distributed publishing system. I don't think this
proposal has as good a fair use defense as the Internet Archive wayback
machine does.

In the world today, we often think of publishing online as providing access to
something under our control. I think a technology that aims to solve these
problems should embody a different spirit, one closer to "making public". The
word "mine" doesn't need to imply ownership in the sense of exclusive control.
I mean, "My children" is at least as meaningful a relationship as "my
property". Some kind of copyright license ability built into a distributed
document publishing system would be nice.

------
bajsejohannes
I think the federated wiki is a neat idea, but in it's current incarnation, I
find it to be exceedingly unlikely that a page I'm looking at there will stay
around 10 years.

Even if I'm making a copy of every page I see, I'm not sure I'll still run a
federated wiki on my server in 10 years.

I don't think this is a real solution to the problem posed by Bret Victor.

------
ricardolopes
It's a great idea for a real problem that needs to be solved. Still, for
dynamic pages, what would be the desired behaviour? Updating it whenever
possible, which could lead to the specific info we wanted to save potentially
disappearing or changing? Leaving it outdated? It is really something that I
cannot answer.

------
unicornporn
The museums, libraries and cultural heritage institutions are working hard on
digital preservation. I think there's a lot to learn from there. Check out
this for an instance: [http://www.lockss.org/about/how-it-
works/](http://www.lockss.org/about/how-it-works/)

------
steele
Right to be forgotten?

~~~
wyager
>Right to be forgotten?

A ridiculous concept of a "right". I do not recognize that anyone has a right
to force other people to forget things.

------
blablablaat
Y U no archive.org?

~~~
pbhjpbhj
Archive.org doesn't have everything and their archives can be cleared of
specific content on request of the copyright holder.

There is a big copyright issue here as in the UK we don't have the relatively
liberal Fair Use exceptions that are in the USC - we only just got permission
to format shift and make personal-use backups (so MP3 players [using tracks
ripped from CD] are legal as of June this year!). Copying down a website,
beyond caching, is generally speaking copyright infringement for those in the
UK.

~~~
CyberShadow
> their archives can be cleared of specific content on request of the
> copyright holder.

Not only that. If a domain expires and is picked up by a squatter, the
squatter can instruct Archive.org to delete ALL copies of content archived
from that domain. Unfortunately many do so.

------
dang
We changed the title to the first sentence of the article because it is less
linkbaity.

