Hacker News new | comments | show | ask | jobs | submit login
Immutable URLs (medium.com)
91 points by swombat 1424 days ago | hide | past | web | 45 comments | favorite

Freenet's Content Hash Key URIs are one example of this idea in practice.


BitTorrent's "magnet" URIs could be seen as another. I always liked the idea of using torrents to host static web content. There are downsides, but they would be worth it in many cases.

If you used the torrent info hash as the primary identifier of the web content, but also embedded an HTTP URL that the data could be served directly from, you could have secure immutable content with the almost the same performance as a regular website. The torrent data could be used to verify the HTTP data, and the browser would fall back to download from the torrent network if the website was unavailable or served invalid content.

(This would probably require a bit of original design, since I don't think there's an existing convention for getting the actual torrent data over HTTP instead of from peers (DHT), but that's minor.)

Yes, the article is basically talking about turning the web into something it currently isn't. In a strange way, this was what the web was when it was very young; a bunch of inter-linked documents written in static HTML that rarely moved around.

But now we have something of a hodgepodge bazaar. For URLs to truly not move around and survive the creator and his/her circumstances, there needs to be a distributed repository. I don't know if Freenet will be that repository (the one time I tried it, it was glacially slow). Maybe Bittorrent's Sync project will pave the way to create a truly universal, persistent, content repository with permanent URI(L)s.

> I don't know if Freenet will be that repository (the one time I tried it, it was glacially slow).

Freenet is only slow because of all the indirection it needs to do to guarantee anonymity. You could get the same distributed-data-store semantics "in the clear" for a much lower cost, and then layer something like Tor on top of them if you wanted the anonymity back.

As you say, the early web came close to this ideal. What happened was almost entirely political and social, namely censorship, and copyright, and DMCA takedowns, etc (only occasionally would a "webmaster" die, or would a system fall into disrepair). Freenet/anonymity (and the early web had the appearance of anonymity), is one approach to prevent pointer-breakage by simply making the censorship impractical. Another would simply be to accept that the web should be an append-only distributed database like Bitcoin's blockchain or like a de-duplicating filesystem in which additions depend on prior content, making censorship all or nothing (and hopefully we wouldn't throw out the baby with the bathwater). Bittorrent is somewhere between - anonymity through numbers and a high-availability through independent mirroring (without interdependence between torrents, discouraging censorship).

While magnet URIs come close I think this would actually be a better match for a purely functional data store, like Datomic for example.

If you namespaced each Datomic database and add a transaction ID you would get a reference to an immutable snapshot of that entire data store, including pieces of it, like datomic://myhost:<transaction-id>/path-or-query-into-db

The disadvantage of that is that it is data and not a website, however it's possible to use Functional Reactive Programming to let the site essentially be auto-generated from that data store thus giving you the 'website view' again.

That of course still allows your program to be lost, but if you were to add that program to that purely functional data store itself and thus also version your own program, then that is also no longer a problem.

And once you've done that call me, since you'll have built what I've been dreaming of for the past decade.

Hah! I am working on something similar to that. I don't really think of it as a 'website view', though--it's more like a distributed database of versioned hypercard stacks that can contain hyperlinks. It also runs the stacks on a distributed system of mutually-untrustworthy resource-competing agents, built as a multitenancy patch to Erlang's BEAM VM, and each stack-instance is by default accessed "collaboratively", mediated between users transparently using Operational Transformations.

I'm calling the platform Alph--after the river that runs through Xanadu ;)

This cuts against the nature of real life.

I regularly speak with groups of high school pupils about privacy and one of my main points to them is that once they commit their latest brain-fart to the internet, there is a very real chance that it becomes immutable - should it go viral, for instance. If it were absolutely guaranteed that it would become immutable though, that would be a game changer.

Can you imagine if everything that you had ever said at any point in your life was permanently journaled and indexed and searchable? I personally find that to be a horrific concept from a privacy point of view.

From a purely technical point of view I can see the benefit of this idea - I hate it when an old article that I've bookmarked doesn't exist anymore (even when by article I mean, a gif that made me chuckle) but seriously, there's a very different world to a newspaper article being permanently available and a myspace profile, Facebook post or tweet being there forever.

I wonder if it is better to warn the young about the potential permanence of online expression... or let them take those risks, and then as they grow, manage the world that results.

They might negotiate new norms, of forgiveness and understanding towards prior selves.

Sounds lovely in theory...

I have to say, a service I would love would be a website that mirrors content but only if the original source went down, and otherwise redirects to the original site. Imagine something like a URL shortener, but the URL it gives you will redirect to a cached copy of the original page in the event that the original page disappears. That way, you can link and give credit to the people who made the content, but if something happens it isn't lost from the internet for good. It would, in a sense, be a "permanent URL" service. It'd be great for citations too, e.g. wikipedia, academia, etc. I'm not sure if that's what the OP is getting at here, or if he's suggesting something else?

Either way, too bad rights issues would probably stop something like that ever being made.

As you say, rights issues would probably stop something like this, and I have a few stories that show two sides of the rights issues.

1) Back in 1998, the company hosting my website received a cease and desist letter from a company that held the "Welcome Wagon(TM)" trademark because of a page I had on my website. That prompted me to get my own domain and move the content over (and I was able to get proper redirects installed on the company webserver). I was happy (I had my own domain, a ".org" and apparently, that was enough to keep the lawyers at bay). The hosting company was happy (they didn't have to deal with the cease and desist letter) and the trademark holding company was happy (they protected their trademark like they're legally required to). I'm sure that the trademark company would be upset if their trademark was still "in use" at [redacted].com (the hosting company, long gone by now).

2) I hosted a friend's blog on my server. A few months later he asked me to take the blog down, for both personal and possibly legal reasons (he was afraid of litigation from his employer, who had a known history of suing employees, but that's not my story to tell). I'm sure he would be upset (and potentially a lot poorer) had his content remained online for all to see.

3) I've received two requests to remove information on my blog. The first time (http://boston.conman.org/2001/08/22.2) someone didn't quite grasp the concept that domain name registration information is public, but I didn't feel like fighting someone who's grasp of English wasn't that great to begin with, and removed the information. The second time (http://boston.conman.org/2001/11/30.1) was due to a mistake, so I blacked out identifying information. I didn't want to remove the page, because, you know, cool URLs don't change (http://www.w3.org/Provider/Style/URI.html); yet the incident was a mistake. There's no real point in seeing the non-redacted version, nor do I really want people to see the non-redacted version.

There are a ton of corner-cases like these to contend with. Just one reason why Ted Nelson's version of hypertext never got off the ground.

W3C's Permanent Identifier Community Group maintains https://w3id.org/ which performs a similar service.

Does anyone know the current status of this effort? I love the idea, but the mailing list has no activity and I didn't see any evidence that w3id URLs are currently usable.

I really like your idea, but what about content being changed/updated, instead of deleted?

For some use cases it would make sense to show the cache (when the original quote is no longer there), while for other it'd make sense to forward (some style update, or an important addition).

How do you think can such service handle this?

I imagine there'd be a few options:

- Allow someone to manually view the cached version at any time if they wanted to.

- Show a splash page giving the user to view the original or the new version (complete with a diff highlighting changes?)

- Code some heuristics, similar to Instapaper and the like: if the content of the page changes, display the cached version, but if it's just the layout that changes then display the new version. Or look for dates on blog posts, or words like "Updated: " or similar.

- Give the website owners control: let them submit their site to be linked against, and give them some metadata tag that they can use to flag updates. This sidesteps the rights issues too (the website owner gives permission) and it could also be used as a CDN essentially, or a backup in case of server failure, or if the website is hosted in an unfriendly country etc.

I think it's definitely doable in theory at least.

It occurs to me--this is exactly what http://semver.org/ is for! Content changes are new "major versions"; edits are "minor versions"; errata are "bugfixes." You could link to a page @14.1.2, or @14.x, or @latest.

I think you can achieve that effect with auto-scaling, for example on Elastic Beanstalk. If AWS goes down though, that won't help (but most probably such a service would run on AWS anyway :)).

I think he's talking more about linking to third-party sites that don't go down, not links to his own. :)

See webcitation.org - copies the page at time of linking.

Thinking of the first content I published on the web as a teenager some 15 years ago, I'm happy it's gone now.

Sure. But wait, you might change your mind in the future. Nostalgia perhaps.

"A cool URI is one which does not change." http://www.w3.org/Provider/Style/URI

Once upon a time, I drank this koolaid, but no more. Many things in life are ephemeral, including information. To suggest that a webmaster's responsibility is to hoard data for eternity is both scatological and counterproductive. As the Web matures, it is threatened far more by the growing mountain of obsolete information that must be ignored in order to find anything timely and relevant. I would much rather see these pages deleted if they aren't going to be updated, even if it means broken URIs, which will eventually fade away.

I think people have been confused for a long time about what the Cool URIs essay means by "change".

This is a particular resource:

It has representations that change over time, because the conceptual resource "the latest thing at example.com" itself changes over time. This is perfectly fine: the resource that the URI refers to stays the same, but that resource's state is mutable, and the changes in this state are reflected by changes in the resource's representation (what you get by retrieving it.)

This is another resource:

This representation at this resource probably shouldn't change very much; not nearly as much as the one at /latest/. It's still allowed to be mutable, though! If there's a typo, or a retraction, you're allowed to reach back through time and fix that resource, to make it "the way it should have been" at that date.

A webmaster's responsibility is to make sure his URLs continue to refer to the same things they originally referred to. Conceptually, if you only want to store "the latest news", then you should only have a /latest/, and not a /2012/01/02/news/. Creating the latter is creating a promise that it will stick around, continuing to refer to "the news at 2012-01-02"--a permalink, in the real sense.

I think another important aspect of this is that the (relatively unchanging) resource at /2012/01/02/news/ shouldn't be moved to /archive/2012/01/02/news/ without at least a redirect.

The point being that if the resource still exists, the old URI would preferably still point to it. If you, as the arbiter of the resource, decide to remove it, of course the URI will break.

There as many kinds of websites are there are types of paper publications. Nobody cares about old TV magazines, but there are plenty of books that are older than me and still relevant. The online version of SICP, for example, has incoming links from many universities that I hope will never break.

Yeah.. I think the desire to keep everything around forever, untouched, is an inability to organize and distill, or maybe an unwillingness to at least try and find out it's possible. The web as an endless roll of toilet paper rather than a library is a sad development.

This is basically the idea that Julian Assange was putting forth in that article a few weeks ago about his secret meeting with Larry Paige.

Its interesting to see that people are already saying this is a bad idea but was praising his version of it.

Seems like a bad idea. People think that once something is on the internet it's there forever but that's simply not the case. Hard drives develop errors, servers get shut down, backups get corrupted, etc. etc. Your stuff may be around for a long while but there's no guarantee that it will be permanently accessible. If you want the contents of a web page to be available to you then download said page to your computer and do proper backups, etc. This will increase the likelihood that said data will survive. This is not a problem with URLs.

Sounds like the same kind of idea from this Assange interview - he presents an idea for a naming system where something's name (url) is intrinsically tied to its content: http://wikileaks.org/Transcript-Meeting-Assange-Schmidt?noca...

Didn't Julian Assange suggest this? There was an interview published last week with Eric Schmidt, where this was suggested.

I've since started work on a side project that does this - to be integrated into Fork the Cookbook - since our target audience seems to be very up-in-arms about original recipes.

I've had this thought before but it seems like the natural key for a web resource has to be the URL (location) plus the time that the resource was accessed for practical reasons.

Pages are expected by the end user to change over time, but they also expect to access them at the same location each time.

> I've had this thought before but it seems like the natural key for a web resource has to be the URL (location) plus the time that the resource was accessed for practical reasons.

These are called Dated URIs/DURIs: http://tools.ietf.org/html/draft-masinter-dated-uri-10

No browser currently implements them, but a viable resolution mechanism probably involves keeping a default store of Memento Time-Gates (http://www.mementoweb.org/guide/quick-intro/) and querying them to see if any of them have a copy of your resource for that date.

This is a cute, facile idea, but not thought through. It's not a problem of technology per se - content itself doesn't want or need to live forever. I reserve the right to alter or remove content that I publish.

It's trendy to think of the web as completely stateless, distributed etc, but the reality is that it's not. The state of resources changes over time because the world changes - and URIs are only around to reflect that.

The problem with HTTP is that you mostly can't tell the difference with a 404 between 'It's not there (and was never there)' and 'It's not there (but used to be, and has gone away)'. Servers should send a 410 to reflect that.

After that I thought: I have to reinvent the web someday. Another engine, another software, not even called "web". The web structure is so old-fashion. Did you already think about how much different a page is from each other? It is bad for the final user! Think again: Android, Windows, Mac, the SO usually try to make a standard to help user don't think again to make repetitive tasks. Different layouts makes an unnecessary brain effort. I know that this is the beauty anarchy from web, but it is not practical. It is possible to be beautiful and follow minimum standards. iPhone is there to prove that.

It's very interesting that an immutable web could make current real world immutable objects (printed books, etc.) appear more flexible, more mutable. With a book, you can write in the margins of a particular copy, every copy could be lost, but a distributed system of permanent content would persist without marginalia or utter destruction.

The web is amazing because of its participatory potential and it's archival abilities. What might be more interesting than simply having immutable content is palimpsested content where the original object always exists beneath additional changes and additions.

"Immutable URLs" already exist, they're called URNs [1] and it's a standard since 1997.

[1] http://en.wikipedia.org/wiki/Uniform_Resource_Name

I think people are re-discovering/re-inventing Berners-Lee's semantic web and "linked data" descriptions (having the epiphany on their own) in part because Berners-Lee, for all his excitement, often fails at presenting the very basic idea, and I think it's because he takes the idea of a static URI for granted.

By using a URI like a globally-unique primary key - a symbolic link - into "the database of the web," in place of the content itself (not just as a pointer to a the next page page with cats) you can begin to use all of the web as the data set and something like XPath/XQuery as the query language.

Before any of that can happen, people need to really accept that URIs/URLs can't change their semantic content and rarely-if-ever go away. That's a big problem with the current approaches to displaying content: the references they generate are presumed to be forgettable.

A URN is still a pointer. An ISBN is not the book.

"Immutable URLs " Is a poor headline then.

It should read instead as "immutable content" since the OP intends to never have the content a URL points too to never be lost or changed

URIs are pretty much immutable. My impression is that what the OP suggests is a guaranteed lifetime of the content associated with the URI.

As for this second part, "once it's published it should always remain out there", I'm not very sure it's a good idea. In many cases I'd actually like to be able to say that a piece of content has expired (the content is not relevant anymore).

There's an interesting proposal for a 'duri' URI scheme, that means "what this URI was at this date":


It doesn't actually freeze the contents... but it provides a language/key for talking about permanent URI+content bindings.

This reminds me of Van Jacobson's Google Talk several years ago: http://www.youtube.com/watch?v=oCZMoY3q2uM

Basically a sketch of if the entire HTTP-based web worked like bittorrent.

The memento project (led by Herbert Van de Sompel) attempts to solve some of this: http://www.mementoweb.org/

If there aren't enough interested readers to encourage a maintainer to keep the content available, is it really worth the effort to auto-archive all of it?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact