
Permanent Identifiers for the Web - r0muald
https://w3id.org/
======
btrask
Adding layers of indirection isn't going to give us a permanent addressing
system. This service just adds new points of failure: the service itself, or
the creator of an identifier who fails to update it or changes it maliciously.

The real solution is some form of content addressing. Whichever one you want:
URNs, magnet links, the "ni:" RFC[1], IPFS paths[2], or my own hash link
system[3].

[1] [http://tools.ietf.org/html/rfc6920](http://tools.ietf.org/html/rfc6920)

[2] [https://ipfs.io/](https://ipfs.io/)

[3]
[https://github.com/btrask/stronglink/](https://github.com/btrask/stronglink/)

~~~
javajosh
Content-addressing suffers the exact same problems, they are just a little
more abstract. Content addressing also contains indirection because TCP/IP is
most certainly location-based. So, to map from a content hash to one or more
IP addresses requires (drumroll) a service that is a new point of failure.

Edit: So, there is certainly an algorithm that lets you go from hash to IP
directly in a way that evenly distributes things, blah blah - but then you'll
probably end up bugging a lot of hosts that don't participate in that
algorithm! I pity the poor fools who get "lucky" IP addresses for certain
content. :)

~~~
danieldk
_Content-addressing suffers the exact same problems, they are just a little
more abstract._

Actually, content-addressing already works pretty well. If you paste the SHA
hash of a widely distributed file (e.g. an OS ISO or commonly used source
package), you typically find many places to download the content. It's often
more reliable then URIs for such kind of data.

It is surprising that it already works well for the web, which was not built
for CAS. Peer to peer CAS systems, such as IPFS make this much more usable.

 _but then you 'll probably end up bugging a lot of hosts that don't
participate in that algorithm!_

??? Trackers/DHTs were invented exactly to make this efficient and doable.

~~~
gioele
DBJ puts random IDs in his papers and uses that that as a persistent (work)
identifier.

An example from
[http://cr.yp.to/bib/documentid.html](http://cr.yp.to/bib/documentid.html)

> Date: 2004.04.02. Permanent ID of this document:
> 46d904d0613f360904e88a85dcdaa52b.

> When I cite a paper with a document ID, I list the paper's document ID in
> the bibliography entry:

> [7] Daniel J. Bernstein. Sharper ABC-based bounds for congruent polynomials.
> Journal de Theorie des Nombres de Bordeaux, to appear. ISSN 1246-7405.
> Document ID: 1d9e079cee20138de8e119a99044baa3. URL:
> [http://cr.yp.to/papers.html#abccong](http://cr.yp.to/papers.html#abccong).
> Date: 2004.02.10.

> The idea is that, even if the URL changes for some (probably bad) reason,
> you can still search the Internet for the words

> Permanent ID of this document: 46d904d0613f360904e88a85dcdaa52b

~~~
javajosh
Interestingly there's some ambiguity about which document - is it the one that
includes the hash, or is the hash inserted after it is computed?

------
themartorana
I suppose it's a good project with good intentions, but it's still only as
useful as the content it points to. Take, for instance, Geocities - the
amazing free platform where everyone was experimenting with websites at the
same time.

It's all gone.

A permanent URL helps not-at-all. True, the Web Archive has a lot of old
Geocities pages cached, but most of the time I find something missing, the
content is gone. It has nothing to do with the wrong URL.

That said, it's a nice way to claim a "permanent" URL for say, yourself. It
lets you change domain names and whatnot in the future.

Still, I don't mean to downplay the good intentions. I just wonder what the
half-life of the database is - that is, before half of all claimed perm URLs
point to nothing.

~~~
firasd
Right. The main source of linkrot isn’t that a webpage was carefully moved
from place A to place B (in that case you could just use google to find it
again), it’s that the webpage content isn’t maintained anymore. It's just
gone.

------
detaro
Aren't DOIs already pretty established for that, at least for scientific
sources? [http://www.doi.org/](http://www.doi.org/)

Granted, they don't have nice names, but the namespace will get cluttered and
full anyways.

~~~
tejtm
Yes, and they work but are too heavy a solution. Try approaching them and
saying: Hi! I would like to reserve two billion DOIs :)

yea, NOPE!

There are lighter schemes out there but none are household names yet.

ARCIDs
[https://confluence.ucop.edu/display/Curation/ARK](https://confluence.ucop.edu/display/Curation/ARK)

~~~
cetacea
> Try approaching them and saying: Hi! I would like to reserve two billion
> DOIs :)

If you've already registered a prefix with them (say it's 10.8888), then you
have an essentially unlimited namespace of possible suffixes for forming DOIs.
Representing two billion DOIs requires only 6 suffix digits in base 36.

What exactly is so difficult about this?

~~~
tejtm
as I recall first they were excited, then calmer heads prevailed and pointed
out it was orders of magnitude more DOIs than they had minted since the
beginning of time and they were not prepared to make a sudden jump, but they
could do a tiny fraction for several times our entire grants budget.

~~~
cetacea
I have no idea who "they" or "our" refers to, or what "sudden jump" you're
talking about.

~~~
tejtm
Sorry, I have a bit of a cold so it all made sense to me. "They" were a DOI
representative. "We" were a research team working on a grant involving
persistent resolvable identifiers. the "sudden jump" would be the difference
from how many DOIs they were responsible for resolving (in 2012) without any
assigned to us v.s. to how many they would be responsible for if they did what
we had asked.

------
rictic
IPFS solves this problem far better.

It marries content addressing, DHT+bittorrent-style distribution, and public
key cryptography.

In the low level ipfs namespace content is identified with its hash. Content
may be either a file or a directory, which is a map of names to hashes.

In the higher level ipns namespace content is associated with the hash of its
current content and the public key of the agent that's allowed to update this
hash.

A hypothetical that I found motivational:

The New York Times writes an article that includes a link to a source, say a
scientific paper. As an organization, they make the decision to care about
their articles being available with necessary context, so they link to the
paper's hash within IPFS and configure their servers to maintain and serve all
IPFS content that they ever link to. In this way, even when the original host
of the report goes offline the New York Times is able to ensure that their
article's links continue to work.

------
mbleigh
We already have permanent identifiers for the web...they're called URLs. This
seems like a super-unnecessary level of indirection.

~~~
colanderman
This isn't for _your_ URLs. It's for URLs you're linking to that you don't
control and that are liable to change because _other_ people (typically large
content sites) don't understand how URLs should work.

~~~
xrstf
So how exactly is this supposed to make things better? I want to save, let's
say, www.newspaper.com/import-article.html. So I must create a PR, wait for it
to be merged so that I get w3id.org/my-project/newspaper-article as a redirect
to the URL mentioned earlier.

Now what? What if the newspaper.com URL changes? Am _I_ supposed to keep all
my URLs up-to-date (i.e. send pull requests every now and then to update my
"bookmarks")?

web.archive.org seems much more useful, by copying the entire page at the time
of creating a snapshot and thereby creating a true, durable copy.

~~~
endergen
The system linked to is terrible. Content addressable schemes should be solved
via decentralized routing algorithms/networks. See ipfs et al mentioned else
where in these threads. The poster's link is no better than link shortening

------
quotemstr
We've had unique ID technology for decades: the GUID. You can access it with
uuidgen. A unique _name_ is not the same as an enduring, universal set of
directions for reaching the _named content_.

------
gue5t
This is another global centralized namespace. I thought we had learned to
avoid this mistake when we all saw how bad the domain-name system was.

------
johansch
Most pretentious URL shortener ever? :)

~~~
colanderman
URL shorteners don't generally allow link targets to be updated. This is not a
typical URL shortener.

~~~
blowski
So it's a URL shortener with an added feature?

Part of the reason URL shorteners don't offer this feature is the potential
for abuse.

------
benologist
This is a cool idea, but making links via pull requests through Github isn't
very enticing. A more efficient/faster API for automation and complete
anonymity are minimum required features in my opinion.

------
cetacea
We already have the DOI system. Why is a new system needed?

~~~
bitserf
[https://xkcd.com/927/](https://xkcd.com/927/)

~~~
dexterdog
When you think you know which xkcd is being linked to by context you are
nearly always correct.

~~~
such_a_casual
Dexter's Law?

~~~
ta0967
Dexter's Dog's Law

------
bmn_
Lots of misunderstanding going on in the comments. Let me set your confusion
right.

w3id.org is basically the same as purl.org, see
[https://en.wikipedia.org/wiki/Persistent_uniform_resource_lo...](https://en.wikipedia.org/wiki/Persistent_uniform_resource_locator)
for a background explanation. They are services that promise to be extremely
stable and long lived and where you coin permanent URLs for certain Web
technologies (e.g. Link relations RFC 5988 §4.2, XML namespaces, …) that
require an identifier that never changes. So in theory you can put any well-
formed URI there because most of the time, software will just compare for URI
equivalence (RFC 3986 §6), but if a user wants to, he can also dereference the
resource identifier and possibly arrive at a human readable document
describing what's going on, for example visit
[http://www.w3.org/2001/XMLSchema](http://www.w3.org/2001/XMLSchema) in your
Web browser. You cannot do this with content addressable IDs (named hashes/ni
scheme, IPFS, DHT), URNs (guid/uuid scheme), etc. In order to achieve that
practical goal, the dereferenced document needs to be published on a Web host,
and the domain name associated with the Web host needs to be under your
control.

Now, for coining purls, you put in an indirection. When you lose control of
your domain name, simply redirect to a new one. In practice this eliminates
link rot. There are other things on the Web that make use of redirection
mentioned among these comments, like archives and link shortening, but that's
out of scope for purls – you are not supposed to coin purls for general Web
documents like news articles (millions a years), but specific documents whose
URI serves as an identifier for a schema description or the like (dozens a
year).

The difference between the different purl services is their governance model.
IMO w3id is best aligned with the interests of hackers that make use of Web
technology.

------
jrochkind1
So on the one hand, we're already supposed to have permanent identifiers, and
they're called URIs/URLs.

On the other hand, what is it that makes URL's fail even if the content still
exists on the web?

Generally, changes in platforms hosting the content, that addresses them
differently.

It is not technically hard to provide redirection yourself that redirects from
the old URLs to the new ones. If the new platform still uses the same internal
identifiers for each piece of content, it may be as simple as a one-line
apache httpd or nginx redirect, from
`[http://example.com/get/thing/$ID`](http://example.com/get/thing/$ID`) to
`[http://example.com/find/it/here/$ID`](http://example.com/find/it/here/$ID`)
or whatever. If the internal identifiers have changed, it's a pain to list the
mapping -- but that pain doesn't actually diminish much at all with this
w3id.org service, you're still going to have to update all the URLs
individually with their service.

Another possibility is that your hostname has changed; as long as you haven't
lost the ownership of the old domain though, it is still not technically hard
to point it to the same place as your new one, and then you're reduced to the
same situation as above.

So it's not technically hard to provide the URL redirection yourself locally.
If you want to provide your own platform-idenpendent "permanent identifiers"
from the start, there's even several open source packages meant to help you do
it yourself, locally.

On the other hand,it is another thing to think about, another thing to
maintain and monitor. Nearly everything else one can think of, even things not
that hard to do locally, especially if they might require running another
service locally, is being 'outsourced' to "X As a Service" platforms.

So, okay, why not 'permanent' identifiers too? I wish people would just take
care of it themselves, the way the web was intended. And I wish w3id.org
actually just identified themselves as "permanent identifiers as a service" or
whatever, instead of implying that they're doing something fundamentally
different than plain old URL redirection you can do not too difficultly
yourself.

And it is important to note that you are relying on the continued existence
and maintenance of the w3id.org hostname and service behind it for
"permanence". When "permanent" is in the name, the risks of relying on an "_
as a service" provider are higher, you can't really switch to a different
provider later, you're stuck with them literally forever, and counting on them
existing as long as you need your identifiers to.

But it's not too surprising if people are looking for "permanent identifiers
as a service", they're looking for nearly everything as a service. On the
other other hand, most entities don't seem to care about permanence in their
URI's _at all_ -- if you are at the point where you realize it's important,
I'd think you'd be the kind of entity to have the technical capacity to
implement it yourself locally too, and then have true local control over the
'permanence' of identifiers, not have to rely on a third party continuing to
be maintained.

