> non-hierarchy-revealing URLs that must be resolved using a database they control
What do you mean? There is a journal prefix (Nature has 10.1038) and the link usually translates fairly directly to a URL for new articles.
EDIT: Just looked it up in the DOI rules. The registrant chooses the suffix. So the journals are free to maintain 1:1 mapping to URLs if they wish. I assume DOI is popular because people trust that the central entity might do a bit better job with regards to link rot than the journals themselves.
To take your point to the next step, how do two otherwise unrelated journals (of which there are tens of thousands) link between each other and solve link rot? Answer: shared open non profit metadata infrastructure.
Theoretically, the journals could publish a canonical URL for each article. But apparently organizations are not good at publishing and maintaining such URLs. Especially in case of mergers and rebrandings.
That canonical URL is literally what the DOI system is. Publishers getting together to agree on a shared identifier system, regardless of each publisher's business model.
The resolvability of a DOI is obviously important, and you have found a few examples where the steward of the content (ie publisher) hasn’t updated the links. But in that case DOIs can in theory be redirected to archived versions. That's the price you pay for a large, diverse community of publishers. Don't imagine that publishers would magically be any better at fixing their own websites.
But more broadly, as identifiers, having an agreed scheme helps link publications, datasets, other entities such as funders and institutions. It allows discovery of metadata in a way that’s discoverable by all.
Someone will want to follow those links in 100 years and whatever state DNS or WWW is in, the metadata will be in an open archive somewhere.
Taking you up on your AMA, is there a best practice for discovering datasets belonging to articles and vice-versa? I see that Web of Science has a lot of this but they don't make that linkage explicitly available as an export. I have tried querying isSupplementTo and chasing cited works, and I don't expect a smoking gun.
Just now I tried searching Scholix via the web interface for IDs that are linked in Dryad, which is supposed to use the same linking system, and in either direction came up empty.
Scholix[0] is a collaboration between a few orgs, including Crossref (DOI registration agency for scholarly content, e.g. articles) and DataCite (DOI registration agency for data sets).
If a DataCite member registers a data set, and mentions a link to a Crossref DOI, that's almost certainly a dataset-article link. Vice versa.
So the data in Scholix is the union of "what citations do article publishers and dataset publishers think exist".
We're working on improving how we process the data citation data, but ultimatley can't improve on what Publishers provide.
Journals, even prominent ones, change names for all sorts of reasons. It doesn't happen often but it does happen. The meaningfulness issue can cut both ways.
I think the real problem is the centralized vs federated vs distributed nature of it. IPFS is a good example of how that could have looked; not sure if it could be moved into that space somehow (I'm sure it could in theory, but in practice?)
> I assume DOI is popular because people trust that the central entity might do a bit better job
First, I don’t know how popular it is.
Second, look at what an absolutely poor job the doi.org site themselves have done keeping their showcased example links from their demonstration document from breaking over the years. See my comment elsewhere here about what I found on archive.org. What am I missing here? It looks like their link rot is close to 100%.
Very. The doi is now the unique key to identify a scientific article.
Researchers like it because it is an easy way to get canonical metadata without having to go through Google Scholar or (dog forbid) something clunky like Science Direct. You just put the doi and bang, you’ve got the article. No issue with a typo in the volume number, or authors who can go by different names (or the same name written differently), or anything like that. It has make managing bibliography databases massively easier.
Editors like it because it’s a very efficient way of checking the information in the bibliographies of submitted manuscripts.
I have no clue about their examples, but in the real life I have never had a link rot issue in ~10 years handling thousands of references.
Easily fetching the canonical metadata is so useful when doing a literature search and collecting sources, or trying to put together a bibliography. In Zotero (and presumably other bibliography managers), you can just plug the DOI in and you instantly have the title, authors, abstract, etc.
What do you mean? There is a journal prefix (Nature has 10.1038) and the link usually translates fairly directly to a URL for new articles.
EDIT: Just looked it up in the DOI rules. The registrant chooses the suffix. So the journals are free to maintain 1:1 mapping to URLs if they wish. I assume DOI is popular because people trust that the central entity might do a bit better job with regards to link rot than the journals themselves.