Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> non-hierarchy-revealing URLs that must be resolved using a database they control

What do you mean? There is a journal prefix (Nature has 10.1038) and the link usually translates fairly directly to a URL for new articles.

EDIT: Just looked it up in the DOI rules. The registrant chooses the suffix. So the journals are free to maintain 1:1 mapping to URLs if they wish. I assume DOI is popular because people trust that the central entity might do a bit better job with regards to link rot than the journals themselves.



To take your point to the next step, how do two otherwise unrelated journals (of which there are tens of thousands) link between each other and solve link rot? Answer: shared open non profit metadata infrastructure.

Disclaimer - I’m at Crossref AMA.


Theoretically, the journals could publish a canonical URL for each article. But apparently organizations are not good at publishing and maintaining such URLs. Especially in case of mergers and rebrandings.


That canonical URL is literally what the DOI system is. Publishers getting together to agree on a shared identifier system, regardless of each publisher's business model.


Compare the readability and editability of:

nature.com/2021/11/17/news/

versus:

10.1038.123.366.345643

And you ask what I mean. What I mean is the first one is transparently better in several ways.

If I got the syntax wrong, just focus on the first part then, nature.com versus 10.1083. One is more readable than the other.


Neither your URL nor your DOI resolves. I meant rather the correspondence of

    https://www.nature.com/articles/1771046b0

    to

    10.1038/1771046b0
or

    https://www.nature.com/articles/s41586-021-04114-w
    
    to
    
    10.1038/s41586-021-04114-w


Look at the metadata of those two articles.

    https://www.nature.com/articles/1771046b0
    has
    <meta name="dc.identifier" content="doi:10.1038/1771046b0"/>
and

    https://www.nature.com/articles/s41586-021-04114-w
    has
    <meta name="dc.identifier" content="doi:10.1038/s41586-021-04114-w"/>


They aren’t meant to resolve, but to show the difference between friendly and unfriendly ways to construct a URL (or URI if you prefer).

But yes that’s interesting and good to know that sometimes they can choose to use corresponding components in the URL versus DOI.


>... Compare the readability and editability

Indeed, I was wondering why not to adopt an approach similar to a Git repo where each leaf (document) is identified with a ref hash.

It could be used with a publisher prefix to segment the hash-space.

Basically, it's a repository of metadata. Each publisher could equally host the whole repo, but extend only their corresponding sub-repo.

The central entity could then maintain the publisher's name-to-id convenience xref and some search facility.


> 10.1038/nature.com/2021/11/17/news/xxxxxxx

can be a valid DOI, 10.1038 has the same role as the nature.com domain but with a different governance


That’s why the display guidelines stipulate that DOIs are always expressed as URLs.


Sure, but 10.1083 is still a shitty link compared to nature.com, independent of which url prefix is used.


There are different timescales involved.

The resolvability of a DOI is obviously important, and you have found a few examples where the steward of the content (ie publisher) hasn’t updated the links. But in that case DOIs can in theory be redirected to archived versions. That's the price you pay for a large, diverse community of publishers. Don't imagine that publishers would magically be any better at fixing their own websites.

But more broadly, as identifiers, having an agreed scheme helps link publications, datasets, other entities such as funders and institutions. It allows discovery of metadata in a way that’s discoverable by all.

Someone will want to follow those links in 100 years and whatever state DNS or WWW is in, the metadata will be in an open archive somewhere.

(I'm at Crossref, AMA)


Taking you up on your AMA, is there a best practice for discovering datasets belonging to articles and vice-versa? I see that Web of Science has a lot of this but they don't make that linkage explicitly available as an export. I have tried querying isSupplementTo and chasing cited works, and I don't expect a smoking gun.

Just now I tried searching Scholix via the web interface for IDs that are linked in Dryad, which is supposed to use the same linking system, and in either direction came up empty.

TIA for any clues!


Scholix[0] is a collaboration between a few orgs, including Crossref (DOI registration agency for scholarly content, e.g. articles) and DataCite (DOI registration agency for data sets).

If a DataCite member registers a data set, and mentions a link to a Crossref DOI, that's almost certainly a dataset-article link. Vice versa.

So the data in Scholix is the union of "what citations do article publishers and dataset publishers think exist".

We're working on improving how we process the data citation data, but ultimatley can't improve on what Publishers provide.

Sorry, a bit of a non-answer.

[0] https://www.scholix.org/


The https link doesn't work for me, working link: http://www.scholix.org/


Journals, even prominent ones, change names for all sorts of reasons. It doesn't happen often but it does happen. The meaningfulness issue can cut both ways.

I think the real problem is the centralized vs federated vs distributed nature of it. IPFS is a good example of how that could have looked; not sure if it could be moved into that space somehow (I'm sure it could in theory, but in practice?)


Dois are not supposed to replace urls. They serve very different purposes. A doi is a unique identifier for a scientific article, that’s all it is.


The sole purpose of DOIs is that they never change. The moment Nature decides to update their CMS, the first link is going to 404.


As is the second link, as many DOIs have.


> I assume DOI is popular because people trust that the central entity might do a bit better job

First, I don’t know how popular it is.

Second, look at what an absolutely poor job the doi.org site themselves have done keeping their showcased example links from their demonstration document from breaking over the years. See my comment elsewhere here about what I found on archive.org. What am I missing here? It looks like their link rot is close to 100%.


> First, I don’t know how popular it is.

Very. The doi is now the unique key to identify a scientific article.

Researchers like it because it is an easy way to get canonical metadata without having to go through Google Scholar or (dog forbid) something clunky like Science Direct. You just put the doi and bang, you’ve got the article. No issue with a typo in the volume number, or authors who can go by different names (or the same name written differently), or anything like that. It has make managing bibliography databases massively easier.

Editors like it because it’s a very efficient way of checking the information in the bibliographies of submitted manuscripts.

I have no clue about their examples, but in the real life I have never had a link rot issue in ~10 years handling thousands of references.


Easily fetching the canonical metadata is so useful when doing a literature search and collecting sources, or trying to put together a bibliography. In Zotero (and presumably other bibliography managers), you can just plug the DOI in and you instantly have the title, authors, abstract, etc.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: