
Now I am become DOI, destroyer of gatekeeping worlds - jmnicholson
https://thewinnower.com/papers/now-i-am-become-doi-destroyer-of-gatekeeping-worlds
======
dalke
> The injunction against citing DOI-less documents is unfortunate, because
> people deserve to get credit for the interesting things they say–and it
> turns out that they have, on rare occasion, been known to say interesting
> things in formats other than the traditional peer-reviewed journal article.

Ain't that the truth. There's a paper that came out a few years ago which said
that new method B is 3x faster than old method A. That's well and good. But
there were two widely-used free software projects which had done the same
thing, blogged about it, and for one case had the old code still available as
a #define option. While the published article was based on a proprietary in-
house package.

I pointed the two previous reports to one of the authors, who had heard of
one, but because it wasn't an academic paper, it wasn't worth citing. Grr!
Because free software developers really have the time and money to write a
paper for the $1,000 per pop open access journal in the field.

I've been reading old literature, from the 1980s and older. There used to be a
"Letters to the Editor" section, which was sometimes used to point out or
correct these sorts of issues. No more. That same open access journal says my
only options are to write a comment on their comments section (which unlike an
old Letter to the Editor, no one reads, which is not indexed, and is not
citable), or submit an entirely new article, at $1,000.

The other thing a requirement for a DOI does it prevent citing the older
literature, which doesn't always have a DOI.

> [Quoting Neuron's author guidelines] "References should include only
> articles that are published or in press. For references to in press
> articles, please confirm with the cited journal that the article is in fact
> accepted and in press and include a DOI number and online publication date.
> Unpublished data, submitted manuscripts, abstracts, and personal
> communications should be cited within the text only."

This would make certain publications impossible. For example, in my historical
research, I tracked down the paper "Ciphering Structural Formulas – the
Zatopleg System", Zator Technical Bulletin No. 59 (1951). This was cited by
many chemical information papers in the 1950s and 1960s, and in some patents.
However, the only copy I could find was in the Mooers archive at the Charles
Babbage Institute. (Mooers founded Zator.) The archive controls the copyright,
and lets people make copies for research purposes, so long as the
archive/box/folder information is included in the citation.

Which is why if you look at a paper like
[http://trafficways.org/ascii/ascii.pdf](http://trafficways.org/ascii/ascii.pdf)
it has citations like:

> Thomas E. Kurtz, letter to Secretary, X3, December 21, 1965, Calvin N.
> Mooers Papers (CBI 81), Charles Babbage Institute, University of Minnesota,
> Minneapolis, box 20, folder 1.

Try doing that "within the text only".

~~~
greglindahl
My PhD advisor had a paper which was rejected by a few major journals, so he
decided to keep it as a preprint and see how many citations it could acquire.
It has hundreds now.

These days he'd stick it on arxiv. Arxiv doesn't appear to issue DOIs, you're
supposed to add the DOI from a real journal after the preprint gets published.

------
JBiserkov
[http://zenodo.org](http://zenodo.org) will issue a free DOIs for any GitHub
release you tell it to. :)

See [https://guides.github.com/activities/citable-
code](https://guides.github.com/activities/citable-code)

~~~
techdragon
I was about to suggest the building of this very sort of thing. Glad to know
it already exists.

~~~
IanCal
There's also figshare.com where you can publish data, papers or whatever
really. I published code there so my wife could cite it and we do public data
releases on there too.

Disclaimer: I work for digital science, a parent company.

------
verisimilidude
Rules against citing DOI-less content are easier to understand once you
understand a bit more about life within an academic publishing house.

An article with a DOI has more longevity. This is mentioned in the post, but I
don't think the author fully grasps the importance of this. Plain URLs break
down all the damn time; publishers/journals redesign their sites and migrate
between technology partners often. It only takes a few Very Important People
to flip out about a missing piece of data or a broken link before a publisher
will start to put safeguards into the production workflow to prevent such
complaints. Mandating DOIs alongside citations is an easy policy toward that
goal. An article with a DOI will almost surely be supported by a technology
back-end that can keep that DOI up-to-date, generate manifests for (C)LOCKSS,
submit materials to archival services like Portico, etc.

DOI-bound citations are much more efficient throughout the publishing process.
Editors can use DOIs to quickly verify and proof citations. If a publisher's
workflow requires sending papers to an XML composition vendor for tagging,
DOIs can make that process more automated and accurate. Some of the more
sophisticated XML vendors can generate all the citations given only the DOIs.
This might not sounds like much, but it's notoriously easy to make mistakes on
citations throughout the editorial process, even with NLP and other more
enlightened software in the mix. If you ever see an article with a citation
list like this...

* Suzuki, et al. 2004...

* Suzuki, et al. 2006a...

* Suzuki, et al. 2006b...

* Suzuki, et al. 2006d...

* Suzuki, et al. 2006f...

* Suzuki, et al. 2006g...

* Suzuki, et al. 2007...

* Suzuki, et al. 2008b...

...think about all the double/triple/quadruple checking bullshit that goes
into making sure all the text and links in that list are accurate, and know
that DOIs take most of that headache away. It's substantially more efficient
at scale when DOIs are included.

Recently I've been thinking about how some of these problems could potentially
be solved via new ideas like IPFS. A Handle-based system like DOI might not be
as necessary in IPFS (given IPNS), and perhaps you could start creating and
citing more diverse sources of content with some confidence in
preservation/longevity. It could be a great fit for open access content.

~~~
dalke
I can understand why a publishing house would find the DOI to be useful. I can
understand why a publishing house would want to encourage people to use a DOI.

I do not understand why there are rules against citing DOI-less content, as
that would seem to place the needs of the publishing house far above the needs
for good scholarship.

Elsewhere in this discussion I gave an example of a citation without a DOI
from
[http://trafficways.org/ascii/ascii.pdf](http://trafficways.org/ascii/ascii.pdf)
:

> Thomas E. Kurtz, letter to Secretary, X3, December 21, 1965, Calvin N.
> Mooers Papers (CBI 81), Charles Babbage Institute, University of Minnesota,
> Minneapolis, box 20, folder 1.

That paper has dozens of DOI-less references, like meeting minutes and memos,
which are stored in archives like the CBI or the Smithsonian.

How does a paper like that get published in a journal which has rules against
citing DOI-less content?

~~~
verisimilidude
> I do not understand why there are rules against citing DOI-less content, as
> that would seem to place the needs of the publishing house far above the
> needs for good scholarship.

Absolutely. Welcome to the industry!

But the publisher's concerns are not without merit. Your cited reference is
useless if nobody can find it in a couple years. I'd argue that any article
which does not take steps to preserve the availability and connectivity of its
sources is neglecting a critical component of "good scholarship", even though
this often has nothing to do with the authors or research methodology. (There
are, of course, countless justifiable exceptions to this.)

There are creative solutions though. A good journal would take that PDF in
your comment, get permission from the author, and package it up with the
article as "supplementary data" or some other auxiliary material. A good
publisher will find ways to otherwise publish critical outside materials.

We are stuck at an awkward time in digital scholarship. In the old days, you
could cite any published material because everything was on paper, and
everything was distributed to a thousand libraries, and you could reasonably
expect the availability of everything, and there was no expectation that
everything would be available instantly. You can't put any of that confidence
in naked web pages; URLs are fatally unreliable for this purpose. DOIs, and
confidence in the tech brokers that use them, help. And with time, as more and
different types of online content become more reliable, it will become easier
to cite anything like the old days.

~~~
dalke
Welcome to the world of scholarship!

I thought I want out of my way to say they publisher's concerns have merit, so
feel somewhat insulted by your bringing it up as you did.

> Your cited reference is useless if nobody can find it in a couple years.

Nonsense. I told you exactly where to find it. It's in a box in the archives
at the Charles Babbage Institute. Even more, I just visited that archive last
month, and if I didn't look through that box I looked through its neighbor
boxs. You can also request a copy for yourself. The CBI charges $0.25/page.

Do you believe that the Smithsonian and other archives are not a valid part of
"preserv[ing] the availability and connectivity of its sources"? Not only did
I pay a copy fee, but I gave extra money to the CBI as a gift - does that not
count?

> ... if nobody can find it in a couple years

You speak from the point of view of a publishing company.

Many organizations publish besides publishing companies. IBM has (or had)
their own corporate journals, both in-house and public. One of the common
reference in my field is: Tanimoto, T. (17 Nov 1958). "An Elementary
Mathematical theory of Classification and Prediction". Internal IBM Technical
Report 1957. That's certainly not a peer-reviewed journal!

Google Scholar says it knows of 222 citations to this report. I have a copy
that I ordered from the Niedersächsische Staats- und Universitätsbibliothek
Göttingen. Up until I changed it a few weeks ago, the relevant Wikipedia
entry said the report was 'unavailable'.
[https://en.wikipedia.org/w/index.php?title=Jaccard_index&typ...](https://en.wikipedia.org/w/index.php?title=Jaccard_index&type=revision&diff=704793261&oldid=688763411)
). Yes, I wonder how many of the authors of those 222+ papers actually read
the original paper

If there were DOIs in the 1950s, do you think this document would be more
findable now? Perhaps with IBM, yes. But there are many small companies which
also publish their own papers as their own press. (I can provide examples if
needed.)

So if my company starts its own press, publishes papers with its own DOIs,
then goes under, shuts down the servers, and doesn't bother to transfer
copyright to a new provider ... just how will anyone be able to resolve those
DOIs and find those documents in a couple years?

> A good journal would take that PDF in your comment, get permission from the
> author, and package it up with the article as "supplementary data"

Please, yes! I would love all of the journals to take charge of digitizing the
world's archival data and make it more accessible! Many of the documents I
looked at were untitled, and in cursive, and I can't begin to estimate the
cost of indexing all of those properly.

Which journals do this, so I can publish in them? ... Or are there no good
journals?

To be more specific, I want to cite pages 61-63 of the Work Book I for Calvin
N. Mooers, located in box 13 folder 4 of archive CBI 81. This 1947 entry is a
precursor to his 1951 Zatopleg paper and show that there's no connection to
Wheland's 1949 connection matrix proposal. The sketch on p62 shows that he
thought it had practical application. The double counter-signature on p63
shows that Mooers likely considered this a patentable idea. For a historian of
science, this is quite useful.

Will the journal arrange the scans and the copyright licensing details for me?
What happens if they and the archive can't come to an agreement on the
copyright terms?

Will they scan only those pages, or the entire notebook? (Since there's more
I'll want to cite in the future, it makes sense that they make a bulk
request.)

If they scan the entire work book, will they make sure to remove the Social
Security numbers of living people and other potentially sensitive information?

How much will this service cost me, vs. a standard fair-use quote that nearly
all journals now would accept?

However, _that doesn 't help with the citation I gave_, which is to a letter
by Thomas E. Kurtz which is in the CBI archive for Mooers' papers.

The CBI archived Mooers' papers. These include works by Mooers and works by
others. They control the copyright to Mooers' own works, but not those of the
others. They do not have the ability to let people use Kurtz's letter beyond
the research purposes allowed by fair use.

Which means no journal in the world (or at least those parts signatory to the
Berne Convention) can do as you suggest. Not even good ones.

Should a paper not be published unless it's possible to do what you want,
regarding DOI and some incompletely specified definition of what counts as
sufficient permanence?

> We are stuck at an awkward time in digital scholarship

All interesting points to consider, but when haven't we been at an awkward
time in publishing for the industry and for scholarship? I mean, I read a
paper recently from the 1930s concerning the question of scientific priority
when the paper is published only on microfilm. It asked questions like, does a
microfilm-only publication really count as a publication?

At [https://www.asis.org/Bulletin/Bulletin-50thAnniversary-
Watso...](https://www.asis.org/Bulletin/Bulletin-50thAnniversary-
Watson_Davis.pdf) you can see the founder of one such microfilm service.
People could send in camera-ready auxiliary publications to be microfilmed and
published on demand by those who want it. If someone sent in a document, which
was listed on the ADI list of available documents, does that count as being
published?

Replace "auxiliary publication" with "blog" and you'll see it's pretty much
the same debate.

~~~
dalke
>> We are stuck at an awkward time in digital scholarship

I'll add to that. It will still be an awkward time in 20-40 years. Let's say
that "Professor X" is a pioneer in a field, and her computer, which has her
last 40 years of activity (email, source code, draft documents, downloads,
etc.) on it is placed in an archive.

Now I, as an historian of science, go through that blob and figure out the
timeline for some event, which is based on recorded IRC sessions, git commit
messages, etc.

How do I cite those individual locations in the archival data?

------
IanCal
I see there being two main arguments, the first we agree on, permanence.
That's really important when it comes to looking things up.

The second however is not quality. I really don't understand that point at
all. The reason it all sounds so ridiculous as they continue to describe it
is, I think, because it doesn't make sense. Anyone can get a doi for anything
they want really.

No, the second thing is for analysis. DOIs mean you can automatically resolve
papers and data. They're unique and unambiguous and easy to detect.
Handwritten ones are not always possible to figure out even with a human and
certainly not automatically at 100%.

Imagine trying to build something like Google when pages don't have URLs but
just describe how a person might go and find what they're looking for
manually. That's what people are trying to deal with now.

~~~
dalke
I think you interpreted the essay as an argument for or against DOIs, which it
isn't.

The two arguments in the essay deal with the sentence: "I think there are two
main reasons for the disdain many people seem to feel at the thought of
allowing authors to freely cite DOI-less objects in academic papers."

As you say, "Anyone can get a doi for anything they want really." This essay
describes a services for doing exactly that.

The essay argues why "quality" is not relevant, which I think you agree with.

Switching topics, you write that DOIs are "unique and unambiguous and easy to
detect."

They are _supposed_ to be unambiguous. In practice, there are errors. Consider
10.1021/c160038a601 available from
[http://pubs.acs.org/doi/abs/10.1021/c160038a601](http://pubs.acs.org/doi/abs/10.1021/c160038a601)
. It's a letter to the editor titled "A Utility Analysis for the MCC
Topological Screen System" and supposedly by F. Bruce Sanford. If you look at
the page you'll see there are two letters to the editor. Hamilton is the
actual author of that letter, while Sanford is the author of a different
letter to the editor.

There is also _supposed_ to be unique. However, given that "Anyone can get a
doi for anything they want really" this means someone can use multiple DOI
services to get multiple DOIs for the same thing.

------
haddr
"The right perspective, I would argue, is to embrace the benefits of
technology and seek out new evaluation models that emphasize open,
collaborative review by the community as a whole instead of closed pro forma
review by two or three semi-randomly selected experts."

Can't agree more with that. The problem is that it's hard to change the system
that has already been built around gaming this status quo.

Maybe some sort of blockchain for publications & peer reviews that could
decentralize and break with current world ruled by few publishers? Anyway I
have a feeling that any system here could be gamed too...

------
obelisk_
Worth reading. Interesting perspective and useful information, conveyed in
good writing.

I think having the feedback at the bottom of the page say reviews instead of
comments was a nice touch.

------
woodandsteel
This is a perfect job for ipfs.

