Hacker News new | past | comments | ask | show | jobs | submit login

I can understand why a publishing house would find the DOI to be useful. I can understand why a publishing house would want to encourage people to use a DOI.

I do not understand why there are rules against citing DOI-less content, as that would seem to place the needs of the publishing house far above the needs for good scholarship.

Elsewhere in this discussion I gave an example of a citation without a DOI from http://trafficways.org/ascii/ascii.pdf :

> Thomas E. Kurtz, letter to Secretary, X3, December 21, 1965, Calvin N. Mooers Papers (CBI 81), Charles Babbage Institute, University of Minnesota, Minneapolis, box 20, folder 1.

That paper has dozens of DOI-less references, like meeting minutes and memos, which are stored in archives like the CBI or the Smithsonian.

How does a paper like that get published in a journal which has rules against citing DOI-less content?




> I do not understand why there are rules against citing DOI-less content, as that would seem to place the needs of the publishing house far above the needs for good scholarship.

Absolutely. Welcome to the industry!

But the publisher's concerns are not without merit. Your cited reference is useless if nobody can find it in a couple years. I'd argue that any article which does not take steps to preserve the availability and connectivity of its sources is neglecting a critical component of "good scholarship", even though this often has nothing to do with the authors or research methodology. (There are, of course, countless justifiable exceptions to this.)

There are creative solutions though. A good journal would take that PDF in your comment, get permission from the author, and package it up with the article as "supplementary data" or some other auxiliary material. A good publisher will find ways to otherwise publish critical outside materials.

We are stuck at an awkward time in digital scholarship. In the old days, you could cite any published material because everything was on paper, and everything was distributed to a thousand libraries, and you could reasonably expect the availability of everything, and there was no expectation that everything would be available instantly. You can't put any of that confidence in naked web pages; URLs are fatally unreliable for this purpose. DOIs, and confidence in the tech brokers that use them, help. And with time, as more and different types of online content become more reliable, it will become easier to cite anything like the old days.


Welcome to the world of scholarship!

I thought I want out of my way to say they publisher's concerns have merit, so feel somewhat insulted by your bringing it up as you did.

> Your cited reference is useless if nobody can find it in a couple years.

Nonsense. I told you exactly where to find it. It's in a box in the archives at the Charles Babbage Institute. Even more, I just visited that archive last month, and if I didn't look through that box I looked through its neighbor boxs. You can also request a copy for yourself. The CBI charges $0.25/page.

Do you believe that the Smithsonian and other archives are not a valid part of "preserv[ing] the availability and connectivity of its sources"? Not only did I pay a copy fee, but I gave extra money to the CBI as a gift - does that not count?

> ... if nobody can find it in a couple years

You speak from the point of view of a publishing company.

Many organizations publish besides publishing companies. IBM has (or had) their own corporate journals, both in-house and public. One of the common reference in my field is: Tanimoto, T. (17 Nov 1958). "An Elementary Mathematical theory of Classification and Prediction". Internal IBM Technical Report 1957. That's certainly not a peer-reviewed journal!

Google Scholar says it knows of 222 citations to this report. I have a copy that I ordered from the Niedersächsische Staats- und Universitätsbibliothek Göttingen. Up until I changed it a few weeks ago, the relevant Wikipedia entry said the report was 'unavailable'. https://en.wikipedia.org/w/index.php?title=Jaccard_index&typ... ). Yes, I wonder how many of the authors of those 222+ papers actually read the original paper

If there were DOIs in the 1950s, do you think this document would be more findable now? Perhaps with IBM, yes. But there are many small companies which also publish their own papers as their own press. (I can provide examples if needed.)

So if my company starts its own press, publishes papers with its own DOIs, then goes under, shuts down the servers, and doesn't bother to transfer copyright to a new provider ... just how will anyone be able to resolve those DOIs and find those documents in a couple years?

> A good journal would take that PDF in your comment, get permission from the author, and package it up with the article as "supplementary data"

Please, yes! I would love all of the journals to take charge of digitizing the world's archival data and make it more accessible! Many of the documents I looked at were untitled, and in cursive, and I can't begin to estimate the cost of indexing all of those properly.

Which journals do this, so I can publish in them? ... Or are there no good journals?

To be more specific, I want to cite pages 61-63 of the Work Book I for Calvin N. Mooers, located in box 13 folder 4 of archive CBI 81. This 1947 entry is a precursor to his 1951 Zatopleg paper and show that there's no connection to Wheland's 1949 connection matrix proposal. The sketch on p62 shows that he thought it had practical application. The double counter-signature on p63 shows that Mooers likely considered this a patentable idea. For a historian of science, this is quite useful.

Will the journal arrange the scans and the copyright licensing details for me? What happens if they and the archive can't come to an agreement on the copyright terms?

Will they scan only those pages, or the entire notebook? (Since there's more I'll want to cite in the future, it makes sense that they make a bulk request.)

If they scan the entire work book, will they make sure to remove the Social Security numbers of living people and other potentially sensitive information?

How much will this service cost me, vs. a standard fair-use quote that nearly all journals now would accept?

However, that doesn't help with the citation I gave, which is to a letter by Thomas E. Kurtz which is in the CBI archive for Mooers' papers.

The CBI archived Mooers' papers. These include works by Mooers and works by others. They control the copyright to Mooers' own works, but not those of the others. They do not have the ability to let people use Kurtz's letter beyond the research purposes allowed by fair use.

Which means no journal in the world (or at least those parts signatory to the Berne Convention) can do as you suggest. Not even good ones.

Should a paper not be published unless it's possible to do what you want, regarding DOI and some incompletely specified definition of what counts as sufficient permanence?

> We are stuck at an awkward time in digital scholarship

All interesting points to consider, but when haven't we been at an awkward time in publishing for the industry and for scholarship? I mean, I read a paper recently from the 1930s concerning the question of scientific priority when the paper is published only on microfilm. It asked questions like, does a microfilm-only publication really count as a publication?

At https://www.asis.org/Bulletin/Bulletin-50thAnniversary-Watso... you can see the founder of one such microfilm service. People could send in camera-ready auxiliary publications to be microfilmed and published on demand by those who want it. If someone sent in a document, which was listed on the ADI list of available documents, does that count as being published?

Replace "auxiliary publication" with "blog" and you'll see it's pretty much the same debate.


>> We are stuck at an awkward time in digital scholarship

I'll add to that. It will still be an awkward time in 20-40 years. Let's say that "Professor X" is a pioneer in a field, and her computer, which has her last 40 years of activity (email, source code, draft documents, downloads, etc.) on it is placed in an archive.

Now I, as an historian of science, go through that blob and figure out the timeline for some event, which is based on recorded IRC sessions, git commit messages, etc.

How do I cite those individual locations in the archival data?




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: