
Dozens of scientific journals have vanished from the internet - bookofjoe
https://www.sciencemag.org/news/2020/09/dozens-scientific-journals-have-vanished-internet-and-no-one-preserved-them
======
bnewbold
At the Internet Archive, we are working on this exact problem, and have been
in communication with the pre-print's authors. We have built open
infrastructure (open source, open data) tracking "preservation coverage", for
example:

[https://fatcat.wiki/coverage/search?q=is_oa%3Atrue+year%3A%3...](https://fatcat.wiki/coverage/search?q=is_oa%3Atrue+year%3A%3E1945+year%3A%3C%3D2019+%28type%3Aarticle-
journal+OR+type%3Aarticle+OR+type%3Apaper-conference%29)

and are working to improve crawling. There is a "save paper now" feature, as
well as an API for bots. Organizations like DOAJ, ISSN, DOI registrars
(Crossref, Datacite, others) are crucial for this. In the broader ecosystem,
we hope this can complement existing efforts that partner with large
publishers (like LOCKSS, Portico, JSTOR) and institutional repositories. A
natural niche for us is web-native (HTML) content, which we have crawled a lot
of but are just getting started to index. For example, publications like
d-lib, first monday, and distill.pub.

If folks want to help, it would be great to have a "youtube-dl for open access
papers". There is a lot of content on large platforms and publishers which
have anti-crawling measures (even for gold OA and hybrid content!), as well as
a long tail of small publishers that don't use simple/common mechanisms like
OAI-PMH and the `citation_pdf_url` HTML meta tag to identify fulltext content.
The OAI-PMH ecosystem sadly is not very complete or helpful for the use case
of mirroring.

~~~
cxr
> If folks want to help, it would be great to have a "youtube-dl for open
> access papers".

Zotero has an existing set of "translators".

And somewhat related to this request: I scratched out some notes last year
about how to get more out of "zero-obligation communities" (like the pool of
prospective contributors in open source)
<[https://www.colbyrussell.com/2019/02/15/what-happened-in-
jan...](https://www.colbyrussell.com/2019/02/15/what-happened-in-
january.html#underdeveloped>). The long and short of it is that instead of
saying something like "if folks want to help, it would be great[...]", you
should provide a place for people to sign up, take them at their word that
they're willing to help, and then lay out a concrete set of
tasks/deliverables. People get weird about trying to avoid being seen as not
gentle enough with volunteers, but the end result is a lot of unharnessed
human potential. You've got a pool of mechanical turks at your disposal. Take
a break from polishing the arrangement of instructions you give to the
computer and focus some energy on writing the "programs" that you want to be
executed by meatbags.

~~~
garfieldnate
Wintergatan's Martin recently transitioned a lot of the work for his new
marble machine to volunteers, and he essentially followed this pattern,
casting himself into the product owner role to harness the power of
volunteers. It seems to be working pretty well!

------
guerby
In France we have HAL:

[https://en.wikipedia.org/wiki/Hyper_Articles_en_Ligne](https://en.wikipedia.org/wiki/Hyper_Articles_en_Ligne)

"Hyper Articles en Ligne, generally shortened to HAL, is an open archive where
authors can deposit scholarly documents from all academic fields."

I work at a university and I know library people and management check
carefully that every paper we produce is deposited in HAL.

Also:

[https://fr.wikipedia.org/wiki/Hyper_articles_en_ligne](https://fr.wikipedia.org/wiki/Hyper_articles_en_ligne)

"Depuis le 25 septembre 2018, les dépôts de logiciels sur HAL sont connectés à
Software Heritage"

[https://en.wikipedia.org/wiki/Software_Heritage](https://en.wikipedia.org/wiki/Software_Heritage)

For recruiting some french institutions like CNRS will only consider papers
deposited in HAL when doing the evaluation.

~~~
Thlom
In Norway any publication needs to be deposited to the national archive by
law. That includes scientific journals and in theory even small publications
distributed in a private setting if it's a big enough group of people (I'm not
sure of the details). Not sure how it works for scientific work published in
foreign publications, but I assume it's sent to the national archive as
routine.

However, most of the archive is not publicly accessible due to copyright,
privacy etc. You can request access to specific content both as a private
person and as a researcher.

~~~
acomjean
the US for medical/biology papers "The National Center for Biotechnology
Information" NCBI. They store papers/ abstracts in a service called pubmed,

[https://pubmed.ncbi.nlm.nih.gov](https://pubmed.ncbi.nlm.nih.gov)

It works pretty well. Papers are submitted. The get a unique id. Some are
stored and accessible. (Some US funding sources require public accessible
papers.).

NCBI has a ton of information. Its a pretty awesome resource.
[https://www.ncbi.nlm.nih.gov](https://www.ncbi.nlm.nih.gov)

They even index the paper submitted with a controlled vocabulary of terms
(within a month or two)
[https://www.ncbi.nlm.nih.gov/mesh/](https://www.ncbi.nlm.nih.gov/mesh/)

~~~
mattkrause
Nearly _all_ federally-funded (and unclassified) research needs to be publicly
accessible, as per a 2013 policy memo.

------
homarp
The article does not discuss sci-hub unfortunately.

list of the 176 vanished is here:
[https://github.com/njahn82/vanished_journals/blob/master/Dis...](https://github.com/njahn82/vanished_journals/blob/master/Dissapeared%20OA%20Journals.xlsx)

~~~
waynecochran
What would be really useful is to know the average citation count of these
journals.

As someone who hates to see this stuff disappear, there is still a cynical
person inside me that knows there are a glut of journals that are often used
to bump publishing count for professors trying to get tenure.

That cynic inside of me also realizes that a subset of the journal business is
a bit of a scam anyway since frequently authors have to pay the journals to
include their paper and then the journals charge an exorbitant rate to get
access. Have you tried to buy a journal article lately ($35 for one paper!) --
yeah, neither have I.

~~~
jszymborski
It's worth noting that citation counts are an increasingly poor metric of
paper quality (and always has been).

There are multiple works to show that the rise of search engine like Google
Scholar have meant that researchers are increasingly citing the same papers,
because their searches are all returning the same thing.

Meanwhile, there are some "sleeper" papers that are super relevant to a lot of
works, offer great insight, but by virtue of their low search ranking, never
get cited.

That's not to say there isn't a fair amount of unremarkable research. It's
just that it doesn't always correlate with citation count.

~~~
umlautae
To possibly find “sleeper” papers check out the “show similar” feature of this
arXiv mirror on condensed matter physics. [https://cond-
mat.abbrivia.com](https://cond-mat.abbrivia.com) Search for some keywords and
then surf by “show similar” on relevant articles.

------
amirkdv
I think there is a case to be made for a kind of "public utility"
infrastructure for the distribution and storage of scholarly work given how

1\. cheap it is, considering the size of the institutions that produce and
benefit from them.

2\. absurdly broken the private publishing industry has become.

~~~
random_visitor
That exactly what Library Genesis and Sci-Hub are. Assuming, you aren't
expecting this public utility to be 100% lawful, since the other parties
involved here (universities for instance) don't seem too keen on the idea of a
having their work circulate openly.

~~~
marcosdumay
> other parties involved here (universities for instance) don't seem too keen
> on the idea of a having their work circulate openly

Hum... What?

Universities at worst don't care. Most really want they work circulating and
will do a lot of things to get it (many useless things that miss the point,
but well, that's how people are).

Universities could push it harder. But they are surely pushing on the correct
direction.

~~~
Lammy
> Universities at worst don’t care.

I wonder how aaronsw would feel about this statement.

~~~
clankyclanker
Didn’t the university and publisher both request that the case be dropped?
Wasn’t the DA the only one pushing for a conviction?

~~~
Lammy
[http://swartz-report.mit.edu/docs/report-to-the-
president.pd...](http://swartz-report.mit.edu/docs/report-to-the-
president.pdf)

"Very early in this post-arrest period, MIT decided to “remain neutral,” as
between the government and Aaron Swartz, in the investigation and eventual
prosecution. Initially this meant simply that MIT would not take a public
position on the prosecution.Throughout the following (almost) two years, MIT’s
decisions were mostly guided by this posture of neutrality."

"With regard to substance, MIT would make no statements, whether in support or
in opposition, about the government’s decision to prosecute Aaron Swartz, the
government’s decisions about charges in an indictment, or any possible plea
bargain stances of the prosecution or the defense. [15]"

"[15]: This position of neutrality would not have necessarily extended to the
sentencing phase of the prosecution, where MIT might have been prepared to
advocate on behalf of Aaron Swartz had he been convicted."

------
aksss
This isn’t just a problem with scientific journals, but also niche
research/enthusiast journals. When a magazine goes under, and the copyright
holder is of a murky/unknown status, still too dangerous to digitize and make
available, which is a shame.

~~~
Nemo_bis
Indeed. One part of the problem might be addressed in EU in the coming years
if art. 8 of the 2019 copyright directive gets implemented well.
[https://www.communia-
association.org/2019/12/10/implementing...](https://www.communia-
association.org/2019/12/10/implementing-new-eu-provisions-allow-use-commerce-
works/)

------
pintxo
The predigital and national solution are laws requiring a a copy to be sent to
the national library.

What's the digital, and post-national solution?

~~~
ghaff
In the US, the mandatory deposit requirement likely worked pretty well with
traditional book/magazine publishers and music labels. Outside of that, I
expect a huge amount slipped through the cracks. There's no real enforcement
AFAIK and I imagine most who are independently publishing or otherwise working
outside of conventional channels don't deposit.

~~~
jumelles
It's not mandatory and not even a requirement for copyright protection.

~~~
ghaff
It's independent of copyright registration.

It is, as far as I can tell, mandatory in theory but not in practice.
[https://www.copyright.gov/help/faq/mandatory_deposit.html#:~...](https://www.copyright.gov/help/faq/mandatory_deposit.html#:~:text=What%20is%20mandatory%20deposit%3F,after%20a%20work%20is%20published).

------
dan-robertson
I feel like a lot of this article is trying to make comparisons between online
only open access journals and traditional closed publishers. However the paper
the article is based on does not collect any data about the latter and so
there isn’t any real comparison to make.

I don’t think the solution is to move back towards the old model. There are
already lots of initiatives towards creating online archives of academic work
that may be piggybacked on. In mathematics, perhaps the easiest way to set up
an open access journal is as an arxiv overlay journal where at the most basic
level each issue of the journal is a list of links to specific versions of
papers on the arxiv. This would be likely to be archived sufficiently well.

For a traditional journal that shuts down to be archived, lots of things need
to happen:

1\. Some library needs to pay some exorbitant fee to get physical or
(permanent not saas-based) digital copies of the journal

2\. That library needs to keep hold of that copy for the 100 years or so until
copyright expires

3\. That library then needs to take the initiative to make its copies
available

This seems like a harder process than finding some public domain digital copy.
And for a lot of journals, the only reason the library gets copies is due to
the bundling systems which universities hate.

I’m curious to know more about these journals which did vanish, and what sort
of quality they are. If a predatory journal offers open access and later
disappears, would they be counted?

------
Kednicma
This sounds like apologia from a big closed publisher (AAAS) explaining why
open-access is supposedly bad. See, sometimes open-access journals fold, and
when that happens, nobody knows what happens to the articles. But they'd like
you to ignore two inconvenient facts: First, that traditional closed
publishers effectively lose _all_ articles by default by this metric! And
second, that the Internet Archive, itself open-access, was essential to
conducting the study in the first place!

~~~
isido
I have been involved in the same projects furthering open access within
Finnish universities as the corresponding author has, and I think the aim of
the study is not make OA look bad, but to make it better by finding its
shortcomings and then fixing them.

~~~
l_matthia
Exactly. We don't see OA as the problem. OA solves many issues that exist with
traditional publishing and also makes it easier to preserve content in the
first place. The problem lies with decreasing library budgets, rising
subscription prices, and that preservation services often are not suitable for
smaller OA journals.

LOCKSS provides a free option for publishers to join, but only accepts a
limited number of OA publishers ([https://www.lockss.org/use-
lockss/publishers](https://www.lockss.org/use-lockss/publishers)). A couple
years ago the PKP launched their preservation service, which we're really
excited about as it also offers free preservation (for OJS journals) and would
help esp. those smaller journals that otherwise couldn't afford to enroll into
preservation schemes.

------
Jerry2
This article reminded me to donate to Sci-Hub again. I feel like donating to
various archives is some of the best use for my monthly donations budget.

~~~
dmix
archive.is always seems to be struggling to stay online, they are well
deserving too. It's a thankless job trying to work around all of the pushback
against archiving, like copyright and whatnot.

Having archives is so important in the legal field and plenty of areas of
research.

~~~
Jerry2
> archive.is always seems to be struggling to stay online

Very good point. I will donate to them too. I've been donating to Archive.org
for a long time but I use Archive.is more often these days so they deserve
some love too.

I have a list of places where I make donations to on my profile page here.

------
afandian
There are archiving schemes in scholarly publishing such as LOCKSS and
CLOCKSS. Not saying that they apply in this case, but YSK.

[https://en.m.wikipedia.org/wiki/LOCKSS](https://en.m.wikipedia.org/wiki/LOCKSS)

------
panic
Legalize Sci-Hub.

------
mensetmanusman
On the bright side, 50% of the vanished content was not reproducible.

------
cycomanic
I can't really say this is a bad thing. The number of journals has so
massively exploded over the last 20 years that it is pretty much impossible to
follow all the literature anymore. I'm not even counting the predatory OA
journals (which I think is the majority in the list) but just looking at the
big societies and publisher creating ever more journals.

------
ramshorns
> The authors defined a vanished journal as one that published at least one
> complete volume as immediate OA, and less than 50% of its content is now
> available for free online.

Well, this exact definition could have some false positives, like a journal
that publishes every third volume as complete open access and keeps the others
behind a paywall. But I'm sure they were a bit more careful than it says here.

~~~
l_matthia
Yes. We checked that all journals were full OA journals (so nothing like the
scenario you just described here). So the timeline looked like this: OA
journal was actively publishing > then became inactive, but the content on the
journal website was still accessible > and eventually the website and the
content disappeared/became inaccessible.

In some cases, we found websites (other than the original journal website)
that now host some individual issues, but not all of the content.

------
TheUndead96
The regression of civilisation is palpable.

------
peter303
Library of Congress should preserve them.

------
jp1016
read an article about waybackmachine , archive.org on hn few days back, will
there be a copy on it ?

~~~
toomuchtodo
If its in SciHub, it's in Archive.org, just not accessible.

~~~
jrochkind1
Say more? All of scihub is mirrored at archive.org? Where do I find out more
about this?

~~~
Nemo_bis
You don't. _shhhh_

~~~
jrochkind1
How do we know it's actually a thing at all, and not just something someone
made up?

~~~
Nemo_bis
It's not quite a secret, you can see yourself if you look closely enough.

------
Nemo_bis
Publishers in general are very bad at preservation. Entire runs of journals
often vanish because the publisher went bankrupt, sold some assets or just
didn't bother to do their homework.

For this reason, customers (mostly libraries and universities) around year
2000 have started demanding that the closed-access publishers have
preservation mechanisms and provide dark archives to ensure access after
subscriptions expire or are otherwise breached.

This has resulted in initiatives like
[https://www.lockss.org/](https://www.lockss.org/) and
[https://clockss.org/](https://clockss.org/) .

> Since CLOCKSS launched in 2008, 53 journals comprising 13,000 articles have
> been triggered

That's 13k articles (mostly from closed-access publishers) saved from
oblivion, but there's many more. The English Wikipedia alone knows thousands
of articles where even the DOI is broken (usually they're from publishers like
Elsevier, Wiley, T&F, LWW, OUP).
[https://en.wikipedia.org/wiki/Category:Pages_with_DOIs_inact...](https://en.wikipedia.org/wiki/Category:Pages_with_DOIs_inactive_as_of_2020)

On average, fully open access journals present a much lower risk of vanishing
because they're easier to archive, especially if they use a [free
license]([https://en.wikipedia.org/wiki/Free_license](https://en.wikipedia.org/wiki/Free_license))
like CC BY or CC BY-SA. However, publishers still need some nudging towards
archival.

DOAJ requires a digital preservation plan in place (at least with CLOCKSS,
LOCKSS, PKP PN, PMC, Portico _or_ a national library) for a journal to be
granted the [DOAJ
seal]([https://doaj.org/publishers#seal](https://doaj.org/publishers#seal)).
At the moment there are [about 1400 journals with the DOAJ
seal]([https://doaj.org/search?source=%7B%22query%22%3A%7B%22filter...](https://doaj.org/search?source=%7B%22query%22%3A%7B%22filtered%22%3A%7B%22filter%22%3A%7B%22bool%22%3A%7B%22must%22%3A%5B%7B%22term%22%3A%7B%22_type%22%3A%22journal%22%7D%7D%2C%7B%22term%22%3A%7B%22index.has_seal.exact%22%3A%22Yes%22%7D%7D%5D%7D%7D%2C%22query%22%3A%7B%22match_all%22%3A%7B%7D%7D%7D%7D%2C%22size%22%3A10%2C%22sort%22%3A%5B%7B%22created_date%22%3A%7B%22order%22%3A%22desc%22%7D%7D%5D%7D)).

Authors can publish in a journal with the DOAJ seal and be sure that their
work is safe. On the other hand, they have no recourse against a commercial
and closed-access publisher, which can be forced to comply with archival only
by contracts with its paying subscribers.

------
rektide
Welcome to our new dark-aged ultra-tech future.

~~~
dang
Please don't post unsubstantive comments here.

------
Jaxkr
Let’s be real here: was anything of value __really __lost? Any important work
was likely cited, paraphrased, or duplicated elsewhere.

Anyone disagree?

~~~
nurbl
Part of the point of citing another article is that you don't then have to
repeat all of it. So if B is cited by A, and B is no longer available, it's
not really possible to read and understand A either. And likely A used
information in B to justify some claim, which is now weaker.

~~~
Nemo_bis
That's pretty common either way. See Rekdal (2014), "Academic urban legends".
[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4232290](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4232290)

