
How Internet Archive Is Ensuring Permanent Access to Open Access Journals - constantinum
https://blog.archive.org/2020/09/15/how-the-internet-archive-is-ensuring-permanent-access-to-open-access-journal-articles/
======
Hydraulix989
One really good cause for making a donation or charitable gift is the Internet
Archive. Preserving the world's knowledge is a worthwhile endeavor especially
when people here lament the lost hacker zeitgeist of the 90s Geocities Era
Internet.

IA isn't raking in large amounts of $$$ like Google, and a rare good actor
like them really could use some support from one of the many fortunate users
on this site who won the RSU lottery by simply doing what they love in this
unprecedented decade of technology fueled personal wealth.

~~~
zokier
IA would be easier to support and promote if they weren't skirting so close
with copyright law with their initiatives. IA really can be grouped into at
least 4 parts: web archive, public domain (historical) content, user submitted
content, and "other" such as these initiatives. Of course it is not 100% clear
cut always, but I do hope that the different parts would be more strongly
separated.

~~~
Nemo_bis
Is there any way to do any meaningful preservation work without "skirting so
close with copyright law"?

~~~
zokier
Well depends. You can do lot of preservation/archival work without
distributing publicly the artifacts. Then also public domain works and works
with explicit grants from copyright holders can be distributed, but someone
needs to do the hard work of establishing provenance and clear the copyright
status.

Neither of these approaches are obviously very useful for web archival
(wayback machine), so there they already need to step closer to the fire. But
it seems they can manage it by being very responsive to requests from owners
and providing controls for websites, so that helps there.

The things I personally find more objectionable are projects like this
[https://blog.archive.org/2019/10/13/2500-more-ms-dos-
games-p...](https://blog.archive.org/2019/10/13/2500-more-ms-dos-games-
playable-at-the-archive/) where the copyright status is pretty unambiguous and
which unlike websites were not before distributed publicly for free. Nor is
public distribution really as material to the nature of the content as it is
for websites.

The sections that are really inviting trouble are the "community audio/video"
collections. As far as I can tell those are completely uncurated and
unmoderated. Unsurprisingly the sections are full of crap, I don't know what
they were/are thinking in keeping those open. I find it bit difficult to see
how absorbing all that is doing anyone any good.

------
willscott
It would be super interesting to overlay the scale of efforts like this one
that follow publisher licenses with those of scihub that ignore them.

The amount of effort we put into curation due to license and the amount of
content lost because of our ownership policies seem to be coming more clearly
into view as we consider the costs those choices incur.

------
Nemo_bis
Thanks Bryan! I'm more interested in the old papers, both public domain and
out of commerce, and we have quite some work left to do...

> 1,150,246 30.22% no known independent preservation

[https://fatcat.wiki/coverage/search?q=year%3A%3E1800+year%3A...](https://fatcat.wiki/coverage/search?q=year%3A%3E1800+year%3A%3C%3D1925+\(type%3Aarticle-
journal+OR+type%3Aarticle+OR+type%3Apaper-conference\))

I love the fatcat, this coverage graph is so nifty. :)

------
causality0
This is a good thing to do, but for once, I'm not sure the Internet Archive is
the right entity to do it. I don't think the IA can ensure permanent access to
anything at all right now since it's facing an existential threat as a result
of its "lend copies of books that belong to other institutions" stunt.

~~~
gojomo
Choosing any one "the right entity" is the _wrong_ strategy for truly robust
preservation.

Redundancy in a variety of organizations & technical regimes best ensures
durability against all risks. So the IA's efforts here should be encouraged,
_and_ those of every other project able to take on similar duties.

And if vaguely implying that the IA is inadequate, please at least hint at
potential alternatives who are or could be doing the necessary work instead.
Note that anyone concerned about IA's longevity can and should leverage IA's
groundwork – such as by mirroring its collections elsewhere, as is fairly
straightforward either via explicit coordination or even dark/uncoordinated
scraping.

~~~
mellosouls
Your point about multiple points of failure is well made, but one of the
problems with the constant focus on the admirable efforts of the Internet
Archive here and elsewhere is precisely that the many other archiving and
digital preservation efforts go by relatively unheralded and underfunded.

Having said that, the spotlight on the IA does give opportunity to raise the
profile of web archiving and digital preservation generally; we just need to
be careful not to invest everything in any one initiative - as the parent
comment fairly points out.

~~~
Shared404
> one of the problems with the constant focus on the admirable efforts of the
> Internet Archive here and elsewhere is precisely that the many other
> archiving and digital preservation efforts go by relatively unheralded and
> underfunded.

Any chance you could point me to some other initiatives? I'm always interested
in finding more repositories of information.

~~~
mellosouls
Example (these have followed the trailblazing Internet Archive):

[https://www.wikipedia.org/wiki/List_of_Web_archiving_initiat...](https://www.wikipedia.org/wiki/List_of_Web_archiving_initiatives)

International Internet Preservation Consortium is an active body linking many
of them.

[https://netpreserve.org/](https://netpreserve.org/)

There are various resources giving information on tools etc, eg.

[https://github.com/iipc/awesome-web-
archiving](https://github.com/iipc/awesome-web-archiving)

[https://github.com/pirate/ArchiveBox/wiki/Web-Archiving-
Comm...](https://github.com/pirate/ArchiveBox/wiki/Web-Archiving-Community)

~~~
Shared404
That is an amazing list. Favorited.

------
drallison
LOCKSS: A Permanent Web Publishing and Access System by Vicky Reich and David
S. H. Rosenthal describes a practical system which guarantees survival of
published articles even when the hosting journal disappears.

See
[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.68....](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.68.7907&rep=rep1&type=pdf#page=430)

------
bnewbold
Link back to recent discussion on this topic:
[https://news.ycombinator.com/item?id=24422593](https://news.ycombinator.com/item?id=24422593)

------
amelius
I'm curious, is IA also archiving youtube videos?

------
google234123
Maybe they should just make all closed access journals available also! What
could possibly go wrong?

~~~
Shared404
No skin in the game but...

Sounds good to me.

My understanding is that most authors will send you a copy if you contact
them. If the authors are okay with open access, doesn't that make paid
journals just rent seekers?

~~~
hazz99
But journals help support peer review, selecting high quality papers, etc.
Their value proposition exists - if it’s worth the premium is up for debate,
but still.

~~~
bonoboTP
Many people outside academia still don't know, so worth repeating: reviewers
get no payment. That's right, they do it for free, or more precisely, it's
considered part of their job and is therefore paid by their institution (their
normal salary).

If peer review is the value proposition justifying paywalls, then be aware
that none of the money goes to the people providing that value.

When you buy an article through a paywall, you are not paying the author of
the article and you are not paying the peer reviewers. Why does it still cost
so much? Because they can get away with it. Academia is conservative and
changes very slowly because all actors want to preserve their prestige and
status and any change may upset the current balance of power. People are used
to this system, they've learned how to play by its rules (journal reputations,
number of publications, citations, metrics like that).

