Hacker News new | past | comments | ask | show | jobs | submit login
How Internet Archive Is Ensuring Permanent Access to Open Access Journals (archive.org)
235 points by constantinum 4 days ago | hide | past | favorite | 25 comments

One really good cause for making a donation or charitable gift is the Internet Archive. Preserving the world's knowledge is a worthwhile endeavor especially when people here lament the lost hacker zeitgeist of the 90s Geocities Era Internet.

IA isn't raking in large amounts of $$$ like Google, and a rare good actor like them really could use some support from one of the many fortunate users on this site who won the RSU lottery by simply doing what they love in this unprecedented decade of technology fueled personal wealth.

> One really good cause for making a donation or charitable gift is the Internet Archive.

I agree. I started donating more than a decade ago. At first, my donations were motived not by altruism but by self-interest: I was saving a lot of money by downloading out-of-copyright books from the IA rather than buying reprints or used editions, and I wanted to help keep the IA in operation.

I've since come to understand and respect the IA's overall mission more, and my annual donations now are motivated slightly less by selfishness.

IA would be easier to support and promote if they weren't skirting so close with copyright law with their initiatives. IA really can be grouped into at least 4 parts: web archive, public domain (historical) content, user submitted content, and "other" such as these initiatives. Of course it is not 100% clear cut always, but I do hope that the different parts would be more strongly separated.

Is there any way to do any meaningful preservation work without "skirting so close with copyright law"?

Well depends. You can do lot of preservation/archival work without distributing publicly the artifacts. Then also public domain works and works with explicit grants from copyright holders can be distributed, but someone needs to do the hard work of establishing provenance and clear the copyright status.

Neither of these approaches are obviously very useful for web archival (wayback machine), so there they already need to step closer to the fire. But it seems they can manage it by being very responsive to requests from owners and providing controls for websites, so that helps there.

The things I personally find more objectionable are projects like this https://blog.archive.org/2019/10/13/2500-more-ms-dos-games-p... where the copyright status is pretty unambiguous and which unlike websites were not before distributed publicly for free. Nor is public distribution really as material to the nature of the content as it is for websites.

The sections that are really inviting trouble are the "community audio/video" collections. As far as I can tell those are completely uncurated and unmoderated. Unsurprisingly the sections are full of crap, I don't know what they were/are thinking in keeping those open. I find it bit difficult to see how absorbing all that is doing anyone any good.

It would be super interesting to overlay the scale of efforts like this one that follow publisher licenses with those of scihub that ignore them.

The amount of effort we put into curation due to license and the amount of content lost because of our ownership policies seem to be coming more clearly into view as we consider the costs those choices incur.

Thanks Bryan! I'm more interested in the old papers, both public domain and out of commerce, and we have quite some work left to do...

> 1,150,246 30.22% no known independent preservation


I love the fatcat, this coverage graph is so nifty. :)

This is a good thing to do, but for once, I'm not sure the Internet Archive is the right entity to do it. I don't think the IA can ensure permanent access to anything at all right now since it's facing an existential threat as a result of its "lend copies of books that belong to other institutions" stunt.

Choosing any one "the right entity" is the wrong strategy for truly robust preservation.

Redundancy in a variety of organizations & technical regimes best ensures durability against all risks. So the IA's efforts here should be encouraged, and those of every other project able to take on similar duties.

And if vaguely implying that the IA is inadequate, please at least hint at potential alternatives who are or could be doing the necessary work instead. Note that anyone concerned about IA's longevity can and should leverage IA's groundwork – such as by mirroring its collections elsewhere, as is fairly straightforward either via explicit coordination or even dark/uncoordinated scraping.

Your point about multiple points of failure is well made, but one of the problems with the constant focus on the admirable efforts of the Internet Archive here and elsewhere is precisely that the many other archiving and digital preservation efforts go by relatively unheralded and underfunded.

Having said that, the spotlight on the IA does give opportunity to raise the profile of web archiving and digital preservation generally; we just need to be careful not to invest everything in any one initiative - as the parent comment fairly points out.

> one of the problems with the constant focus on the admirable efforts of the Internet Archive here and elsewhere is precisely that the many other archiving and digital preservation efforts go by relatively unheralded and underfunded.

Any chance you could point me to some other initiatives? I'm always interested in finding more repositories of information.

Example (these have followed the trailblazing Internet Archive):


International Internet Preservation Consortium is an active body linking many of them.


There are various resources giving information on tools etc, eg.



That is an amazing list. Favorited.

Yeah archives can be affected by multiple threats. Fire, natural catastrophies, etc. are one. Those can be fixed within the organization, by creating redundant copies in different places. There are organizational threats as well, affecting the structure running the archive. It's the guardians of the archive not being able or willing to preserve parts or all of it. The lawsuit is such an organizational threat.

> please at least hint at potential alternatives who are or could be doing the necessary work instead.

For this particular use case, Perma.cc has significant institutional backing.



> existential threat


"it only asks for a halt to the practice of copying books for loan in the Open Library itself, not the entire IA"

"the lawsuit takes pains to clarify that the publishers aren’t trying to shut down the rest of the Internet Archive"

"the lawsuit seeks financial damages only for the sharing of 127 books under copyright"

"the most the Internet Archive would have to pay would be $19 million — essentially equivalent to one year of operating revenue, according to IA tax documents. That’s a huge setback, but for the IA, a tech nonprofit that relies heavily on grants and public donations, it’s not the major death blow it might seem to be."

After near-term existence is confirmed, long-term existence can be ensured by legally firewalling the global public web archive from more adventurous business units.

I don't think there should be any single entity responsible for archiving. If you want something preserved there needs to be multiple copies across different organisations.

LOCKSS: A Permanent Web Publishing and Access System by Vicky Reich and David S. H. Rosenthal describes a practical system which guarantees survival of published articles even when the hosting journal disappears.

See http://citeseerx.ist.psu.edu/viewdoc/download?doi=

Link back to recent discussion on this topic: https://news.ycombinator.com/item?id=24422593

I'm curious, is IA also archiving youtube videos?

Maybe they should just make all closed access journals available also! What could possibly go wrong?

No skin in the game but...

Sounds good to me.

My understanding is that most authors will send you a copy if you contact them. If the authors are okay with open access, doesn't that make paid journals just rent seekers?

But journals help support peer review, selecting high quality papers, etc. Their value proposition exists - if it’s worth the premium is up for debate, but still.

Many people outside academia still don't know, so worth repeating: reviewers get no payment. That's right, they do it for free, or more precisely, it's considered part of their job and is therefore paid by their institution (their normal salary).

If peer review is the value proposition justifying paywalls, then be aware that none of the money goes to the people providing that value.

When you buy an article through a paywall, you are not paying the author of the article and you are not paying the peer reviewers. Why does it still cost so much? Because they can get away with it. Academia is conservative and changes very slowly because all actors want to preserve their prestige and status and any change may upset the current balance of power. People are used to this system, they've learned how to play by its rules (journal reputations, number of publications, citations, metrics like that).

Applications are open for YC Winter 2021

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact