Hacker News new | past | comments | ask | show | jobs | submit login
Rescue Mission for Sci-Hub and Open Science: We are the library (reddit.com)
408 points by diplodocusaur on May 14, 2021 | hide | past | favorite | 52 comments



It’s amazing how consistently broken our legal and social systems are when it comes to dealing with public goods. You’d think we would have found some way to make all these papers easily accessible in a sustainable way by now.

Torrenting or hosting on ipfs solves a part of the problem, but it pushes the social and legal norms even farther apart. But perhaps the path towards a longer term solution is to lead with that social fix, and then let the law catch up with society? Feels gross, but it also feels substantially more feasible than fighting by someone else’s rules.


This is shaping up to be a real showdown between the People and the type of regulatory capture / institutionalized corruption by private industry that has run amok for many years.


the thing that's quite astonishing to me is the incredible level of involvement by the deep state on this issue in particular. I mean obviously a countries interests are the interests of its business community, but I wouldn't think naturally that sci-hub falls under the realm of things that say, the FBI would be concerned with.


The FBI has long been used to enforce intellectual property law, their involvement in something like this isn't terribly novel.


E.g., the FBI anti-piracy notices on VHS, DVD, and Blu-ray content[0].

[0] https://www.slashfilm.com/fbi-warning-front-dvds-updated-uns...


> deep state

Your comment is reasonable, but as an FYI this term makes me picture a tinfoil hat on your head.


[flagged]


Less of this, please.


Why? It's the state of deep ASCII-Art! :-)


There are libgenesis torrents which include 77TB of scihub papers, see

https://www.reddit.com/r/DataHoarder/comments/8ky647/scihub_...

Archivists were asking for help with torrent seeding:

https://www.reddit.com/r/seedboxes/comments/e129yi/charitabl...

Also see https://news.ycombinator.com/item?id=27092466


I'm pretty sure that SciHub and LibGen downloadable archives already contain all the torrent files.


It's worth emphasizing that this is merely the first step in an attempt to provide a long-term backup of the archive. Besides that, the project appears to be in need of help in order to restart downloading new papers, which apparently has been discontinued in December 2020, and maybe also a more robust storage solution for its live operation.


According to a piece of discussion recently (https://news.ycombinator.com/item?id=27088530) the stop for adding new papers seems to be voluntary and motivated by complying with a request in an Indian court process.

Under the circumstances I'm sure they could use help and development with that stage anyway, but it's not clear to me that whatever system was in place until last year has been put out of commission more permanently.


Why is sci-hub complying with Indian courts and worried about American law enforcement if sci-hub is in Russia?


Alexandra mentioned that she didn't want this project to be illegal forever.

Why India specifically? Probably because the same court has previously decided that photocopying books for personal use is fair-use, as long as no profit is being made.

Being legal in India would be a huge benefit. sci-hub could be hosted on a .in domain with servers hosted in India. No more cat and mouse game.


I am really eager to have a custom search engine for the content of the papers. What is the cheapest 100TB solution one could get?


I would go for a 4U rack case that fits loads of drives and fill it with shucked [1] external hard drives. Just add a cheap motherboard and some pci-e sata expansion boards and your good to go. [1] https://youtu.be/iMpCnIr622M


https://www.hetzner.com/dedicated-rootserver/matrix-sx

~180 EUR a month for 160 TB. Would need RAID so the 224 TB for 300 EUR a month would be better.


these are not available, i don't see any high storage options for sale at hetzner


I know libgen has a desktop client, but it still downloads files from server. Would be nice if a libgen/scihub client can search and then download from torrents.


This is a great idea. Extending torrents.csv is a potentially solution but it requires meta data.


Downloaded a few TBs worth. Hope it helps!


Pretty sure it's the seeding, not downloading, that helps. :)


Is any part of the library stored in the IPFS network?


IPFS seems to be mostly hype, it's never really worked for me when I've tried to store even relatively tame dataset.

There's a reason it hasn't taken over the torrenting/piracy world and it's not because those people are slow to adopt new technology.


I just learned about IPFS this week, it sounded really cool and was planning on learning more about it. But as my time is limited I'd wanna know if it's worth the time investment. Which parts of IPFS didn't work for you? Was it to slow? Missing features? No integrations? Or plain broken ridden with bugs?


It's not censorship-resistant. They have technical solutions in place to directly ban/blacklist certain content hashes which can get served to the servers directly by national security agencies.

A dystopian heaven.

IPFS had a really cool idea at the start but they quickly bent under legal pressure without even trying to fight back. They have lost all legitimacy in the circles of the people who want to distribute potentially sensitive data that's only deemed illegal by corporate interests (like scientific papers).

---

And as the sibling poster alluded to, the network can be extremely slow. Even for well-known content hashes I often times had 5-10 minutes of preliminary phase where not a single byte was downloaded, only for finally start downloading 500MB at 20KB/s... And I have a gigabit connection.


Do you have more info on this? What's the specific mechanism used here? And do you have any links about the legal issues?


The first mirror on library genesis often has a link to an IPFS option, which is usually the fastest option.

The “problem” with ipfs is that without incentivizing someone to pin (basically to seed) your file, it will be slower to access/will not necessarily remain alive if you yourself stop hosting it. But it’s the same with torrent. It’s actually really cool that it has public gateways so you don’t need anything specific for access.


I wonder how people working for Elsevier and Nature could look at themselves in the mirror.


I have a distant relative who works for Springer Nature, and previously worked for Elsevier.

She's a nice person who I rather like. I have tried discussing the ethics of her employer's journals business in a polite way. She told me what I felt were the company's "talking points" - "well, publishing is actually quite expensive, we publish many more papers than we used to, etc" - all of which I think are not valid justifications - e.g. look at the huge profit margins of Elsevier. And yet I think she believes them, or at least "chooses" to believe them.

I find it rather depressing that someone, who is otherwise a fine human being, chooses to work there. I know quite a few other people in the same situation, in various other businesses.

Anyway, I thought it was worth mentioning what some of the people "on the dark side" are like, and that in the mirror they look rather like me and you in most respects. My opinion is that at work they are in an environment of people who all choose to believe the same thing, and so what they do is "normal" and "acceptable".


> I wonder how people working for Elsevier and Nature could look at themselves in the mirror.

When they do they don’t see anything.


You could ask the same about anyone who submits their articles there. Because, you know, we could all revolt and submit our manuscripts only to the open-access journals.


Some disk space as a proof cryptocoin could be useful for this..


That'd be file coin: https://filecoin.io/


We need to store these files forever publicly. Isn't this the exact value-prop of Filecoin? And if not Filecoin, some blockchain solution?


All that is needed here is distributed storage. Whether or not it has anything to do with blockchain is immaterial. The value add, if any, of blockchain is verification of the provenance of the files stored on the blockchain, but if a third party is copying scientific papers in to begin with, the chain of custody has already been disrupted. Theoretically, scientists could digitally sign and upload their own work, but they can already do that. The reason they publish to journals instead is the actual scientific validity of published work isn't verified by being of known provenance, but by peer review.

So unless you have some idea of how to do peer review by blockchain.

I mean, that's probably not impossible, but good luck. Since the idea would be preventing anybody from making money off of it, it kind of goes against what blockchain is actually used for. Scihub is trying to create abundance, not scarcity.


A site that just publishes metadata and SHA-256 hashes for published papers could bootstrap this effort and might actually be legal, or at least would be an interesting court case to follow. I wonder if there have been any interesting legal decisions for that sort of thing?


If nothing else it would be fascinating to see the variation due to watermarking...


Filecoin isn't really useful, is it? It suffers from the same lack of real censorship resistance as IPFS, on which it is based. It also suffers in that the barrier for entry for new nodes is incredibly high (I didn't know this until recently myself when I was considering throwing part of a NAS at it and found out they wanted me to have way more than just 16GB of RAM to participate). Freenet is more appropriate as it provides cover for people holding the information on their nodes that IPFS does not.

Additionally, IPFS has the same problem as torrent files in that it requires additional layers (which aren't all feasible or effective) to obfuscate the source of data. If sharing it has been declared, in some sense, illegal then IPFS will reveal who is sharing it. And once those nodes are forcibly removed (because governments are more capable of doing that than random individuals) then the data can disappear.


> It suffers from the same lack of real censorship resistance as IPFS, on which it is based.

I guess, then I2P should be the solution?


IPFS is what you're looking for, and there's already a IPFS site for LibGen so SciHub is probably not far behind.

Note: Filecoin despite advertising itself for years as an incentivisation layer for IPFS is not - it's two separate networks [1], so it lacks all the features of the mature IPFS protocol.

[1] https://github.com/filecoin-project/specs/issues/1191


> Isn't this the exact value-prop of Filecoin

Yes, exactly. Would love being able to use Filecoin or any of the alternatives (Storj, Sia and those) but they are all very immature right now, both UX and stability wise. If it's not hard to get the files into the networks, it's hard to get them out again.

IPFS could work as it's not ruined by the whole blockchain mess, but then you might as well use torrents (which people are successfully using already for Sci-Hub).


Filecoin has massive overhead. It also relies on exchange of funds to keep the files around. If you want to VOLUNTEER to store the files for others for free, then Filecoin adds needless overhead!

And yeah, torrents already let you do this.


Torrents are not suited for websites at all. There are however numerous websites which are available on IPFS.


Sci-Hub is not "A website" though. It's a archive, archive of scientific papers that can be served by a static file server, IPFS or torrents. Have the DOI be the filename of the paper, and now you even have a DOI-lookup functionality in your static file server.

Sure, you could do that over IPFS. But if you really want scale, censorship-resistance and wide-spread usage/storing of it, you'll use torrents (today, maybe future will be different)


IPFS is built on torrent technology so it has those same proprieties. The advantage here is that IPFS additionally handles the front-end - the website, search and any future features that might be added like a blog or forum.


IPFS is not "built on torrent technology", it does borrow some of the ideas that Bittorrent also uses. And no, IPFS won't automatically give you search and other things, you have to build those yourself, same as if you used Bittorrent.



That is more recent and more substantive, so we changed to it from https://www.reddit.com/r/scihub/comments/awlc4s/full_archive.... Thanks!


For those confused, the above is a current post, but not in fact the old.reddit.com link to the submitted article.

That would be https://old.reddit.com/r/scihub/comments/awlc4s/full_archive...

The link lousken suggests seems to me more appropriate and useful however. (I've suggested this to mods via email.)


(2019).


Submitted URL was https://www.reddit.com/r/scihub/comments/awlc4s/full_archive..., since changed to a URL from this year.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: