Hacker News new | past | comments | ask | show | jobs | submit login
Sci-Hub: Removing barriers in the way of science (sci-hub.io)
760 points by kasbah on Feb 13, 2016 | hide | past | favorite | 217 comments

Previous HN discussion (article has neat details of how sci-hub works): https://news.ycombinator.com/item?id=11074638

There was another shortly before that: https://news.ycombinator.com/item?id=11070192.

We're bending the rules not to treat the current post as a duplicate, but the community interest seems stronger than usual.

I am happy she is doing this. The price that paywalls are charging for accessing research papers is unacceptable: the money does not flow back to the researchers, nor to the institutions. Most of the paywalled research has been sponsored by taxpayer money and hence should be publicly accessible or at a very low "maintenance fee".

I work at a small academic database as a developer. We charge a pittance compared to what a lot of the larger players in this market do. And I'm conflicted.

We are a registered non profit, and are actively losing money. Developer costs, of which there are only two. Building costs--I also shovel snow!, marketing costs, hosting costs, and much more. We have the workload where we would benefit from a staff 3 times the size. Currently we are at 12. Years ago we had a fully functional publishing operation too.

But here is the thing. These open access places often are pay to publish. This is built into many grants today. Publishing costs. But not so much in the field I'm in. The social sciences. So sure, the result of that work should be open access. I just cannot help thinking this takes away from what you could be paying one more lab or research assistant.

a lot of these pay walled sites are struggling to survive as library budgets dwindle. Maybe there isn't much room left for the little players. Many are bought up or have folded over the last 20 years.

This is a social sciences problem, not a large player vs small player problem. At Sage, which is one of the largest social science publishers, we also lose money on social science OA publishing. We've tried for years experimenting with different APC prices and have struggled to figure out how author pays will work in the social sciences. Shoot me an email (details in profile) if you want to chat about this stuff more. It's rare to find someone with experience doing social science OA publishing on HN!

What is the unique value proposition in having your two developer team building another database?

I'm struggling to think for a good reason to have journals or databases be anything other than a PDF dump, and peer review already happens on other platforms.

Where are all those hundreds of millions of dollars going, from university subscriptions??

Simply by doing this, she has become one of the greatest single contributors to science as a whole in our era. Heroic.

She? Is there someone publicly affiliated with the project?

Yes, Alexandra Elbakyan, a researcher from Kazakhstan.


>Despite seizure of the websites as ordered by a New York district court on October 28, 2015, the site is still accessible through alternative domains as of December 2015.

So a court in the United States could seize a site just because the plaintiff was based there?

Or because the .org domain registry (PIR) is based in the US?

A little bit of both. A US citizen can sue foreign nationals in US courts, but a judgement can't be enforced unless the foreign national steps foot in the US or has a US based asset. In this case, the .org domain is controlled on US soil, so that's about all the court can do unless some crazy extradition arrangement is made or she accidentally forgets and comes to the US for a conference or something.

The US says it can seize all domains whose registrars are based in the US, so .com, .org, .net, etc are all vulnerable.

It's amazing that retaliations following this initiative would make it close to impossible to do for anyone in the United States. Kudos to her.

Yes, Russian neuroscientist Alexandra Elbakyan.


Actually, the surname seems to be Armenian. :-)

Attributing the correct ethnicity seems to matter a lot more outside of the US, Canada, and Commonwealth.

I got an earful (screenful?) for calling a historical ethnic Russian painter Ukrainian, just because he was born in current-day Ukraine. This all predates the current conflict, and was from someone I'd consider educated, too.

In my experience US, Canada and Commonwealth are the ones obsessed with (positively or negatively) ethnicity and race.

Race, yes, but nobody born here asks me "where are you from?" upon first meeting, whereas I get this a lot from europeans and people from more tribal societies.

I'm an American, and I get asked "where are you from?" almost every time I meet a new person here (in America). I look very non-American and have an accent most people find hard to pin down, so I'm thinking that the question is just not posed to people who look and 'speak' very American.

Well, I'm Canadian, so we can't really compare. I look brown, but have a basically Toronto / SF accent.

Ethnic Russian from Kazakhstan.

I've used sci-hub a few times. It's a little buggy and not every article can be accessed, but it works well enough to try when I'm not on campus.

$30 to read a single article is ridiculous anyway and presents a barrier to scientists who don't have, can't afford, or don't want to pay for access. I hope sci-hub stays up and improves for some time.

If you're a student, your university likely has a proxy service that allows you to access journals from anywhere using your credentials.

My previous uni gives access to alumni for life, so I can access journals for free from wherever.

What about poor countries who can't afford to pay for many subscriptions in the field? Should their scientists be banned from making contributions to science?

Spain is not very poor, but when I was doing research in Seville, the department simply could not afford subscriptions to most physics journals, so we had to write directly to the authors with mixed success.

How do you think a scientist in Senegal would feel about spending $6,000 [0] for access to a single journal for 5 researchers.

If the per-capita GNI is $1,000 [1], the equivalent in the US would be for your university to pay $330,000 for 5 people in your department to read _a single_ journal!!

[0] http://store.elsevier.com/product.jsp?issn=03784371 [1] http://data.worldbank.org/indicator/NY.GNP.PCAP.CD/countries...

I agree completely. Open access is really the only way to go if we want scientific knowledge to be equally accessible everywhere.

However, if you were fortunate enough to attend a college that does have access, you should take advantage of it.

Probably a researcher in Senegal has access through this: http://www.research4life.org/about/

I specifically chose the journal above, because it's not included in that initiative.

Research4life only gives access to what the first world thinks the third world should concern itself with. Namely <<leading journals and books in the fields of health, agriculture, environment, and applied sciences>>. I think it's a generous initiative, but it misses the point of open access.

If you're poor and passionate about math, theoretical physics, or some branches of computer science ... then, sorry, these journals are only for the rich kids.

Even with that, every uni does not subscribe to everything. You can still find important papers behind paywalls.

In my uni we had pages-long emails for the lab explaining the different procedures (at least 4, e.g. access with some other proxy, etc) to try to get access to an article before requesting it.

I frequently took advantage of this when I was a student. It was amazing that I could bypass any paywall and get access to all the journals/articles that I needed either for my courses or my own curiosity.

Unfortunately, that great privilege was revoked when I stopped paying tuition (aka graduated).

And what are people outside of the U.S. supposed to do?

Agreed. I studied in Germany, and none of the academic institutions I know of make papers accessible to alumnis. For students, sure, but for alumnis? Never heard of it.

The local hack: get a lifetime membership at the uni computer club. This gets you a shell account on a computer inside the uni network, and Rob's your uncle. Lifetime cost is about $100 here, and that money goes to a good cause.

My uni didn't have any of that. YMMV

I'm in Australia and we have the same service. But it is bullshit that academic papers are hidden behind paywalls.

From a Norwegian IP you can get access to many medical journals, see http://www.helsebiblioteket.no/om-oss/english. This can be achieved through TOR if you choose only to exit through a node in Norway.

The GP is a great person to ask that question to in light of the fact that he is a student in UAE. I am curious why you assumed that GP is from the US?

Let me guess: Because the spoken language is English and the site has a clearly US-centric culture? Even more specifically, an SV culture. Have you ever seen a housing article about Fresno on the front page? New York? One could argue the bias lacked utility, but it's hardly incriminating.

I'm living in the United Arab Emirates. My previous college has subscriptions in most journals. My current institute also has access.

I don't think it is unique to the US... I'm in Canada and also have that.

Not from anywhere. Most major publishers, sure, but some journals are a little esoteric or very specific in their discipline and not part of the main journal subscription collections.

> My previous uni gives access to alumni for life, so I can access journals for free from wherever.

Huh. Need to check on this, thanks. That said I hope sci-hub succeeds.

Instead of paying the publishers, who contribute nothing to the work at all (remember they also charge us researchers hundreds of dollars just to get our work published), I'd rather donate the same $30 to sci-hub.

Another option: pay $5 to actual editors and designers, and $15 to reviewers. And save $10. Remember, the publisher typically doesn't even help with paper formatting, never mind copyediting.

Unpaid reviewers (who allegedly agree to work for some 'reputation' — BS, they're anonymous) delegate the actual scientific review to the least busy student of those capable to write a syntactic semblance of a positive review.

And then you're forced to pay $30 to have a chance to finally review it for yourself, as you're the only one interested in quality.

Please pay the reviewers, or everybody (in many applied areas at least) will self-publish in blogs and judge quality on HN votes. Like it has happened with most of software research.

Where do you get good editing and design work for $5?

Most reviewers don't want $15 from the author. Quoting http://www.senseaboutscience.org/pages/peer-review-survey-20... :

> Reviewers divided over incentives: Just over half of reviewers think receiving a payment in kind (e.g. subscription) would make them more likely to review; 41% wanted payment for reviewing, but this drops to just 2.5% if the author had to cover the cost. Acknowledgement in the journal is the most popular option.

I suspect the parent meant $5 per sale; if you were selling millions of copies of papers, you'd have a substantial editing and design budget.

Ahh, that makes sense - I missed the context.

When does it end? I'm reading papers from the 1960s and '70s - where should I send the $5? Do I adjust for inflation? And how do I send Deutsche Marks to West Germany?

Most papers have some to all public funding behind them. Even papers from Stanford and MIT.

Shouldn't the public have free access? We paid for them.

The NIH now requires that NIH-funded research publications be made open-access upon acceptance. They're usually on pubmed central. These tend to be the ugly author manuscripts instead of the typeset format but at least they're available.


No doubt $30 is expensive, but many journals do provide free access to their content in developing countries.

Kindly give a few examples. Tired of seeing $$ to read beyond the abstract.

Not in Russia

Elsevier made more than 3.5 billion dollars in revenue last year. They are trying everything possible to destroy open research. They were behind three bills in the US congress to prevent universities from providing access to pre-publication research. This is research that's been paid for by taxpayer dollars.

Companies with attitudes like Elsevier need to be buried.

It paints a bulls-eye on her back and its use of the term "piracy" to "free knowledge" doesn't flow well in western sentiments. While I get that the "booty" she is stealing are the fees that the journals would like you to charge, the acts she are creating are more like a librarian letting people check out books without a library card because she has an infinite supply of said books.

I am hoping that the rent-seeking behavior of the science journals can be used as the canonical example of how copyright can harm the common good.

By endorsing and upholding this egregious use of copyright, our elected officials are clearly causing more harm than good, and the perversion of the spirit of copyright, that an author is granted a temporary monopoly so that they might recover some of their investment, portrays this use as indentured servitude at best, and outright theft at its worst.

So while I don't think anyone is really "harmed" because Disney won't release the original Cinderella or employs measures to keep it from being copied. It is very much the case that by creating this barrier to scientific research, a person or group who might change the world in a positive way if they had access, is perhaps even unaware that there is relevant work that they cannot get access to. That is definitely a harm in my opinion.

So I hope that the narrative here, which has been dominated by big media for so long, might get some interjection of a more nuanced understanding of why copyright exists, and how to craft laws that embrace that spirit, rather then the rent-seeking interests of the people who live off the work of others.

The narrative on copyright has been dominated by big media because every political narrative has been dominated by big media and big money.

Those of us who get information from relatively unfiltered and uncontrolled sources via the Internet have long had a different perspective on copyright than those who don't.

I too hope that big media's control of the conversation is coming to an end - but they won't lose that control without a fight.

A long time ago (even before Aaron Swartz), when I was still familiar with the active and rapidly growing filesharing community of the time, I vaguely remember reading about an effort by some of the "ebookers" to plant proxies in various universities' networks that would perform much the same function. I wonder what eventually became of it besides the large paper torrents that appeared, but I wouldn't be surprised if SciHub was related to that in some way. Back then, systems were far more open (as opposed to secured), and something like that was easier than it is today.

> an effort by some of the "ebookers" to plant proxies in various universities' networks that would perform much the same function

Yep they did that, it's called ezproxy.


scihub uses this AFAIK.

Good for them. For the last article that I published, the publisher "value added" consisted of highlighting all the all-caps names in my document and asking me to define them as acronyms. Literally the only thing they did, and it wasn't even right.

Oh, I get so much more out of the publishers: 1.) Long waits during which I worry that I'm going to be scooped. 2.) Lots of typos because Elsevier outsources printing to people who don't speak proper English. 3.) PDFs that do not render correctly in some PDF viewers.

Got a source for #2?

Funnily enough, I was just reading an article from the journal SYSTEM (Sciencedirect/Elsevier) yesterday. Very prestigious journal in my field, but it was littered with typos and mistakes - I figured surely it couldn't be the work of the two authors.

Didn't they also hassle people to review it (for free of course). That's the only thing that would be a bit hard to replace by a totally free service.

They do. They route it to the editor (senior academic), who routes it to appropriate reviewers. I think that the editor does it for free too in many cases. I feel like running ads on the site would pay for the bandwidth, only thing a totally free service couldn't replace is the prestige :(

"Prestige" of a peer-reviewed article in a particular journal isn't the most important thing. What matters is how well-cited your work is. Peer-review is done by contributors to a journal, so you could get the same from a free service.

Publishing in the most prestigious journals is hugely important. The first thing many academic hiring/promotion/grant committees look at is how many Nobel Prizes you've won, the second thing is how many publications you have in Nature or Science. Everything else is just for breaking ties. I've even heard that there are fields where having any conference publications counts against you.

That's not the experience I've had in my faculty. My supervisor has referred to Nature and Science as "shiny PR journals". Besides, citations are what actually matter, because they show how popular your work is within the community. Why else would people who write cool software (GNU Parallel, Scipy, Numpy, etc) all ask for citations for their papers? If all that matters is "getting it published" and not "getting it cited", then they shouldn't care, right?

Citations help you out in the long-time limit for sure; "worlds 7th most cited chemist" is definitely an improvement on "another dude with a bunch of Nature papers." But I would imagine that when you are going toward important early-career milestones (postdoc/assistant prof/tenure), they are mostly looking at your recent work, which will not have had time to accrue too many citations unless it is the absolute hottest thing. In those cases, journal ranking may be a bigger differentiator. I don't know how one would fix this.

Actually reading papers for content would be a start, and not just assuming that a first-author nature paper means you are a genius. Nature editors are human beings who use normal criteria for deciding whether a paper is good. They don't even have subject matter expertise all of the time. It makes no sense to trust the nature brand above your own judgment as a reader.

There is a larger problem at work, however, of university administration and departments using these sorts of signposts to decide who is worthy. I think both widespread managerialism in unis and a poor funding climate are both at fault.

Getting the almost parasitic "managers" off universities should be a top priority goal for the well-educated people. Managers should have no role in any funding, selection or other academic activities. The publishing related rent seeking is a result of poor management of university's academic affairs. The parasitic "managers" at universities, either knowingly or unknowingly, help the rent seeking "managers" at publishing businesses plunder money and thus pressurize academics to bow to the publishers. I know, this is very difficult to achieve but not impossible. Efforts like this (sci-hub) are steps in the right direction. Kudos to her.

Plos or NLM or any number of people will host the content for free; bandwidth is not at issue.

But opening and accessing the data used in the analysis is a big problem.

The big problems are the legal structure strangling us (and the greed and power behind it); we can fly to the moon, making a way to download PDFs is nothing.

Welp, I used to get books, papers and software from non-legal sources when I was in undegrad, because I just couldn't afford it, now that I make some money I buy most of this stuff. The thing is, without all those resources in the past I couldn't have made it to where I am now. Just my 2 cents.

I'm not sure if we're talking about the same thing here?

Say you need to research some scientific algorithm or other, you paid $40 per paper just to do that? There will be 3-5 papers that are must-read, so $120-$200. Then there's another 5-10 papers that are referenced in the former, you might want to check, just for the few paragraphs that are referenced, which may contain crucial elements of the algorithm you are trying to write (often not explained in full in the original papers).

Even if you have that money to spare, wouldn't your research be hampered by the choice you stand whether that one extra paper at $40 is going to be worth the money?

Kudos to you, there is little respect these days for personal decisions in the area of information access. Pirating information is wrong, doesn't matter how you twist it. There is big difference between creating and promoting free content (great) and trying to break the law to access content that already exists and is subject to copyright law. After all, requesting payment is a deal between publishers and writers. By pirating copyrighted works you're not just breaking the right of publishers, but also the rights of millions of small content creators.

What do you do of researchers that pirate their own books, and tell you never ever to buy their books, but go to libgen? I have met at least 5 world-class scientists say that "because I don't do any money on it anyway", "it's a scam", and "I wrote this book to be read". And tell "it's wonderful to imagine that this poor student read my book, and she was afraid to say she downloaded it illegally"...

"Piracy" is a very nuanced subject, depending on what/who you are talking about.

There is no nuance there, these people had a personal choice of publishing their books through a traditional publisher and decided for that. It is just a tradeoff that you need to honor. Maybe next time they will just self-publish and have a book that is truly free without the need of pirating.

We have no real choice. Self-publishing means that the book doesn't count at all in the CV for grant applications, tenure applications, etc. Only the very top scientists that no longer have to fight for all these things can afford to make that "personal choice". For the rest, it's suicidal.

I also have a book at a major publisher, with an outrageous price, and when I saw it "pirated" I only felt joy at the fact that more people will get to read it and thus my work is more meaningful. And I also downloaded it myself, because I actually didn't have it in PDF, only in physical form.

I have published academic books like you, and I never had a problem with this once I understood the consequences of traditional publishing. These books are cataloged in good libraries and available in Amazon. If people don't have money they can go to a library and get a free copy. The day I want a book freely available in the web I will just write one and post it on in my web page. I like the idea that authors have the option to go one route or another. Pirating books doesn't enter into this equation.

Going to a library and getting a free copy is exactly what people are doing. The library is online, freely accessible without discrimination.

Since when have we asked authors permission to add their book to a library? In many places (including the US) if you publish a book it is mandatory to submit it to a library.

>Since when have we asked authors permission to add their book to a library?

As far as I know, libraries in the US purchase their materials like anyone else. They have the right to lend due to the Doctrine of First Sale[1], because what they lend they legally own.

>In many places (including the US) if you publish a book it is mandatory to submit it to a library.

According to Wikipedia, in the US publishers are required to submit two copies of a published work to the Library of Congress[0], not to distribute copies to public libraries.



I have never seen an illegal library, have you? Don't try to confuse a respectable institution with pirate web sites that didn't ask permission to anyone to do their illegal thing.

What is the difference, asside from someone deeming copying 'illegal'?

There is nothing illegal about copying per se, as long as there are no provisions against it. This is a basic principle of human society. For example, there is nothing illegal about walking without shoes, as long as there is no regulation preventing it as it's the case at some government offices. Your reasoning is just trying to throw away our society principles to justify your behavior. My main contention is not that we shouldn't have free information, but that it is unethical to disregard existing laws just because you don't like them.

There is nothing unethical in braking the law. It is illegal to disregard existing laws, ethics is a different matter. There were many racist, sexist, oppressing laws in the past that we consider unethical today, and may even celebrate people who broke those unethical laws in protest to authorities and 'the society'.

True, there are situations where breaking the law is the ethical thing to do. However, you are trying to put access to a copyrighted book at the same level as fighting against sexism and racial oppression. Unless you can show that these situations are closely comparable (little clue: they're not), you're just creating an excuse to avoid following laws that don't benefit yourself.

In the case of copyrighted books or research papers, it could be a matter of life and death for the user, if we are talking about access to various types of medical research, for example.

Research papers in medicine are written for specialists, who already have access to them by means of employment. I don't know how access to that literature can save lives otherwise. Even if that was the case, it is a very far fetched way to prove that you need generalized civil disobedience with regard to copyright law.

In a lot of countries in the world, medical institutions or individuals don't have enough money to pay for access to research and books (lib-gen, the sister project of sci-hub also serves pirated books). And I agree, that might not be a proof we need civil disobedience in general (with regard to copyright law), but I think it does show that copyright law doesn't work well for medical research. I think similar could be proved for other areas covered by copyright law, but that would be a long discussion.

> requesting payment is a deal between publishers and writers

Not in the case we're talking about here. Sci-hub doesn't provide copies of novels. It provides copies of scientific research papers, for which we, the public, have already paid with our tax dollars. The payments to journals are not deals between those publishers and the scientists; they are deals between those publishers and the government, to get us to pay again for something we've already paid for.

Is there a tarball of the data somewhere that one could download, redistribute and host somewhere (on the darknet, assumedly)?

Library Genesis, which also includes scientific articles and provides storage for sci-hub, provides a large series of dataset dumps as torrents, http://libgen.io/repository_torrent/

There are occasional torrents, AFAIK. Hope you have 10tb of free space handy.

It is absolutely mind blowing to me that you can purchase that amount of storage with 20% to spare for around $500 these days. 12TB, $498 : http://www.amazon.com/Red-Desktop-Hard-Disk-Drive/dp/B00LO3K...

Or around $120 if you just want to archive the data and use tape:


The drive isn't cheap though, but tape is still the cheapest media for long-term archival storage.

I don't think that's feasible -- scihub requests articles directly from publishers (using legit accounts). Each requested article is cached by libnet.io, so articles are not re-requested twice

Great to see a massive middle finger to the journal system. It's a disgrace and has to stop. Unfortunately I fear sites like this might entice more stringent protections for future journal published articles. The war continues.

In Norway there is free access to NEJM, JAMA, BMJ, Annals of Internal Medicine and the Lancet (2 month delay). UpToDate, BMJ Best Practice and McMaster Plus is also free. See http://www.helsebiblioteket.no/om-oss/english for information about all included resources. You need to access these resources from a Norwegian IP to get access. From abroad, this can f.ex. be done through Tor if you define only exit through a Norwegian exit node.

http://www.freefullpdf.com/#gsc.tab=0 is another useful site if you're a lone researcher who doesn't have taxpayers' money funding your literature search and can't afford (in some cases) $30-$40 to look at a published paper.

Surely that's the same corpus?

This is what Aaron Swartz was trying to do, right?

Allegedly. All we actually know is that he downloaded a large number of articles.

Speaking of moral courage, how does one contribute institutional login credentials to sci-hub?

Only guessing here - you don't provide login credentials, you provide a proxy host within the university IP range as IP-based authentication is still the most commonly used method for licensed databases.

May it continue. Perhaps science can go the way music has where in practice you can see most stuff for free.

At least some of the time that does happen (not nearly as much as it should IMHO). I moved from one scientific institute to another, and would often get requests from friends in the first place asking if I was able to download a paper for them.

As a student who is studying to become a Theoretical Mathematician, I hope Sci-hub stays open for many years ahead. In Finland, we have pretty good access but only if one is a student. Mathematics is so interconnected, that removing paywalls and any obstacles could help uncover breakthroughs by combining ideas from other fellow Mathematicians. I hope UN exercises Article 27 of Human Rights and aligns itself on the right side of history in order to better Science and to encourage curiosity in today's minds and definitely tomorrow's! Pardon my English. Thank you.

There is another service in the pipeline. I came across it very recently; it is in a public-beta phase. It appears to focus on providing access to all digital libraries and specifically serving the third-world or developing countries, mostly in Africa. It has got a different (business) model and uses some advanced technologies for provisions of the articles. Given that they intend subscribing to the publishers, there is no doubt that they will remain in business for as long as the publishers themselves exist.

The projects like sci-hub.io, library.no and libgen are highly commendable. It is no news that the third-world countries are destabilized by war, economic sanctions, e.t.c. perpetuated by the world powers thereby making them re-prioritize (access to) their resources. And it is not surprising that webrtc/p2p related services are often times blocked in the first world institutions with access to articles from those digital libraries. Such technologies/protocols/tools are defined/shaped (at standardization meetings - IETF, W3C, e.t.c.) by big corporations in order to preserve their own product offerings.

Sometimes we have no option but to break the law until what is considered illegal is made legal.

Storing research papers behind paywalls is absolutely ridiculous. The law literally prevents the development of science.

Having personally seen people benefit directly(for purposes of research) from this initiative solidifies my whole hearted support for sci-hub.

How is this diffrent(better) from http://arxiv.org ?

Arxiv is by its nature open access; Arxiv does not charge for access.

On the other hand, the articles that this site hosts require payment to access. The journals typically charge $30 / article, or roughly $2,000 / year subscription.

Note that for both the open access or paid journals, researchers do NOT receive any compensation when users download articles. That is, despite the research being mostly paid by taxpayers, a PRIVATE company receives compensation for the work done by the researchers. Not only that, but the researchers have to PAY a publication fee, and that fee is higher if they want to allow open-access.

Not only the researchers do not receive any compensation, they actually need to PAY the publishers hundreds of dollars minimum to get their work published.

How come all papers i have come across at the arXiv are readily downloadable as PDF?

Because the arxiv is a place that hosts free pre-prints. By contrast, sci-hub is a way to get normally expensive articles for free. Arxiv stuff is already free.

Arxiv is a voluntary service where authors upload their own papers. It covers a few hundred thousand papers, probably, since authors have to know about it, want to use it, and have copyright service; it covers only a few areas of science where Arxiv use is in vogue, and only from the past decade or two.

Whereas, SH/Libgen acquire copies of tens of millions of papers from everyone everywhere everywhen.

It's the difference between a local library and the Library of Congress.

Arxiv is a place to host scientific articles for free. Journals are institutions that scientists submit articles to, get them reviewed, published, and then charge a fee to access them. Sci hub takes the papers in the closed access journals and distributes them online for free.

Hope that made it clear.

Arxiv doesn't let you access paid articles from Springer, Elsevier and the like

Hope widespread knowledge about its existence does not kill it.

http://libgen.io/ which archives the sci-hub's newly accessed papers distributes the copies through BitTorrent. Libgen itself is also mirrored in multiple locations. Even if Sci-hub is taken down, it shouldn't take long for another to pop up.

They encrypt their torrents, so it's pretty useless.

What makes you think that? If the filenames confused you, those are checksums. The metadata is available as database.

they don't distribute scihub data in their torrents, it's a separate data set :(

It's already at the receiving end of a lawsuit by Elsevier in New York. I doubt that more attention would change anything.

Death to Elsevier. I hope everyone knows the story about how they used to be involved in the international arms trade until outrage at The Lancet forced them to stop:


Partly depends on whether people organize to support it, I guess.

So ordinary people may finally read my papers with reasonable effort? Sounds like an improvement. I am not in the academic content distribution industry though.

Looks like it is piping the queries over to scholar.google.com - getting only timeouts right now though.

It uses scholar as a search engine, but then it replaces the links in the results. e.g.


As far as I can tell, it works for some articles.

Just use DOI of the article, then it's instant and bypasses Google search

It's off and on. Last night I could access it but it came back.

US is prob ddosing it...

So what is the strategy once Springer starts getting their domains taken down? With torrents this was never a big deal once there was the DHT - it didn't matter which search engines where taken down or which trackers, the torrents lived to see another day.

This website at this stage seems particularly easy to take down as it is a centralized weak link.

Centralized services work great when they are legitimate: Netflix, Spotify, but decentralized work best when they are not legal.

Sci-hub isn't going anywhere:

1. The data is stored via lib Gen as torrents (for the PDFs), and a metadata database that is mirrored by hundreds of people.

2. In case of a domain takedown the site can be resurrected at a new domain very quickly - as recently happened when the sci-hub.org domain was taken down after Elsevier sued the Sci-Hub founder.

3. There's an onion site, which can't be disrupted by a centralised domain service.

Bookmark scihub22266oqcxt.onion

There is very little innovation from academic publishers. Most don't even offer a single download that includes the paper and supplementary materials. E-book files are non-existent.

Very expensive publications, like Nature Biotechnology, should at the very least provide a single download (preferably epub) of each issue.

Wonder what the size of the data set is so far?

Likely TB's?

The total size of the torrents are around 38-40 TB.

Wow, that's larger than I was expecting.

Um, is there a source for that info? Didn't see it in the referenced article. :)

I don't know where vortico's getting his/her stats, but you can calculate it yourself from the http://libgen.io/ , where sci-hub.io stores the newly accessed papers [1]. They have torrent archive of all the materials since 2011.

[1]: https://en.wikipedia.org/wiki/Sci-Hub#Website

Nice. Does anyone know if there is something equivalent to this much like the SSRN (Social Science Research Network) http://www.ssrn.com/en/index.cfm/mjensen-20th/

Where you can perform full text search on all the papers?

Server seems down (?)

Also, isn't this better done over bittorrent?

in memoriam: Aaron Swartz

Brain Aggregates: An Effective In Vitro Cell Culture System Modeling Neurodegenerative Diseases.

Can we download only single articles, or can we download the whole 40M+ collection?

While searching it goes to Google Scholar. Am I missing something?

They replace the links with their own, giving access to papers a plain Google Scholar search wouldn't.

I am wondering about the same thing. It simply shows the output of Google Scholar for that query.

(1) Isnt this illegal?

(2) How do they get access to those papers?

(1) Yes. In USA anyway. Creator is not in USA, however. (2) Academics donate their institutional login credentials to the site.

Do you know how I would go about donating my student access to the site? Do I need to give them my login, or can I run a program on my computer that logs in and downloads without giving anyone else the password?

Springer doesn't watermark based on logins? They could easily and then she the academics.

Usually being on the general university network is sufficient to get journal access. Using the university network credentials for the academic, it probably goes SciHub -> School -> Journal. So the school would have to block SciHub/discipline the academic. Disciplining people with tenure is hard, although maybe it is just graduate students giving the credentials idk.

There are many easy ways to get it to from others. Most students at Universities have access to the journal subscriptions as well. They get access by logging in with their University credentials OR using the barcode on their student card. BOTH of these are lost very often. People so often enter their details in a phishing site and lose their credentials, but what can also happen is that someone finds a lost student card and will take a picture of it and post it on Facebook asking if anyone can find x person.

This student card is typically on a public Facebook page and anyone could just use that code and get access to a wealth of journals.

I didn't know that you could raid ships using a website(!).

In all seriousness, the act of "sharing the collective knowledge of mankind publicly" isn't morally equivalent to attacking ships and killing people. We should stop using terms that are clearly propaganda created by the film and music industry to try to muddy the waters.

  > We should stop using terms that are clearly propaganda
  > created by the film and music industry to try to muddy
  > the waters.
This is an instance of reappropriation:


It is similar to the way groups have reappropriated slurs like “slut,” “nigger,” or “queer."

The word pirate has a new and different meaning in the 21st century, I think it's too late to try to fight that. And the title presumably helps with conveying what the website is about, compare it with my submission where I used the original title (and which got one upvote in total): https://news.ycombinator.com/item?id=11093454

I agree completely. We should also start owning the term and framing the debate around it. Cast it in positive anti-establishment terms. Like pirate radio, Robin Hood, V4vendetta. Lone, beleaguered hero(s) fighting for the peoples against tyrannical and oppressive regimes.

What might help is if someone started a file sharing site and named it after a place where pirates hung out and incorporated a pirate ship and the word "pirate" into their logo.

Then maybe if it became popular these so-called "pirates" could cast off the shame of being tarnished by an image they clearly don't embrace at all.

I heard about a certain site named "The Pirate Bay". They sort if got some publicity, but a positive reframing did not occur for some reason.

got any ideas for names?

Some people also make the case that the pirates, rather than the navy, were the 'good guys' of the maritime era.

The word "privateer" comes to mind.

Some people have been using the word "freebooting" instead.

Hello Internet is a very funny podcast. But "freebooting" isn't a positive thing. Sharing is a positive thing. The two terms do not equate morally.

Well piracy shouldn't be a positive thing.

<insert funny joke about the Pirates of the Caribbean movies not being positive>

"Piracy" is a negative term for "sharing of media". I don't see how sharing books between friends is a bad thing. Why is sharing music and movies suddenly "evil"?

Pirate is a badge of honor in these days. Think: Pirate Party.

Umm, "pirate website" that uses a secure.sci-hub.io, but not actually a secure connection. Really, Let's Encrypt has made this a no-brainer. Anyone making a site should be expected to be using SSL. Especially those making anything related to anything "pirate" or "secure."

> Really, Let's Encrypt has made this a no-brainer.

Sure, if you have the time and opportunity to check that the certificate renewal has worked properly every 60 days. Once you've written a cron job, of course, to do the renewal.

For someone who's primary task isn't IT that's probably not something they want to worry about or will forget about until people complain that the SSL is broken.

Probably better to throw $ to her to buy a multi-year SSL cert from a vendor.

Because if there's one thing pirates are known for, it's their commitment to your security.

I just checked, the two most popular public torrent sites have COMODO CA default-on https.

These sites do want to maintain security, so their users can keep coming back rather than getting copyright strikes or worse fines and getting scared off. That doesn't stop them from putting viruses in the ad banners, but that is not getting their users arrested.

Except that torrent sites are not necessarily completely used for pirating.

Torrent sites, especially the popular ones mentioned above, are almost entirely used for pirating. Bittorrent, on the other hand, is not exclusively used for piracy.

I'm fairly sure that raiding ships isn't very good for the security of the crew. Although, it might teach them to invest in better cannons.

Pirates that don't pay attention to security don't last very long, so I would tend to agree with that statement.

Try a quick survey of the most popular pirate sites and see how many support HTTPS. You might be pleasantly surprised.

That's not caring about security, it's just not wanting to get blocked by firewalls. I don't think you can seriously argue that pirate sites care about security when the biggest thing on a page is an adware-installing, fake "DOWNLOAD" button.

How does HTTPS help circumvent firewalls? I wasn't aware this was a feature. My understanding is that a firewall blocking file-sharing sites would do so whether the site was HTTP or HTTPS.

My point is that by using HTTPS, these sites have demonstrated a higher level of proficiency with security tools than many more popular mainstream sites. I agree the scammy fake download buttons are a problem though, but that's what ad-blockers are for...

How does HTTPS help circumvent firewalls?

The firewall cannot do deep packet inspection on encrypted connections, since it cannot decrypt the data: https://en.wikipedia.org/wiki/Deep_packet_inspection

A firewall cannot see encrypted layer-7 traffic, true, but you can block sites by IP/hostname. HTTPS does not help you circumvent firewalling.

But AFAIK they can block the whole domain, when you access an https site any proxy or firewall you have can see the domain but nothing more.

For example a firewall would see that you entered Google.com but not what you searched

Looking at it another way, a certificate issued by a CA is another thing that can be revoked by the authorities to immediately give browser warnings and cause people to think the site has been "hacked" somehow. Browsers rejecting self-signed certificates also makes that route useless.

We changed the title from "Sci-Hub – Pirate website providing public access to millions of research papers" to what the site itself says.

I don't think this was an egregious title rewrite, but the word "pirate" was becoming the subject of discussion, which a title shouldn't be (and that goes double for extraneous ones).

It's all fun and games, until someone in the open source community wants the same copyright protections from a commercial entity using GNU code without releasing the source.

Silicon Valley and YC don't exactly have a stellar reputation for ethical behavior. Having a "pirate website" at the top of the news page doesn't exactly change that perception.

I totally get that journals are evil, and charging money for research generated with public funds is questionable. It's very frustrating as a small entity needing to view articles, and being asked to cough up $25-50. That said, there are legitimate alternatives (like emailing the corresponding author, or professional society memberships, or alumni library access, or DeepDyve). The linked website is flagrantly violating copyright and that should be cause for concern; not breaking the law is part of every engineering (and professional) ethical code.

Distinguish ethical from legal. Not all laws are ethical (depending on where you live, most laws could easily be unethical), and this website goes out of its way to say why the laws they violate are not.

>One may well ask: “How can you advocate breaking some laws and obeying others?” The answer is found in the fact that there are two types of laws: There are just laws and there are unjust laws. I would be the first to advocate obeying just laws. One has not only a legal but moral responsibility to obey just laws. Conversely, one has a moral responsibility to disobey unjust laws. I would agree with Saint Augustine that “An unjust law is no law at all.” Now what is the difference between the two? How does one determine when a law is just or unjust?

-Dr. King, Letter From a Birmingham Jail

One guidance for the ethical side are the guidelines agreed upon by professional associations. I can't think of any that condone copyright infringement.

I disagree that that is a good source of guidance. Even if we take that as a given, OF COURSE few organizations promote breaking the law (sometimes a crime it self).

Although, many, many push to change the law / restrictions put on scientific research. I can't spend time to google and list all the references. start here https://en.wikipedia.org/wiki/Open_science#Projects_promotin...

Another guidance for the ethical side is the concrete behaviour of working scientists: If they would send you the paper if you asked them by email, this is a clear statement that copyright violation is perfectly okay.

Way back in the day, we would buy reprints from the journal, and then mail them to requesters. And then Xerox machines appeared, and we made our own copies. Now we just email PDFs.

At least in my discipline, authors either retain copyright to their manuscript and can disseminate that freely, or they are able to personally disseminate the final published article (sometimes including on their own website). No copyright transgression occurs in this case.

Generally speaking: what's legal and what's morally right sometimes diverge. Civil disobedience can be a necessity.

If being a professional means blindly following laws without thinking, I'm happy this place is not just for "professionals".

I think you are making a mistake of equating ethics with legality.

Furthermore, I am not affiliated with YC and have never been even been to Silicon Valley. I merely use this news/link aggregator because the content and links interest me.

Linking to a site doesn't even mean you are condoning it. Would you really rather the mods censor content like this because they are worried about their reputation? I think that is the day I would stop reading HN.

Well, I'm not from Silicon Valley or YC, and I avoid torrenting music etc because it takes more directly from content creators, but the publishers' business practices put this on a whole different level to me. Also, half the time for the old-school stuff that one has to cite in the introduction for a paper, the corresponding author is dead but the paper is still under copyright.


edit: Those alternatives will not work 99% of the times. I love the fact that it says pirate site.

Information wants to be free.

Hm, let's investigate how things get to the front page of HN. Ah, here it is, on a page titled "Hacker News FAQ": https://news.ycombinator.com/newsfaq.html under the header "How are stories ranked?"

Maybe perceptions are completely off base sometimes.

Regardless of your personal stance, this is newsworthy for hackers and it belongs on a site named "Hacker News" if the users upvote it as such.

With that said, the site isn't working for me. Pirates better not quit their day jobs.

Does YC wants to change that perception? In this other thread from today people suggest civil disobedience regarding copyright: https://news.ycombinator.com/item?id=11092016

And? That's not HN's stance. That's the opinion of people who have a login here, in an ongoing and free wheeling discussion! Sheesh.

If they want to seriously engage regulated industries like finance, medical, energy, etc., they need to. As it stands now, an association with them is suspect.

I too am disappointed to see this here, though the young hotheads are obviously out in force today and relishing sticking it to the man.

Surely we are better than this.

One of the earliest lessons I was taught, and I taught my kids, is that if somebody else has something we want and doesn't want to share it, it's not OK to just take it.

Firstly, copying isn't the same as taking.

More importantly most of the scientists want their research to be read and studied as widely as possible but have their careers to worry about. The journal system is being widely criticised but academics are not in the best position to take action against it.

The dissemination of knowledge, with it's potential for reducing inequality and increasing social mobility, is much more important than the profitability of journal publishers and outweighs the risk of hurt feelings due to a sense of ownership of knowledge (which seems like a fallacy in itself) that anyone involved could possibly have.

So why not email the corresponding author? I have yet to not get (or give) a manuscript that way. From my own perspective, each time I respond I'm possibly getting another citation. It's also a great form of networking.

When I do research I go through a lot of papers, many of them are discarded after the first couple of sentences. It would slow me down a lot, when I had to contact all of the authors in the first place.

How useful would google really be, if you had to contact every author before reading the actual website?

What's the practical difference between getting any paper you need from the authors and getting any paper you need from such a website? Except much more work for anyone in the former case without any benefit.

Practically speaking, none I can think of. But I completely disagree that the networking aspect has no benefit.

Because it currently wouldn't be possible to make a site of this scale with that method.

> One of the earliest lessons I was taught, and I taught my kids, is that if somebody else has something we want and doesn't want to share it, it's not OK to just take it.

That is why I don't take it away, but copy it instead.

> One of the earliest lessons I was taught, and I taught my kids, is that if somebody else has something we want and doesn't want to share it, it's not OK to just take it.

Like the absolute power of a dictator?

Of course this is a hyperbolic example, but the real world is not only black and white and simple rules like that cannot cope with the complexity of it. The question is, where we should draw the line. And many people in here agree, that publicly funded research should be made available to the public at no further costs for the greater good.


If, the king, reserves all the political power to themselves you would not join (and expect your children) to not join in a revolution against them?

If corrupt gov officials & cronies. keep all the food/medical aide for themselves, you would let your family starve, sick child die before stealing what you needed?

My point is your "lesson" is overly simplistic and naive. Reality is much grayer and messier. Some believe what in other contexts would be considered unethical, is morally justified, even morally required when it is needed to combat injustice/other unethical situation. But, sometimes the means do not justify ends. (messy). Why you and your kids need critical thinking more than simplistic platitudes.

I think people haven't totally thought this open access thing through.

First, publishing costs money, even in the digital age. It costs money to comb through submissions and decide which ones are worth pursuing. It costs money to hassle scientists into reviewing the submissions. It costs money to convert every submission into the same format. It costs money to develop and host a website to disseminate the articles. All of these things cost money.

Now, who is going to pay for it? Traditionally these costs were put onto the research institutions in the form of library subscription fees. Open access shifts this burden onto the author, and ideally grants would include that into the budget.

Even if grants include that in their budget (and many don't yet), there's a finite amount of money available for research. Shifting the cost of publishing onto grants will make funding available for actual research even smaller than it is now. In some fields publishing costs are entirely negligible compared to the cost of research, but in others it's not.

Also, open access would mean that you have to have funding in order to publish a paper. As it stands right now you don't actually need funding to do research in certain fields. A math professor at a university can devote some of his spare time to a project over several years and publish a paper on it with no costs at all. This happens all the time, not every paper has funding behind it.

I'm not necessarily arguing against open access, I just think people haven't fully explored the downsides of moving away from our current system.

They've been plenty explored, discussed, chewed over, and more, including the scenarios your presented. The costs have been analyzed in dozens of different ways, with many business models proposed and some executed upon.

(Note that your mathematician gets indirect funding by having access to the university library. As a non-academic, I can use the local college library but must pay access fees for some services that are free to staff and students. At a somewhat further away university library, as a visitor I can read journals online but am not permitted to make copies.)

Nor is our "current system", concentrated as it is in the hands of Elsevier (and its 37% profit on revenue), all that old. Most people outside the big publishing companies didn't fully explore the downsides of moving away from the system we had before the 1980s - or at the least, nothing like the ongoing discussions concerning open access.

Distribution companies made sense prior to the advent of the internet. But you can host journals over bittorrent easily, and if they were legal journals, there'd be no reason for people to stop seeding, since there's no penalty to do so.

I think your post has a bit of confusion, because as far as I know, the platforms themselves are not doing the peer review, they're just hosting the content.

The big complaint about these platforms is that the institutions that send out the papers to these platforms do so completely independent of the authors. The platforms do not fund the authors, the institutions do, and for whatever reason, the institutions continue to deal with the platforms.

The platforms themselves are quickly becoming irrelevant. Hosting costs are dropping radically, and the curation methods used by the platforms are very out-dated and more focused on anti-piracy techniques instead of making the information they're hosting accessible. The only reason they're staying relevant is due to the requirement that institutions publish to them, it's not as a result of actual service provided.

The current platforms are a legacy item and they are inhibiting research. Like a lot of old legacy services, at a time they made sense, but more and more they don't.

> Now, who is going to pay for it?

Taypayers, as it is now. The difference is, with parasitic incumbents gone, the price will fall down massively.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact