
Sci-Hub: Public access to research papers - lorenzofeliz
http://sci-hub.cc/
======
paulhilbert
For heavy users (like me) I suggest using one of many "implementations" of a
auto-scihub bookmark here:
[https://github.com/nfahlgren/scihub_bookmark](https://github.com/nfahlgren/scihub_bookmark)
It is a great addition to an otherwise clean tor-browser...

This is not mine - but looking at the code you should notice fairly quickly
that it is somehow trivial. Note also that it may be necessary to change the
URL if necessary.

Lastly it should be said that sci-hub does indeed need donations - it is
essentially a one-woman project and she can always use some lawyer-funds...

~~~
patall
I prefer the custom search approach over bookmarks: I have scihub as custom
search with 'sh' as key word, so I go to the address bar (Ctrl+L or similar),
and type {left}sh{space}{enter}

~~~
paulhilbert
My usual workflow is to check citations for a paper via google scholar or
getting a link from a peer which leads me directly to the paywall. From there
it is a bookmark click away. In other words I always need sci-hub when I
already found what I am looking for (but can't reach it). Does using sci-hub
as a search engine work well?

------
erikbye
This reminds me of Aaron Swartz's fight. People have made great sacrifices for
this. Information that can lead to or enhance knowledge and learning shouldn't
be behind closed doors.

~~~
lawl
Aaron Swartz's death brings tears to my eyes to this day. Even if I have never
met him, or talked to him. Or actually ever really heard of him until he had
to take his life.

There's a great documentary about this extraordinary guy I assume most have
already seen, but I'll post it again anyways:
[https://en.wikipedia.org/wiki/The_Internet%27s_Own_Boy](https://en.wikipedia.org/wiki/The_Internet%27s_Own_Boy)

~~~
erikbye
A terrible and tragic loss. And not the first or last victim of zealous and
overwrought prosecution. This kind of overkill-prosecution is a familiar and
common tactic used to oppress and silence activists.

Though not an activist, he was another hacker prosecuted by the same team as
Swartz, Jonathan James. He also committed suicide, at the age of 24. When he
was 17 he spent six months in a federal corrections facility for hacking.
Later, in 2007, he was connected to the TJX hack. He denied involvement but
was friendly with some of the hackers who were involved and his past gave him
little credibility. From his suicide note:

"The feds of course would see me as much more appealing target than Chris - if
they could tie me to this case I'd be like Mitnick times 10 to them. I
honestly, honestly had nothing to do with TJX. Unfortunately I don't picture
the feds caring all too much. Read Agent Steal's guide to getting busted. The
feds play dirty. Chris called me the other day. He was in jail and they let
him out. That can only mean that he too is trying to pin this on me. So
despite the fact that he and Albert are the most destructive, dangerous
hackers the feds ever caught, they'll let them off easy because I'm a juicier
target that would please the public more than two random fucks. C'est la vie.

I have no faith in the ‘justice’ system. Perhaps my actions today, and this
letter, will send a stronger message to the public. Either way, I have lost
control over this situation, and this is my only way to regain control.
Remember, it's not whether you win or lose, it's whether I win or lose, and
sitting in jail for 20, 10, or even 5 years for a crime I didn't commit is not
me winning. I die free."

~~~
starbird3000
I agree. It was a great guy. However, I think its worth pointing out that
JSTOR didn't want him prosecuted and they came to a viable agreement. SciHub
on the otherhand is plain theft and its not being backed by donations, it has
to be more well funded than that based on its uptime and storage.

The general estimates in the community is around $1.6 million a year to run.
That's not cheap.

------
xwvvvvwx
Sci-Hub represents everything that is good about the internet.

~~~
azangru
Well, not really.

Sci-Hub has a founder. A founder that may be sometimes be a benevolent
dictator, and sometimes not. For example, very recently, this very founder got
offended by Russian social media, in which some academics said something
unflattering about her, and a biologist apparently named a newly discovered
species of parasitic wasp after her. So she blocked Sci-Hub for all Russian
IPs. For about a week. And lifted that ban after a number of humble petitions
in Russian social media asking her to change her mind.

To represent everything that is good about the internet, Sci-Hub should have
been impersonal. Like archive.org perhaps, or pubmed, or thepiratebay.
Unfortunately, it is not.

But it's definitely a good thing we have it, anyway.

~~~
foo101
Can someone briefly describe how decentralization is achieved in archive.org?

~~~
paulhilbert
As far as I can tell the storage is quite centralized. At least I remember
them allegedly moving all their porcelain to Canada when that elephant entered
the room...

~~~
psittacus
Not moving, but copying. Here's the source:

[https://blog.archive.org/2016/11/29/help-us-keep-the-
archive...](https://blog.archive.org/2016/11/29/help-us-keep-the-archive-free-
accessible-and-private/)

------
roystonvassey
ELI5: I'm not from academia so please help me understand. Are researchers who
publish these [firewalled/copyrighted] papers likely to lose out on their
earnings if this is 'open-sourced'? I'm honestly asking as I try to think
through the whole 'open-source' paradigm and how it fits in a (mostly)
capitalist, libertarian world. Thanks!

~~~
beloch
1\. You produce novel research.

2\. You submit novel research to the best journal you think will publish it.

3\. The editor (the only person involved actually paid by the journal) invites
other researchers to _volunteer_ their time to peer review your research.

4\. Assuming your research bears up under scrutiny and the peer reviewers are
not actually competitors who decide to filibuster (this happens), your paper
passes peer review.

5\. Congratulations! The journal agrees to publish your novel research. For
this, you must _pay the journal_. They do not pay you. You pay them. Color
figures in the paper version nobody reads cost thousands extra.

6\. Having been paid by you to publish your work, the journal sells your paper
at exorbitant prices to university libraries (unless you paid them _lots_
extra to just make it available for free).

Taxes pay researchers to pay journals and taxes pay for university libraries
to pay journals. Why on Earth do intelligent researchers (or taxpayers) put up
with this crap? Being published in a good journal bumps up your impact factor
and helps you win more grant money. If high impact factor journals go out of
business because of piracy, others will just take their place.

In short, pirates screwing over journals doesn't hurt researchers in the
least. Shaking up the parasitic journal biz is actually long overdue. Journals
put in only a tiny percentage of the labor involved in putting a paper through
peer-review, but they soak everyone involved for massive amounts of time and
money. It's time they died.

~~~
Athas
Not all fields suffer from this to the same degree. For example, I am a
fledgling computing scientist. I mostly publish in ACM-related conferences and
workshops, where ACM is an organisation run by and for academics and
professionals in the computing field, not a for-profit publisher. My
submission experience is:

1\. You produce novel research.

2\. You submit novel research to the best conference or workshop you think
will publish it.

3\. The editor (a volunteer) invites other researchers to volunteer their time
to peer review your research.

4\. Assuming your research bears up under scrutiny and the peer reviewers are
not actually competitors who decide to reject (this supposedly happens, but is
rare), your paper passes peer review.

5\. Congratulations! The conference or workshop agrees to publish your novel
research. For this, you generally must show up to the conference or workshop
to present your work in person. This costs money (conference registration),
and also travel cost and such. However, if you were going to the conference
anyway, there is no added expenditure.

6\. Having been paid by you to publish your work, ACM sells your paper at
somewhat exorbitant prices to university libraries (unless you paid them lots
extra to just make it available for free). However, at no cost to you, you are
also explicitly allowed to post a "pre-print", identical to the published
paper, on your personal/university website, where it will be promptly found by
Google. However, the hosting is your own concern. (You can also upload the
preprint to arXiv.)

It's a decent procedure. ACM also publishes journals, which do not require you
to present anything in person. I do not know whether any author-borne costs
are involved if you go that route.

~~~
superasn
So does the piracy affect you? Are you in favor of sites like sci-hub?

~~~
lorenzhs
I'm in a very similar position as the OP (Computer Science PhD student).
Piracy doesn't affect me at all. I'm very happy if someone wants to read my
papers, however, you don't have to use sci-hub etc to access them - I've
published preprints of all of them on arxiv.org. Usually I do this right after
I submit to a conference and update it later to reflect any changes made
before publication.

This also allows me to timestamp my results - say my paper gets rejected
because the reviewers think it's a bad fit for the conference, or they didn't
understand my point because my writing was bad, etc. Now I have to improve it
and submit it to another conference. If someone else publishes similar results
in the meantime, then I can always point to the preprint and say that I did it
first ;)

~~~
unixhero
Nice approach!!!

------
eatplayrove
For those who use Telegram, there is a Telegram bot, where you send it the DOI
or the URL and it sends you the PDF -- so much easier than legally accessing
the file.

~~~
isatty
What's the name of the bot?

~~~
sleepychu
@scihubot [0]

[0] -
[https://twitter.com/Sci_Hub/status/731467465973174273?ref_sr...](https://twitter.com/Sci_Hub/status/731467465973174273?ref_src=twsrc%5Etfw&ref_url=https%3A%2F%2Fwww.techdirt.com%2Farticles%2F20160515%2F01471134445%2Fsci-
hub-repository-infringing-academic-papers-now-available-via-telegram.shtml)

~~~
dangom
Bot doesn't always find what the website does, though.

~~~
sleepychu
Sounds like a bug :-)

~~~
saladeen
The site returns direct pdf links only for paywalled journals. If you put in a
DOI for an article that's hosted on a free-access journal, the site redirects
you to the article's page on the journal, so you can download the pdf directly
from them. This behavior might be the cause?

------
dogruck
I'm surprised that this isn't old news for every HN reader. Did something
change that I'm overlooking?

~~~
mino
Indeed. How can scihub be on #1 on the HN homepage?

~~~
wallflower
I think it is evidence that there a lot more new HN users than old HN users
now. I'm not saying this is good or bad, just an observation.

~~~
dogruck
Interesting. Any way to get data on active new user numbers over time?

Regardless, Sci-Hub launched in 2011 and has been frequently discussed in the
media since then.

~~~
stevemclaugh
The only data we have are the 2015-16 download logs posted by Elbakyan and a
reporter for 'Science.' At the time they had 28 million+ downloads over 6
months. I truly hope no more logs are released (at least not with that level
of detail).

------
dredmorbius
What the academic publishing industry calls "theft" the world calls
"research": Why Sci-Hub is so popular

[https://redd.it/4p2rwk](https://redd.it/4p2rwk)

A piece I'd written a year or so back on why Sci-Hub is such a compelling
option for academic and independent researchers. It's been picked up by a
number of OA sites in the past few months.

As for Internet research gems, I'd also like to note a self-created resource
that could use love, the Online Etymology Dictionary, produced by Douglas
Harper. Unlike Sci-Hub, this is original work largely created to support
Harper's own etymological explorations. I've found it tremendously useful, and
very much in the spirit of the original web (of which it is very much a part).

[https://redd.it/75k35m](https://redd.it/75k35m)

------
robertwalsh0
Over at Scholastica ([https://scholasticahq.com](https://scholasticahq.com))
we've been taking on this problem for the last few years. We allow journal
editors to create, manage peer-review, and publish OA journals all in one
place. Sir Tim Gowers, the Field's Medal winner, uses our platform for his
journal Discrete Analysis
([http://discreteanalysisjournal.com/](http://discreteanalysisjournal.com/))
The journal Internet Mathematics recently came over to the platform after
being on Taylor & Francis for years
([https://blog.scholasticahq.com/post/internet-mathematics-
pub...](https://blog.scholasticahq.com/post/internet-mathematics-publishing-
solo-using-scholastica/)).

We think journals make a lot of sense and that the problem is that journals
don't control the toolchain.

I notice there's a lot of interest on HN around this subject from time to
time. If you work with a journal and want to get in touch or have questions
feel free to write me at rwalsh [at] scholasticahq.com

------
mlu
Sci-hub is great! I'm a researcher and use it multiple times a week. I wonder
how it works - any insights?

~~~
dotdi
In my understanding, some people have donated their academic login
credentials, effectively giving Sci-hub access to their institution's
subscriptions.

When somebody requests an article, Sci-hub will try and see if it already has
it. If not, it will attempt to use some of the donated credentials to gain
access, download it and store it for further download requests.

~~~
dingoonline
I'm still curious as to how this system works. Don't institutions have logs
that would show someone researching topics wildly outside of their field of
study?

~~~
college_library
At my library (smallish US college):

We do keep logs for a period of time, but the library administration's policy
is to only investigate them when a publisher contacts us with a claim of abuse
(we do not proactively monitor for unusual activity, although we do take steps
like limiting the number of concurrent sessions per user and blacklisting IP
addresses/ranges with a history of suspicious activity).

The publisher generally supplies examples of timestamps and URLs that were
part of the alleged abuse. We use that information to identify the "abusing"
user in the log.

Usually there is pretty clear evidence that the user is not conducting
legitimate personal research (e.g. the user is a freshman early childhood
education major at the local rural branch campus, but they're downloading
thousands of chemistry papers from an IP address in China or Russia).
Typically the user does not seem like an information freedom warrior, or even
to have a clue what is going on, so it seems most likely the credentials were
phished.

~~~
stevemclaugh
Thanks for this.

These cases may or may not be phishing. When corporations are hacked for their
user credentials, those databases sometimes end up in dark web markets. It
would be easy to extract email addresses with .edu domains ... so if a student
used their university address for some service and reused the password,
there's your login.

Moral of the story: Encourage students to use a password manager and 2FA.

------
Vinnl
Sci-Hub's an important player in the transition to Open Access. Note, however,
that many papers are already available elsewhere (e.g. ArXiv), legally. You
can use the OAButton [1] or Unpaywall [2] extensions if you hit a paywall to
find a free version. They're not perfect, and solve a problem that shouldn't
exist, but it's nice that they're there.

[1] [https://openaccessbutton.org/](https://openaccessbutton.org/) [2]
[http://unpaywall.org/](http://unpaywall.org/)

~~~
dredmorbius
Yes, many papers are available ... somewhere ... if you know precisely where
to look for them and find them.

The great thing about Sci-Hub, and a not-inconsequential element of its
success, is that it is very nearly universal. Content _is simply available in
the archive_ in a tremendous number of cases. And is directly referenceable by
DOI (or, when it's working, direct search).

The size of an archive matters as it reduces search costs _across_ archives
(this is a reason why archives tend to "a single max-size dump" dynamics). If
you look at lists of the world's largest libraries, for example, it's pretty
much the U.S. Library of Congress, and ... everything else. Even at the
university library level, the largest collections tend to be fairly uniform at
about 15-20 million volumes (Harvard, University of California, etc.).

The fact that a scholar can go to one such institution _and have access to
their entire archive_ is a compelling advantage. Shoe-leather adds up when
you're crossing provincial or national borders.

Similar dynamics drove the adoption of single scholarly languages -- Greek,
Latin, Arabic, Latin (again), French, German, and (in a battle with German and
Russian) English following the 2nd world war, given that works had to be
translated only to one language (English) rather than mulitiple.

Similar arguments, compounded by the insane costs of scribe and codex
formation, applied in the pre-print era.

~~~
Vinnl
Yes, and it's great that people can use Sci-Hub for that. That said, for
papers that are legally available, OAButton and Unpaywall exactly aim to solve
the problem of having to know where to look for them and find them.

They're still only limited to openly available articles ("green" open access),
but might be a solution for those not willing to use a solution of dubious
legality, or for whom perhaps their institution blocks it or something.

But yeah, I'm not trying to trivialise what a great resource Sci-Hub can be
for those who need it.

------
anc84
I wish we as scientific community could make some secure, anonymous, shared
hosting work for this. With sci-hub as authority and everyone else being able
to allocate as much of their local disk space for this.

edit: Before someone suggests IPFS: IPFS is not suitable. a) it is not
anonymous and b) it duplicates everything for its block storage so you have to
have twice the space...

~~~
jsilence
What about I2P? [https://geti2p.net/en/](https://geti2p.net/en/)

~~~
anc84
Yeah, I love I2P. Tahoe-LFS is well supported with I2P and vice-versa but I
think it has the same duplication annoyance like IPFS.

------
Tade0
After I graduated I gave off the credentials to my university's online library
to my sister, who studied at another university, which didn't have the proper
literature even though its profile(agricultural engineering) was much closer
to her field than mine(computer science/electrical engineering).

She should have had access to that literature without all this.

~~~
raister
That worked great for _your sister_. What about those researchers living let's
say in Brazil, where Universities cannot afford huge subscription prices? The
_only_ alternative is SciHub...

------
Asdfbla
The user experience really is just so good. Put in a paywalled link to a paper
and out pops the PDF. I think the ease of use really contributes to its
popularity a lot.

------
thinkloop
What would an example search be?

~~~
ColanR
I searched "particle" (as in physics etc...) and got "article not found".
What's up with that?

~~~
vertex-four
The point of sci-hub is that it fetches specific articles for you, not to be a
search engine. Find an article you want - for example, the PolderCast paper[0]
- then enter a link or the DOI for that article, and it'll find the paper most
of the time.

[0]
[https://dl.acm.org/citation.cfm?id=2442644](https://dl.acm.org/citation.cfm?id=2442644)

~~~
john_minsk
Could you advise any good resource to discover interesting articles?

~~~
vertex-four
Unfortunately not, I tend to find something close to what I want via Google
and then find papers that it references, and papers that reference it. Review
articles are useful, as are benchmark comparisons that you find in some
papers.

------
gexla
It gives you access to a lot of books as well. For example, all the apress.com
books that you can get in ebook format through the site.

------
dmingod
Is there a way to download the whole archive? People can do cool stuff like
visualization etc. on it.

~~~
3131s
Yes, but it's huge. At one time you could torrent it piece by piece but now
the link appears broken...

[http://libgen.io/scimag/repository_torrent_notforall/](http://libgen.io/scimag/repository_torrent_notforall/)

Anyone have the current link?

Good luck extracting much of anything useful out of older PDFs though.

~~~
userbinator
This?
[http://libgen.io/scimag/repository_torrent/](http://libgen.io/scimag/repository_torrent/)

I'm not sure what the numbers mean, but the last-modified dates on those
torrents span a range of 3 years ago to this month.

~~~
3131s
Yes, that's the one! Those numbers refer to the number of papers in each
torrent, so each one contains 100,000 papers giving a current total of 66+
million.

The torrents of 100,000 are broken into 1000-paper zip archives that can be
downloaded individually, so it's pretty manageable if you want to just check
out a random sampling of the papers.

I would love to see somebody do some kind of massive scale analysis of the
papers, but just extracting plain text from all those PDFs is a pretty
herculean task considering that many would need to be OCRed, and others end up
pretty garbled / misformated with pdftotext and the like.

~~~
lsh
[https://elifesciences.org/labs/5b56aff6/sciencebeam-using-
co...](https://elifesciences.org/labs/5b56aff6/sciencebeam-using-computer-
vision-to-extract-pdf-data)

------
ruc0la
Just wondering, what if I use this way to obtain papers for my thesis? Do I
have to worry about it?

~~~
Vinnl
Well, you might want to worry about citing sources that your readers might not
be able to read. Otherwise, it's mostly the same risks you get when using e.g.
the Pirate Bay. That's if you use the site, by the way, regardless of if you
use them for your thesis.

------
yeukhon
How can I help? Coding, money, infrastructure?

~~~
Vinnl
There's a "donate" link at the bottom.

------
Nazzareno
It scares me actually, when trying a search I got this:

поиск временно недоступен. пожалуйста, используйте DOI или прямые ссылки
search temporarily unavailable, please use DOI or direct links If you're using
Google chrome, you can install Sci-Hub extension to use search.

To do this: Download the extension and unpack it. You get the "Sci-Hub" folder
with code. Open Chrome and navigate to chrome://extensions, or just open the
menu -> settings -> extensions. Check the developer mode in upper right.[...]

~~~
dotdi
I am an avid user of Sci-hub and I never saw this before.

~~~
cooper12
It's the message that shows up when you try searching. It says that "search
temporarily unavailable, please use DOI or direct links" and then under that
"If you're using Google chrome, you can install Sci-Hub extension to use
search", with instructions. I've installed it in the past and it does work,
but I was also wary about enabling developer mode and Chrome would nag me on
startup each time that it was unsafe so I ended up disabling it. Here's the
full text: [https://pastebin.com/6RnRJYUa](https://pastebin.com/6RnRJYUa).

