
The Sci-Hub Effect: Sci-Hub downloads lead to more article citations - lnyan
https://arxiv.org/abs/2006.14979
======
oefrha
Maybe I misunderstood something, but it seems to me their data demonstrate
that less interesting papers that nobody bothered to download from SciHub tend
to have fewer citations than more interesting ones that are downloaded? How
does that support the conclusion “Sci-Hub downloads _lead to_ more article
citations” (emphasis mine)?

(Yes I read the paper and I’ve seen all the other factors they considered, but
those factors simply aren’t as good predictors/indicators of interest as an
organic download number from _any channel_ , whether it’s ScienceDirect or
SciHub.)

Edit: To be clear, papers in their supposedly "limited access" (if we want to
draw the same conclusion) control group appear to be available on Sci-Hub,
i.e. as open access as their Sci-Hub group. Details in this comment:
[https://news.ycombinator.com/item?id=23710992](https://news.ycombinator.com/item?id=23710992)

~~~
pen2l
As someone from a research lab from a big name school that already has access
to every journal: we use sci-hub all the time just because it is so damn
convenient (me, I have a chrome addon wherein I click a button on a page that
has a doi link, and wahlah, I'm looking at the pdf. Firewall at work blocks
scihub, but I find ways around it).

As to why my lab might be preferring papers from big-name journal: I think 1)
there's admittedly our own bias of wanting to cite big papers vs. small
papers, 2) if we're doing some basic literature search, Google will also yield
results that are already more popular. Really, at the moment, I struggle to
see how our behavior changes about which papers we're citing with or without
scihub.

~~~
neutronicus
I think without Sci-hub I might have just said "fuck it" and never read
certain papers from smaller journals that I later wound up citing

One thing I would do a lot with Google Scholar is find a highly-cited,
somewhat recent (past decade or so) paper and then click into its citations to
find the very most recent related work. IIRC this interface was chronological
so there was no big-journal bias and I found a lot of stuff in random
journals. Low-friction quality-filtering through SciHub was very helpful for
me in this process.

I worked in a theoretical field, so it's actually possible to assess the
quality of a paper just by reading it. If you're doing something experimental
you may have to lean on the big-journal filter a little bit more.

------
setgree
The econometric term for what this paper lacks is "identification strategy"
\-- a way for assessing the impact of x (downloads on sci-hub) on y
(citations) which _identifies_ a source of variation in x that is independent
of y.

A randomized controlled trial would be ideal. Second-best would be something
that, say, suddenly caused some papers to be available or not available on
Sci-Hub, or a "plausibly exogenous" (to y) cutoff around which papers would be
either available on Sci-Hub or not.

This paper offers nothing. I think we can say the relationship is "not
identified." Doesn't mean the conclusion is wrong, just that this paper
doesn't produce meaningful evidence on the question.

For more see Angrist and Krueger (2001)
[https://economics.mit.edu/files/18](https://economics.mit.edu/files/18)

(In Pearl/DAG language, the underlying quality or appeal of the paper is, as
others have noted, an obvious confound/backdoor path).

~~~
barmstrong
This is a great summary of how to show causal effects. Thx for pointing it
out.

I feel like we need a new way to do peer review, that is more real time - so
that papers can be upvoted/downvoted, flaws can be pointed out - and we have
some way to assess the truthiness of what the paper is claiming. Your comment
is a step in this direction (but we're not capturing the wisdom of the crowds
quantitatively around papers today - arxiv is great, but 1990's era web
design).

I'm working with a team that is trying to build better tools for science at
[https://www.researchhub.com/about](https://www.researchhub.com/about)

Rethinking peer review is one item on the roadmap.

~~~
thomasahle
> we need a new way to do peer review ... so that papers can be
> upvoted/downvoted

This one managed to get 700 upvotes on HN and still be meaningless.

~~~
Melting_Harps
> I feel like we need a new way to do peer review, that is more real time - so
> that papers can be upvoted/downvoted, flaws can be pointed out - and we have
> some way to assess the truthiness of what the paper is claiming. Your
> comment is a step in this direction (but we're not capturing the wisdom of
> the crowds quantitatively around papers today - arxiv is great, but 1990's
> era web design).

> This one managed to get 700 upvotes on HN and still be meaningless.

Agreed. This is why I don't think voting with online peramters, like down
voting, will ever be viable choice; because aside from things like Cybil
attacks, the reality is that Social Media has normalized and acutely optimized
group think and the tribalism that often follow. For those of that don't
engage in it its incredibly alarming and disconcerting how many succumb to
it's practices in real Life.

What I will propose as an alternative is something that was first tried with
Andreas Antonpolus' first book 'Mastering Bitcoin' wherein the chapters are
each individually uploaded to Github and follows the same process that OSS
does, wherein commits and corrections are submitted to alter the books
ultimate maintained version, and never really being fully 'finished' and can
be amended and annotated as needed to suit the updates that follow (eg: Segwit
or Lightening Network) or in this case perhaps a replicated experiment that
provides a larger sample size. Replication in Academic Papers is nearly non-
existant, despite the notion that Peer Reviewed (especially in STEM) was to be
a the critical component that made it invaluable.

This could also effectively undo the walled-garden-extrotionist model that
afflicts Academia's Peer Reviewed model and foster more interactive and
International work without being present in a University. Samples or specimens
may need to be more tightly controlled and transported, but this could
effectively be done over night if the Will exists for little to no money in
anything other than training.

If the International University model is undergoing mass disruption and
shifting toward a mainly online learning platform, this too could help
mitigate these glaring problems and have a Global Repository for ongoing
research.

Its almost stupid not to do it at this point given the MANY pitfalls that its
current model forces down our throats.

~~~
thomasahle
> commits and corrections are submitted to alter the books

I think this is good idea fro books, and I wish more scholars would contribute
to Wikipedia and other peer editable overviews.

I'm not sure quite how it would work for new results though, for which it is
still not quite clear how they fit into the current knowledge, or if they are
worthwhile at all.

People also tend to feel really strongly about their own original ideas, and
may try to push them forward when they should rather be forgotten.

I don't know how to prevent something like that, other than having a BDFL or
individual publishing (as in the current system)

~~~
hunter-gatherer
Gitlab has an article on their website somewhere where they address some of
your concerns. I'm on mobile now and can't find it, but the article advocates
for a 'Handbook first' approach to documentation. We did this at my last job
and I found it well worth the effort. The BDFL ended up being the quality of
our work. In other words, the most correct version always wins out.

------
skybrian
It seems like downloads from Sci-Hub and citations could have a common cause,
for example, _being interesting to scientists_. If Sci-Hub downloads are a
good proxy for interest and they happen first, then they could usefully be
used to predict citations, without necessarily being a cause. Different people
could be using SciHub versus making citations.

I couldn't tell from the paper whether they considered this, or what they
think the cause graph looks like.

~~~
kgwgk
"After controlling for problems of endogeneity, heteroscedasticity, and
inconsistency, the interpretation of our robust estimates goes beyond claiming
mere correlations or associations."

They think they have showed causality. If it only was so easy...

~~~
hatsunearu
Whoever reviews this is probably going to go ballistic lol

~~~
js8
The reviewers should still hedge their bets and make sure their research is
available on Sci-Hub.

------
PudgePacket
[https://en.wikipedia.org/wiki/FUTON_bias](https://en.wikipedia.org/wiki/FUTON_bias)

> FUTON bias (acronym for "full text on the Net")[1] is a tendency of scholars
> to cite academic journals with open access—that is, journals that make their
> full text available on the Internet without charge—in preference to toll-
> access publications.

------
captn3m0
The source for the Sci-Hub downloads is the "Who's downloading pirated papers?
Everyone" research from 2016 which was based on anonymized server logs from
Sci-Hub.

Dataset:
[https://doi.org/10.5061/dryad.q447c](https://doi.org/10.5061/dryad.q447c)

Research:
[https://doi.org/10.1126/science.352.6285.508](https://doi.org/10.1126/science.352.6285.508)

Article: [https://www.sciencemag.org/news/2016/04/whos-downloading-
pir...](https://www.sciencemag.org/news/2016/04/whos-downloading-pirated-
papers-everyone)

------
newyankee
I mean nowadays i see a lot of motivated and young crowd in India who have
become very knowledgeable due to the explosion in many scientific and other
such youtube channels in India. Previously amateur researchers could not get
access to research papers due to the prohibitive cost of access (unless they
are in a really top notch university, which frankly told is highly limited in
India as it is purely a numbers game). Sci-hub really enables access to many
such folks.

A similar story played out in the last decade (2000 - 2010) when affordable
streaming platforms did not exist. Many folks got familiar to global movies
and shows due to piracy as the price of even legal DVDs were prohibitively
expensive.

~~~
fxtentacle
I think it's also a strong psychological thing. I work for a German company,
so paying $40 per paper is no issue. But I still usually close the tab as soon
as I see that it's Elsevier.

Those people who are interested in me reading their work will send an arxiv
link or put the PDF on researchgate or publish open access. Especially for
mathematics, I need a printable PDF so that I can take notes. That makes DRMed
publications impractical to use.

So if someone only links to the paid version of their article, I usually just
assume that they're an arrogant prick and skip to the next paper.

There's now more good new research being published than I could ever read.
Researchers need to adapt to that by reducing friction.

~~~
tannhaeuser
> _There 's now more good new research being published than I could ever read.
> Researchers need to adapt to that by reducing friction._

That's encouraging to hear from at least someone since it doesn't meet my
experience. What I rather see in the few fields I still care about is that
we're flooded with a mass of unoriginal and uninspired papers, many using a ML
approach, where the purpose is clearly to get graduation or tenure rather than
advancing the state of the art. It's happening to a degree that even assessing
the major contributions in a field and separating me-too publication from the
few original and foundational works has become impossible, similar to how
general web search has become pointless. I'm all for free access, but 1. major
works have always been published as author's copies with free public access 2.
I really don't see any advancement in scientific quality at all as academic
achievements are becoming just stepping stones and academic institutions
career networks more than anything else.

Edit: also want to mention citeseer as my search engine of choice which seems
to have improved a lot after their rewrite ten years ago (which made it
useless for me)

~~~
Alekhine
I'm interested to hear why you think general web search is pointless. I know
that SEO and Google dropping various search functions has made things a little
more annoying, but it's still easier to find information than it's ever been.

~~~
tannhaeuser
No it's not, like at all. For nearly every topic I can think of in SW dev,
where I usually have a pretty good idea what I'm after, I'm hitting hundreds
of naive content-farm clickbait articles when I used to find posts by experts
in their blogs, in forums or mailing lists not even ten years ago. At first I
blamed Google for sending me to the sites with the most AdWords and
Doubleclick ads on them, but with DuckDuckGo consistently giving me just the
same results, I believe the problem is rather with the incentives for
producing content (or lack thereof), with Google and Facebook having extracted
all value out of what used to be "the web". It's not going to improve with ad
prices going down the toilet, and Google increasing their efforts of
monopolizing every single point of contact as they're struggling to grow.
Today if I'm "researching" (not in an academic sense) a topic, I go straight
to StackExchange sites, and sites like HN. Life's too short to care about the
world of copycat shite that Google indexes; people may find that "searching
456678743 sites in 0.03s" is not, in fact, very useful on the extant web.

~~~
fxtentacle
As odd as that sounds, I believe there's a market opportunity for a company to
start out again like what Google used to be: a search engine used mostly by
technical people and searching within a well-defined small circle of websites.

------
azalemeth
I wish sci-hub had a "donate paper" button. I've published things that are
open access but the published obsfucate or make hard to find yet get SEO'd to
the top of the rankings. I'd love to just upload a pdf, say, of a book
chapter.

~~~
henriquemaia
Submit it directly to Library Genesis. There's an upload option there [0]
where you can submit scientific papers [1].

Edit: I added some additional info.

[0] [https://library.bz/main/upload/](https://library.bz/main/upload/)

[1] [https://i.imgur.com/2YCpTSz.png](https://i.imgur.com/2YCpTSz.png)

------
snicker7
Publishers don't care about citations nearly as much as squeezing every last
penny from taxpayers that they can (via publically funded research /
libraries).

------
Percnopterus
There is a long known effect that open access articles get more citations
compared to non open access articles (e.g.
doi.org/10.1371/journal.pbio.0040157 ). So I do not see why this should be
very different for Sci-Hub articles. However I agree that the claimed effect
might be exaggerated by confounding. More interestingly proper deposition of
the data corresponding with the article increases also the citation rate (
doi.org/10.1371/journal.pone.0230416 )

------
aglionby
There's some similar work out that analyses the impact on conference paper
acceptance of having deanonymised arXiv versions of papers available before
review. They look at ICLR papers for the last 2 years.

I've not read it in a lot of detail but it looks like there's a positive
correlation between releasing papers and having them accepted. Not sure how
they've controlled for confounders (you only release papers you're confident
in the quality of on arXiv?)
[https://arxiv.org/pdf/2007.00177.pdf](https://arxiv.org/pdf/2007.00177.pdf)

------
jillesvangurp
Not so surprising. When I was still publishing (until about 10 years ago), I
always made sure my papers were easy to find and download. Academic
performance is basically measured through scientific references to your work.
So basically, a smart scientist would want to do SEO to ensure people can
actually find their articles. Even 20 years ago I avoided going to the library
to request copies of articles that somehow weren't available online.

Nothing gets rid faster of a potentially interested reader than a paywall. I
find it surprising, scientists aren't getting smarter about publicizing
themselves. All that effort and you can't be bothered to blog about your
findings, tweet a bit, engage with your peers online. etc.? There's this
notion of spending months years on something and then expecting people to
actually find it, pay for it, and then read it only to then consider referring
to it. It doesn't work that way if you are just starting out.

------
TekMol
How does one even use Sci-Hub?

The search functionality is "temporarily unavailable" and Google seems to have
not indexed the site.

Find the abstract elsewhere and then use the DOI to find it on Sci-Hub?

~~~
eythian
I actually quite like using the telegram bot. I just send it a URL or title or
whatever and it replies with a PDF.

~~~
BelleOfTheBall
Oh, what's the name of the bot? I like using Sci-hub a lot but it gets pretty
slow sometimes, the bot would be much more convenient.

~~~
lnyan
Perhaps this one [https://telegram.me/scihubot](https://telegram.me/scihubot)

~~~
eythian
That's the one, yep.

------
micelwell0563
Same here. Simply put, this is correlation. The irony humorous.

[https://www.onnewstimes.com/2020/05/fortnite-v-bucks-
redeem-...](https://www.onnewstimes.com/2020/05/fortnite-v-bucks-redeem-v-
bucks-gift.html)

------
mmis1000
Although they do have positive correlation. Not sure whether it is.

The more downloads leads to more citations(so the paper is seen by more
people).

or

The more interesting papers has more downloads(people download papers that are
more interesting).

Looks like both way makes sense. Not sure which way is contributed more to the
correlation?

~~~
otherme123
I kind of like the other conclusion: Impact Factor of the journal is not
associates with the number of citations. IF is an important point in this
bussiness. I've been in some job screening in which they aggregate your papers
according to the IF of the journals (like one paper in a 1st quartile journal
equals three papers of a 2nd quartile).

But according to this data you could publish in a 4th quartile that if your
paper is interesting, free to download and with some figures, it will be read
and cited.

------
enjoyyourlife
Appropriately posted on arXiv

------
akramtariqkhan
I have experienced it first-hand due to the fact that referring to paid papers
is not possible unless one has a huge budget and the abstract itself isn't
enough to decipher what lies beyond!

------
Yetanfou
Ghee, whiz, whod'dathunkit that opening up access to papers would lead to more
people having access to papers which they'd then use to write more papers in
which they cite those papers they read.

Even if you have institutional or individual access to the likes of Wiley or
Elsevier it is usually far easier to just feed the DOI to Sci-Hub and read the
paper instead of jumping through all the hoops to get 'official' access. This
goes doubly for those who, like me, use whitelists for cookies and block
third-party content (including cookies) since it generally takes a few
attempts to convince the paywall that you just logged in for the umpteenth
time and can I now please read that paper please? Nope, thou shalt not pass!

 _aw shucks, I 'll just get the thing off Sci-Hub again_.

------
slim
Researchers should not be allowed to cite a paper unless they demonstrate
proof they actually did pay for the papers they are citing /s

~~~
jkh1
Readers of scientific papers never had to pay. Before papers were available in
digital form, you could request a paper copy from the authors. Of course the
authors usually paid for some extra copies for this but when those ran out,
you'd get a photocopy. Then people started requesting a copy and would get one
by email. Now it's just faster to simply find that copy on the web than wait
for someone to reply to your email.

------
pedro596
It's like saying that if you give free newspapers near the metro it is
possible that more people will read them.

The only problem is that here they are giving for free "paid newspapers". So,
everyone that wanted a paid newspaper but didn't have the money to pay for it
read it more times because they were able to steal them.

By this analogy the conclusion adds that the fact that people are able to
steal newspapers helps to keep everyone more informed.

Now, please do the same analogy for food.

~~~
guerrilla
> The only problem is that here they are giving for free "paid newspapers".

Or the problem is that there are "paid newspapers" in the first place.

> Now, please do the same analogy for food.

No, because information is not a finite resources in the same way that food
is: Food can't be cloned at negligible cost after it is produced. If food was
replicable like in Star Trek, then at that point we could make the same
argument for food. (We do waste a lot of food when people can't pay for it or
its transporation and distribution though.)

------
nxpnsv
Popular papers are more cited! Truly groundbreaking stuff there.

------
red_admiral
From my own experience, putting your papers for free on your discipline's
archive (arXiv or whatever), or at least on your personal webpage, is a must
if you want citations. There's no excuse for having your work behind a paywall
that you don't even get commission from!

------
buboard
I think the most important is that these citations are being read in full
text. This is better that adding a reference based on what it says in the
abstract

------
lowdose
Aka the bandwagon effect.

------
zitterbewegung
I thought I was good at finding information on Google. Then I took a reading
class on quantum computation with a professor. It was like going from a yellow
belt to a black belt after two years .

I don’t think researchers care that much about rules that hinder them to be
publish or perish.

