Hacker News new | past | comments | ask | show | jobs | submit login
The Sci-Hub Effect: Sci-Hub downloads lead to more article citations (arxiv.org)
754 points by lnyan 42 days ago | hide | past | favorite | 92 comments



Maybe I misunderstood something, but it seems to me their data demonstrate that less interesting papers that nobody bothered to download from SciHub tend to have fewer citations than more interesting ones that are downloaded? How does that support the conclusion “Sci-Hub downloads lead to more article citations” (emphasis mine)?

(Yes I read the paper and I’ve seen all the other factors they considered, but those factors simply aren’t as good predictors/indicators of interest as an organic download number from any channel, whether it’s ScienceDirect or SciHub.)

Edit: To be clear, papers in their supposedly "limited access" (if we want to draw the same conclusion) control group appear to be available on Sci-Hub, i.e. as open access as their Sci-Hub group. Details in this comment: https://news.ycombinator.com/item?id=23710992


As someone from a research lab from a big name school that already has access to every journal: we use sci-hub all the time just because it is so damn convenient (me, I have a chrome addon wherein I click a button on a page that has a doi link, and wahlah, I'm looking at the pdf. Firewall at work blocks scihub, but I find ways around it).

As to why my lab might be preferring papers from big-name journal: I think 1) there's admittedly our own bias of wanting to cite big papers vs. small papers, 2) if we're doing some basic literature search, Google will also yield results that are already more popular. Really, at the moment, I struggle to see how our behavior changes about which papers we're citing with or without scihub.


I think without Sci-hub I might have just said "fuck it" and never read certain papers from smaller journals that I later wound up citing

One thing I would do a lot with Google Scholar is find a highly-cited, somewhat recent (past decade or so) paper and then click into its citations to find the very most recent related work. IIRC this interface was chronological so there was no big-journal bias and I found a lot of stuff in random journals. Low-friction quality-filtering through SciHub was very helpful for me in this process.

I worked in a theoretical field, so it's actually possible to assess the quality of a paper just by reading it. If you're doing something experimental you may have to lean on the big-journal filter a little bit more.


I'm not in research but the same thing happens in embedded development: parts that you need an account to purchase and for which you need to log in to view the datasheet are less popular than those that have clones and for which the PDFs are available right inside my EDA tool.


> I struggle to see how our behavior changes about which papers we're citing with or without scihub.

Under what conditions do you see that changing? You yourself say how useful it is, but discount the effect it has on the citation graph. Could we say the samething about arxiv?

This is exactly why research needs to be open and freely available. The esoteric, rarely cited papers are locked behind paywalls and away from search engines. And in doing so, their impact fades over time as the flock centers around a single vein of focus. The future should look like a net not a river.

I am not an academic and I read papers for their metalessons in areas that interest me. But without Sci-Hub, lots of research would not be accessible outside of its niche, esp if it was unlucky enough to not be in the right journal at the right time.

On hilarious pattern I see, is when a bigshot in a field publishes a new paper, there is a race to cite it by some nobody. Basically the academic publishing version of "first post".


I think the title they chose is just misleading (wrong?). Otherwise, their conclusion is

> we found that articles downloaded from Sci-hub were cited 1.72 times more than papers not downloaded from Sci-hub and that the number of downloads from Sci-hub was a robust predictor of future citations

> The results suggest that limited access to publications may limit some scientific research from achieving its full impact.


It’s more than just the title. The conclusion from abstract reads “The results suggest that limited access to publications may limit some scientific research from achieving its full impact.”

My uncharitable read is they slapped an interesting yet unsupported conclusion onto a rather boring observation.

Edit: In particular, it seems to me that papers in their control group are available on SciHub (correct me if I’m wrong), just nobody bothered to download them in the time window they analyzed. Which directly contradicts the “limited access” part of their claim.


> Edit: In particular, it seems to me that papers in their control group are available on SciHub (correct me if I’m wrong), just nobody bothered to download them in the time window they analyzed. Which directly contradicts the “limited access” part of their claim.

I was not sure about this, but if this is the case then their conclusion seems to be unrelated to their findings.


I'm pretty sure about this now.

I just looked into the actual data set[1], and analyzed a few papers from it, e.g. [2] the first paper in scopus_nature.csv. They're definitely available on Sci-Hub at least at the moment [3], and there's no reason to believe they weren't available back then.

Also, here's what they say about the dataset, emphasis mine:

> As a quality control, we performed a random sampling of all the articles retrieved, excluding those already present in the first data set. As in this second data set, the number of Sci-hub downloads is precisely equal to zero, we regard it as a control group (nC = 4,015) from which we are going to estimate comparisons for our experimental group (nE = 4,646).

Downloads equal to zero != not available.

[1] https://osf.io/xb9yn/

[2] https://www.scopus.com/inward/record.url?eid=2-s2.0-85016141...

[3] https://sci-hub.tw/10.1038/541123a


So it seems that this papers result comes to "more downloads => more citations".

Of course, that is hardly a surprising result. I think most authors have read most papers they cite (although things like reviewer/editor shenanigans could make this less than 100%).

Of course, such a pedestrian result does not grab attention.


> Downloads equal to zero != not available.

I'm wondering about this too. What could be the possible other reason for the papers not being downloaded? Boring titles? Preprints easily available? Open access?

> and there's no reason to believe they weren't available back then.

Is it possible that the articles really were not available, as they used the data from sep-2015 to feb-2016, early version of SciHub?


> Is it possible that the articles really were not available, as they used the data from sep-2015 to feb-2016, early version of SciHub?

That's within the realm of possibilities but sure doesn't sound like the case the way they described their methodology. That would have required a catalog of available papers on Sci-Hub too, but the dataset they used only contained download logs, so they couldn't have known whether a zero-download paper was available or not (unless I misunderstood what the first dataset is about).


I thought that too. But in the "Statistical analyses" section it says: "the interpretation of our robust estimates goes beyond claiming mere correlations or associations." which when you read through the statistical analysis is clearly not true.


I think what they're hinting at is that without SciHub, even a popular paper would have less exposure. Some papers which cite a popular publication can only do so, because they could download the referenced one through SciHub.


I don't think anyone questions that some articles have been cited because they were downloaded from scihub.

This was still a garbage attempt to 'prove' this conclusion. And I don't have any faith the authors measured the true magnitude of these effects.


I find in nearly all cases, if you invert the causality and the results make more sense, you can ignore a paper.

"Papers with more citations are downloaded more frequently on Sci-Hub" makes more sense to me than "Papers downloaded frequently from Sci-Hub get more citations".


Yes this is a huge confound. Their methods state...

The first data set contained all articles (nE = 4,646) that were downloaded from Sci-hub between September 2015 and February 2016 [24, 25]. The second data set was extracted from the Scopus database and contained all articles published in the selected journals within the same time period. From this data set, we excluded articles already present in the first data set.

Their methods lend to circular reasoning since the conclusion could also be "papers of high interest (cited more often) are more likely to be downloaded from scihub"


I guess what you are saying is that correlation does not imply causation, and in this case the causality arrow may go the other way round.

In this case, there's something that actually can help establish the direction of the causality arrow: time. If the citations precede the downloads, then sure, it's more well known papers get downloaded more. This papers show that the downloads precede citations though:

"number of downloads from Sci-hub was a robust predictor of future citations"

While it's not a watertight argument that downloads lead to citations, the theory is at least plausible.


So basically, correlation, speculated (w/o basis) as causation?


Same here. Simply put, this is correlation. The irony humorous.


The econometric term for what this paper lacks is "identification strategy" -- a way for assessing the impact of x (downloads on sci-hub) on y (citations) which _identifies_ a source of variation in x that is independent of y.

A randomized controlled trial would be ideal. Second-best would be something that, say, suddenly caused some papers to be available or not available on Sci-Hub, or a "plausibly exogenous" (to y) cutoff around which papers would be either available on Sci-Hub or not.

This paper offers nothing. I think we can say the relationship is "not identified." Doesn't mean the conclusion is wrong, just that this paper doesn't produce meaningful evidence on the question.

For more see Angrist and Krueger (2001) https://economics.mit.edu/files/18

(In Pearl/DAG language, the underlying quality or appeal of the paper is, as others have noted, an obvious confound/backdoor path).


This is a great summary of how to show causal effects. Thx for pointing it out.

I feel like we need a new way to do peer review, that is more real time - so that papers can be upvoted/downvoted, flaws can be pointed out - and we have some way to assess the truthiness of what the paper is claiming. Your comment is a step in this direction (but we're not capturing the wisdom of the crowds quantitatively around papers today - arxiv is great, but 1990's era web design).

I'm working with a team that is trying to build better tools for science at https://www.researchhub.com/about

Rethinking peer review is one item on the roadmap.


> we need a new way to do peer review ... so that papers can be upvoted/downvoted

This one managed to get 700 upvotes on HN and still be meaningless.


> I feel like we need a new way to do peer review, that is more real time - so that papers can be upvoted/downvoted, flaws can be pointed out - and we have some way to assess the truthiness of what the paper is claiming. Your comment is a step in this direction (but we're not capturing the wisdom of the crowds quantitatively around papers today - arxiv is great, but 1990's era web design).

> This one managed to get 700 upvotes on HN and still be meaningless.

Agreed. This is why I don't think voting with online peramters, like down voting, will ever be viable choice; because aside from things like Cybil attacks, the reality is that Social Media has normalized and acutely optimized group think and the tribalism that often follow. For those of that don't engage in it its incredibly alarming and disconcerting how many succumb to it's practices in real Life.

What I will propose as an alternative is something that was first tried with Andreas Antonpolus' first book 'Mastering Bitcoin' wherein the chapters are each individually uploaded to Github and follows the same process that OSS does, wherein commits and corrections are submitted to alter the books ultimate maintained version, and never really being fully 'finished' and can be amended and annotated as needed to suit the updates that follow (eg: Segwit or Lightening Network) or in this case perhaps a replicated experiment that provides a larger sample size. Replication in Academic Papers is nearly non-existant, despite the notion that Peer Reviewed (especially in STEM) was to be a the critical component that made it invaluable.

This could also effectively undo the walled-garden-extrotionist model that afflicts Academia's Peer Reviewed model and foster more interactive and International work without being present in a University. Samples or specimens may need to be more tightly controlled and transported, but this could effectively be done over night if the Will exists for little to no money in anything other than training.

If the International University model is undergoing mass disruption and shifting toward a mainly online learning platform, this too could help mitigate these glaring problems and have a Global Repository for ongoing research.

Its almost stupid not to do it at this point given the MANY pitfalls that its current model forces down our throats.


I agree with this.

I'm currently doing research, and one factor that always makes go nuts was the actual time to go and find an article and then donwloadit. I can say that I have used or rather abused sci-hub so that I can find related papers.

Having something like "github" style + the option to visually view the changes updates, what are the connection, who did what, where is coming fro, something in the way the site conectedpapers does is also very powerfull.

Lastly, the only concern in here it is that if it's ever leving document how ca you asses as whole one document if its good quality at certain given time ?

So my proposal wil follow under "feature request" type where even if the changes are made, having the ability to read the "metadata" or quality check done by lets say AI(plagarims, etc, etc,) + with the feedback from online reviewer = qood quality check.


> commits and corrections are submitted to alter the books

I think this is good idea fro books, and I wish more scholars would contribute to Wikipedia and other peer editable overviews.

I'm not sure quite how it would work for new results though, for which it is still not quite clear how they fit into the current knowledge, or if they are worthwhile at all.

People also tend to feel really strongly about their own original ideas, and may try to push them forward when they should rather be forgotten.

I don't know how to prevent something like that, other than having a BDFL or individual publishing (as in the current system)


Gitlab has an article on their website somewhere where they address some of your concerns. I'm on mobile now and can't find it, but the article advocates for a 'Handbook first' approach to documentation. We did this at my last job and I found it well worth the effort. The BDFL ended up being the quality of our work. In other words, the most correct version always wins out.


> This one managed to get 700 upvotes on HN and still be meaningless.

On the other hand, the top/most salient comments on it are all pretty critical of its causal claims.


Not to mention that it also needs to account for science advancing one funeral at a time.

How to distinguish between real flaws or just a good theory that is not yet taken seriously ?


I just uploaded the papers I’ve co-authored — all were licensed CC 4.0 — to Research Hub and emailed to say hello. Looks like a great platform!


Thank you so much for this feedback. My team and I will be more than happy to discuss these ideas, if you want to partner up...


Hi Dr. Correa,

Thanks for writing. I've taken a quick look at your paper and its OSF page, and you all clearly know what you're doing and have some great work here. I am not your target audience (I am not an academic), but if I were peer reviewing this, I'd suggest one of two avenues.

1) As Angrist and Krueger put it, "In our view, good instruments often come from detailed knowledge of the economic mechanism and institutions determining the regressor of interest...progress comes from detailed institutional knowledge and the careful investigation and quantification of the forces at work in a particular setting." With this in mind, is it possible to contact the maintainer of Sci-Hub, email her your paper, and ask if she can think of any plausibly exogenous sources of variation in papers' availability on Sci-Hub? Some possibilities that come to mind for me:

* In my own experience, papers without DOIs are often harder to find on Sci-Hub. Are there closed-access journals that don't mint DOIs on which you could use propensity score matching to create something of an apples-to-apples comparison?

* Is there an identification strategy somewhere in journals with optional APCs for which only some of the papers are open access?

* Were there ever any sci-hub outages that lasted periods of many months, and, however long down the line, did the citations of paywalled articles decline relative to their open access peers?

2) If none of this is possible, I suggest removing all causal language from your article and sticking to a predictive model. Observational research is appropriate for exploring and identifying causal hypotheses rather than confirming them; no statistical techniques that attempt to control for unobserved population heterogeneity will persuade me otherwise (I am admittedly an extremist on this position, but my own views hew closely to those of Gerber, Green and Kaplan (2003): http://www.donaldgreen.com/wp-content/uploads/2015/09/Gerber...). The Lewbel (2012) paper you cite looks interesting but, to paraphrase Gerber et al., statistical techniques can't account for nonstatistical sources of uncertainty (the inherit unnkowability of whether you've specified the 'correct' model, in the absence of randomization or exogenous variation, means we can't say anything about the biasedness of your estimation procedure).

Observational research is great! Generating hypotheses is as important as confirming them. It's just that some research designs license causal inference, and some do not.

Best of luck.


Just for the record my definition of an identification strategy is not quite right -- it should be

> a way for assessing the impact of x (downloads on sci-hub) on y (citations) which _identifies_ a source of variation in x, that is independent of y, except through its impact on x.

There are a bunch of different ways that people express this. One is that there are no backdoor paths between x and y. Another is that X should be uncorrelated with the error term of the model.


It seems like downloads from Sci-Hub and citations could have a common cause, for example, being interesting to scientists. If Sci-Hub downloads are a good proxy for interest and they happen first, then they could usefully be used to predict citations, without necessarily being a cause. Different people could be using SciHub versus making citations.

I couldn't tell from the paper whether they considered this, or what they think the cause graph looks like.


"After controlling for problems of endogeneity, heteroscedasticity, and inconsistency, the interpretation of our robust estimates goes beyond claiming mere correlations or associations."

They think they have showed causality. If it only was so easy...


Yeah it doesn't look like they consulted a statistician when doing this research. It always amazes me how a paper like this can be 90% statistics based but non of the authors are statisticians.

Without the proper statistical analysis it just becomes a paper on how highly cited papers get downloaded from Sci-Hub which isn't very interesting.


Whoever reviews this is probably going to go ballistic lol


The reviewers should still hedge their bets and make sure their research is available on Sci-Hub.


Similar to many other commenters, I don't really believe the far-reaching conclusions in the paper either.

As a scientist, the papers that I download from Sci-Hub are mostly the ones I am interested in with regards to my current research projects, and as such are quite likely to be cited. Or in other words, the work I do predicts which papers I download and which papers I cite.

There may be some causation of availability to citations, but I believe it would have been more interesting _before_ Sci-Hub. That is, I would guess that papers freely available somewhere on the internet (mostly home-pages for scientists) would have more citations than papers that are just available behind a pay-wall. For me, Sci-Hub has changed my citations from "just those I can get my hands on", to "any paper I'm interested in".


The obvious way to assess if availability is the factor is to look at open access publications. Elsevier (unsurprisingly) say it makes no difference [1] and there are confounding effects which are stronger. For example, papers with code tend to be cited more in ML. And it's very difficult to control for article quality, which is a big factor (MDPI's quality is extremely hit/miss, for example). Open access may improve your paper's reach, but will it actually give you more citations? Unclear. I'm very much in favour of open research though.

On the matter of Sci-Hub. I already have access to all the journals I commonly use via my university. Those journals almost certainly track downloads (Elsevier even has a recommender system when you download a paper). All Sci-Hub does is reduce friction for researchers who can't be bothered to use their institutional login while off campus. Admittedly that includes a lot of people, but for the people who measurably contribute to citation metrics, does Sci-Hub actually improve availability? I doubt it, at least in universities with budgets for subscriptions. Before Sci-Hub there were always routes to get papers: ask the author or ask a colleague in a neighbouring university if they have access. Or you can ask your library who may be able to get it.

My point is that generally it was a bit more work sometimes, but in the grand scheme of publishing a paper, it wasn't so bad to network to get access to things.

It's certainly enabled the general public to access research and no doubt tons of small companies are using it to avoid paying for subscription fees, but those people aren't significantly publishing papers in places where we can measure it.

All this study shows is that paper downloads from anywhere is a proxy for popularity/hype which can be a proxy for citation count. I'm sure Elsevier or Taylor or Nature could provide a similar correlation between downloads and citations. There is also a bias here - if I'm looking for papers, I'm probably going to download the most highly cited stuff first because that's a weak signal that it's a useful paper.

I'd be more interested to see, as you said, temporal analysis of before/after Sci-Hub.

Or alternatively geographical studies - does this affect lower income countries more? I imagine that this is a boon to researchers who didn't have access because their institution couldn't afford it.

[1] https://www.elsevier.com/connect/citation-metrics-and-open-a...


> It seems like downloads from Sci-Hub and citations could have a common cause, for example, being interesting to scientists. If Sci-Hub downloads are a good proxy for interest and they happen first, then they could usefully be used to predict citations

So far so good.

> , without necessarily being a cause.

The logic has fallen apart. In order for the downloads not to be a cause, the same citations would have to occur in their counterfactual absence. You're claiming that anyone who downloads a paper from Sci-Hub would, in the absence of Sci-Hub, obtain that paper by other means.

Compare the currently higher-positioned thread ( https://news.ycombinator.com/item?id=23710896 ), where this paper is getting slammed for concluding that limiting the availability of a paper may prevent the research described in that paper from "achieving its full impact". On your analysis, where Sci-Hub downloads are driven by interest from scientists and that explains the entire more-downloads-means-more-citations effect, that conclusion is 100% correct.


> The logic has fallen apart.

Note the words used in the argument: "could have a common cause", "without necessarily being a cause".

The logic that fails is the one that depends on a common cause being impossible and concludes that one thing is necessarily causing the other.

(It plausably may do so to some extent, but that paper doesn't prove that nor gives a good estimate of the effect.)


No, I specifically agree that they have a common cause. Nobody's depending on a common cause being impossible.

But they cannot be coincidentally related by a common cause! This is clearly a causality chain:

    1. interest in the paper -> reading the paper
    2.   reading the paper   ->  citing the paper
It is not:

    1. interest in the paper -> reading the paper
    2. interest in the paper ->  citing the paper
You can't seriously believe that interest in the paper can lead to citations of the paper without being intermediated through reading the paper!

And that intermediary position of Sci-Hub means that, in the model described, it cannot possibly fail to be a cause of citations. That would be the same argument as "one game downloaded through piracy equals one lost sale". If you don't believe that one illegal download = one lost sale, you've immediately admitted that piracy is a cause of additional gameplay.


OP of higher-positioned thread "slamming" the paper.

> And that intermediary position of Sci-Hub means that, in the model described, it cannot possibly fail to be a cause of citations.

No, downloads don't automatically lead to citations. The possible scenario where every single citation came from another channel (arXiv, ScienceDirect, SpringerLink, JSTOR, what have you) is entirely consistent with the paper's findings. I for one happen to be a scientist, I don't think I ever cited a paper I downloaded from Sci-Hub, but I've certainly grabbed a few papers from other fields on Sci-Hub; those downloads did not contribute to citation stats at all.

In addition, the paper is trying to conclude "limited access to publications may limit some scientific research from achieving its full impact" by comparing papers available on Sci-Hub to, guess what, papers available on Sci-Hub (https://news.ycombinator.com/item?id=23710992). I really need some solid convincing to accept this leap in logic.

I don't necessarily disagree with the conclusion, in fact I think it's in a way kinda obvious (there should be an effect, however small it is), but the paper's analysis adds nothing to support it. I'm just not a fan of shoddy analysis, whether I like the conclusion or not (otherwise I'd better be a politician or lawyer).


> I don't necessarily disagree with the conclusion, in fact I think it's in a way kinda obvious (there should be an effect, however small it is)

Agreed.

> but the paper's analysis adds nothing to support it.

I don't quite agree with this. The paper demonstrates that more realized access is related to more citations. I think you can make a solid argument that more potential access is related to more realized access. That connection is enough to count the analysis as "supporting" the conclusion ("more potential access ~ more citations") at a level which is greater than zero. Support isn't proof.


If the null hypothesis is that research professionals' discovery of and access to papers interesting and relevant enough to their own paper to cite is independent from the paper also being available on Scihub, I don't think the paper's demonstration that papers interesting enough to garner future citations are [predictably] also more popular on Scihub than those which aren't does much to support its rejection.

There's no reason to believe the correlation between realized access and future citations isn't exactly the same or stronger for subscription platforms or paper copies in university libraries, and nothing in the paper itself to suggest that researchers publishing in top tier journals might discover and access papers exclusively through those methods rather Scihub.

I'm sure the frequency with which papers are referenced in undergraduate essays also correlates with future citations in top tier academic journals, but any paper arguing for a causal Undergraduate Essay Effect on citations in top journals would be laughed off, especially if they made claims like 'revealed the importance of undergraduate scholarship in almost doubling the number of [top tier journal] citations of articles mentioned by undergrads'

[FWIW I don't think the direct and indirect impact of papers being accessible Scihub on professional research work is literally zero, but this paper does nothing to indicate otherwise.]


The point is that the paper does not (and can not) distinguish one situation from the other.

Edit: your diagrams are too simplistic anyway. Papers can be read elsewhere. Citations have more causes than readership and its incremental effect will be heterogeneous.


Well, if a paper is available in Sci-Hub I’ll download it and I may cite it, while of it’s not I won’t so their is 0 chance I’ll cite it. My university system was already annoying to use, but now they switched to a Chrome extension that is not working half of the time. Not worth bothering with it.


There is a very obvious cause, which is that people tend not to cite papers that they have never seen.

So if your reviewer tells you that you need more citations, you use the stuff that's easiest to get.


You use the stuff cited in the citations you already have :-)


https://en.wikipedia.org/wiki/FUTON_bias

> FUTON bias (acronym for "full text on the Net")[1] is a tendency of scholars to cite academic journals with open access—that is, journals that make their full text available on the Internet without charge—in preference to toll-access publications.


The source for the Sci-Hub downloads is the "Who's downloading pirated papers? Everyone" research from 2016 which was based on anonymized server logs from Sci-Hub.

Dataset: https://doi.org/10.5061/dryad.q447c

Research: https://doi.org/10.1126/science.352.6285.508

Article: https://www.sciencemag.org/news/2016/04/whos-downloading-pir...


I mean nowadays i see a lot of motivated and young crowd in India who have become very knowledgeable due to the explosion in many scientific and other such youtube channels in India. Previously amateur researchers could not get access to research papers due to the prohibitive cost of access (unless they are in a really top notch university, which frankly told is highly limited in India as it is purely a numbers game). Sci-hub really enables access to many such folks.

A similar story played out in the last decade (2000 - 2010) when affordable streaming platforms did not exist. Many folks got familiar to global movies and shows due to piracy as the price of even legal DVDs were prohibitively expensive.


I think it's also a strong psychological thing. I work for a German company, so paying $40 per paper is no issue. But I still usually close the tab as soon as I see that it's Elsevier.

Those people who are interested in me reading their work will send an arxiv link or put the PDF on researchgate or publish open access. Especially for mathematics, I need a printable PDF so that I can take notes. That makes DRMed publications impractical to use.

So if someone only links to the paid version of their article, I usually just assume that they're an arrogant prick and skip to the next paper.

There's now more good new research being published than I could ever read. Researchers need to adapt to that by reducing friction.


You probably know this already, but for the benefit of others: While it depends on the subfield, a lot of computer science and mathematics papers are freely available.

1. Searching for the title often gives you a PDF in the first few results already. Authors usually have the right to upload a version on their personal website or arXiv.

2. There is Google Scholar [1], which has links to PDFs on the right of the result (if it knows one). It is better at this than regular Google search, in my experience.

3. Manually searching on the author's website (or former website, if they moved) sometimes proves successful, although this is relatively more effort.

4. As mentioned, writing an e-mail to the authors works as well. (If you buy a paper, the authors get nothing, so there is no incentive not to share it.)

[1] https://scholar.google.com/


> There's now more good new research being published than I could ever read. Researchers need to adapt to that by reducing friction.

That's encouraging to hear from at least someone since it doesn't meet my experience. What I rather see in the few fields I still care about is that we're flooded with a mass of unoriginal and uninspired papers, many using a ML approach, where the purpose is clearly to get graduation or tenure rather than advancing the state of the art. It's happening to a degree that even assessing the major contributions in a field and separating me-too publication from the few original and foundational works has become impossible, similar to how general web search has become pointless. I'm all for free access, but 1. major works have always been published as author's copies with free public access 2. I really don't see any advancement in scientific quality at all as academic achievements are becoming just stepping stones and academic institutions career networks more than anything else.

Edit: also want to mention citeseer as my search engine of choice which seems to have improved a lot after their rewrite ten years ago (which made it useless for me)


I'm interested to hear why you think general web search is pointless. I know that SEO and Google dropping various search functions has made things a little more annoying, but it's still easier to find information than it's ever been.


No it's not, like at all. For nearly every topic I can think of in SW dev, where I usually have a pretty good idea what I'm after, I'm hitting hundreds of naive content-farm clickbait articles when I used to find posts by experts in their blogs, in forums or mailing lists not even ten years ago. At first I blamed Google for sending me to the sites with the most AdWords and Doubleclick ads on them, but with DuckDuckGo consistently giving me just the same results, I believe the problem is rather with the incentives for producing content (or lack thereof), with Google and Facebook having extracted all value out of what used to be "the web". It's not going to improve with ad prices going down the toilet, and Google increasing their efforts of monopolizing every single point of contact as they're struggling to grow. Today if I'm "researching" (not in an academic sense) a topic, I go straight to StackExchange sites, and sites like HN. Life's too short to care about the world of copycat shite that Google indexes; people may find that "searching 456678743 sites in 0.03s" is not, in fact, very useful on the extant web.


As odd as that sounds, I believe there's a market opportunity for a company to start out again like what Google used to be: a search engine used mostly by technical people and searching within a well-defined small circle of websites.


I would like to promote the Netherlands here. The university I'm working for aims to have all papers open-access. This is derived from national policy [1].

[1] https://en.wikipedia.org/wiki/Open_access_in_the_Netherlands...


Would be pretty awesome if YouTube channels are translating and explaining new research in local Indian languages. This could give a significant boost to public scientific awareness.


I wish sci-hub had a "donate paper" button. I've published things that are open access but the published obsfucate or make hard to find yet get SEO'd to the top of the rankings. I'd love to just upload a pdf, say, of a book chapter.


Submit it directly to Library Genesis. There's an upload option there [0] where you can submit scientific papers [1].

Edit: I added some additional info.

[0] https://library.bz/main/upload/

[1] https://i.imgur.com/2YCpTSz.png


The NIH already has a 'Public Access Policy' for all NIH-funded research projects (which compose the majority of all university research). If only everyone complied...

https://publicaccess.nih.gov/

It mandates that within 1 year of journal publication the authors are to make the article available for download on pubmed...

"Before you sign a publication agreement or similar copyright transfer agreement, make sure that the agreement allows the paper to be posted to PubMed Central (PMC) in accordance with the NIH Public Access Policy. Final, peer-reviewed manuscripts must be submitted to the NIH Manuscript Submission System (NIHMS) upon acceptance for publication, and be made publicly available on PMC no later than 12 months after the official date of publication."


Publishers don't care about citations nearly as much as squeezing every last penny from taxpayers that they can (via publically funded research / libraries).


There is a long known effect that open access articles get more citations compared to non open access articles (e.g. doi.org/10.1371/journal.pbio.0040157 ). So I do not see why this should be very different for Sci-Hub articles. However I agree that the claimed effect might be exaggerated by confounding. More interestingly proper deposition of the data corresponding with the article increases also the citation rate ( doi.org/10.1371/journal.pone.0230416 )


There's some similar work out that analyses the impact on conference paper acceptance of having deanonymised arXiv versions of papers available before review. They look at ICLR papers for the last 2 years.

I've not read it in a lot of detail but it looks like there's a positive correlation between releasing papers and having them accepted. Not sure how they've controlled for confounders (you only release papers you're confident in the quality of on arXiv?) https://arxiv.org/pdf/2007.00177.pdf


Not so surprising. When I was still publishing (until about 10 years ago), I always made sure my papers were easy to find and download. Academic performance is basically measured through scientific references to your work. So basically, a smart scientist would want to do SEO to ensure people can actually find their articles. Even 20 years ago I avoided going to the library to request copies of articles that somehow weren't available online.

Nothing gets rid faster of a potentially interested reader than a paywall. I find it surprising, scientists aren't getting smarter about publicizing themselves. All that effort and you can't be bothered to blog about your findings, tweet a bit, engage with your peers online. etc.? There's this notion of spending months years on something and then expecting people to actually find it, pay for it, and then read it only to then consider referring to it. It doesn't work that way if you are just starting out.


How does one even use Sci-Hub?

The search functionality is "temporarily unavailable" and Google seems to have not indexed the site.

Find the abstract elsewhere and then use the DOI to find it on Sci-Hub?


>Find the abstract elsewhere and then use the DOI to find it on Sci-Hub?

That's pretty much it. Suppose you want to read a paper on memes, so you search scholar.google.com for "memes". Then let's say you find https://www.jbe-platform.com/content/journals/10.1075/etc.10... (I just chose one at random).

Going to the link gives you just the abstract. This one happens to have a full text link, but suppose it didn't. Then you'd go to your favorite sci-hub site (e.g. sci-hub.tw) and paste the URL into the "enter URL..." field, and the paper shows up.


For me, this website is usually working, with relatively good search http://gen.lib.rus.ec/scimag/ Though, I often search on scholar and then c/p the title to scihub. Also, http://z-lib.org has a good search function, both for books and articles.


I actually quite like using the telegram bot. I just send it a URL or title or whatever and it replies with a PDF.


Oh, what's the name of the bot? I like using Sci-hub a lot but it gets pretty slow sometimes, the bot would be much more convenient.


Perhaps this one https://telegram.me/scihubot


That's the one, yep.


The few times I've needed it due my uni not having access to a paper using the DOI worked - there is also reddit.com/r/scholar


Yes, pretty much. I find an interesting paper, note its DOI, go to sci-hub to download it. Never tried to search on sci-hub itself.


Same here. Simply put, this is correlation. The irony humorous.

https://www.onnewstimes.com/2020/05/fortnite-v-bucks-redeem-...


Although they do have positive correlation. Not sure whether it is.

The more downloads leads to more citations(so the paper is seen by more people).

or

The more interesting papers has more downloads(people download papers that are more interesting).

Looks like both way makes sense. Not sure which way is contributed more to the correlation?


I kind of like the other conclusion: Impact Factor of the journal is not associates with the number of citations. IF is an important point in this bussiness. I've been in some job screening in which they aggregate your papers according to the IF of the journals (like one paper in a 1st quartile journal equals three papers of a 2nd quartile).

But according to this data you could publish in a 4th quartile that if your paper is interesting, free to download and with some figures, it will be read and cited.


Appropriately posted on arXiv


I have experienced it first-hand due to the fact that referring to paid papers is not possible unless one has a huge budget and the abstract itself isn't enough to decipher what lies beyond!


Ghee, whiz, whod'dathunkit that opening up access to papers would lead to more people having access to papers which they'd then use to write more papers in which they cite those papers they read.

Even if you have institutional or individual access to the likes of Wiley or Elsevier it is usually far easier to just feed the DOI to Sci-Hub and read the paper instead of jumping through all the hoops to get 'official' access. This goes doubly for those who, like me, use whitelists for cookies and block third-party content (including cookies) since it generally takes a few attempts to convince the paywall that you just logged in for the umpteenth time and can I now please read that paper please? Nope, thou shalt not pass!

aw shucks, I'll just get the thing off Sci-Hub again.


Researchers should not be allowed to cite a paper unless they demonstrate proof they actually did pay for the papers they are citing /s


Readers of scientific papers never had to pay. Before papers were available in digital form, you could request a paper copy from the authors. Of course the authors usually paid for some extra copies for this but when those ran out, you'd get a photocopy. Then people started requesting a copy and would get one by email. Now it's just faster to simply find that copy on the web than wait for someone to reply to your email.


It's like saying that if you give free newspapers near the metro it is possible that more people will read them.

The only problem is that here they are giving for free "paid newspapers". So, everyone that wanted a paid newspaper but didn't have the money to pay for it read it more times because they were able to steal them.

By this analogy the conclusion adds that the fact that people are able to steal newspapers helps to keep everyone more informed.

Now, please do the same analogy for food.


> The only problem is that here they are giving for free "paid newspapers".

Or the problem is that there are "paid newspapers" in the first place.

> Now, please do the same analogy for food.

No, because information is not a finite resources in the same way that food is: Food can't be cloned at negligible cost after it is produced. If food was replicable like in Star Trek, then at that point we could make the same argument for food. (We do waste a lot of food when people can't pay for it or its transporation and distribution though.)


Popular papers are more cited! Truly groundbreaking stuff there.


From my own experience, putting your papers for free on your discipline's archive (arXiv or whatever), or at least on your personal webpage, is a must if you want citations. There's no excuse for having your work behind a paywall that you don't even get commission from!


I think the most important is that these citations are being read in full text. This is better that adding a reference based on what it says in the abstract


Aka the bandwagon effect.


I thought I was good at finding information on Google. Then I took a reading class on quantum computation with a professor. It was like going from a yellow belt to a black belt after two years .

I don’t think researchers care that much about rules that hinder them to be publish or perish.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: