Hacker News new | past | comments | ask | show | jobs | submit login
Sci Hub repository torrents of scientific papers (rus.ec)
502 points by jacquesm on May 20, 2018 | hide | past | web | favorite | 255 comments

Not being in a scientific field, one of the previous times this came up I asked if any of the working scientists felt like SciHub had positively impacted their work: exposing them to more papers than they would have read, guiding them in different ways, etc. and the answer was a pretty overwhelming "ohmygodyes".

From the outside, it's very difficult to see this as anything but a public good. It certainly seems like literally the entire world is benefitting from this (including the scientists involved in the publishing) and its being held back by a handful of publishers.

As an undergraduate in the UK whose university doesn’t have the greatest subscription collection, Sci-Hub has literally enabled me to write my dissertation as there is no way I could have afforded the individual cost of the papers I’ve needed to reference.

I can only imagine what it is like in poorer parts of the world in terms of access to subscriptions, or lack thereof.

That's very true. The costs are enormous, and also keeps the research closed off to many non-Western universities. Based on rankings such as [1], the research performance is heavily skewed to the US and UK. To gain a greater global perspective, research needs to push outside of these bounds.


Be aware that those rankings are heavily biased.

e.g. they don't count 95% of German research at all.

Yeah if you go to a less than stellar university your data sources and paper sources are not that great.

I went to a poorly ranked undergrad and had most of the standard stuff.

I went to a top 10 for my post grad and we had so much stuff, market research reports that normally only consulting companies can afford, top data sources, virtually every journal subscription imaginable, top level subscriptions to online textbook resources.

Imagine a school which pays for a license so you can download PDF's of the textbooks for free.

Made a huge difference, the fact we all had access to this stuff meant the courses could teach literally from the research papers so we learned cutting edge stuff.

My exams were made up of content from research papers written between 2015-2018. Practically brand new developments.

Compared with the poor university which taught from the standard texts.

I'm a PhD student and my university has a pretty decent subscription. But while downloading a paper from a uni IP is very straightforward, from outside campus one has to jump through so many hoops that sci hub is the most comfortable option.

Not only in poorer countries, but for everyone, including Me, doing research before Sci-Hub !

Let me ask you a provocative question. When you build credibility in your field, will you be reviewing papers for free for open access journals?

Given peer review is unpaid and so is authorship, I am curious as to why you think this is a provocative question?

I would imagine that reviewing papers for free is usually considered part of 'service' to the field and the imagined tradeoff here has far more to do with whether the journal is influential or not, and almost nothing to do with pay or access policies.

Paid journals had a point 20 years ago, but not anymore. Most studies are funded by the tax-payers or collaboration with the industries. The authors are not paid. The reviewers are not paid either. Finally the publisher comes along and claims royalty. It doesn't make sense.

Just a reminder that authors have to pay around $800-1000 to publish an article. Then readers have to pay to read the article.

And you have to submit your work edited and almost ready to print, it's not like they are doing heavy work there either. I remember one time I submitted an image as SVG but they wanted EPS and asked me to do the conversion. Yes, that oneliner with "convert" was enough.

The payment is in cv credentials. It is a great payment, by thr way.

I'm not sure if I get your question right. Pretty much everyone in the academy will be reviewing some papers (there are reasons why it's nearly unavoidable, long story), and they're doing that for free.

Are you asking if they will be reviewing papers exclusively for open access journals and avoiding non-open access ones? Or something else?

Peer review is unpaid, no matter if the journal has open access or not

I'll answer this as a practicing academic: I do, regularly, and with the same willingness and rigor I review papers for other journals.

One thing that's never been clear to me: what benefit do you get from reviewing articles for paid journals? Phrased another way, what stands in the way of a movement where reviewers "go on strike" against anything that isn't open access?

Lets consider the top three paid journals I review for. This is certainly not all of them of course, but the volume of review work is the highest with them:

* The American Journal of Epidemiology * Epidemiology * Infection Control and Hospital Epidemiology

Those are the journals of the Society for Epidemiological Research, the International Society for Environmental Epidemiology, and the Society for Healthcare Epidemiology of America's journals. I am a member of all of those societies. I would, to be frank, rather like to be on the editorial board of those societies - and certainly wish to be on good terms with their editorial staff. And revenue from those journals does help support the mission of the society.

That's the benefit. "Going on Strike" would harm all that, and in my field, the vast majority of papers will be open access within a year anyway (due to NIH/CDC funding).

There may be other answers to this problem. When someone posts a blog article, it usually gets submitted to an aggregator like HN and it then gets upvoted/downvoted/discussed, all for free. You might say that you need to be an expert in the field to review a paper and, perhaps that is true - or perhaps new publishing constraints will lead to a trend of more readable papers with more background and thorough explanation so as to be understandable to a wider audience. I for one am not afraid to see a little classic SV disruption hitting Academia.

Sorry, but you have to do the work. There's no real point to having every paper revisit the key points of the field that everyone working already knows. The introduction section, when well written, provides starting points for research if you're not completely up to speed.

While cross-domain sharing of insights and knowledge is commendable and important, I'm not sure why it would make sense for review to be outsourced to non-experts in the field, as you describe (even if the paper was "readable" the domain experts would presumably be the most able to evaluate a paper). Maybe one or two reviewers on a panel of many, but otherwise it makes very little sense. All of these suggestions essentially serve to slow down the research process and have benefits for a very small portion of the potential audience.

I think what is more valuable to opening academic research up is greater open education material, besides the traditional models of bachelor->masters->phd->postgrad, and more aggressively written textbooks that work on the cutting edge of the field (with appropriate disclaimers).

Oh, and big disclaimer: tons of papers are badly written. That doesn't affect any of these points.

Agreed. Academia is gradually doing a better job of explaining research to the public, but this is (rightfully) separate from explaining it to other experts in the field. A good example is the LIGO Scientific Collaboration, who among other actions write publicly accessible summaries [1] and infographics [2] of most of their big detections.

Some groups have a (never large enough) budget to fund flyers, posters, videos, websites, etc. for public outreach and visits to public events. It's also usually possible to ask for this kind of fund when applying for research grants - like adding an extra 3% on top of what you ask for in order to publicise the work.

[1] https://www.ligo.org/science/Publication-GW150914/index.php

[2] http://www.astro.gla.ac.uk/~daniel/infographics.html

When I review an article, I do it for free. If it's short, I do it fast. If it's long and complicated, I notify whoever is requesting it that I need that-many-weeks for that. If they agree, great; if not, they will have no problems finding another reviewer.

They will be reviewing them for free or, at best, for peanuts.

(Usually, if peanuts are involved, they're literally peanuts)

This was crowdsourcing before it had a name.

What makes you think they plan to become an academic?

I was a researcher at a reasonably big UK University until about 6 months ago, and I used SciHub fairly regularly.

Sometimes I used it because we didn't have the right subscription for that publisher to get me the journal I needed, but mainly I used SciHub to access papers I already had the "right" to access. The layers of logins screens, buggy single-sign-on systems, having to be in the right IP address range, not being able to download a PDF, meant that I almost always attempted to get the paper through SciHub first, and only went through the library systems when I couldn't find what I needed on SciHub.

I have had access to pretty much every journal in existence by working at big R1s for some time now. Still I find myself using SciHub on a daily basis when I am not on campus. Much easier to append '.sci-hub.hk' than authenticate (2-factor now), fire up the proxy, etc.

Piracy has one USP that the competition lacks that is very hard for them to replicate: convenience.

That’s the same excuse that’s trotted out in every field with a piracy problem—up until someone finds a way to do it and it turns out it wasn’t an insurmountable problem after all. That’s how we got iTunes, Spotify, Steam, Netflix, and so on.

It's not really an excuse - it truly is convenient in such a way that you only have to go to one place to get the thing you want.

If I want Man in the High Castle, I have to pay for Amazon. But if I want Ozark, I need to pay for Netflix. If I want to watch Metropolis, or The Raven, or The Maltese Falcon, they're old and not any streaming service, even if the rights are owned by a studio. Where can I find them? Yup, bittorrent.

At least piracy more-or-less forced most music to be centralised onto a single service, i.e. in many cases you can get the same songs on Spotify and Apple music. It's a shame piracy didn't force Hollywood studios to sort their out, and allow you to just go to one or two sites. Just look at what Disney are trying to do.

You are of course right: "it wasn’t an insurmountable problem after all" - it's just that studios and publishers squabble. But I'm not holding my breath on Elsevier, MacGrawHill, AddisonWesley, OUP, SUP, etc. to come up with a commerical SciHub.

The selection of movies on Netflix is beyond dismal. If it's not from the last 15 years, your chances of finding it are virtually nil. If it is from the last 15 years, your chances are merely terrible. The demise of video stores has left me with no idea how I'd (legally) watch an old movie short of purchasing the DVD. Strange that widespread access to classic movies only lasted about 20 years -- from the mid '80s to the mid '00s.

That's a false dichotomy that because there was a blockbuster nearby they automatically had every classic movie in stock. I wouldn't be surprised if Netflix's total active catalog size is much greater than what was in a blockbuster in 1999

There are a number of alternative streaming services with a better selection. One that comes to mind is Filmstruck, which has the entire Criteron Collection but there was another successful one I can't remember the name of.

If you are into classic movies I'd imagine that you would still subscribe to the Netflix DVD mail service. Millions of other do. I'd say that is widespread access to old movies, just not widespread interest.

If you must stream (if you are up for physical media, last I checked, Netflix was king, and has am incredibly broad catalogue ) try Amazon Prime.

You have to pay a rental fee of like three or four bucks per, just like blockbuster, but they have a much deeper pool of old movies than Netflix Streaming, I think.

The working definition of "excuse" is "a reason I don't agree with".

I think you're agreeing.

That many people stop pirating as soon as something more convenient comes along does not contradict the idea that people used to pirate because it was convenient.

Why is it very hard? Take a look at the most successful platform for games, Steam, to see both things aren't mutually exclusive.

There is a huge difference between Steam and Elsevier and I'm not sure why you feel the one is an example of how the other could evolve. The one sells games for entertainment purposes to the masses, the other sells subscriptions to scientific institutions to give them access to research papers. It's b2c vs b2b and a competitive space versus a captive audience.

I think the parent comment was referring to video game piracy before Steam.

It was a mess of custom launchers, DRM, losing the CD/DVD or key so people would turn to piracy to access stuff they already owned.

Fast-forward a few year and most people are fine with Steam as long as it's the unified launcher for doing thing.

Pre-Steam also coincided with pre-Internet / online gaming. Having half your software assets lurking on company servers now makes protecting your IP considerably easier.

Ok, that's a good point. But video games are rarely created with public funds and in general are not crowdsourced to then be placed behind a paywall. Even Steam has plenty of competition.

If there were a disruption of the world of scientific publishing that would have a parallel in entertainment I would have chosen Spotify over Steam.

I'm only trying to refute the claim that "piracy has one USP that the competition lacks that is very hard for them to replicate: convenience". That's disproven by the fact that many companies e.g. Steam have had undeniable success on the back of that convenience.

I think that claim was made in the context of scientific papers, where there is no counterpart to Steam or Netflix.

Is BitTorrent more convenient than Netflix?

Torrents are very inconvenient compared to many streaming services.

Therefore, you would expect most people to opt for Netflix. (And by and large I'd expect Netflix to be more popular in mass than tormenting at this point for places where it's available.)

But what if it's not available where you live (That's inconvenient), doesn't have the content you want (That's inconvenient), or you want a file you can play offline that isn't allowed to be downloaded on the Netflix app?

Convenience wins almost every time, but not having the option to do something (be it for lack of $$$ or availability) isn't convenient so that's when more people turn to piracy.

My statement (as can be seen from its GP post) uses convenience to mean ease of use, not in the titles available sense that you (and the two following posts) do.

Far more convenient when you consider that Netflix's catalog of reasonably mainstream as-seen-in-theaters movies has been shrinking for years.

As the end user I don't care "why" their catalog has been shrinking, I only care that something I watched last year and want to watch again is no longer available on that service.

And no, I'm not going to sign up for a dozen different services just on the off chance that one of them will have what I want to see at any given point.

In my experience in the last five years, Netflix's batting average for actually hosting movies I want to see is around 20%. In terms of convenience, if they don't have the movie at all it's kind of like dividing by zero.

No but it is more convenient than lesser known streaming services like popcorn or certain kodi plugins that don't work well.

Depends on what you want to see. If Netflix has it in your country in your language of choice: Netflix wins, else BitTorrent wins.

No but there are piracy streaming apps that are more convenient than netflix.

"Click the picture of Einstein"

Ugh no thanks. Scihub it is.

I've read this many times. It's almost funny

Same exact reasons for me at a big US university.

Definitely a great resource, but (thankfully) it's almost completely redundant in a lot of fields. I'm willing to bet almost no one in my department has ever actually used it, or even heard of it, thanks to preprint servers like arXiv where almost everyone publishes their work for free on their own (usually either before or after publishing in a real journal, but some subfields have taken to exclusively publishing there).

I think fields outside of physics and math have caught on recently; there's now a bioRXiv, PsyArXiv, and ChemRxiv. That last one, kind of surprisingly, is actually co-owned and run by ACS and RSC, two of the largest chemistry publishing companies (who, iirc, had some weird terms that initially made some people weary about using preprints at all when publishing in their journals). Hopefully more publishers can follow suit and support open access across more fields.

This is my experience in political science as well.

Poli sci is starting to get some really good repositories of data (replication is something we actually care about), but for a long time common practice has been to post your paper on your own site. Nobody is going to stop you.



These are a number of articles in major journals which if you go through Wiley are closed access, but a quick search will bring it up. The only reason that doesn't happen is the author doesn't want to do it, which is detrimental to them because their research isn't promulgated as easily.

This stuff tends to be working itself out outside of SciHub, it's just the most visible route (probably because the approach is so newsworthy, being illegal and taking on big interests).

I wish there was a an arXiv for mechanical engineering, sadly most of the research is funded by industrial companies seeking competitive advantage so papers are less freely shared. Most of my masters thesis research sources came from paid-by-industry research projects.

Well, if you're feeling ambitious, you may be able to get one going with some elbow grease yourself. A lot of the marketing work has already been done now, with the success of the other ones.

And if that's not your cup of tea, that's fine too; I don't mean this as a "put up or shut up". Just putting up an idea you may not have had.

I don't know what field you are talking about, but in physics only about 95% of papers I want to read are published on arXiv, including older ones. I assume most researchers occasionally read something from scihub or subscription services.

Downside to this is if there is a major revision during publication peer review that doesn't get reflected in arxiv. Economics has has SSRN for years, and while not as widely adopted I would presume the issues between the two are similar.

It sounds like the "To promote the progress of science and useful arts, by securing for limited times to authors and inventors the exclusive right to their respective writings and discoveries* is made real by less copyright, not more.

And yet, Congress is now trying to extend copyright to 144 years.


No, it's not, that Act doesn't extend copyright terms. Those works are already protected until 2067 (144 years, for some of them). The Act changes the type of protection they receive during that term, specifically meaning that streaming services will always have to pay royalties for using those songs, rather than that being decided depending on state law.

It's pure rentism. Whatever little service they provided has completely evaporated in the age of the Internet. Now they're purely feudal lords, extracting their rent from a status quo.

Especially say in Eastern Europe, resources like SciHub are critical for students and researchers.

I don't know how I would have been able to write my first paper without SciHub. My lab didn't have subscriptions, they just decided to stop paying.

From my outsider perspective I mostly agree, I've read a few papers that weren't available for free. But for the opposing perspective, consider that they index curated content and don't curate anything themselves. This is a frequently debated argument. For comparison I'm considering youtube or torrent sites that are swamped with low quality content or outright malware. In effect, the inception of the internet, as far as I know, was motivated precisely by the need to exchange scientific material. But we are still far from a global library, on-line. Coordination does take a significant amount of work. So, if anything, it needs more investment. But whether that needs to be direct financial investment or just a lot of voluntary work from the people for whom the system is kind of working alright at the moment, is not even in the question. Because journals and universities acting as gate keepers setting barriers to entry is seen as elitist and pejorative by the vast majority of the excluded. That involves a lot of indirect criticism of the financial market system. The whole problem involves marketing. Just look at wikipedia, stackexchange or even the stock-exchanges to see what kind of imbalance, for lack of a better word, huge projects incur, virtually invariably.

* the vocal majority, at any rate

I have access to most papers through my university, but every now and then a pubmed or google scholar search leads me to a paper I cannot access easily. For those, instead of waiting for interlibrary loan I can easily just grab the article through SciHub.

Is it right? I don't know. But it makes my life so much easier when looking for something specific.

Paywalls are a totally evil thing. It hinders human progress will providing benefits only to a small amount of parasites.

The total collection is 54.54 TiB, with 690 torrents as of writing. For preservation purposes, I put magnet links to all the torrents here: https://pastebin.com/zTAqS7wz

So, even if the torrents go away, the magnet links should still be usable.

(Edited: fixed link)

What I find more amazing than the fact that some people have been so neighbourly as to share such a volume of information, is that this is still just a tiny fraction of all human knowledge. Scientific journals are certainly an important collection, but still miniscule in comparison to all the other books out there. I have collected ~20GB of automotive service manuals and related data, and that's a tiny amount too. To speak nothing of the many terabytes of entertainment others have...

I've seen some Plex servers hosting in the amount of petabytes of content, it's crazy.

Downloading all the torrent files:

  mkdir torrents
  for i in {000..689}; do curl -sS "http://gen.lib.rus.ec/scimag/repository_torrent/sm_${i}00000-${i}99999.torrent" -o "torrents/sm_${i}00000-${i}99999.torrent" -m 30; done
...which themselves take up about 75 MB, so here is the full collection of torrent files for your convenience:

https://lab.brainonfire.net/drop/delete-after/20180630/torre... (please use the next link if possible, though!)

And a torrent that is basically equivalent to that file, thanks to the below commenter:


Each torrent appears to index 100 zip files (about 800 MB each), each of which presumably contains 1000... journal articles? I don't know. The seed isn't blazing fast, so it will be a bit before I can pull out a random zip file from this randomly selected torrent and inspect it.

I base64'd a torrent file containing the HTML index and all 690 current torrent files and stuck it here: https://pastebin.com/p1nN6veK. Would appreciate if you could upload that in a binary format somewhere.

Magnet link for simplicity:

I am currently seeding this small file and will do so for as long as I am able.

(Edit: updated magnet link and torrent to include trackers)

Thanks! Uploaded as binary to https://lab.brainonfire.net/drop/delete-after/20180630/scihu... and currently seeding.

I can't manage to load this magnet link in transmission:

  Error adding "magnet:?xt=urn%3Abtih%3A97fc8218775e3a1e5b90607435c0581f839e0f1d&dn=repository_torrent&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.open-internet.nl%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.skyts.net%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.piratepublic.com%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2F9.rarbg.to%3A2710%2Fannounce&tr=udp%3A%2F%2Fpublic.popcorn-tracker.org%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker1.wasabii.com.tw%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.zer0day.to%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.xku.tv%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.vanitycore.co%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.mg64.net%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.tiny-vps.com%3A6969%2Fannounce&tr=udp%3A%2F%2Fp4p.arenabg.com%3A1337%2Fannounce&tr=udp%3A%2F%2Foscar.reyesleon.xyz%3A6969%2Fannounce&tr=udp%3A%2F%2Fopen.stealth.si%3A80%2Fannounce&tr=udp%3A%2F%2Fopen.facedatabg.net%3A6969%2Fannounce&tr=udp%3A%2F%2Fmgtracker.org%3A6969%2Fannounce&tr=udp%3A%2F%2Fipv4.tracker.harry.lu%3A80%2Fannounce&tr=udp%3A%2F%2Finferno.demonoid.pw%3A3418%2Fannounce": invalid or corrupt torrent file

Sorry, just tried this and apparently Transmission isn't a fan of the encoded colons (despite that being very valid URL syntax).

Try this one instead:


I'll try to remember to do it when I get home, but if you beat me to it can you please check and see if that's a known issue with Transmission? Because that seems easy to fix, and in my mind it's bad news to need "vendor specific" magnet links

It sounds like their URL handling might just be terribly broken; see for example https://github.com/transmission/transmission/issues/608.

It doesn't look like they have an open issue for that. If you file an issue, you may wish to crosslink it with https://github.com/transmission/transmission/issues/249 which would probably be solved by the same fix for URL query handling.

[wget] wget -r -l1 -H -t1 -nd -N -np -A.torrent -erobots=off http://gen.lib.rus.ec/scimag/repository_torrent/

Is there a way to see which are in the direst need of seeders?

I am currently participating in some of them at random, as I have nowhere near 50 TiB of free space but who love to do it more efficiently.

Possible in theory, just ping the trackers with each of the torrents and see which ones have the fewest peers.

Even if an entire satellite transponder (36 MHz) was used to broadcast these files (assuming 100Mbps broadcast/downlink), it would still take two months to download the whole thing.

Well, that depends on the code rate and modulation.

DVB-S2X 32APSK 32/45 could do 150Mbps with the same bandwidth (as in, the same amount of radio space, not internet bandwidth).

???? This is obviously equal to 54,000 datasets of 1 gigabytes each, which I would say is a pretty large dataset to publish along with a paper, so it's hard for me to even imagine 54 thousand such 1-gigabyte datasets. A one gigabyte PDF is astronomical in size, PDF's are usually far far shorter.

So why is this so large? Aren't they just PDF's, basically? And usually just a few pages? Excuse my ignorance. I'm very surprised at the size that you quote (54 TiB).

There’s dozens of millions of articles total, not thousands.

wow. I didn't know there were a grand total of dozens of millions of journal articles published total worldwide. My guess would have been hundreds of thousands at most. These are journal articles, not high school book reports, right? I'm astounded at the number you quote.

For example there are about 10,000 papers submitted per month to all of arxiv[1] - which is a huge number. That totals 132k per year or so. For there to be "dozens of millions" of journal articles, that would mean arxiv has just 0.83% of them. Given the very low bar to publishing on arXiv I would think if you compared the number of arxiv articles as a percentage of all pubished journal articles, it would be a lot more than 0.83%! So the "dozens of millions" of journal articles is extremely astounding to me.


arXiv only handles a small corner of scientific publications, so 0.83% doesn't sound out of whack. Most folks in biology or chemistry aren't on there, for example.

A Nature study from 2014 (http://blogs.nature.com/news/2014/05/global-scientific-outpu...) pegs the number of papers published between 1980 and 2012, with at least one citation, at 38 million. So, 69 million publications including uncited ones and ones from outside this time range actually sounds like an underestimate - that is, the 69 million paper archive is probably missing a fair number of articles.

There are a lot of scientists and they work hard (and get paid peanuts, which seems unfair).

Unfair and utterly stupid of humanity.

54TiB for publications only ? do you know the total for the books too (which is more libgen than scihub). I remember a 200PiB estimate but I'd love to know more.

Do you recommend a seedbox for this? I'm currently with Ultraseedbox which is only 3TB. I am thinking of getting a new service specifically to download then provide another source/host of these files.

This is a magnificent cultural artifact, a modern day library of Alexandria. That it had to be 'stolen' is disappointing.

Wonder how it must feel for Elsevier to have their entire business up on a torrent. Zero sympathy here.

> This is a magnificent cultural artifact, a modern day library of Alexandria. That it had to be 'stolen' is disappointing.

Many of the books in the library of Alexandria were really stolen (not just infringed copyright):

> https://en.wikipedia.org/w/index.php?title=Library_of_Alexan...

"The Library at Alexandria was in charge of collecting all the world's knowledge, and most of the staff was occupied with the task of translating works onto papyrus paper. It did so through an aggressive and well-funded royal mandate involving trips to the book fairs of Rhodes and Athens. According to Galen, any books found on ships that came into port were taken to the library, and were listed as "books of the ships". Official scribes then copied these writings; the originals were kept in the library, and the copies delivered to the owners."

To be fair, their business isn't really those articles, it's extracting rent on making use of their brand names...

I guess we don't know how Alexandria was amassed, not exactly.

This is awesome. If only we could share this huge dump of PDF files as a more structured format, perhaps using SQLite [1, 2], we could search through the torrents without having to wait for all of them to download beforehand.

Although I guess the fact of having to "download them all beforehand" forces the data to be spread across various computers, hence increases availability of the data.

One idea I had regarding this is perhaps structuring the contents of torrents as "append-only binary trees". So as new dumps are released every month, one can simply start downloading the torrent and has "search capabilities" for new data as well.

1. https://github.com/lmatteis/torrent-net

2. https://www.youtube.com/watch?v=EKttt8PYu5M&feature=youtu.be

Maybe someone could compute and host an index of this stuff?

Yes and those are effectively "torrent sites" which carry with them all the baggage of ads and being easily shuttable.

Not if the index runs on Tor, such as ZeroNet.

Google Scholar is something like that, though there is no easy API that I'm aware of that would allow you to build applications on top of the index.

So now I'm curious--

What happens if you take the intersection of Wikipedia references with Sci-Hub content? Is it substantially less than the total 54Tb content on Sci-Hub?

Also, has anyone made a browser extension that hyperlinks Wikipedia references with articles available over Sci-Hub?

>browser extension that hyperlinks Wikipedia references with articles available over Sci-Hub?

this works:


Probably, in general most published papers are new and not too interesting, so it'd be surprising and worrying if there are wikipedia entries for a majority of the papers.

Tonight Aaron Swartz is finally at peace.

This torrent page has been around for a long time.

What I don't get is...why don't governments simply declare that all research at publicly funded universities must be made available to the public. It's seems so trivial. You pay the researchers, you get the research results, you make it available for all citizens (or the world).

Companies do this, they keep the research results of their employees.

A major reason, unfortunately, is that they don't want to be interfering with the distribution of scientific articles. For example, if there's a very well-read journal in a specific discipline, but whose contents are only visible with a subscription, then if one countries prevents its researchers from publishing in those journals, research by that country's researchers will be less read. And the reason they do not want to do that, is because they by all means want to avoid being able to suppress the reach of research whose conclusions they might not like, to prevent situations like when the Catholic church was able to do so.

(That is, if they are actually actively aware of and see it as a problem. Lots of governments/funders also don't have an active Open Access policy, although this is starting to change.)

Eh, I think you're stretching in your interpretation of the causes of our current science publishing climate. As a current researcher, and former scientific publisher, I think most authors are interested in hitting certain "target" journals, it's a discussion that comes up early on in the research process. They want the right topic audience to read their work, and they want to publish in the highest impact factor journal they can. Some university departments requiore that researchers publish in multiple high impact, or many more low impact, journals before they get tenure. So, I believe countries are mostly unaware of this problem, except for UK and now others, as the UK has been demanding their publicly funded research be open source for a few years.

To me, the problem is that scientists are highly motivated to achieve successes in publishing and prestige and impact, and they are less likely to stand in front of governments and demand open-access and freely circulated articles. This is not in their best interest, I would argue, however they are EXTREMELY busy people, researchers work 60-70 hours with a multitude of different duties. It's not at all surprising they don't have the time to lobby congress vociferously on behalf of the commons.

> I believe countries are mostly unaware of this problem

Yes, that's what I was referring to in my ellipsis.

> To me, the problem is that scientists are highly motivated to achieve successes in publishing and prestige and impact, and they are less likely to stand in front of governments and demand open-access and freely circulated articles. This is not in their best interest

I agree that that is a crucial link in the vicious cycle that upholds the current system - I was merely giving a view in the motivations of why even the funders who are aware of the problem are not mandating Open Access without extra costs.

If you google "open access eu", you'll find a lot of articles reporting in 2016 on an EU initiative to have all its funded research open access by 2020. I wonder how that has come along since...

Scihub would strongly benefit from moving to ipfs, shocked that hasn't happened yet.


"This money is effectively a surcharge, or tax, on scientific research imposed not by a government but by a for-profit industry. Imagine for a moment how much research could be carried out using these resources if they were channeled back into our academic enterprise."

How are the torrents grouped or organized? What are some good ways to locally recreate the SciHub front-end search functionality?

Wow! Any idea how much data is there all together? I'm on a work network right now and can't fiddle with it.

About 50T or so from my quick calculations, but:


Has it at almost 60 T and it increases all the time.

I too would love the answer to that question.

The JSTOR archive that Aaron Swartz wanted released was 35GB. I would imagine SciHub's archive is probably much bigger, maybe a TB+?

Also, I wonder if supplementary figures/videos/program code are included in these archives. Probably not, certainly we'd be looking at many TB if such was the case.

It's 54 TiB, and that's just PDFs - no supplementary material. 69 million publications.

At least 26TB.

How much of Sci Hub is journals and other actual research papers, and how much is things like articles from general circulation magazines about science?

I know there is at least some of the later, because one time when I came across a Sci Hub link before it stopped working, it was to an article from Scientific American.

Wow! Did you wget all the torrents and run rtorrent or something on them?

That's 9 months behind the times.

It contains all of the information about the Sci Hub torrents without me linking directly to them.

Except the answer to the question that was asked.

I appreciate you posting the most up to date corpus size.

As a former scientist, and researching during the 90s, the publishing system that has arisen where a company can monopolise it, and by that, prevent access, is very unfortunate.

No scientist wants their work to be unseen and hidden behind a paywall, but that is what has happened.

Worse still, in some fields amateurs can make a reasonable contribution to the field (my experience is observation astronomy), and the current system hinders that.

So many comments along the lines of I can't find the paper I want without going to SciHub tells you just how broken the current publishing system is.

Researcher at a small company here (eg: SBIR work). I can't even tell you how many hours I've wasted looking for access to papers. Can't exactly use something like Sci Hub at work. But once I do find articles, usually with the help of a local university, I am able to do a lot more work. But combine the normal problem of having to run through many papers and add the inability to see beyond an abstract easily.

Open access greatly helps the small researchers. The big guys can already buy access.

You're probably aware of this already, but to be sure and to help potential other readers, Unpaywall can help you legally find gratis versions of articles: https://unpaywall.org/

I haven't actually seen this before. Thanks! Looks like it just does what I do manually. Great idea.

Also can you explain more what you do? With Flockademic.

Yeah, it's a great project :)

Flockademic attempts to help researchers promote their work without relying on publishing it in "prestigious" (and often paywalled) journals. I'm hoping to have quality research get its authors known as sources of credible research, rather than its journal.

I'm also currently in the process of applying for a grant for some exciting work that aims to solve the same problem in collaboration with preprint servers, but whether that materialises depends on the success of those applications. If you're interested and don't mind the occasional Flockademic spam, you can sign up for updates here: https://tinyletter.com/Flockademic (Or to the RSS feed here: https://medium.com/feed/flockademic)

Is this legal?

Of course not. Copyright law is extraordinarily restrictive, and you, me, and everybody you know is guilty of violation after violation, easily enough to bankrupt hundreds of millions of people if strict enforcement were applied.

SciHub is all normal copyrighted published work; The publishers claim the same commercial status as an album or novel. The fact that there are currently 69 million works bundled means that willful copyright infringement of them has statutory damages in excess of 10 trillion dollars, perhaps multiplied by the torrent seed ratio if the judge is feeling generous.

Plainly, our present copyright system is ridiculous, and ridiculously disproportionate. It's also (separately) morally outrageous to restrict scientific inquiry to institutional subscriptions, for work that was submitted for review for free. This is commonly acknowledged in academia however, where every other person is willing to help you get access to that paper or this preprint to help out.

The law of the land and the feelings of its population have an enormous disconnect here. The only thing preventing the two from colliding head-on and something reasonable coming out of that contact, are the fact that copyright infringement is litigated less than one time in a billion, and the vaguely defined, legally vulnerable principle of fair use.

You wouldn't be fined. You'd be arrested and charged criminally, and likely spend the rest of your life in jail. The public does not understand and would happily hang you for being one of those evil hackers. The person who tells them you belong there would be a $1000/hour professional witness.

Infinite-term copyright builds dynasties. They do not take kindly to competition. Stay safe.

In most counties you would not go to prison at all. Even in the US, you wouldn't get more than 10 years max.

Ironic, isn’t it - taxpayers money first goes towards funding a major fraction of the research, and then towards “hanging the evil hackers” to ensure the taxpaying public (the evil hackers included) never accesses what they paid for.

Even 10 years seems to me completely unimaginable for sharing some PDFs

How many of us would take the risk of associating with the trillion dollar pirate when the feds make their case? I really hope it’s enough to be a help. Would the ACLU or EFF take the case?

It’s going to be ugly. We’re talking about billions in profits being threatened here with the law on their side. The only defense I’ve heard is with regard to research funded by the US federal government and that’s a long shot argument that applies to a minority of these articles.

> Copyright law is extraordinarily restrictive, and you, me, and everybody you know is guilty of violation after violation...

That's a pretty extreme claim. Can you give an example of how the average non-bittorrent-using person regularly violates copyright law?

I bet that almost everyone has already copied a friend's drive full of (pirated, old, crappy divx) movies. And doesn't check the copyright for music/pictures they include in material they produce and share (blog, social media, report, art). Academics (even ones not caring about copyright) usually put a lot of their papers on their website even if they handed exclusive rights to some publisher. A lot of amateurs using digital media software use pirate versions they got from some friend on a thumbdrive (photoshop, fl studio, final cut etc). A lot of copyright violations are actually done by people not particularly caring about information sharing just by visiting shady websites (mega and co) or irl friend-of-a-friend propagation. Showing a personal dvd or book scans in a classroom is also a violation. Depending on the terms of service, sharing media subscriptions (newspaper, streaming, tv network) between housholds might be prohibited. Copying cds/dvds from your local library or friends. Reading manga scans. Playing covers at some gig without paying the fees... There's quite a lot of them really! Imho that's showing just how stupid copyright (and patenting) is.

Playing a song for a crowd without the rights is illegal, hence why no major DJ software has a spotify integration

Sometimes what’s legal is incredibly immoral, and what’s moral isn’t legal. “I didnt break any laws” can be a very thin shield to hide behind at times. By the same token, breaking some laws can be the right thing to do, although that won’t necessaril shield you from legal consequences. For concrete examples of legal yet immoral, see Jim Crow, Segregation, slavery, and present day prison work. For illegal yet moral, selling pot to a cancer patient, spreading the sum of human knowledge far and wide, etc.

Legality is not always the only question. Is it a net benefit to humankind? Is it a direct harm to anyone? Is it motivated by greed, or by malice, or by fear, or by altruism?

What is interesting (thus far) to the responses is that no author of a paper has shown up going 'you're stealing my work' - nor is any money gathered by most of these organizations returned to research - myself, I've always envisioned Elsevier as the dutch dude in Austin Powers - One possible cure I heard from a librarian sobbing over the bill was that if there were companies that could hold copyright and companies that could distribute, but no one company could do both there might be some market tension

Definitely not. But you could have some interesting discussions about morality and the legality of hoarding publicly funded scientific results behind paywalls.

There is a very very simple fix for this. Convince the researchers to either self-publish or use a non-paywalled publication/journal.

It’s not that easy unfortunately. Many of the stronger advocates for open access are younger researchers who can’t self-publish as they need to be able to list their peer reviewed publications. Also the choice of journal is often made by more senior academics who don’t always hold the same views. It can also often have associated costs which aren’t small.

I think pre-print servers like arXiv[1] are a good stepping stone to full open access, though even their use can be complicated in some fields.

EDIT: In the UK there are efforts to stop counting closed access publications in various rankings as a result of the Finch report. It’s a new policy change but I already see it having a positive impact.

[1] https://arxiv.org/

I agree with your comment.

your tax dollars paid for the research which was then privately hoarded

Anyone know why this is done as so many tiny torrents? Torrent users can already choose which files to download from a torrent, so why not do this as a smaller number of larger torrents?

The torrents each weigh in at 80.9 GiB on average, so they are not small. Each torrent contains 100 .zip files, which each in turn contain 1000 publications (averaging just under 1 MiB per publication). There's a total of 69 million publications here - it's a very hefty collection.

They're not that tiny. That's many thousands of papers / torrent, the file names are the paper id ranges in the system.

Does anyone have a complete copy? The seeds are very slow. I know someone willing to pay for this data. Shoot me an email.

The _whole_ thing? It's 54 TB, how do you plan on receiving it?

What are you talking about? "I'll show up with hard drives and whatever I have to do. I'll rent the damn Amazon snowball truck if I have to."

Then just get the torrent ... Adequate payment will be seeding 2x.

The complaint is that the torrents are slow. Which I can confirm, the trackers seem dead and the DHT hasn't found anything yet.

I think tape would probably be the best choice --- a single LTO-7 tape cartridge holds 6TB and costs <$100USD. Of course the drive is quite expensive, but HDDs are less reliable and more expensive per GB.

The scale usually tips towards LTO at around ~100TB or so (give or take 20TB depending on where you are located and how good your negotiation skills are), and that's only if you can afford a tape downtime until you get a new one (in case it breaks), and to refresh the whole library come LTO-10 (because it won't even be able to read your LTO-7 tapes).

It almost always makes better sense to back up to hard drives that carry their own connectors and "tape" with them -- I've recently found two disks from 2001 in storage. I needed a $15 IDE-to-USB connector because I don't have IDE connectors anymore. And there were 16 bad 512-byte sectors over one 20GB disk, and no bad sectors over one 15GB disk. Easily readable.

I also had a couple of backup tapes from the same era inside the same box I found. The required drive brand is not even listed (it's not LTO), and I'd be surprised if (a) I can find a tape that can read them, and (b) if they still work properly.

But I don't need them, because the drives they backed up are still readable .....

Tapes are only good for offline storage.

If you want online storage, you'll need one drive per tape, and you would get huge lookup times.

> Tapes are only good for offline storage.

> If you want online storage, you'll need one drive per tape,

No, tape libraries exist, and would be considered 'nearline' storage.


> and you would get huge lookup times

Yes. This is not a good system for random access.

Indeed. But the initial investment is very expensive.

It depends on your financial scale. A fancy tape library, sure, but if your company is in-sourcing hundreds of TBs of backups surely you are not destitute. Glacier is always cheaper.

On the other hand I paid $185 for a LTO-5 drive a year or two ago (no library, manually operated). 1.5 TB on a $20 tape is still pretty good in my book.

It's just five 3.5'' HDDs (12TB ones), which would amount to about $2,000 or a little more.

Can't you just attach it to the email?

Since this is technically very illegal I'd be inclined to view your message as a bait to capture and jail one of the very first people who managed to mirror the whole thing.

No offense, just seems fishy.

Were they to help you, would that person not be opening themselves up to >avg legal harm?

For those of you who use SciHub: do you take any special precautions against malware?

PDFs are a convenient vector for viruses, trojans, etc. And the users downloading these papers tend to work for academic/research institutions that could be ripe targets for hacking & IP theft.

Because I'm mostly using this from school, I'm usually in a disposable VM behind a proxy. Those are the precautions I personally take.

you are referring to pdf viewer exploits, you can circumvent most of them by either using linux or utilizing less targeted pdf viewer software like sumatra.

Has anyone with uni lib credentials done random hash checks on some of the PDFs from sci-hub?

Not sure that would work, since pdfs are usually marked or prepended with some metadata about download time and the library upon download. At least that's the case with my uni library.

We need more sci-hub

Is possession of these simple torrent files a copyright violation or does one have to actively download them to create the copyright violation?

My uni doesn't have access to some language journals, and sometimes I just want to check a single paper to see if it's worth it. Also many paywalls don't allowed to you get one paper, or if they do the price is extortionate.

How can you search effectively for a specific article ? Is there an index with all the files in all the zip files of every torrent ?

Regular Sci-Hub allows you to search: https://en.wikipedia.org/wiki/Sci-Hub

Are the files compressed at all? I can see that helping in terms of sharing this

Opened one randomly, as someone else has reported, it contains one hundred files with zip extension of various sizes, 118 GB total.

Why are people breaking copyright law, instead of questioning the researchers who voluntarily approach the journals? Why not ask the researchers if they'd consider sharing their work? Or convince them of an alternative means of publication.

There is no one-handed clap.

I think people often do approach researchers. It's just nigh-on impossible to expect researchers not to try and get published in Nature, for example. The effect on credentials (and career) is too great. And of course the department they work for will be pushing for it too.

Besides being the main source of academic esteem, there is a serious structural issue where Elsevier et al receive publicly-funded knowledge, yet privately control the supply. The model seriously warps the production of research (and, arguably, the reproduction - p-hacking, fiddling with your regressions/covariants).

Using Sci-hub can be considered a form of protest. Perhaps 'break copyright law' is not a universalisable rule in the spirit of Kant, but the same is true of most forms of activism.

The simultaneous arguments (that get repeated often) of "journals provide little to no value and therefore publications should be free" and "researchers value the curation of top journals because (seemingly) only high quality papers get accepted" are quite at odds with each other I think.

>where Elsevier et al receive publicly-funded knowledge, yet privately control the supply.

But they are given the 'knowledge'. Sure, Chicken/Egg. Maybe government grants should come with a clause for open-access..

> "journals provide little to no value[...]" and "researchers value the curation[...]" are quite at odds with each other I think

They aren't because the journal corporations don't provide the curation, they provide the brand name that attracts heavy-hitting papers. The curation is provided by academics for free. The brand name ("impact factor" of historic papers in that journals) is the one thing that prevents migration.

At the end of the day, researchers need some way to decide which papers they should read and which they shouldn't. There are too many papers coming out for anyone to read them all. The fact that prestigious journals have a vested interest in not printing crap papers is a helpful filter when deciding what to read.

I still think there should be some reform in this area but the journals do provide some value at the moment.

> The simultaneous arguments (that get repeated often) [...] are quite at odds with each other I think.

Yeah, I can see your point, but I think both can reasonably appear true: credentials are very 'sticky' concepts which seem reliable and are hard to dismiss unless everyone else is doing it too...

> Maybe government grants should come with a clause for open-access..

Hard agree.

It can take days to weeks to hear back from a busy researcher to get a copy of their paper. Half the time you skim it and determine that its not useful. Sci-hub is instant.

Right, and that sort of delay can slow down the academic process. I agree with you that the current situation sucks. I'm not seeing how torrenting publications is a viable solution to this problem. This sends the signal to the publications that they're important and that their product is valuable enough to break the law. Naturally, no person wants to be out of a job, and the first course of action they're going to seek is to stop the torrenting in any way possible.

It's faster than logging in with your university account, in fact.

Not everyone lives in a jurisdiction where copyright law exists. Among those who do, not everyone lives in a jurisdiction where this action is a violation of that law. Of those who do, not everyone recognizes copyright law as legitimate or binding on their conduct.


By posting this link, are you trying to indicate a belief that I am exhibiting this fallacy?

Can you explain how?

GP asked, "Why are people breaking copyright law?"

I thought I gave a reasonably solid explanation - some of the people in question aren't breaking copyright law while others (certainly a good number of people here on HN) don't regard copyright law as a legitimate imposition of state power in the information age.

What did I do wrong?

Because you could trivially modify the statement to be "for the people who live in a jurisdiction where copyright law applies, why do they X" and it would be clearly the spirit of the question (and even that is imprecise).

The net effect of your means of conversation isn't a more cogent conversation. Instead, by causing everyone to over-specify their statements you force conversation with you to be only for the things that can be fully specified for low cost.

This is only your loss. Some people will not talk to you because they know you'll just be looking to say "gotcha", not help arrive at the truth. When most people bring up ideas with others they're looking for meaningful invalidation or corroboration with the net effect of refinement of the idea.

It's often popular, so you'll amass karma on HN, but I don't think you'll find it productive when improving your worldview. Forums like this have a wide variety of people with a varying degree of sophistication in a subject. You can appeal to the people with the least sophistication in the subject and be very popular. But you'll annoy the people who are skipping unnecessary steps. This is fine, but you'll end up moving the forum in the direction of less sophistication.

If a maths analogy will help: imagine that you are in a group of first year maths students talking about the reordering of terms of a sequence. To the more advanced students, this means we are talking about a convergent sequence and specifically a bijection from N to N. Since it's a conversation the details are elided and these students are already making the assumptions that are, to them, obvious. Objections that their statements are untrue for some divergent sequences will simply lead to those people being left behind when the next chat happens.

In practice, on the Internet, some people just aren't told where things are spoken about. I only tell you this out of love. Good luck.

> The net effect of your means of conversation isn't a more cogent conversation. Instead, by causing everyone to over-specify their statements you force conversation with you to be only for the things that can be fully specified for low cost.

under-specifying is also a problem, especially when the framing is a bit sloppy. it leaves a lot of room for weaseling about what was or wasn't meant, and people end up talking around topics and not arriving at some kind of understanding.

it seems to me that the problem here is the question, which seems to be poorly rendered given the context it was presented.

> Because you could trivially modify the statement to be "for the people who live in a jurisdiction where copyright law applies, why do they X" and it would be clearly the spirit of the question (and even that is imprecise).

...but I responded to that too, and that's the most important part of my comment. Most people in the USA clearly don't recognize copyright law as legitimate. It is flouted everywhere by everyone.

GP seemed to assume that refraining from breaking copyright was important to most people; it isn't.

You are applying propositional logic rigor to a comment made in a casual setting that you fail to apply to your own position. Let me show you how..

>Not everyone lives in a jurisdiction where copyright law exists.

How many? How did you gather this data?

> Among those who do, not everyone lives in a jurisdiction where this action is a violation of that law.

How did you survey the jurisdictions of people accessing this information? What legal standard did you apply to come to your conclusion of people not breaking laws?

> Of those who do, not everyone recognizes copyright law as legitimate or binding on their conduct.

Who has communicated to you that they don't recognize copyright law as legitimate or binding? Why is the number statistically relevant?

You see? Anyone can apply rigor to any comment. It took me 5 seconds just like it would take you five seconds to apply rigor to any other comment.

>I thought I gave a reasonably solid explanation

I didn't see any explanation, only a logical argument. A logical argument is not means to establishing truth. Cats have four legs and a tail, but a dog is not a cat.

> Who has communicated to you that they don't recognize copyright law as legitimate or binding? Why is the number statistically relevant?

To my personally? Only maybe a few dozen.

But obviously these laws are ignored everywhere and by everyone in the USA. Surely you don't mean to suggest that it's a law that enjoys widespread perception of legitimacy?

What makes you think people have any interest in following this law in the first place? It seems like you haven't really given this foundation, which is crucial for your original comment.

You might want to read up about the Berne convention, various international copyright treaties and such. You can "ignore" laws, but you might find yourself in handcuffs on a plane, clutching an extradition order if you get popular enough. In any case that is an irrelevant aside as far as I am concerned. I don't wish for people to be jailed over this.

>What makes you think people have any interest in following this law in the first place? It seems like you haven't really given this foundation, which is crucial for your original comment.

Wow, really? Yes, I admit, my argument assumes that normal rational people don't go breaking laws.

> Yes, I admit, my argument assumes that normal rational people don't go breaking laws.

I mean, yeah, that's a very strange assumption. Throughout the history of common law societies, normal rational people have indeed gone around breaking bad laws.

Many foundational thinkers, dating back to the magna carta and further, have argued that citizens are morally bound to do this in order for this legal system to work. It's a big part of the basis of the western legal tradition. Of course Henry David Thoreau is the best known, but he's certainly not the first, nor is he an aberration

Take this example: cannabis has been prohibited federally since 1937 - surely you don't think that the persistent, substantial part of the population who has consumed cannabis throughout that time is abnormal or irrational?

...and to add a slight but meaningful nuance: remember that in the tradition of the legal republic, the state is not the sovereign - the individual is. So, each of us has to determine for ourselves which edicts of the state are in fact legitimate laws.

It's a perfectly valid view to hold that copyright or drug prohibition (or whatever) are not legitimate laws. You then have to ascertain the reaction of the state (a reaction which you are free to hold as illegitimate / unlawful) when it finds that its edicts are being disobeyed. It's up to each of us to build community consensus around us to disobey these edicts in keeping with our view of the legitimacy and constitutionality of the law.

We are also free to use our own sensibilities about these laws when deciding to refrain from making a citizen's arrest (this refusal during the fugitive slave law period is part of what led to the creation of slave patrols, some of which have evolved into today's professional police forces) or to refrain from finding guilt as a jury.

By contrast, this discretion is not cognizable for a magistrate or peace officer, whom, at least traditionally, are bound to regard all statutes of their jurisdiction as lawful.

This is a cornerstone of the common law tradition.

Wow.. it’s a shame that this article is tied to the beautiful subject of logic.

I’m pretty sure you can come up with such an article for any phrase you’d ever encounter.

For what it's worth, while I publish in conference venues, like ACM SIGCHI and ACM UIST, I also make every paper available online on my own website as explicitly allowed by ACM's copyright rules. Many authors in my field do likewise.

Unfortunately, the situation with many journals in the hard sciences tends to be more dire, especially since the longer history of their fields means that many important papers are only online as paywalled, scanned PDFs (where the original author may not even be around to make the work freely available).

People are questioning the current publishing systems and trying to promote alternative means of publication. Unfortunately, it's not so simple as being able to blame a single actor: https://medium.com/flockademic/the-vicious-cycle-of-scholarl...

Because it is inefficient and change is unlikely?

Breaking copyright law doesn't change anything either. People are still going to publish in those journals. It doesn't address the central problem.

Is there any way to download a specific journal?

Yes, sci-hub itself.

I believe they meant all articles of a given journal, rather than a specific article.

Ah I see, no that isn't possible right now as far as I can see, it would require some meta data. Also, for some unspecified reason I have no problem with downloading a paper but I do have a problem with downloading a journal. If I try to pin it down it is related to the act of curation, the articles themselves imo should not be copyrighted but the curation is original work so the list of articles in a given journal should have some protection.

This might even point to a possible resolution of this whole conflict: journals get paid by submitters to have their article listed in the journal's index (which takes care of the pedigree part), the public has free access to the articles themselves.

Sucks you can’t index torrents?

Download them and index them locally?

I believe it's a good time for the publishing of a high-quality article detailing on how to build such mirroring servers with minimal budget.

Usually such machines don't need best CPUs around but they definitely do need ECC memory and serious NAS-like capabilities. So maybe older-gen Xeons? I am no expert though. Hopefully somebody publishes a blog post about it.

If we all got to seeding, surely we’d have the oppressors bent over a filthy toilet in no time at all.

I suppose if you don't have 55 TB of space to spare, you'd want to pick random torrents from this list to seed, in the same spirit as a torrent program picks random chunks to request.

Until the cops kick your door in on copyright infringement charges to "make an example" of someone. Don't think it can't happen.

In over twenty years of piracy, I’ve never actually met someone online who got something more than a pissy email from an ISP. It’s possible to be made an example of, but if you’re not profiting from the piracy it’s vanishingly unlikely. The only people I know of who got more than a token fine were involved in some other criminal enterprise besides just downloading something.

I can tell that you’re not from Germany or know anyone from here that pirates. There are special law firms (Waldorf Frommer comes to mind) specializing only on sending out cease and desist letters carrying fines of around 1.000€ per infringement (mostly movies and tv shows). ISP’s willingly hand over any customer data to those firms. Of course the only one benefiting from this is the law firms not the copyright holder who never sees any decent payout. Torrenting in Germany can get very expensive very fast if you don’t know what you’re doing. I got big folders full of letters from Waldorf Frommer & Co from before I learned about VPN’s and I still get the occasional letter when my VPN fails or I’m being stupid. They can/will freeze your bank accounts if you don’t pay, it’s no joke.

So what did you do after you received such letters? It seems you haven't paid. How?

I paid a small number of those, appealed others in court (sometimes going through multiple instances) and the last years just kept ignoring the few letters that still come from time to time until they sue me or try to freeze my accounts. I think they have me tagged in their system because they give up pretty quickly as soon as I actively communicate with them. Might have to do with some small won cases against them. Maybe I’m not worth the risk to them anymore. Plus my VPN-fu is strong now.

I lived in Germany for a period and was amazed by this as well. It's hard to reconcile with the fact that they are one of the most privacy-centric countries there is as far as technology goes

Still a long shot from black helicopters and cops kicking your doors in.

This is Germany I’m living in not the USA..

I met Gary Fung (the founder of isoHunt) while waiting in line to buy my iPhone 4S in Richmond, Vancouver. Very friendly guy.

We had a nice chat on Facebook recently about copyright for song lyrics (I want to share a load of fan-made translations of Christian songs, but there's no CCLI in Taiwan so there's actually no legal way to proceed). Hearing about his experience amplified the chilling effect for me - if I had a limited-liability company, then there's more that I could do (e.g. VoiceTube), but as an individual it's not worth the risk.

I got one of those letters... I download from other sources now... it's just harder to find older stuff.

I'm a prolific BitTorrenter... and don't get me started about my wife. Our piracy knows no bounds.

Not everyone lives in the authoritarian US where the police are legitimate psychos.

Safety in numbers is a thing.

They can still make examples out of individuals. A good VPN is probably a better increaser of safety.

This can't be said too much!

Torrenting anything potentially problematic without using a reputable VPN service is foolish. And make sure that the machine is firewalled, to block all traffic (including DNS requests) not using the VPN. Only the VPN client should have direct Internet connectivity.

For extremely sensitive material, it's prudent to use nested VPN chains. So no one provider knows both who/where you are, and what you're doing. The bandwidth hit for chaining is smaller than you might think.

You can also setup your own VPN server on a VPS. But make sure to lease that VPS anonymously, and pay with well-mixed Bitcoin. Do all that through Tor.

But do not torrent through Tor. It's usually very slow. And it screws up Tor for people who really need it.

Thanks, mirimir. I live in a country that's been having a lot of trouble recently with crimes against the freedom of the press and this may actually help me recover some peace of mind when writing anything that might upset powerful people.

That would be very prudent.

And it's arguably good practice for everyone. Good OPSEC.

Why do I say that? Well, even in "free" countries, one never knows how stuff will go down. What will change. It's better to be safe, than to be sorry.

Also, mass adoption helps protect me, and others who love their privacy. Because it makes us less unusual. That's one of the standard arguments for widespread Tor use. But even so, I prefer to obscure my Tor use with VPN services. Because VPNs are still far more common than Tor. In some areas, VPN usage is over 50%.

And finally, it's a great hobby. Better than puttering in the woodshop. Or whatever.

Do you have recommendations for actually anonymous leaseable VPS/Seedboxes? Just buying some bitcoin on Coinbase defeats the purpose. Can I mail cash somewhere?

I've come to like Host Sailor. Some consider their reputation to be iffy. But their prices are reasonable, and their reliability is good. And their customer support is fast and capable. Bithost is also OK. They're a Digital Ocean reseller. But their prices are too high for anything but testing.

Back in the day, I sent cash to Nanaimo Gold for Bitcoin. But I can't vouch for them now. In mailing cash, use printed labels, because handwriting attracts attention. It's best to use a B&W laser printer. Most inkjet printers use watermarking. And use a valid return address, because they're typically scanned. But not yours, obviously. I would often use an address for some charity, so they might get the money if something went wrong. And mail from a public box, not a post office, at some distance from your area.

If LocalBitcoins works in your area, that's a good option. But there's always risk of surveillance. Or theft.

These days, I mostly just use mixing services. Any one service could be incompetent or compromised. So it's best to mix at least twice, using different mixers.

Let's say that your Bitcoin are in some hosted wallet, and aren't anonymous. You need at least two ~anonymous wallets. One gets Bitcoin from the first mix. The other gets Bitcoin from the second mix, and spends.

I use Whonix instances in VirtualBox, each with a local Electrum wallet. The Whonix instances (VM pairs) reach Tor through a VPN service, running in the host machine. DeepDotWeb runs CoinMixer, which seems OK. I've used Bitcoin Fog for years, at http://foggeddriztrcar2.onion/ but it requires creating an account.

Edit: re Host Sailor and Bithost, I've typically gotten 800-900 Mbps (iperf3) for their gigabit uplinks, measured between VPS on each. And ~700 Mbps using such public test servers as iperf.volia.net and bouygues.testdebit.info

LocalBitcoins has sellers who accept cash or cash-equivalent payments. Alternatively, you can buy on Coinbase then exchange for some Monero, or use a mixer.

Find a local bitcoin exchange. Meet in person to exchange

They can. It still is probably safer than driving though.

Especially driving while torrenting.

The exception being torrenting while wardriving.

That's what VPNs are for.

Working on it.

I'm seeding a small Tb too. Sadly I just took some random torrents in order to "randomize" the lot.

It's a shame though that the publications are in zipped files…

Don't forget to download the metadata too !

Would you mind detailing how they have oppressed you? AFAICT they haven't forced anyone at gunpoint to do anything..

Here's a couple: a) Gatekeeping publishers keep research from being available to less-elite schools that can't afford the subscriptions, creating a rich-get-richer dynamic in science. Leveling the playing field will lead to more-better output.

b) High subscription fees increase university fees, which translate directly into student loans.

As many have pointed out before, publicly funded research should be open access. There's no reason the university should be both paying researchers (from tax dollars + student fees) and then paying a bunch of rent-seeking rent seekers for access to the published research (using tax dollars and student fees).

Are you aware that researchers voluntarily approach publishers? And if you are, what is your rationale in blaming the publisher?

>Here's a couple: a) Gatekeeping publishers keep research from being available to less-elite schools that can't afford the subscriptions, creating a rich-get-richer dynamic in science. Leveling the playing field will lead to more-better output.

>b) High subscription fees increase university fees, which translate directly into student loans.

Sorry, but I didn't find them convincing.

>As many have pointed out before, publicly funded research should be open access.

That, we agree on 100%.

I wouldn't state "voluntary" too strongly: https://medium.com/flockademic/the-ridiculous-number-that-ca...

Sure, I can offer you the choice between death or a slap in the face, but if you'd choose the slap in the face, I wouldn't say you voluntarily asked for the slap in the face...

That dichotomy would only make sense if there were no means to self-publish or choosing a non-paywalled journal.

Well, those means are there (most of the time), but if researchers want a career in academia, most of the time there's no other option than submitting their work to "high impact" journals. I don't think sacrificing their already slim career choices is a reasonable sacrifice to ask when speaking of voluntarism.

>I don't think sacrificing their already slim career choices is a reasonable sacrifice to ask when speaking of voluntarism.

Right, but I don't see how breaking copyright law is a solution to fixing the problem either. The journals are just going to jack up the rates to cover their "losses".

Is there really no other way?

I am not very sure they can afford to jack up their rates. News in academia spreads really quickly and most researchers can't really afford to pay subs from their pockets.

I know several researchers who are tragic with anything computer-related. Even they picked up SciHub mere days after it made the news for the first time.

IMO jacking up prices in such a climate can succeed but is very risky.

Well, it doesn't fix the greater problem of research being mostly behind disproportionately priced paywalls, but it does solve researchers' immediate problem of access to research. Yes, it's (mostly) illegal and no, I wouldn't endorse it, but I can't blame them too hard either, especially not by drawing on their "voluntary" paywalling of research.

Long term, hopefully there are other ways, and there are plenty of people (including myself) trying to affect that change. If there was an easy way, I'm positive that we'd have found it already, unfortunately. That's not to say that change is in the air, and it might be for the better.

(Btw, as for journals jacking up their rates: most of those rates are pretty non-standard, and the result of "Big Deal" negotiations with university libraries behind closed doors. However, both the pressure of Sci-Hub and publication of the results of some of those negotiations have led to the representatives of public money now having a much stronger negotiating position, so if anything, I'd only expect rates to go down in the short term.)

No, they do not voluntarily approach publishers. It is a faustian bargain in which society ultimately pays the price.

I suspect that you either do not understand how academic reputation is built or you think it is somehow beneficial for society to ensure a system that makes scientific knowledge unavailable to anyone byt the elite.

>No, they do not voluntarily approach publishers.

What are you basing this on?

>I suspect that you either do not understand how academic reputation is built or you think it is somehow beneficial for society to ensure a system that makes scientific knowledge unavailable to anyone byt the elite.

Or neither. Its easy to think black/white, but thankfully the world is not so boring. I don't believe that any of the rationalizations that people have stated for breaking copyright law in this case. No, I don't agree that this is like slavery or student oppression or the driver of income inequality or the other myriad arguments people trot out.

Funny how you seem exempt from basing your hunches on citations. Clearly "voluntary" has been demonstrated to be a dubious term to use, with URL references. Researches either publish in high impact journals or lose funding. Publish or perish. Society loses out.

I use voluntary in the same way as the day to day common usage of the term. Seems to me you want it to mean "without personal sacrifice". I hope we can agree that it is not the commonly understood meaning of the term.

> Researches either publish in high impact journals or lose funding. Publish or perish. Society loses out.

I don't see how breaking copyright law addresses this.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact