Hacker News new | past | comments | ask | show | jobs | submit login
Today Sci-Hub is 10 years old. I'll publish 2M new articles to celebrate (twitter.com/ringo_ring)
1154 points by DominikPeters 16 days ago | hide | past | favorite | 160 comments



I really hope sci-hub survives this. Sci-hub and libgen are like an entirely different internet, one allowing you to dive as deep as you wish into any technical subject. There’s really no comparison I have found anywhere for the depth of material available. People always point to Wikipedia, but all of that is surface level. If you to build something, research something, or just really delve into it, there’s no substitute for having access to all of the latest textbooks, manuals, and papers. I’ve never found any source paid or otherwise that comes even close.


It's the typical Eastern European (or non-US centric) Internet. The IP laws make sense only if people have the capital to buy stuff. When I was a kid in a post-Soviet country, no one ever bought anything original (like a CD with a game). Everything was bootleg or pirated from the Internet.

After [the US lobby started to push for copyright enforcement in the EU under the threat of sanctions](https://falkvinge.net/2011/09/05/cable-reveals-extent-of-lap...), things have changed. The copyright and IP laws that US lobbies push to the world are killing the idea of free Internet and only benefit large corporations that are untouchable.

Elsevier is Dutch-based and they are fierce in suing everyone who tries to get away with getting free papers that were paid for by the taxpayers.

The "free" Internet doesn't exist anymore, but it's good to have places like Russia where the IP law is not strictly enforced, because everything is broken so no one cares.


> that were paid for by the taxpayers

What do taxpayers have to do with it? I thought it was just a for-profit company?


The content i.e. research papers they publish are mostly from government funded research. Taxpayers fund those but don't get access: we have to buy the results from these for-profit corporations.


Yes, it’s a new and better world. On the other side of the $275 per-paper Elsevier paywall, there are researchers who wish more people would read their papers. In my experience digging into robotics kinematics, authors are happy to answer questions and can point me to the right person when I want to send a check to support investigating specific research questions. The paywall deceives; science is neither an institution nor a copyright. It’s people. You can talk to them. You can learn from them and they can learn from you. When you apply their research, they often want to know about it! They might even discuss your application, in future papers. You don’t have to be Siemens or Big University Labs. The situation with for-profit journal publishers is diseased. Who the hell do they think they are? Elsevier should be dead, and Aaron Swartz should be alive.


Most authors of scientific papers will gladly send you a free PDF of their papers if you ask them (assuming they remember to check their e-mail and respond in the first place). The profitability of their publisher is of no concern to them.


This is a solution that doesn't scale for authors or readers. You'll just end up building another sci-hub.


That's the point: that the authors also support sci-hub, because it's simply doing what they would do themselves, at scale.


Sometimes, it takes days for some authors to reply. When you do research, it is a bit like going deeper and deeper in citations. Sci-hub fix this, specially to universities in developing countries.


I have been pleased that The Journal of Field Robotics is well represented on scihub. I have an open source off road robot I am designing and the journal is literally about robots out in fields and stuff. I am a "serious hobbyist" in that I believe my open source contributions to be at least somewhat helpful to others, but it's not the kind of thing that would justify paying for paywalled papers. I just want to glance over the material and keep track of what researchers are up to. Libgen is to me a vision of a world without copyright and intellectual property restrictions and I think it's a much better world than ours.


Wow! The content in JFR is fantastic! Thank you for the pointer!


So glad it was helpful!


I really wish someone could build a better UI for this research internet.

Hyperlinks for all references would be a good start. Finding some way to make some automatic glossary of definitions of technical terms would make scientific papers substantially more accessible too.


Have you come across scite (https://scite.ai) yet? We're also innovating in this space by extracting citation statements from full-text articles and classifying their intent.

So let's say paper A cites paper B. If you look at paper B, we show you:

- how many times it was cited

- the direct paragraphs from paper A where it was cited

- the sections from paper A where paper B was referenced

- ... and a lot more

You can also now search these citation statements directly to find evidence-based information pretty quickly.

- Short video to showcase that search: https://www.youtube.com/watch?v=JYjCn-4uMJk

- Website with a bit more details: https://citation.to/

You can also visualize citation networks similar to ConnectedPapers, set notifications for new citations on groups of papers you're interested in, and much more.

(Disclaimer -- I work here!)


No one cares. This thread is about free access to papers and not another paid service that forces you to pay monthly fees for something that could be a free service. In that sense you aren't any better than large online publishers. 8 bucks a month for a scientific paper search engine? Really?


I vouched for your comment because you have a very valid point, but you could be more polite in making it. Welcome to HN.


Thanks. I am really cranky today. I'll do better next time.


Hiya,

Well, I definitely agree with your sentiment in a normative sense that scientific papers should be free and readily accessible to all -- in part because a lot of it is funded through tax dollars!

But given the current state of affairs, we're looking at making that information accessible to people without having to pay exorbitant fees to access individual research. We also offer steep discounts for students or anyone in academia.

With that in mind I would push back a little that we're just a scientific paper search engine -- our system does a lot of work in extracting and classifying those citation statements, which makes it more powerful than traditional scientific search engines.

And besides just using our search, a huge time-saving value of our service is the report pages which helps you quickly build a qualitative understanding of how something was cited.

Even if all scientific papers were freely accessible, our report pages allow you to see the direct, relevant snippets from citing papers without having to manually read each and every single one. I think that is quite valuable!

I know I've gone on a little tangent from the original discussion about scihub, and having free and open access to papers, but I did just want to throw that in because I think it's an important distinction. And as much as we all want that free and open world to exist, I think it's also interesting to think about how we can open up that information for people in the interim.

Best,

Ashish


The site sounds interesting, but signing up just to evaluate the results of my first query is a no-go for me.


One thing I've recently discovered is this website: connectedpapers.com/

It builds a graph of referenced papers and makes it easier to narrow down which one are important/foundational for further research

I don't think a glossary would help. What you need to find is a "review paper". These act as a primer to the field for new researchers. They're usually well written, with less jargon and have tons of references for you to dig into. That said, I don't have a good method for finding them.. I just stumble across them haphazardly..


https://www.semanticscholar.org/ is very good too for reading paper abstracts, links to the references, citations and related papers. The full papers are included, if copyright allows.

The references and citations are tagged and filterable, making it easy to see which are the most cited, or review papers, etc.


DOI URIs work great and sci-hub understands them. Figuring out the DOI from the citations section is a bit more annoying still. A meta glossary would be fantastic. 80% of learning a new field is figuring out the jargon.


> A meta glossary would be fantastic. 80% of learning a new field is figuring out the jargon.

True!


Yeah it's called Web of Science. It's very good but it's not free unfortunately. And it doesn't go as far as hyperlinking references in PDFs or defining terms. I agree those would be great, but unfortunately there's not much incentive for paper authors to do that.

https://en.wikipedia.org/wiki/Web_of_Science


Hyperlinking references in PDFs is trivial for authors if they use LaTeX and the journal/conference template supports it. It's just a matter of ensuring that the bibtex entry has an URL or a DOI, and most bibtex entries copied and pasted from curated sources already have them.

If you are finding many papers without hyperlinked references, it's probably just because they're published in journals whose templates don't support it. In my particular research field, most publication venues' templates starting supporting those links around 3-4 years ago, so my papers from, say, 2015 have no hyperlinks in references, while those from 2019 do. This didn't require any significant extra effort on my part, in fact in general it requires less because well-curated bibtex entries are easier to come by now than some years ago.


It's not adding the links that is hard. It's choosing the destination URL. Where do you link to? I guess you could link to doi.org. Probably better than nothing but still not ideal because it doesn't actually take you to the PDF.

Can you show an example paper with links?


DOI is the correct thing to link to because the author or publisher has chosen this as their canonical URL for the object. The DOI could and sometimes does point to the pdf, it's just conventional to point to an html version of the paper. It would make a lot of sense to me to standardize a field in the DOI metadata containing the PDF URL. (source: I manage the DataCite membership of a large organization)


Linking directly to the PDF is usually the wrong choice. When you find a new paper, you often want to get the citation metadata, which the PDF document rarely contains in a convenient form. There are often multiple versions of the same paper, and you may want to determine which version you managed to find. Is it a preprint, the final authors' version, the published journal paper, an early version published in conference proceedings, or an unpublished extended version of the paper?


Some of the background data for this is being collected by wikiCite (https://meta.wikimedia.org/wiki/WikiCite) and Scholia https://scholia.toolforge.org/ projects

Part of this involves turning text data of authors into linked data that used to navigate between texts: https://author-disambiguator.toolforge.org/


Have you heard of Alexandra Freeman's Octopus? https://www.science.org/careers/2018/11/meet-octopus-new-vis...

Edit for direct link to Octopus: https://science-octopus.org


Seems like a large part of what's needed is just being able to make the pdfs machine-readable, by making decent plain-text versions of the text content. Right now, IIRC, there's no hands-off way to get the text of a pdf. Especially if there's weirdness like multiple-columns (sometimes happens with this stuff).


Like you said the hard parts are the unstructured data/images/tables. There are pretty-good(80% of the way there) solutions tho. But nothing that could handle millions of paper without error


Currently working in this field and this is actually the cutting-edge(!!) but it will be 100% possible/robust within the next year or so I believe. Really cool ML techniques being used for htis.


How well does it work with old OCR'd PDFs? :)


I had no idea. Is there anything that's open source, or is it all still being kept proprietary?


It wouldn't take much to significantly improve things. We don't even have full-text search for paywalled articles. The paid search engines like Web of Science just do title, keywords, and abstract. Even considering the subset of open access articles, Google Scholar and Semantic Scholar do ok but don't offer much in the way of search refinement (e.g., "DTAF" NEAR "collagen"). They're good at finding _something_ related to your query but not good for systematic review.


What about a Netflix but for science information.

Pay $15 a month for access to a rolling catalogue of science info.


Then download everything possible with a small script and the FBI will hunt you down. For those who don’t know, I’m referring to Aaron Swarzt


And the DA will hound you till you drop dead to get re-elected.


It's always amusing to see how much people take for granted. $15 is a lot of money for most people. Not to mention that not everyone has access to internation bank accounts.


I think the comment makes fun of the fact that content gets removed from Netflix all the time, so it is not very useful as a reference.


Why? The authors write the papers for free, the peer review is done by other scientists for free. Why should this "netflix for science" get to reap the profits by locking it behind a paywall? The reason why predatory publishers still exist is a coordination problem. The journals have prestige built up historically, and the scientists need to publish in prestigious journals for their career. It's a chicken and egg problem.


I'm not sure about how fair the whole system is, but if we assume the papers should be free, then a service where the papers are available is still reasonable to have a small fee.

Someone needs to rent and service the servers, update the software, bandwidth costs money, etc..


There's a lot of information, data, and software out there that is available to download for free. Yes there are hosting costs, but there are also people and organizations that are willing to pay them because they believe that it's worth it.

I'm absolutely convinced that if copyrights weren't an issue, there would be enough governments, foundations, universities, corporations, and individuals willing to pay the costs of making scientific publications available to everyone. It wouldn't have to be a paid service.


> Someone needs to rent and service the servers, update the software, bandwidth costs money, etc..

Scientific papers use approximately zero bandwidth.

As a point of reference, if I hosted such a website on my home internet connection I would be paying approximately 0.00013 cents per upload. Even users downloading millions of papers would cost me less than a coffee.

Setting up and managing a proportionate payment system would cost more than just eating the bandwidth costs would.


And the papers are often funded by public money.


A related cause is to have publicly funded software be published open-source. Check out the FSFE Public Code campaign https://publiccode.eu/.


I'm glad initiatives like [Plan S](https://en.m.wikipedia.org/wiki/Plan_S) exist ^^



How much does it cost for an author to publish in a Nature or Science journal under Plan S?


I don’t know specifically about for Plan S, but most open access fees are in the 3-5k range per article


Libraries can get you a physical copy of pretty much any paper or book for completely free. But somehow getting a digital copy is not on the table? It’s just insane, I would happily pay for a service that did this even if it included drm or something. The journals would never go for it though because they live off of the insane rates research institutions pay to subscribe.


You can more or less do this with scite's Citation Statement Search[1,2], or by setting email notifications when we detect new citation statements to one or more papers (grouped by a topic you're interested in like a disease or drug, an author, or more)[3].

Quick video of our citation statement search to give you a glimpse: https://www.youtube.com/watch?v=JYjCn-4uMJk

[1] https://scite.ai

[2] https://citation.to

[3] https://help.scite.ai/en-us/article/how-can-i-set-alerts-on-...

(Disclaimer -- I work at scite!)


This self promotion seems off topic and unrelated to the person you are replying to.


The research was already paid for by us. Make it free as it should be.


That's the best part about a good idea once it's out there! It's hard to kill. Really wish we had come up with an alternative to 20 streaming sites...


There is. Torrents. The UX can be amazing if you know how, but we dont want spoil the party by sharing.


PeerTube at https://joinpeertube.org/ or Owncast at https://owncast.online/

Both support live streams, and are (being) federated and ever more integrated with other Fediverse apps.


Torrent seeding effort: https://www.reddit.com/r/DataHoarder/comments/nc27fv/rescue_...

All papers on sci-hub are available as torrents from library genesis. The full collection contains 85 million articles (before this announcement), and is about 80TB. If anything ever happens to sci-hub or library genesis, there's enough people out there with backups that a replacement can be set up fairly quickly, albeit without the proxy functionality to obtain new papers.

However, the more the merrier, so if you've got some spare hardware and bandwidth to share, I'd encourage you to contribute to the seeding effort if you're able. At current market prices of ~$30/TB, it costs ~$2400 to have a copy of the full collection sitting on your desk.


Is there any legal risk to users in North America that do this? Is this copyrighted material?


Probably; use a VPN.

Yes they're copyrighted - albeit not by the authors who actually wrote them, but by the publishers who require copyright assignment for the privilege of having your work hosted on their website.


So when academics casually share papers with their collaborators, are they technically exposing themselves to lawsuits? Or is being part of a research organization/university with subscriptions to these services enough to mitigate that risk?


No one talks about this, but for many (most?) of the papers they publish, there’s no copyright to assign. When I was in grad school, 100% of my research was federally funded (NSF, Navy, NASA) so they insisted that every product of that research be freely available with no restrictions. When we get an article ready for publication, the Journal would send us the standard “you assign your copyright to us” forms. We’d sign them but also send a copy of “the green form” stating that it was federally funded and we didn’t have any copyright to assign. They just ignore that.


It depends on the circumstances and the publisher. In many cases publishers permit authors to host the accepted version of the paper (but not the final version that includes revisions based on reviewer feedback) on their personal/institutional website and to email copies of the paper on an individual basis to people who request them.

For example, see https://www.elsevier.com/about/policies/copyright (under "Author Rights") for what Elsevier permits you to do with your own work.

On the other hand, publishers have sometimes filed lawsuits against sites where authors share their papers, e.g. ResearchGate: https://www.nature.com/articles/d41586-018-06945-6

Elsevier have also sent takedown notices to universities where academics have made the final version available on their institutional websites:

https://www.washingtonpost.com/news/the-switch/wp/2013/12/19...

https://osc.universityofcalifornia.edu/2013/12/elsevier-take...

https://news.harvard.edu/gazette/story/newsplus/elsevier-tak...

https://blogs.library.duke.edu/scholcomm/2014/01/28/setting-...


You are almost always allowed to self archive the final version, but you have to share the PDF you generated yourself, not the nicely formatted one from the publisher. And some publishers only allow self-archiving outside repositories like researchgate. But most importantly, Google Scholar and Semantic Scholar would pick up most links from blogs and Arxiv.

Use https://v2.sherpa.ac.uk/romeo/ to check. Also, EU projects in most cases now require open publishing and publishers make exceptions even when OA is forbidden (“self archiving is allowed if mandated by the funding agency”).


That is an interesting question without a good answer. As I understand it (I am far from being an expert), there are sometimes specific exemptions to allow that sharing. However, generally speaking, I think it is a gray area. The journals would surely be shooting themselves in the foot, though, if they tried to sue their contributors. Academics would bring down hell on any journal that tried that. Moreover, it is not clear to me that they would win as it seems to be a customary practice even if it is not an explicitly allowed one. Finally, sharing a single paper might get research or academic exemptions as you aren't copying the whole journal issue. I doubt any publisher wants to go down that road. They are probably better off with the law remaining vague.


They are usually shared under the title of “preprint” or “draft” often, but not always, before the time of actual publishing.

Always seemed like a grey area to me. We didn’t really distribute the copy of the paper with the journal/conference’s name + copyright - though a perhaps a line under the title: “To be published in…”


What happens in practice: researchers are usually free to share the draft they had before the publishing process. Which means you can read the text before every sentence was wordsmithed in huge pain to make the paper a quarter page shorter.


Sharing with collaborators may constitute fair use. I wouldn't put one of my own papers on my website, though, for example, for something that isn't published as open access.


> casually share papers with their collaborators, are they technically exposing themselves to lawsuits?

For the final, not open access published papers they certainly do.


From watching the Aaron Schwartz documentary, hopefully once scihub makes these publishers obsolete they will no longer have the power to press charges. But yeah they can and will get the Feds involved and will ruin your life like Schwartz


1. rent a seedbox for $10/mo hosted in a different country

2. seed from your seedbox, not your personal device

3. profit


I encourage you to do this in the most audacious, and visible way possible.

Let see how will they sue millions, upon millions of people.

Make them face a fait accompli. They already lost.


They never target millions. They target one, and ruin his life, preferably one with children so he also divorces.

Every dystopian regime does that.


Please look into the history of file sharing and how that went for individuals.

These companies will pick examples and ruin their lives. It's hard to say how big the risk is here but this is reckless advice.

Vote for copyright reform.


They only need to create some examples.


What we really need is an index of these torrents by DOI, and then ultimately by journal and issue. Are you aware of any work to make this happen?


The only one I'm aware of currently is https://github.com/sci-hub-p2p/sci-hub-p2p. Library genesis also hosts database dumps at https://libgen.rs/dbdumps/.

There's really a need though for more developers to get involved with building tools for more easily searching and working with the collection, ideally with a nice UI and integration with things like crossref. This is a massively valuable data set and it would be great to see what people can come up with. Lots of awesome potential for data mining too.

If it weren't for the legal issues (publishers using copyright law to restrict access to literature they got for free since they never pay authors for their work), there's no shortage of projects that could utilize this data and be enormously beneficial for the scientific community and humanity in general. Unfortunately such work can only be done in the shadows right now, which greatly limits the number of people/institutions likely to do so.


Isn't libgen already distributed via IPFS too?


Yes, LibGen is mirrored on IPFS:

https://news.ycombinator.com/item?id=25209246


I believe there is a database dump available with this information:

http://libgen.rs/dbdumps/scimag.sql.gz


This sounds very appealing. However, from a cursory search I can’t locate any NAS of that size anywhere close to that price point.


8-bay dock ~ 400$ (Icy Box 10-bay)

7x14TB HDD ~ 7*300$ (Toshiba MG07ACA14TE)

total 2500$


Is it compressible or already compressed?


It's all PDF files, which have their own compression, so it's unlikely there would be substantial gain from additional compression. Each torrent has 100 zip files, and each zip file has 1000 PDFs, but the files are stored uncompressed within the zips (i.e. using the STORE method).


> It's all PDF files, which have their own compression, so it's unlikely there would be substantial gain from additional compression.

You could write a custom compressor that decompiles journal PDFs to valid TeX, then compresses that.

Or at the simpler end of what's technologically possible, you could at least extract shared assets such as fonts that appear in multiple files. Keep files from the same journal together to find more overlaps.

I suspect there's quite a large gain to be had from further compression, at least theoretically. Even more if you could accept some level of non-semantic loss.


You could losslessly translate PDFs with compression to PDFs with no compression (bitmap images excepted), tar them up and compress the lot. This would get you a fair bit of gain for little pain.

However, I guess they use .zip STORE because it's fairly robust against minor corruption.


Is there some kind of searchable index included so that you can locate an article in a particular Zip? I'm assuming each article has some kind of ID numbers and the Zips are divided by ID range or something?


Yep! See https://github.com/sci-hub-p2p/artifacts/releases/tag/0

This project is in it's early stages and the documentation has quite some way to go, but the index that's part of the release contains all the necessary information. This tool also contains the code necessary to produce the index files if you have a local copy of the zips.

Each torrent contains 100,000 files, comprised of 100 zip files with 1,000 PDFs each. They are named by DOI. There's a database dump at (http://libgen.rs/dbdumps/) (scimag.sql.gz) which has the id -> DOI mapping and other information. The specific torrent and zip file can be determined based on the id; torrent = id/100000 and zip = id/1000.


Sci-Hub database/index is available here: http://libgen.rs/dbdumps/scimag.sql.gz

and database documentation is available here: https://gitlab.com/lucidhack/knowl/-/wikis/References/Libgen...

also see introduction to Sci-Hub for developers: https://www.reddit.com/r/scihub/comments/nh5dbu/a_brief_intr...


But each PDF is compressed individually. The textual content of the papers must have a lot of redundancy between them, maybe there is some gain to get there?


Illustrations easily outweigh the textual content, and those aren't shared. I mean, the text/formatting/latex code for an article compresses to something like 10kB, there's not much to save there.


Virtually all the works are published as PDFs. (There are some other formats, occasionally DJVU, etc.) There's integrated compression, though this can still vary tremendously by docuemnt.

Recent publications are virtually always based on direct PDF renders, and tend to be a few 100 kB per article.

Older publications are often scanned from paper-based copies, and can be about 10-20x larger, depending on the source. These may or may not have OCRed text, and OCR itself may be of variable quality. For documents with images or diagrams, those also add to both size and difficulty in vectorising copies.

It's possible to go through larger scans and regenerate them as rendered PDFs. That's intensive and error prone. There's also a range of viewpoints on archival as to whether it's preferable to retain the full expression of the original published version (and often accumulated marginalia and other marks of a specific instance), or to optimise for both storage and automated processing through reprocessed renders. The costs are high (typically you'll require a human or multiple humans to proof each work), though the storage and line-transmission savings are considerable.

I lean toward the latter myself. The attitude of other archivists (notably the Internet Archive) is to capturing as faithful a replication of originally-published formats as possible, at considerable cost in both storage and accessibility. (This applies to the Archives work in print, online / Web, and other document formats.)

Pressed, I'd strongly recommend a "capture what you can, reprocess according to need and demand as possible" approach.


I just noticed something funny: The library of Alexandria was accidentally burned in 48 BC by Julius Ceasar. Here we are in 2021, we now have Alexandra's library, which rivals the first in size and scope, and there are at least as many forces trying to burn it. But fortunately this one is digital and you too can have a copy in your hallway closet for the price of some harddrives and bandwidth.

Go make a backup, if you can afford it, and let's make sure that this one sticks around.


Small footnote: despite its standing in popular culture, the burning of the library at Alexandria was not as important as they make it.

> We do not lose texts because of catastrophic events that wipe out all copies of them. We lose texts because they stop being copied.

https://www.reddit.com/r/AskHistorians/comments/5t6op5/facts...


Good reminder to backup.


Copyright holders taking down these libraries should be considered a crime against humanity.


Sciencide or knowledgecide. Let's hope our descendants condemn companies and personalities behind these attacks like we condemn criminals of the past.


Condemnation is not enough. We must actively work at dismantling the foundations of their power. We must abolish intellectual property laws, especially copyright.

They'll still try to maintain control but it will no longer be a crime to resist. They will lose.


The people making these decisions are used to being condemned by common folk, so we are not in any way better off by having more people condemning these actions.


We should start with Mickey Mouse.


I personally see it as an essential element of preserving democracy going forward. As science becomes a more influential component in public policy (it has arguably become the most influential component over the past 18 months), you end up with a governance style of “the law is whatever science says it should be”. If that science is not available for public scrutiny, then you have the added dynamic of “…and the science is whatever we say it is”.


I think open and transparent science is really important if we want to adopt evidence based policies in a democratic society.

However, when it comes to law I don’t think there really is a “what science says it should be”. We can use the scientific method and evidence based reasoning to assess the likely outcome of any law or policy change, but figuring out what outcomes were as society are willing to accept given all the reasons trade offs is not a scientific question.

Unfortunately I’m not sure just having open and transparent science will be enough when so many seem uninterested in having a good faith conversation about the evidence and its implications


I don’t think there is either. It ignores the fundamentally solipsistic nature of “the truth”, and perhaps more importantly the fact that science can’t tell you what values to have, or what concerns to prioritize. It’s a methodology for refining knowledge, not designing a society.

But in the realm of governance, science is frequently used (or perhaps abused) as an unassailable authority to justify a wide variety of policy positions. I generally consider this to be a governance anti-pattern, but so long as science is being used to justify technocratic policy, it should be available for all of us to make our own judgements about.


> “the law is whatever science says it should be”

First we'd need to solve the friction between how stable we want our laws and the progress of science over time.

We've seen in the past two years that going back and forth with recommendations makes the general public just give up.


I always wondered if for knowledge as for ecosystems fire is an important part. Basically, is there a point where the amount of knowledge is so wast that it becomes impossible to reach the edge and come up with something novel.


Nah, ideas becomes simplified and refined overtime. E.g. quantum mechanics was the bleeding edge of physics at one time and now it’s taught to undergraduates. Some university even have classes in string theory for undergrads (not that this is a good idea).


How much would it cost to store the articles on one of the blockchains?


blockchains are not the right tool for terabytes of pdfs, and don’t really provide any functionality over the current torrent approach


In https://twitter.com/ringo_ring/status/1414342378765307907 she jokingly asks where her Nobel Peace Prize is.

Given how often that prize is given out as a bully pulpit to advance a cause, and given the global debt that science owes to her, I think she really does deserve one.


She needs to step it up by killing thousands of innocent people from drone strikes to increase her chances.


Or start a war with her own citizens and use hunger as a weapon by destroying infrastructure, farming equipment and blocking foreign aid, creating a famine among millions.


Ethiopian PM choice was really the most stupid thinf they did with the peace prize.


Once they started using it as a political tool to say “we support this persons policies”. I’m happy that they continue to have a near perfect record of terrible winners. It makes it clear if you win the Nobel Peace Prize you’re likely not as good as you thought.


For a woman in the similar age, I would say she is more deserving of Nobel Peace Prize than Malala Yousafzai.

The risk that Malala takes in advocating for women right in Islamic countries is admirable, there is no denying in that. However, her impact are minuscule compared to Alexandra's in the big picture of progression as a human race. Malala's activism has not changed much on the course of women right in the countries where religion governs the lives from family to governance.


You do not need to uplift one impactful woman by putting down another.


It's become a prize of the political establishment though (why did Kissenger and Obama get one?) - there is no way they'd give out such a controversial one these days.


Start nominating...


Context: Sci-Hub stopped uploading new papers in December 2020, after being ordered to do so by an Indian court. There was (is?) some hope of winning the case which could make Sci-Hub legal in India.

https://www.reddit.com/r/scihub/comments/mk46x4/scihub_v_els...

https://news.ycombinator.com/item?id=26264378


Is there actually any update on the case?


Wikipedia says the restriction was only for a couple weeks:

>In December 2020, Elsevier, Wiley and the American Chemical Society filed a copyright infringement lawsuit against Sci-Hub and Library Genesis in the Delhi High Court... >...The high court restricted the sites from uploading, publishing or making any article available until 6 January 2021.

But it's very strange to me that that could go by unnoticed.


Courts are slow. The court date kept getting pushed back (now scheduled in October) and so Sci-hub paused new papers for several months. But apparently Elbakyan’s lawyers have determined that the temporary restriction on uploading new materials has expired.


I find it fascinating that SciHub seems to be highlighting two large issues.

- Globalisation and the rule of law. Mostly that's a good thing. But SciHub would unlikely survive if Russia was unable to give US courts the middle finger regularly over the last decade. I am not convinced that the benefits of totalitarian regime outweighs the downsides but it is a thing

- copyright law is not patent law, science is not patents

So patents do seem to provide a way for inventors to protect a revenue stream. But the model of science is not one person or org does all the research and then exploits it for profit. So patents don't really seem to support science. And copyright has nothing to do with either.

https://en.m.wikipedia.org/wiki/Sci-Hub

Edit: one way of looking at it is that Science has socialism built in. Patents are a means of encouraging innovation by arranging that revenue flows back to the innovator, as long as the whole market obeys the patent law and licensing conditions.

But apart from "bad" actors, the amount of licensing is vast and probably impossible to track back (you would need point of sale, bill of materials, supply chain data etc)

Science has a simpler answer - publish the innovation openly and assume that the growth in wealth will feed back into general wealth growth. Which is kinda looking like "everyone shares".

So it suggests a singularity style step function - when / if something like UBI works, science will have a massive boost as the feedback loop is not mediated through university grants etc.


The main flaw in both patent and copyright law is that the terms of protection are entirely too long. It's now over a century for some corporate owned copyrights. That directly contradicts the goal of copyright: to advance science and the arts by encouraging the building of a commons. The terms of exclusivity were meant as incentives to that end, not personal or institutional rewards. Lately I've been thinking that those terms should have been gradually shortened as the pace of change and speed of communications (including opportunities for sales) increased. At this point IP law is actively hindering the advance of science and art. Projects like Sci-Hub are restoring sanity to a system that lawmakers have for various reasons sold out to the barbarians.

Two years, non-renewable, for any invention or work that had absolutely no public funding. Anything with direct public funding goes immediately into the public domain.


>- Globalisation and the rule of law. Mostly that's a good thing. But SciHub would unlikely survive if Russia was unable to give US courts the middle finger regularly over the last decade. I am not convinced that the benefits of totalitarian regime outweighs the downsides but it is a thing

I can't wait for 2030 when China overtakes the US as the worlds largest economy. We can all then be banned from the internet for having seen the doctored photo of nothing happening on May 35th.


That's not going to be a problem. When it comes to censorship of the Internet, your primary risk is the 5 / 14 / 19 eyes groups, not China.

China isn't a critical part of the global Internet today and they'll be even less a part of it in another decade. They operate their own separate network that only poorly connects to the Internet, by design. That separation will increase considerably over this decade.

Xi is currently putting new restraints into place to pull Chinese tech companies back even further from the Internet and into their own isolated network.

When China becomes the largest economy by GDP, it'll be meaningless to the operation of the Internet, which they'll only kinda-sorta be a part of.

Further, China is now widely regarded as the top adversary to the US and the West. That context will get increasingly confrontational and war-like in the coming years. Nearly all members of Congress are on board the anti-China bandwagon now, they've all gotten the message from above (the military industrial complex, which dictates nearly all foreign policy). The cultural atmosphere will increasingly become like it was when the USSR was the primary adversary for decades. As that confrontation increases, China's influence over the Internet will be intentionally reduced by the powers that actually do control the Internet today. China sees that coming as well and is taking steps ahead of time to reduce its exposure, points of influence and risk. At this point China views a military confrontation with the West as close to inevitable (which recent Xi speeches have elaborated on).

This increasing separation effort by China is in part designed to make it possible for China to attempt to destroy/damage the Internet - if it comes to that - without posing much terminal risk to their network and economy in the process. If they take down the Internet, it'll butcher the economies of their adversaries, while their own network remains highly functional. This is something the West is almost entirely unprepared for, and China is aggressively preparing for it; an epic mistake by the West.


The enormous trade between China and the West is likely to mitigate some of this isolation - along with how exactly can you bring down the "western" internet? plus the chinese internet seems to be built on the same tech and protocols as "ours" - it's just the companies that are different ?

I am happy to be corrected but interested at the counter point to such doom and gloom


> So patents do seem to provide a way for inventors to protect a revenue stream.

Not necessarly:

1. The vast majority of patents are owned by large companies, not the individual inventors.

2. Patents are very often used purely as "spoilers", i.e. preventing other companies, and even more so individuals, from working in a close enough field to the holding company's, so as not to risk patent litigation.


Sci-Hub and LibGen have contributed more to my PhD than my so-called advisers. So thank you Alexandra for your unselfish efforts to make science accessible to all!


And it has contributed to the majority of the knowledge I've gained from its published documents than I could have ever afforded.

Makes me realise I never donated. They have the pudic courtesy to never even prompt for support.


>They have the pudic courtesy to never even prompt for support.

Thank you, I learned a new word today.


There is no absolute guarantee that SciHub or any other site will always manage to survive the impressive array of forces against it.

As far as I can see, the most robust fallback has to be some kind of distributed data store that can mirror humanity's vital information on the widest possible array of computer/storage systems, and which would literally take an apocalypse to wipe out. Depending on one brave person to fight what should be our common battles (and we do that everywhere, the heroes are always lonely at the top while their actions benefit all of us) is disappointing.

Data has to be duplicated massively or it is always extremely vulnerable. DNA figured this out billions of years ago.


all Sci-hub articles are duplicated via torrents, you can help by seeding the torrents https://www.reddit.com/r/DataHoarder/comments/nc27fv/rescue_...

there is also some ongoing work moving to IPFS that could use help, see https://freeread.org/ipfs/ and https://github.com/sci-hub-p2p/sci-hub-p2p/ , it seems to have IPFS support https://sci-hub-p2p.readthedocs.io/_en/ipfs.html


SciHub is the greatest achievement in academia, research or anything that is knowledge. The truest form of accessibility of knowledge.

In those 10 years, all the students you see or saw in your campus coming from second or third world countries, most of them have used scihub to research and publish papers that enabled them to pursuit higher studies.

Taking this right (not a privilege) away will mean the second burning of Alexandria. If anyone really cared about education or knowledge in general they will advocate scibub and libgen to survive.


It's kind of whimsical to read the praise of sci-hub and fear about its future here. Tech people have access to pools of billions and billions of dollars that are casually being thrown on mindless minor addictions that offer nothing to humanity. Can't someone make an initiative or startup that disintermediates those very old publishers who are trolling science?


There is no profit in it. Of course someone like Gates or Bezos could support a SciHub hundreds of years into the future, but they will avoid the inevitable controversy that will come attached. Rich people will play it safe. The poor can't afford the pennies. So ultimately it falls on to a few middle-class activists and concerned people all over the world. Ethically SciHub (or the idea behind it) is on much firmer ground than say PirateBay, but investors want no controversy, unless it is religious or political.

I bet a "SciHub" devoted to political or religious articles will find backers despite being more controversial. People can be worked into passion for lots of things but not science and abstract philosophical ideas. That is largely why the FSF continues to barely cling to life.


> There is no profit in it

I don’t know that I’m 100% convinced by this. Elsevier makes like 40% profit margins, and their primary contribution to researchers is prestige lock-in, basically.

Scientific publishing is in many ways stuck in the previous century. There are plenty of interesting opportunities to build technology to make that entire ecosystem more awesome, especially as funding agencies increasingly require open access publication.

Maybe there’s still not enough to build a stable company, but I took your reaction as a bit fatalistic


> Tech people have access to pools of billions and billions of dollars that are casually being thrown on mindless minor addictions that offer nothing to humanity.

I feel the same way... It's always some new surveillance capitalism nightmare, endless advertising for garbage nobody really needs, addictive games with the win button hooked up to the player's credit card.

And whenever someone makes something truly world-changing like Sci-Hub all these people start coming after them because they're hurting their "interests". Who cares about their interests?


They have access to poils of billions of dollars because they are expected to aim to return multiples of those billions.

How would a startup reach unicorn valuation based on publishing scientific papers? By making people pay. So we’re back to square one

No, for some problems private enterprise is not necessarily the best solution


It would obviously not be for-profit. But it would benefit them indirectly. For investors who invest in biotech, they need to realize that open access is as important as open source is to software. Scientists have failed to overcome the collective action problem


> For investors who invest in biotech, they need to realize that open access is as important as open source is to software.

Open Source is important to software companies only because it lets them build new products cheaply. Most companies don't care, unless they can use it for cheap publicity, or to hurt their competition by making an open-source version of their proprietary product.

OSS definitely did shift the landscape - it almost killed desktop software! There's a reason why technology business love SaaS business model - it's not just the recurring revenue, it's also because it's immune from being killed by open source. You can create a free alternative to any software running on end user's machine, but you can't do that to proprietary code running on servers the service provider owns.

Open access doesn't bring such immediate, direct benefits to research companies - so investors will be less keen to sponsor it. There's just no business model here.


Remember that if Sci-Hub helps you and you can afford it, a donation to them would go a long way to helping keep them afloat.


True question: is there a value that elsevier&co are providing? Proofreading or Selecting the articles, by example? If so, even if their price reflect more their de facto monopoly than this value, if we want to replace them we need to also find a way to replace it. Could we outcompete them?


Nominally, publishers proof read, copyedit, format and publicise my articles after they've been through peer review.

Occasionally the editors -- inevitably based in Chennai -- spot a typo that the reviewers missed. Sometimes they cock things up massively, especially equations -- I had a big argument about a tickz-based scheme once. The formatting is done automatically and for me as a latex author I really think we could do that very easily. Colleagues who use word see the text transfer to something better as a major value add. The publicity the journal adds is a strong function of how "good" it is. Science and Nature effectively exist because they are Science and Nature.


A journal was a thing when it was based on paper. What value does collecting papers to a specific issue of journal have today? Science and Nature may attract better peer reviewers but maybe we could find a new way to compute a weight for a paper depending on the people who peer reviewed it instead of the journal it appeared in?

What if we could look at authors, peer reviewers and papers as a graph of weighted edges to come up with a score that was independent on journals as a concept? And where there is an ontology for the semantics of edges (not only number of citations)?


> Science and Nature effectively exist because they are Science and Nature.

Right, the perception of being associated with authoritative and knowledgeable publishers which follow certain formalities (in other words "the proof you as a scientist belong to the Club") has been an important part of academic career progression, even if in the last few years the reputation of peer-reviewed papers in general has taken a dive.


Do journals coordinate the peer review, or does that happen prior to the submission to the journal? I was under the impression it's journals that get the peer review process going.


Yes, they do. And editors -- academics, sometimes paid, usually not -- are the ones who do it. There's a lot of "old boy's network" involved there too, as although each author "suggests" a referee usually the editors submit it to someone not suggested by the authors as well. I didn't include that in the list because it's not really a skill done by the journal, more, as they would say, facilitated by it.

Frankly, they profit from: academics writing articles (funded by someone else); academics editing a journal deciding who should review articles (usually, but not always, for free); academics reviewing articles (always for free); and academics citing articles in subsequent ones.

The whole system is a house of cards, and a relatively immutable one.


Depending on the field of research it is quite common to publish only in open access journals [1]. At my university, we are actually only allowed to publish in peer-reviewed OA journals. Most of them provide a paid subscription for the print version while the online version is free and open to everyone.

However, I got the (personal) impression, that this is only well established in fundamental research (which typically comes with few economic interest). As soon as the research is not paid by the state but by private companies (such as in medicine, robotics or any other "applied sciences"), scientists have a hard time to choose a OA journal (i.e. either it does not exist or you are not allowed to publish there). Changing this scheme is of course quite difficult, since too many commercial parties still benefit from it (which likely can only be change by law)...

[1] https://en.wikipedia.org/wiki/Open_access


Working in research, I do see a clear business value that Elsevier&co are providing and why they continue to get their money.

It's acting as an impartial rating/filtering service for the non-scientist administrators.

In essence, the funding agencies want a way to evaluate scientists and institutions without asking their scientists and institutions to do so, probably because they don't trust them, and also because they have very strong incentives to avoid making any subjective judgment themselves but instead defer to some "objective" outside source - so they use paper counts published in "proper places" or the existence of papers published in "very good places" as the evaluation metric to circumvent the (genuinely very hard!) problem of evaluating the quality/quantity of the actual research done.

And so this incentive, attached to much of the money flowing within academia, trickles down to evaluation of people when hiring and promoting (the committees also often look at paper counts and publication venue rankings instead of trying to evaluate the actual papers - which is time-consuming, and if the papers are not in "your field", then very hard to make an informed judgement) and so to the individual motivations of almost all the people in the system, who have to take into account the "proper publishing rituals" or severely limit their career.

So having worked a bit with administration and evaluation of funding proposals I kind of see why something like that is valuable in general, the main problem being is that the costs are enormous and not really commensurate with the provided value - however, all the costs and barriers are suffered by "someone else" (i.e. the scientific community), while all this value is provided exactly to the funding decision makers who have the power to prevent replacing the current system of Elsevier&co, but don't really suffer from its problems. And when I say "funding decision makers" I don't mean people who have the money and personally care about if it's spent efficiently, I mean all the administrators and bureaucrats running the process of allocating someone else's money or hiring scientists in e.g. some public institution, and whatever "sticks and carrots" these administrators have in their career. This means that to outcompete the current process, any solution would have to benefit them (not the scientists) or it won't be accepted and used, and it's hard to imagine what that would be since there's a huge barrier of entry (e.g. it must work for evaluating/ranking all scientists/institutions, across all disciplines and across all the decades of previous work, or it's not useful) and a huge inertia, as the criteria are also included in very many hard to change legal documents e.g. contracts for long-term funding projects, bylaws and processes of many organizations and committees, actual laws regulating funding institutions, etc; if we had a clear winning replacement launched today, it would still take at least 5-10 years to switch to it.

IMHO the way to change is not a competing solution - it requires institutional change, with the major decision makers simply choosing to make decisions on different factors that do not include the metrics of Elsevier et al. Here's a talk by Stonebraker which gives a strong related argument - https://www.youtube.com/watch?v=DJFKl_5JTnA&t=1220s . However, this institutional change is not that likely because, as I said, for the people who can change this there's little incentive to change, and the people who would benefit from this change are not in a position to make it.



Aaron Schwartz would be so proud and so am I. You are a hero!


I love Sci-Hub, even though I never use it. I just really like what it's doing for science. However, the law has failed us. It's very clear that Sci-Hub is overwhelmingly a Good Thing, yet governments want to tear it down because it's threatening companies' revenue streams too much.

Given how important it is, and how at risk it is, I think it's very important to find a technological solution to keep it up. We have the technology to distribute the papers (torrents) as well as a search index. I really hope that either Alexandra starts using these technologies more, or the technologies mature enough to be usable.

Then Sci-Hub would be unkillable.


If authors kept their copyrights and submitted the articles to the likes of sci-hub themselves, they would enjoy higher citations and virtually everyone would do this and all this would be legal. But now journals are keeping uncountable number of articles as hostage in order to parasitize on publicly funded institutions and universities. Higher education places in Africa and other poor parts of the world that don't have resources are unable to lawfully access the papers, it's not just morally wrong, the current system is throttling the progress of science and harms the research in important subjects.


She should get the Nobel peace prize some day


You can bet there is hidden effort being led by multiple departments and multiple govts to take down this shining beacon of knowledge. Something that embodies what the best of internet has to offer.


Theoretically speaking what would it take to assure that sci-hub survives. From what I understand it'd be like moving mountains to make it legal in US.


https://radoncnotes.com/scihub-its-back/

Here's a quick recap and reaction on the blog.


What Aaron Swartz did in the 2010's was what we all talked about around that time.

It was everything we read about as hackers in the 90's The Phrack stories for instance about missions to steal (liberate) hardware PABX's from offices.

Swartz literally went into a library to break out info, fucking Gibson.

Alexandra Elbakyan and crew beat that. And bent history. If you think history is a line, Sci-hub (Which is a little different to Libgen) changed that.


Yes, let's publicize it more so it would die faster. /s




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: