Is there any word about when sci hub is going to start adding new articles again? It's currently only useful as an archive of old research articles. New papers from the last year are not available. I never understood the rationale for stopping new content, though I believe it had some relation to some court case in India...but I don't understand why that was a reason to stop adding articles, and why it hasn't been restarted yet.
I saw that tweet, but it doesn't change the material reality: try plugging in some DOIs from recent article from the last year, and they will not be there.
Scihub used to be a great resource, now it's only a resource for old research. Still useful for background material, but not for current work.
I also don't understand why the Indian court case has any impact on new article availability. The owner is not Indian. The servers and domains are not Indian. There doesn't seem to be any actual reason to stop adding new articles, other than some idiotic halfbaked point that only hurts the people who need the articles, like when Project Gutenberg banned anyone from a German IP, except this is much worse since there is no way around it for people who need new papers.
I have a hunch that the downfall of the "Plato" real-time downloader wasn't the Indian court case but rather the fact that it helped publishers trivially identify the university accounts through which the downloads were happening. Even if the appearance of papers were delayed by a random number of days, there are other pitfalls now, and most importantly, publishers started caring. In particular, Elsevier now slaps UUIDs onto all PDFs you download from them, and no, I'm not just talking of visible watermarks. Other publishers seem to be doing similar things (there was a recent twitter thread on this, retweeted by @textfiles, which I can't find). The rational solution for Sci-Hub seems to be to buffer their uploads and release them in yearly batches, maybe programmatically removing various kinds of watermarks and diffing against the same paper downloaded from a second IP. If this is what they are doing, I'm not surprised. Not sure how much of a winning strategy they have in the long run, though.
> I also don't understand why the Indian court case has any impact on new article availability. The owner is not Indian. The servers and domains are not Indian.
Because Sci-Hub has a good chance of winning the case. The court in question has previously backed a very broad definition of what constitutes fair dealing.
> Because Sci-Hub has a good chance of winning the case.
I understand that this is the party line that is parroted whenever this issue comes up, but it does not make any sense as a rationale for keeping new articles off the site. How is not adding any new articles (but, for example, keeping old articles accessible) assisting the possible winning of the case? And more to the point, why does it matter at all if it wins or loses the case? As stated, neither the owner or the infrastructure is Indian, so of what relevancy is this jurisdiction?
And further still, the case appears to have been delayed indefinitely. That last update claims that there was going to be an update a few days ago, but there was not. The proceedings are just now a list of one postponement after another [1]. Given that new articles are being held hostage, it thus very obviously benefits the legal system and the prosecution to continue to delay the case indefinitely.
The owner might not be Indian, but she's actively defending the case(through lawyers) in India. Not following the injunction would lead her to losing the case, which is why she followed through.
She didn't have to fight the case in India, but she chose to. Why keep old papers and stop adding new papers - that probably depends on the terms of the injunction.
That is not how scihub used to function. Scihub used to have an engine, named Plato, which would fetch papers automatically if not already in their database. For the last year now, this essential service has not been operational. This is what the issue I am raising is about.
I still have airgapped windows xp systems with one software package on them to do one job. Left alone Software doesn’t bit rot, it is the never ending stream of updates that cause it to cease working over time.
Think about it. Software like Plato downloads papers from around the web. If the environment around software (i.e the web) changes, software without updates bitrots.
What I read was that the Indian judicial system tends to be favorable to things like Sci Hub in its interpretation of copyright, and Sci Hub wanted to act in good faith with regard to that court, so as to have a fairly solid basis in international law for operating, should it rule in Sci Hub's favor. I might be off in this understanding, but that's what I understood.
Yeah, I have heard this reasoning, but it seems muddled. How is keeping the site online so old articles are available but no new articles are added acting in "good faith"? It's not like the old articles are any less copyrighted than the new articles, so this doesn't make sense to me.
The court case has also been delayed for over a year now, so if it is delayed indefinitely, like it seems to be, then we will also not get access to new articles, also indefinitely? That's ridiculous. The last update from the court proceedings claimed that there would be a new update over a month ago, which in turn got delayed yet again to a few days ago, and there's been nothing [1].
As I understand it, this was a sort of compromise the courts worked out in the interim. Seems likely that at some amount of delay, Sci-Hub would just break the injunction (I personally hope they don't, as the case seems to be wrapping up). I don't think any of this is ridiculous.
The publishers now encode their papers with individual identifiers, that generationP calls UUIDs on all pdf's. That means they can trace it back to the institution(and perhaps the actual prof?) They have then sent nastygrams to threaten them with fees or loss of access to the institution - potentially serious punishment.
What is needed is a way to run the papers through an OCR recognition program to create renewed text and to further process to randomly vary a large number of adjacent letter spacings. (this is called 'micro kerning' and it allowed a second order of unique document recognition where the document is scanned by the journal to look for these fractional kerning gaps in the letter spacing which can lead back to the institution). I suppose a program flow could be made to OCR + random micro kerning changes - it would take time, but once set up it would be a rapid computer based document flow process. Photos/charts could be sanitised as well, but legends etc on them would also need OCR and micro kerning adjustments. With 25 words of text, each with 6 letters, micro kerning can easily create 10,000 unique ID's, easily enough to cope with all subscribed institutions - and that is on one chart/photo.
I suspect the subscribing institutions have acted with firm words to their profs and grad students to block this. This can easily be told to us if a few people in assorted institutions let us know if they have been read firm words about this?
I am not sure how Sci-hub can get past this, unless they get a good Indian court ruling and can use Indian friends to scan printed copies of journals - if they exist in this online age?
All sci-hub would have to do in this case is download the same paper through three or more accounts (different institutions, networks, countries?) at three different times, rasterize them and keep the common denominator. If a pixel has no common denominator, they'd have to fall back to a default value. This is by no means a perfect method and it has its weaknesses and pitfalls, but the resulting PDF will be far less useful as a means to de-anonymize accounts using information from the PDF itself.
Publishers still have other sources of information to de-anonymize accounts if the multiple accounts/downloads aren't truly isolated from one another.
Makes me wonder how much they'd have to offer me to accept the task of implementing such tracking algorithms to modify other people's scientific papers for the purpose. Certainly I won't be the cheapest, but still. Either some developer is very vested in the idea of keeping science a secret or someone got a very nice bonus.
Edit: also the sysadmin that keeps this database safe without 'accidental' data loss on UUID to downloader mappings.
It seems to me that this type of tracking, under the guise of guarding rights is against the GDPR = unlawful tracking that they are harassing Google and FB with?
Is not justice's sword 2 edged?
(Note that this is all written from a European perspective, applies European Economic Area law and human rights as defined by the Council of Europe (which includes Russia, to everyone's surprise). I know that in the USA everyone is much more pro frontier justice, for example when it comes to pervasive and continual monitoring of employees while at work.)
I think it's very arguable that they have a legitimate interest here. Privacy has always been a weighing of interests, at least that's how I've always heard it explained by the Netherlands' face of digital law (Arnoud Engelfriet) also back in the days of WBP (the law from ~1995 that is 97% the same thing as GDPR), also in light of the European Convention of Human Rights (article 8 is a right to privacy).
A common example is filming the road: illegal, but if you park your car in front of your house and there have been car fires in your neighborhood lately, then it can be justified.
Filming employees inside a warehouse: invasion of privacy (illegal) but if there have recently been thefts from a certain part of the building then it's justified to hang up a camera there, introduce a lock that registers who went there at what time, or some such. (With adequate security measures so only authorized people can use it for the intended purpose.)
Personal example: monitoring everything I do on the company network is illegal, but because I work in a business where secrecy is important (security consultancy) it was considered justified to do spot checks, tell every employee upon entering into the employment contract that spot checks are a thing, and inform the subjects of spot checks after they were part of one. Transparent but still effective.
The two things to consider (iirc) are:
- Do my rights weigh heavier than the other party's right to privacy? (e.g. car fire is a fairly big impact on your right to the peaceful enjoyment of his possessions)
- Is there any other way in which I could achieve this goal with a lesser impact on the right to privacy?
In the case of Elsevier, from what I heard this whole scheme is a big mafia-like practice (wouldn't want to be published in a niche corner nobody reads now would you?) and so in my opinion it's entirely unethical to support (work for) them in the first place, at least in any role except one where you think you might be able to nudge things in the right direction. But I could see how a judge says: well, that's how today's law works, that you have moral objections is something you can take to your favorite religious leader and lament about, not a court of law.
If I'm being fair, there isn't even really an invasion of privacy because PDFs don't have executable code (usually) that can track you. Rather, they need to hide it somewhere so that, if it appears on the pirate bay, they can read out the ID and see who the perpetrator is. More like a criminal investigation using a fingerprint on a glass, and less like a cookie actively sent with every action you perform on a website.
TL;DR: GDPR applies, but it probably doesn't make this database illegal. It's not a loophole by which a person can say no to literally everything. (Would be cool if you could require the police to stop using your fingerprint in a legitimate investigation.)
Still, if I were that sysadmin... I probably wouldn't 'drop table elsevier', but I'd rather live off government benefits than support that scheme.
Thank you for this cogent analysis. It sounds valid as they probably notify their subscribers that they forbid document release to the wild and use technical means to measure it. The only hope is the usual request to the authors, who have traditionally sent papers to all requests, of course the profs can also pay 'blotgelt', and as long as the fees are small enough or the prof can afford, file it as an open paper. I am encouraged that the open paper concept is gaining traction - in the days when Nobel was alive, his Nobel Prize was in fact created on this basis of open papers freely sent to others - sadly, had he known, he may well have included the open paper concept into his legacy to prevent the blatant rent seeking empires we now see.
That said, curation and acceptance/publication of papers is a service of value and it is needed, at a lower cost as the journals do a job of work to keep totally crappy AI/idiot created papers that get into the totally open journals who are unable to deal with the blizzard of papers they face - the bad get through. There are also the fake fee based online journals where a distressingly small fee gets you online for citation by your coterie of anti vaxxers, nutrition gurus, etc etc. - essentially insoluble, save by intelligent readers who often shed this burden to avoid the waste of time. Fortunately the 'contents **' publications winnow most of these out.
No, that's absurd. More likely a different factor. Some googling and lo and behold: starting with Germany, the universities of several European countries cancelled their agreements with Elsevier, pressuring Elsevier to give them better deals, including open access by default. I imagine it lessens the need for Sci-Hub.
Do a reflow of the entire document? As long as it's semantically recognizable it can be done. Figures are harder. ML can really help here: someone make a pdf2tex!
LOL, student sabotage!. I think the paywalls would extract a penance that maintains their cash flow - remember, Greed is their mantra, may it never ever cease.
It's interesting how sci-hub's papers on medicine dwarf those in many other fields like comp-sci, math, and physics. I wonder if that reflects the number of papers in those fields, or if sci-hub just has a non-representative sample. If the latter, why?
I guess today is much easier to find new noteworthy, publishable facts in medicine than physics. New diseases are discovered every year, and old diseases are poorly understood (e.g Alzheimer disease), and the treatments for many of them are still sub-optimal, or even inexistent. Every patient is different, individual cases are research-worth. We only got antibiotics in the 1940s. On the other hand, most big breakthroughs of physics happened between the 17th century and the first decades of the 20th century. After the general case is cracked in physics, individual cases have very little publishing value.
> much easier to find new noteworthy, publishable facts in medicine than physics
Plus I would imagine that everyone wants their illness looked into, thus that's where funding tends to go. I care much less for physicists to figure out what dark matter is than how to treat health problem X that bothers me daily. (Just an example. In my particular case I'm healthy and would actually be quite excited about dark matter findings compared to any individual illness solution... but still.)
This comment would be worth a lot more with some stats about funding going towards the different fields, though. Not sure where to find that.
It does appear to be the latter. I just searched for several famous ML papers (attention is all you need, lottery ticket hypothesis, capsules, etc) and they are not there. I think if someone counted all papers that have been ever published anywhere, the picture would be a lot different.
Could be. I've been an ML researcher for 8 years and I haven't used sci-hub until today. Ironically one of my (very obscure) papers is available there.
I suspect that a lot of ill people, and people with loved ones who are ill, spend an enormous amount of time reading every paper they can find on the illness, or trying to narrow symptoms down to a particular illness. That's also a situation where you can go through 50 papers easily for every useful one (or even intelligible one, if you're a layman) you find, so a definite sci-hub situation unless you're independently wealthy. It's the requests that draw (or once drew, right now I guess) stuff into the database.
I think it's due to the sheer number of biomedical papers published each year, coupled with comp-sci, maths and physics papers being less likely to be behind a paywall.
Interesting. Medical field dominates research in terms of publications. Chemistry produces double the papers compared to physics, and humanities are smaller than biology but larger than physics. I wonder where machine learning papers fit in - CS or Math or both?
Don't use the term "open access" like this. A paper published on arXiv is free to read, and was freely published. "Open access" is a scam by the big publishers, where they don't take money from the readers, but make the authors pay. Or, putting it another way, anyone can pay their way in those journals and publish (sometimes sub par) papers.
As I wrote on another comment, I wasn't aware that there are multiple forms of open access. Since it appears that arxiv (again, at least high energy physics) employs mostly either gratis or libre open access, and since the Wikipedia article explicitly calls it an open access archive, I see no harm in calling it that either.
"arXiv (pronounced "archive"—the X represents the Greek letter chi [χ])[1] is an open-access repository of electronic preprints and postprints[...] "
Not that I want to defend open access fees but the way you describe it is incorrect. Paying for open access fees with large publishers like Springer is an option that is separate from the review system, you can only choose it once your paper has been reviewed and accepted.
Publishers do misuse the concept though. They try to stay clear of using the term when they do. They use terms like Free Access or some of the more dubious colour variants of OA that doesn't provide all the freedoms that the Berlin Declaration of OA defines.
Interestingly articles uploaded to arXiv with the arXiv.org perpetual non-exclusive license are not OA as the reader is not allowed to redistribute the paper.
I wasn't aware that there were different distinct forms of "open access", so I had to read it up on Wikipedia. From what I understand, publications on arxiv are either gratis or libre open access.
Either way, we don't pay anyone any fee to publish on arxiv.
Yeah, I think people are mostly confused here. Open access is a thing when we're talking about peer-reviewed journals. arXiv hosts preprints, meaning they are available before the peer-review process has proceeded. So calling arXiv papers "open access" is misleading, because that label carries the assumption of peer review. And I've never heard of "gold open access" either. Researchers paying to have their papers published is standard, open access or not. That's just how it works. If someone is using the term "gold open access" I'm just going to assume they have no idea how science publishing operates.
I mean, these terms are defined and widely understood, so, um, no.
"gold open access" is where you publish to a peer-reviewed open-access journal, which may or may not involve the author paying for the privilege.
"green open access" is where you publish to any peer-reviewed journal, and then the author self-archives the paper somewhere, like an institutional website, arXiv (as a "post-print", not a pre-print), or even Sci-Hub.
There are discussions involved about copyright and license and so on, but that's the gist of it.
> anyone can pay their way in those journals and publish (sometimes sub par) papers.
This is not true and comments like this are damaging to science.
Open Access papers are still peer-reviewed and by far not all of them make it into the journal. You can't pay your way into those journals.
Of course there are shady pseudo-journals which just cash in on the fee, do not carry out peer review and just dump the paper on the internet. But any scientist should be able to tell such scam journals from serious ones.
True, some journals, like many of the Frontiers series or PLOS One, make it very hard to be rejected in peer review. As long as your paper is reasonably well written and doesn't contain falsehoods it will almost certainly make it to publication. Still, you don't "pay your way into those journals".
Granted, many of papers in those journals report mere incremental progress. But these journals are still attractive for scientists to publish in, for obvious reasons. Publishers like PLOS use those journals as cash cows to fund their higher-tier offerings.
It is fair that the author pays for publication in those journals, since the most benefit is often for the author, not the reader. For the progress of science these offerings are not so useful, unfortunately.
Sci-hub will grab and serve anything with a DOI (or at least used to; I don't know if they have started ingesting papers again after turning it off a while ago). I have found open access papers there before. It's simpler to just paste the DOI into sci-hub than to check to see if it's one of the few open access articles in a mostly paywalled journal.
Alexandra Elbakyan is a titan and a saint. I couldn't have been able to finish my research without access to papers my institution wasn't subscribed to.
Isn't it fantastic that we are alive and seeing the resurrection of the great library of Alexandria right before our eyes?
She has done more than any other organization or individual in the history of mankind when helping people in second and third world country pursue advance research since the advent of internet. Well she and the people who pirate and distribute MS Office. Faculties around the world recommend scihub as the main and only source of research and journals.
Here's a notebook that fetches Sci-Hub mirrors from Wikidata and tests them. I also included an iOS Shortcut to add to your Share screen. When you're on a site that Sci-Hub recognizes and you use the shortcut it will try to fetch the paper.
AFAIK its not really working again? I think there was an upload of a bunch of papers in a batch recently, but not ones that I was hoping for. I'm sort of worried about the past-tense language in this page suggesting that it isn't starting back up again.
> I accessed a paper from late last year not so long ago. I think it's working fine.
It is not. A large batch of new papers was added manually, but the old service of typing in a DOI and having a paper be retrieved automatically is not working. Pick 10 random DOIs from 2022 and see how many Scihub will return.
I hope more disciplines can shift the publishing culture towards the norm found in physics, where arxiv is the go to. I'm not sure why that is the case but it's pretty great.
If only the Nobel Committee would say:- The Nobel Committee will only consider research published under an Open Source Access repository in reviewing published papers for consideration for the Nobel Prize after ~~ June 30, 2022.
This would unleash a horde of hungry cats among those fat pigeons that are the paywalled journals.
There would be a crying and wailing - ending with piles of feathers,(and purring cats), and researchers all over the world, and especially in the many 'third world' Universities whose minds are currently held hostage to budgets and local politics.
The world would gain immeasurably by this simple act!
Sci-hub is sometimes the last resort to obtain a resource that is otherwise unobtainable. But what has become of the old way of obtaining unaccessible papers: Ask the authors for a copy?
Sites like ResearchGate make this very easy. And often a simple email does the job, too.
Advantages:
* It is legal
* The author gets feedback that someone out there reads their research
* Making direct contact to your peers is a good thing.
It's too time consuming and has an undefined likelihood of success. People will naturally flock to alternative methods, such as sci-hub, that are faster and until recently were near guaranteed to have the desired content.
Agreed, sci-hub is so much more convenient. But when the publishers finally shut it down for good, we'll have to find another solution.
A community of scientists sharing their papers would be a good thing already now.
I personally know active scientists who don't even try anymore to look up the paper, but rather go directly to sci-hub for any doi they need. I can understand why, but I also think that this doesn't lead to a sustainable publishing culture.
But honestly, how often does one have to skim that many papers in a day, to a level where the freely available abstract is not sufficient?
Perhaps every once in a while when one compiles a survey of a new field they enter. Once the project is set on the rails, one rarely has to read that much.
> how often does one have to skim that many papers in a day, to a level where the freely available abstract is not sufficient?
More often than you might think.
To take an example from my own work, I was doing assay design a while back, and needed to collect all existing primer sets in the literature. I probably went through a hundred papers over a several day period.
i asked this question here and at many places before. why do people "rely" on an organization that sifts through hundreds of thousands of papers and then charge exorbitant prices for providing this service?
if we use the amazon analogy, is amazon with millions of products worse than a boutique cat food seller that specializes in a specific cat food for a specific cat breed? maybe. but what about the "rest" of products?
why are our scientists made to rely on elsevier et al to sift through the junk and find for them the perfect paper instead of doing it themselves? is science now such a cutthroat quick competition that it requires you to give a company the priviledge to work for you so that you dont have to do your own due diligence?
in india, we have a lot of local research that is done on open databases like shodh ganga and many more. but if you have to access foreign research material, better luck your university has an agreement with elsevier and others to pay them millions for a login. the alternative, go to scihub and find what you need.
i understand the whole quality/delivery debate but doesnt the average user already know who the big players in the specific domain are and who are trusted? or you want discoverability at the hands of a "trusted third party" without doing the legwork yourself.
then at the other end you have non-academics like me. I might have heard of a research paper in some article and i cannot read it without paying an arm and a leg. why? if we use the whole ebook/book argument that compensation is commensurate to the sales so more popular book means more money to the author but here authors arent compensated but elsevier so why should i pay elsevier? because they filtered through 1000 papers to provide 10 and for that privilege, they require unlimited royalty for ever? why?
Not only expected, but actually forced. In many places, a streak of a few years with no publications in prestigious journals can unrecoverably sink a researcher career.
so there is "prestige" in getting in a reputed journal and not the fact that the paper is peer reviewed by good scientists. the reputation of the journal matters. cool.
isnt there a need of change in the "social ideology" of such schools and this whole elitism would go away?
Because otherwise you’ll have to sift through tons of garbage “research”. It’s already a common knowledge that many articles coming from certain countries are fraudulent. There’s a lower chance of having those in journals like Nature Medicine.
> why are our scientists made to rely on elsevier et al to sift through the junk and find for them the perfect paper instead of doing it themselves?
Scientists do do that themselves. That‘s why it is called peer review. Journals take scientists work for free, they just pre-select papers, but don‘t do the review.
are you saying you rely on "pre-selection" of the journals and for that the demand exorbitant prices? why dont you rely on peer-review data on scihub itself?
is the pre-selection such an important thing that these companies necessarily have to remain in business?
It's shame that US or EU don't run open DB for journals that every university should be able to publish as back-up. Also they could create some tools to make research more productive that's free to use.
That we have to rely on single person effort that's risking their freedom to do so is fucking insane.
I did not, that's somewhat interesting (not hard to explain if you read the about pages, though). It's also well within the domain of politics and that seems to be something HN generally avoids.
I mean, even if you limited yourself to just the Peace prize (arguably the most controversial), you'd still have to reconcile your statement with the fact that people like Malala Yousafzai have won.
Remember to donate to sci-hub to keep it going! Even a small donation helps and is way more than the extortionate prices we'd all have to pay without it :D
How come we don't have extensive software for helping doctor decision making by making use e.g of bayesian inference while feeding on the available superintelligence that enable those 24 millions paper? Expert systems long passed the hype curve and it's time for them to cycle up again!
Basically: medicine as a whole is already some sort of expert system.
- Data collection and cleanup: Researchers conduct experiments to produce meaningful data and extract conclusions from that data.
This part isn't more automated because we have strict rules that prevent medical data collection and analysis without a clear purpose. Otherwise we'd be able to collect a lot more information to try and extract results from it using more inference-oriented techniques (deep learning and the like).
- Modeling & training: Expert panels produce guidelines from the results of that research. These panels are the "training part" of the system.
As a sibling comment said, replacing these panels with ML-based techniques isn't trivial because the data produced in the previous step is fairly noisy (p-value hacking, difficulty of capturing all the variables, etc.). Furthermore, the techniques that yield best results nowadays also produce them without clear explanations on why they hold, which is not something we are prepare to accept in medicine.
- Execution: Doctors diagnose and treat following said guidelines. In fact, they use decision flows that they themselves call... algorithms!
The main reason why execution is not automated is that we do not have the technology for machines to capture the contextual and communication nuances that doctors pick up on. There can be a world of difference between the exact same statement given by two different patients or even the same patient in two different situations. Likewise, the effect of a doctors' statement can be quite literally the opposite depending on who the patient is and their state of mind. One of the most important aspects of the GP's job is to handle these differences to achieve the best possible outcomes for their patients.
Because research can be controversial. There’re papers in my field saying patients have increased frequency of certain cells. There’re other papers saying they’re not. Go figure.
It's kind of ironic twist of fate and disgrace that something like SciHub needs to be hosted/served from Russia in order to avoid getting squished and destroyed by rent seekers greedy hands.
If anything is a disgrace it's your russophobic comment.
Alexandra Elbakyan, who created SciHub, was born is Kazakhstan (exUSSR) and later studied in Russia where she lives now. I'm proud that Russia is one of the few countries on the planet where American exterritorial law doesn't apply.
I'm not sure the preceding comment was actually rusophobic. To me it seems you're mostly in agreement in that this content needs to be hosted somewhere safe from the US. It's just that you are proud - because it's hosted in your country - and the previous person is annoyed, because it cannot be hosted in theirs.