Hacker News new | past | comments | ask | show | jobs | submit login
Sci-Hub statistics and database (sci-hub.ru)
547 points by bluish29 on Feb 12, 2022 | hide | past | favorite | 128 comments



Is there any word about when sci hub is going to start adding new articles again? It's currently only useful as an archive of old research articles. New papers from the last year are not available. I never understood the rationale for stopping new content, though I believe it had some relation to some court case in India...but I don't understand why that was a reason to stop adding articles, and why it hasn't been restarted yet.


They had resumed adding articles after receiving legal advice that the Delhi High Court injunction only applied for a few months.

https://mobile.twitter.com/ringo_ring/status/143435621720862...


I saw that tweet, but it doesn't change the material reality: try plugging in some DOIs from recent article from the last year, and they will not be there.

Scihub used to be a great resource, now it's only a resource for old research. Still useful for background material, but not for current work.

I also don't understand why the Indian court case has any impact on new article availability. The owner is not Indian. The servers and domains are not Indian. There doesn't seem to be any actual reason to stop adding new articles, other than some idiotic halfbaked point that only hurts the people who need the articles, like when Project Gutenberg banned anyone from a German IP, except this is much worse since there is no way around it for people who need new papers.


I have a hunch that the downfall of the "Plato" real-time downloader wasn't the Indian court case but rather the fact that it helped publishers trivially identify the university accounts through which the downloads were happening. Even if the appearance of papers were delayed by a random number of days, there are other pitfalls now, and most importantly, publishers started caring. In particular, Elsevier now slaps UUIDs onto all PDFs you download from them, and no, I'm not just talking of visible watermarks. Other publishers seem to be doing similar things (there was a recent twitter thread on this, retweeted by @textfiles, which I can't find). The rational solution for Sci-Hub seems to be to buffer their uploads and release them in yearly batches, maybe programmatically removing various kinds of watermarks and diffing against the same paper downloaded from a second IP. If this is what they are doing, I'm not surprised. Not sure how much of a winning strategy they have in the long run, though.

Guys: post your papers on the arXiv.


This might be the twitter thread you’re referring to? https://twitter.com/json_dirs/status/1486120144141123584



Yep, thank you!


> I also don't understand why the Indian court case has any impact on new article availability. The owner is not Indian. The servers and domains are not Indian.

Because Sci-Hub has a good chance of winning the case. The court in question has previously backed a very broad definition of what constitutes fair dealing.

https://en.m.wikipedia.org/wiki/University_of_Oxford_v._Rame...


> Because Sci-Hub has a good chance of winning the case.

I understand that this is the party line that is parroted whenever this issue comes up, but it does not make any sense as a rationale for keeping new articles off the site. How is not adding any new articles (but, for example, keeping old articles accessible) assisting the possible winning of the case? And more to the point, why does it matter at all if it wins or loses the case? As stated, neither the owner or the infrastructure is Indian, so of what relevancy is this jurisdiction?

And further still, the case appears to have been delayed indefinitely. That last update claims that there was going to be an update a few days ago, but there was not. The proceedings are just now a list of one postponement after another [1]. Given that new articles are being held hostage, it thus very obviously benefits the legal system and the prosecution to continue to delay the case indefinitely.

[1] https://delhihighcourt.nic.in/dhc_case_status_oj_list.asp?pn...


The owner might not be Indian, but she's actively defending the case(through lawyers) in India. Not following the injunction would lead her to losing the case, which is why she followed through. She didn't have to fight the case in India, but she chose to. Why keep old papers and stop adding new papers - that probably depends on the terms of the injunction.


> Why keep old papers and stop adding new papers - that probably depends on the terms of the injunction.

As per the official tweet that has already been mentioned in this thread [1]:

> how about the lawsuit in India you may ask: our lawyers say that restriction is expired already

So according to the owner's official Twitter, this is no longer a valid reason, and yet new papers are still not accessible. Why is that?

[1] https://mobile.twitter.com/ringo_ring/status/143435621720862...


Haven't got around to adding yet?


That is not how scihub used to function. Scihub used to have an engine, named Plato, which would fetch papers automatically if not already in their database. For the last year now, this essential service has not been operational. This is what the issue I am raising is about.


> this essential service

Give man a fish and he will praise you for a day. Give man a spinning and he will bitch at you because it is not a fish.


It's clear what you're talking about. Software bitrots over time. Plato might need fixes, might have a huge backlog, lots of stuff can happen.


I still have airgapped windows xp systems with one software package on them to do one job. Left alone Software doesn’t bit rot, it is the never ending stream of updates that cause it to cease working over time.


Think about it. Software like Plato downloads papers from around the web. If the environment around software (i.e the web) changes, software without updates bitrots.


What I read was that the Indian judicial system tends to be favorable to things like Sci Hub in its interpretation of copyright, and Sci Hub wanted to act in good faith with regard to that court, so as to have a fairly solid basis in international law for operating, should it rule in Sci Hub's favor. I might be off in this understanding, but that's what I understood.


Yeah, I have heard this reasoning, but it seems muddled. How is keeping the site online so old articles are available but no new articles are added acting in "good faith"? It's not like the old articles are any less copyrighted than the new articles, so this doesn't make sense to me.

The court case has also been delayed for over a year now, so if it is delayed indefinitely, like it seems to be, then we will also not get access to new articles, also indefinitely? That's ridiculous. The last update from the court proceedings claimed that there would be a new update over a month ago, which in turn got delayed yet again to a few days ago, and there's been nothing [1].

[1] https://delhihighcourt.nic.in/dhc_case_status_oj_list.asp?pn...


As I understand it, this was a sort of compromise the courts worked out in the interim. Seems likely that at some amount of delay, Sci-Hub would just break the injunction (I personally hope they don't, as the case seems to be wrapping up). I don't think any of this is ridiculous.

Relatedly, if any academics want to help a small bit to resolve this by signing the amicus brief (i.e. intervention application) we made for Sci-Hub, you can do so by contacting me through https://docs.google.com/forms/d/1_if6Lipu-YPBMLk6zYjBxDFRA_c... and I will connect you with the coordinating lawyer. You can read more about this at https://forum.effectivealtruism.org/posts/bEKwqNDGysnZRcmpw/...


In India, courts have famously few remedies against no-show from plaintiffs.


The publishers now encode their papers with individual identifiers, that generationP calls UUIDs on all pdf's. That means they can trace it back to the institution(and perhaps the actual prof?) They have then sent nastygrams to threaten them with fees or loss of access to the institution - potentially serious punishment. What is needed is a way to run the papers through an OCR recognition program to create renewed text and to further process to randomly vary a large number of adjacent letter spacings. (this is called 'micro kerning' and it allowed a second order of unique document recognition where the document is scanned by the journal to look for these fractional kerning gaps in the letter spacing which can lead back to the institution). I suppose a program flow could be made to OCR + random micro kerning changes - it would take time, but once set up it would be a rapid computer based document flow process. Photos/charts could be sanitised as well, but legends etc on them would also need OCR and micro kerning adjustments. With 25 words of text, each with 6 letters, micro kerning can easily create 10,000 unique ID's, easily enough to cope with all subscribed institutions - and that is on one chart/photo. I suspect the subscribing institutions have acted with firm words to their profs and grad students to block this. This can easily be told to us if a few people in assorted institutions let us know if they have been read firm words about this?

I am not sure how Sci-hub can get past this, unless they get a good Indian court ruling and can use Indian friends to scan printed copies of journals - if they exist in this online age?


> I am not sure how Sci-hub can get past this

Quite trivially, actually, thanks to the good old analogue hole: https://news.ycombinator.com/item?id=30084193

All sci-hub would have to do in this case is download the same paper through three or more accounts (different institutions, networks, countries?) at three different times, rasterize them and keep the common denominator. If a pixel has no common denominator, they'd have to fall back to a default value. This is by no means a perfect method and it has its weaknesses and pitfalls, but the resulting PDF will be far less useful as a means to de-anonymize accounts using information from the PDF itself.

Publishers still have other sources of information to de-anonymize accounts if the multiple accounts/downloads aren't truly isolated from one another.


Yes, that would mush their secret sauce...


Makes me wonder how much they'd have to offer me to accept the task of implementing such tracking algorithms to modify other people's scientific papers for the purpose. Certainly I won't be the cheapest, but still. Either some developer is very vested in the idea of keeping science a secret or someone got a very nice bonus.

Edit: also the sysadmin that keeps this database safe without 'accidental' data loss on UUID to downloader mappings.


It seems to me that this type of tracking, under the guise of guarding rights is against the GDPR = unlawful tracking that they are harassing Google and FB with? Is not justice's sword 2 edged?


(Note that this is all written from a European perspective, applies European Economic Area law and human rights as defined by the Council of Europe (which includes Russia, to everyone's surprise). I know that in the USA everyone is much more pro frontier justice, for example when it comes to pervasive and continual monitoring of employees while at work.)

I think it's very arguable that they have a legitimate interest here. Privacy has always been a weighing of interests, at least that's how I've always heard it explained by the Netherlands' face of digital law (Arnoud Engelfriet) also back in the days of WBP (the law from ~1995 that is 97% the same thing as GDPR), also in light of the European Convention of Human Rights (article 8 is a right to privacy).

A common example is filming the road: illegal, but if you park your car in front of your house and there have been car fires in your neighborhood lately, then it can be justified.

Filming employees inside a warehouse: invasion of privacy (illegal) but if there have recently been thefts from a certain part of the building then it's justified to hang up a camera there, introduce a lock that registers who went there at what time, or some such. (With adequate security measures so only authorized people can use it for the intended purpose.)

Personal example: monitoring everything I do on the company network is illegal, but because I work in a business where secrecy is important (security consultancy) it was considered justified to do spot checks, tell every employee upon entering into the employment contract that spot checks are a thing, and inform the subjects of spot checks after they were part of one. Transparent but still effective.

The two things to consider (iirc) are:

- Do my rights weigh heavier than the other party's right to privacy? (e.g. car fire is a fairly big impact on your right to the peaceful enjoyment of his possessions)

- Is there any other way in which I could achieve this goal with a lesser impact on the right to privacy?

In the case of Elsevier, from what I heard this whole scheme is a big mafia-like practice (wouldn't want to be published in a niche corner nobody reads now would you?) and so in my opinion it's entirely unethical to support (work for) them in the first place, at least in any role except one where you think you might be able to nudge things in the right direction. But I could see how a judge says: well, that's how today's law works, that you have moral objections is something you can take to your favorite religious leader and lament about, not a court of law.

If I'm being fair, there isn't even really an invasion of privacy because PDFs don't have executable code (usually) that can track you. Rather, they need to hide it somewhere so that, if it appears on the pirate bay, they can read out the ID and see who the perpetrator is. More like a criminal investigation using a fingerprint on a glass, and less like a cookie actively sent with every action you perform on a website.

TL;DR: GDPR applies, but it probably doesn't make this database illegal. It's not a loophole by which a person can say no to literally everything. (Would be cool if you could require the police to stop using your fingerprint in a legitimate investigation.)

Still, if I were that sysadmin... I probably wouldn't 'drop table elsevier', but I'd rather live off government benefits than support that scheme.


Thank you for this cogent analysis. It sounds valid as they probably notify their subscribers that they forbid document release to the wild and use technical means to measure it. The only hope is the usual request to the authors, who have traditionally sent papers to all requests, of course the profs can also pay 'blotgelt', and as long as the fees are small enough or the prof can afford, file it as an open paper. I am encouraged that the open paper concept is gaining traction - in the days when Nobel was alive, his Nobel Prize was in fact created on this basis of open papers freely sent to others - sadly, had he known, he may well have included the open paper concept into his legacy to prevent the blatant rent seeking empires we now see. That said, curation and acceptance/publication of papers is a service of value and it is needed, at a lower cost as the journals do a job of work to keep totally crappy AI/idiot created papers that get into the totally open journals who are unable to deal with the blizzard of papers they face - the bad get through. There are also the fake fee based online journals where a distressingly small fee gets you online for citation by your coterie of anti vaxxers, nutrition gurus, etc etc. - essentially insoluble, save by intelligent readers who often shed this burden to avoid the waste of time. Fortunately the 'contents **' publications winnow most of these out.


No “firm words” that I heard in my neck of the USA woods.

I was struck by very low number of German downloads. Did I miss something?


No idea about Germany, but they may well be more law abiding?


No, that's absurd. More likely a different factor. Some googling and lo and behold: starting with Germany, the universities of several European countries cancelled their agreements with Elsevier, pressuring Elsevier to give them better deals, including open access by default. I imagine it lessens the need for Sci-Hub.


A good beginning


Do a reflow of the entire document? As long as it's semantically recognizable it can be done. Figures are harder. ML can really help here: someone make a pdf2tex!


Does this mean it is trivially easy for any student or employee to make their institution stop paying publishers by getting it banned?


LOL, student sabotage!. I think the paywalls would extract a penance that maintains their cash flow - remember, Greed is their mantra, may it never ever cease.


It's interesting how sci-hub's papers on medicine dwarf those in many other fields like comp-sci, math, and physics. I wonder if that reflects the number of papers in those fields, or if sci-hub just has a non-representative sample. If the latter, why?


I guess today is much easier to find new noteworthy, publishable facts in medicine than physics. New diseases are discovered every year, and old diseases are poorly understood (e.g Alzheimer disease), and the treatments for many of them are still sub-optimal, or even inexistent. Every patient is different, individual cases are research-worth. We only got antibiotics in the 1940s. On the other hand, most big breakthroughs of physics happened between the 17th century and the first decades of the 20th century. After the general case is cracked in physics, individual cases have very little publishing value.


> much easier to find new noteworthy, publishable facts in medicine than physics

Plus I would imagine that everyone wants their illness looked into, thus that's where funding tends to go. I care much less for physicists to figure out what dark matter is than how to treat health problem X that bothers me daily. (Just an example. In my particular case I'm healthy and would actually be quite excited about dark matter findings compared to any individual illness solution... but still.)

This comment would be worth a lot more with some stats about funding going towards the different fields, though. Not sure where to find that.


It does appear to be the latter. I just searched for several famous ML papers (attention is all you need, lottery ticket hypothesis, capsules, etc) and they are not there. I think if someone counted all papers that have been ever published anywhere, the picture would be a lot different.


So does that mean that vastly more people in medicine use sci-hub than do people from other fields?

Or is there some other reason for the discrepancy?


Could be. I've been an ML researcher for 8 years and I haven't used sci-hub until today. Ironically one of my (very obscure) papers is available there.


I suspect that a lot of ill people, and people with loved ones who are ill, spend an enormous amount of time reading every paper they can find on the illness, or trying to narrow symptoms down to a particular illness. That's also a situation where you can go through 50 papers easily for every useful one (or even intelligible one, if you're a layman) you find, so a definite sci-hub situation unless you're independently wealthy. It's the requests that draw (or once drew, right now I guess) stuff into the database.


I don't even know the scale of medical research vs CS in terms of number of schools.


The answer is right there on the page: the database covers >95% of publications among major publishers.


Math and comp-sci researchers use https://arxiv.org heavily and thus people usually do not need something like sci-hub.

Relevant xkcd - https://xkcd.com/2085/


I think it's due to the sheer number of biomedical papers published each year, coupled with comp-sci, maths and physics papers being less likely to be behind a paywall.


... which is due to the sheer amount of cash being dumped into medical research. Because congressmen get cancer.

There are several medical subfields that on their own get more funding than the entire NSF (i.e. all other science).


Interesting. Medical field dominates research in terms of publications. Chemistry produces double the papers compared to physics, and humanities are smaller than biology but larger than physics. I wonder where machine learning papers fit in - CS or Math or both?


It is note worthy that most of physics (at least high energy physics) ist published on arxiv.org and open access.

I don't know if sci hub bothers with publications that are available freely from an official source.


> published on arxiv.org and open access

Don't use the term "open access" like this. A paper published on arXiv is free to read, and was freely published. "Open access" is a scam by the big publishers, where they don't take money from the readers, but make the authors pay. Or, putting it another way, anyone can pay their way in those journals and publish (sometimes sub par) papers.


As I wrote on another comment, I wasn't aware that there are multiple forms of open access. Since it appears that arxiv (again, at least high energy physics) employs mostly either gratis or libre open access, and since the Wikipedia article explicitly calls it an open access archive, I see no harm in calling it that either.

"arXiv (pronounced "archive"—the X represents the Greek letter chi [χ])[1] is an open-access repository of electronic preprints and postprints[...] "


Not that I want to defend open access fees but the way you describe it is incorrect. Paying for open access fees with large publishers like Springer is an option that is separate from the review system, you can only choose it once your paper has been reviewed and accepted.


Open Access has a precise definition:

https://openaccess.mpg.de/Berlin-Declaration

Publishers do misuse the concept though. They try to stay clear of using the term when they do. They use terms like Free Access or some of the more dubious colour variants of OA that doesn't provide all the freedoms that the Berlin Declaration of OA defines.

Interestingly articles uploaded to arXiv with the arXiv.org perpetual non-exclusive license are not OA as the reader is not allowed to redistribute the paper.


No, "open access" means that the paper is available to readers for free. Making the authors pay is typically termed "gold open access".


I wasn't aware that there were different distinct forms of "open access", so I had to read it up on Wikipedia. From what I understand, publications on arxiv are either gratis or libre open access.

Either way, we don't pay anyone any fee to publish on arxiv.


I've never heard the term "gold open access", but I know plenty of "open access" journals that charge a fee to authors.


Yeah, I think people are mostly confused here. Open access is a thing when we're talking about peer-reviewed journals. arXiv hosts preprints, meaning they are available before the peer-review process has proceeded. So calling arXiv papers "open access" is misleading, because that label carries the assumption of peer review. And I've never heard of "gold open access" either. Researchers paying to have their papers published is standard, open access or not. That's just how it works. If someone is using the term "gold open access" I'm just going to assume they have no idea how science publishing operates.


I mean, these terms are defined and widely understood, so, um, no.

"gold open access" is where you publish to a peer-reviewed open-access journal, which may or may not involve the author paying for the privilege.

"green open access" is where you publish to any peer-reviewed journal, and then the author self-archives the paper somewhere, like an institutional website, arXiv (as a "post-print", not a pre-print), or even Sci-Hub.

There are discussions involved about copyright and license and so on, but that's the gist of it.


> anyone can pay their way in those journals and publish (sometimes sub par) papers.

This is not true and comments like this are damaging to science.

Open Access papers are still peer-reviewed and by far not all of them make it into the journal. You can't pay your way into those journals.

Of course there are shady pseudo-journals which just cash in on the fee, do not carry out peer review and just dump the paper on the internet. But any scientist should be able to tell such scam journals from serious ones.

True, some journals, like many of the Frontiers series or PLOS One, make it very hard to be rejected in peer review. As long as your paper is reasonably well written and doesn't contain falsehoods it will almost certainly make it to publication. Still, you don't "pay your way into those journals".

Granted, many of papers in those journals report mere incremental progress. But these journals are still attractive for scientists to publish in, for obvious reasons. Publishers like PLOS use those journals as cash cows to fund their higher-tier offerings.

It is fair that the author pays for publication in those journals, since the most benefit is often for the author, not the reader. For the progress of science these offerings are not so useful, unfortunately.


Sci-hub will grab and serve anything with a DOI (or at least used to; I don't know if they have started ingesting papers again after turning it off a while ago). I have found open access papers there before. It's simpler to just paste the DOI into sci-hub than to check to see if it's one of the few open access articles in a mostly paywalled journal.


Good point. If so, this data is a lot less interesting :(


I would imagine it depends on the particular paper, the more experimental ones in CS, the more theory ones in math.

Do note though that most math and ML practitioners use arxiv over sci-hub.


Machine learning publication rate is small, at least by assuming that paperswithcode contains most of the publications.


Alexandra Elbakyan is a titan and a saint. I couldn't have been able to finish my research without access to papers my institution wasn't subscribed to.


I snuck her into my own dissertation acknowledgements: https://imgur.com/bDgtBAE


Now i feel foolish for not acknowledging her, especially in elsevier papers.


Pretty sure journals wouldn't allow it, but who's going to read my dissertation anyway (:


Isn't it fantastic that we are alive and seeing the resurrection of the great library of Alexandria right before our eyes?

She has done more than any other organization or individual in the history of mankind when helping people in second and third world country pursue advance research since the advent of internet. Well she and the people who pirate and distribute MS Office. Faculties around the world recommend scihub as the main and only source of research and journals.


The great library of Alexandra, you mean.


Hahaha, I actually typo-ed Alexandra and trust me I googled that immediately. Apparently it is "Alexandria"

https://en.wikipedia.org/wiki/Library_of_Alexandria


Here's a notebook that fetches Sci-Hub mirrors from Wikidata and tests them. I also included an iOS Shortcut to add to your Share screen. When you're on a site that Sci-Hub recognizes and you use the shortcut it will try to fetch the paper.

https://observablehq.com/@iz/sci-hub


Did Sci-Hub start working again? Last time I checked it wasn't adding new papers because of some legal thing going on in India.


AFAIK its not really working again? I think there was an upload of a bunch of papers in a batch recently, but not ones that I was hoping for. I'm sort of worried about the past-tense language in this page suggesting that it isn't starting back up again.



The site is working in the sense of you can download old papers, but I don't believe any new papers from the last year are accessible.


I accessed a paper from late last year not so long ago. I think it's working fine.


> I accessed a paper from late last year not so long ago. I think it's working fine.

It is not. A large batch of new papers was added manually, but the old service of typing in a DOI and having a paper be retrieved automatically is not working. Pick 10 random DOIs from 2022 and see how many Scihub will return.


100 TB is pretty small. I wonder if she will start torrenting it so people can back it up and share the load.


> 100 TB is pretty small. I wonder if she will start torrenting it so people can back it up and share the load

This has been ongoing for a while now:

Rescue Mission for Sci-Hub and Open Science: We are the library https://www.reddit.com/r/DataHoarder/comments/nc27fv/rescue_...


Excellent, thanks!


IPFS or ARWeave...?


I hope more disciplines can shift the publishing culture towards the norm found in physics, where arxiv is the go to. I'm not sure why that is the case but it's pretty great.


If only the Nobel Committee would say:- The Nobel Committee will only consider research published under an Open Source Access repository in reviewing published papers for consideration for the Nobel Prize after ~~ June 30, 2022. This would unleash a horde of hungry cats among those fat pigeons that are the paywalled journals. There would be a crying and wailing - ending with piles of feathers,(and purring cats), and researchers all over the world, and especially in the many 'third world' Universities whose minds are currently held hostage to budgets and local politics. The world would gain immeasurably by this simple act!


Sci-hub is sometimes the last resort to obtain a resource that is otherwise unobtainable. But what has become of the old way of obtaining unaccessible papers: Ask the authors for a copy?

Sites like ResearchGate make this very easy. And often a simple email does the job, too.

Advantages:

* It is legal

* The author gets feedback that someone out there reads their research

* Making direct contact to your peers is a good thing.


It's too time consuming and has an undefined likelihood of success. People will naturally flock to alternative methods, such as sci-hub, that are faster and until recently were near guaranteed to have the desired content.


Agreed, sci-hub is so much more convenient. But when the publishers finally shut it down for good, we'll have to find another solution.

A community of scientists sharing their papers would be a good thing already now.

I personally know active scientists who don't even try anymore to look up the paper, but rather go directly to sci-hub for any doi they need. I can understand why, but I also think that this doesn't lead to a sustainable publishing culture.


Except a researcher could easily need to skim 30+ papers in a day, this is not a solution.


Agreed, in that case that wouldn't work.

But honestly, how often does one have to skim that many papers in a day, to a level where the freely available abstract is not sufficient?

Perhaps every once in a while when one compiles a survey of a new field they enter. Once the project is set on the rails, one rarely has to read that much.


> how often does one have to skim that many papers in a day, to a level where the freely available abstract is not sufficient?

More often than you might think.

To take an example from my own work, I was doing assay design a while back, and needed to collect all existing primer sets in the literature. I probably went through a hundred papers over a several day period.


> It is legal

Are you sure that the usual suspects don't make authors assign copyright or at least distribution rights? I wouldn't put it past them...


I have signed quite a few of those copyright transfer forms, and there are always a clause that would allow sharing on a personal basis.


What's the most popular paper on all of scihub? By field?


i asked this question here and at many places before. why do people "rely" on an organization that sifts through hundreds of thousands of papers and then charge exorbitant prices for providing this service? if we use the amazon analogy, is amazon with millions of products worse than a boutique cat food seller that specializes in a specific cat food for a specific cat breed? maybe. but what about the "rest" of products?

why are our scientists made to rely on elsevier et al to sift through the junk and find for them the perfect paper instead of doing it themselves? is science now such a cutthroat quick competition that it requires you to give a company the priviledge to work for you so that you dont have to do your own due diligence?

in india, we have a lot of local research that is done on open databases like shodh ganga and many more. but if you have to access foreign research material, better luck your university has an agreement with elsevier and others to pay them millions for a login. the alternative, go to scihub and find what you need.

i understand the whole quality/delivery debate but doesnt the average user already know who the big players in the specific domain are and who are trusted? or you want discoverability at the hands of a "trusted third party" without doing the legwork yourself.

then at the other end you have non-academics like me. I might have heard of a research paper in some article and i cannot read it without paying an arm and a leg. why? if we use the whole ebook/book argument that compensation is commensurate to the sales so more popular book means more money to the author but here authors arent compensated but elsevier so why should i pay elsevier? because they filtered through 1000 papers to provide 10 and for that privilege, they require unlimited royalty for ever? why?


There's some truth to "publish or perish". Scientists are expected to publish in prestigious journals.


Not only expected, but actually forced. In many places, a streak of a few years with no publications in prestigious journals can unrecoverably sink a researcher career.


A typical PhD dissertation these days is 3 publications in high-quality journals. It is explicitly required at most schools.


so there is "prestige" in getting in a reputed journal and not the fact that the paper is peer reviewed by good scientists. the reputation of the journal matters. cool.

isnt there a need of change in the "social ideology" of such schools and this whole elitism would go away?


journal prestige does not correlate with review quality. Usually it does, but then there's always reviewer 3...


Because otherwise you’ll have to sift through tons of garbage “research”. It’s already a common knowledge that many articles coming from certain countries are fraudulent. There’s a lower chance of having those in journals like Nature Medicine.


> why are our scientists made to rely on elsevier et al to sift through the junk and find for them the perfect paper instead of doing it themselves?

Scientists do do that themselves. That‘s why it is called peer review. Journals take scientists work for free, they just pre-select papers, but don‘t do the review.


are you saying you rely on "pre-selection" of the journals and for that the demand exorbitant prices? why dont you rely on peer-review data on scihub itself?

is the pre-selection such an important thing that these companies necessarily have to remain in business?


It's shame that US or EU don't run open DB for journals that every university should be able to publish as back-up. Also they could create some tools to make research more productive that's free to use.

That we have to rely on single person effort that's risking their freedom to do so is fucking insane.


It looks like the torrents have all subjects. Anybody aware if there are torrents only for Math or Comp.sci ?


Anyone notice the logo update? The key loop has been exchanged for a hammer and sickle.


I did not, that's somewhat interesting (not hard to explain if you read the about pages, though). It's also well within the domain of politics and that seems to be something HN generally avoids.


Alexandra should get the Nobel prize.


With the rent seeking companies being from Europe? Not a chance.

Nobel is a political tool that's mostly there to make a point (especially that peace prize).


He said "should", not "will". Both of you are right.


[flagged]


I mean, even if you limited yourself to just the Peace prize (arguably the most controversial), you'd still have to reconcile your statement with the fact that people like Malala Yousafzai have won.


Remember to donate to sci-hub to keep it going! Even a small donation helps and is way more than the extortionate prices we'd all have to pay without it :D


I have to be honest and say this is the one time crypto has made a lot of sense to me.


How they are able to store data without being seized?


Getting 403 Forbidden. Did my isp block it, or it this an error on their end?


How come we don't have extensive software for helping doctor decision making by making use e.g of bayesian inference while feeding on the available superintelligence that enable those 24 millions paper? Expert systems long passed the hype curve and it's time for them to cycle up again!


An older comment of mine https://news.ycombinator.com/item?id=30049522 fits well here. I'll adapt it to your question ;)

Basically: medicine as a whole is already some sort of expert system.

- Data collection and cleanup: Researchers conduct experiments to produce meaningful data and extract conclusions from that data.

This part isn't more automated because we have strict rules that prevent medical data collection and analysis without a clear purpose. Otherwise we'd be able to collect a lot more information to try and extract results from it using more inference-oriented techniques (deep learning and the like).

- Modeling & training: Expert panels produce guidelines from the results of that research. These panels are the "training part" of the system.

As a sibling comment said, replacing these panels with ML-based techniques isn't trivial because the data produced in the previous step is fairly noisy (p-value hacking, difficulty of capturing all the variables, etc.). Furthermore, the techniques that yield best results nowadays also produce them without clear explanations on why they hold, which is not something we are prepare to accept in medicine.

- Execution: Doctors diagnose and treat following said guidelines. In fact, they use decision flows that they themselves call... algorithms!

The main reason why execution is not automated is that we do not have the technology for machines to capture the contextual and communication nuances that doctors pick up on. There can be a world of difference between the exact same statement given by two different patients or even the same patient in two different situations. Likewise, the effect of a doctors' statement can be quite literally the opposite depending on who the patient is and their state of mind. One of the most important aspects of the GP's job is to handle these differences to achieve the best possible outcomes for their patients.

All that being said, there are companies trying to produce expert systems to help doctors diagnose. See https://infermedica.com/product/infermedica-api for instance.


Because research can be controversial. There’re papers in my field saying patients have increased frequency of certain cells. There’re other papers saying they’re not. Go figure.


Nailed it. With publish or perish incentivizing shenanigans like "p-hacking", many of those papers are the research-equivalent of spam.


I think Watson does something like this.



Did Watson fail because they where bad at their job or because the problem is much harder than people assumed?


Electronic health care records are not high quality data. They are qualitative, often discretized, and also distorted by fiscal shenanigans.

The best “EHR” data we have—quantitative and minimally biased—-are from large genetically diverse animal cohorts like the BXD mouse family.


I think the marketing got ahead of the tech. I would classify that as a business failure.


Watson failed because it's marketing pretending to be a technology.


I wonder how many years that sets back the field. Who will want to invest in something that could end up being Watson 2.0?


It's kind of ironic twist of fate and disgrace that something like SciHub needs to be hosted/served from Russia in order to avoid getting squished and destroyed by rent seekers greedy hands.


If anything is a disgrace it's your russophobic comment.

Alexandra Elbakyan, who created SciHub, was born is Kazakhstan (exUSSR) and later studied in Russia where she lives now. I'm proud that Russia is one of the few countries on the planet where American exterritorial law doesn't apply.


I'm not sure the preceding comment was actually rusophobic. To me it seems you're mostly in agreement in that this content needs to be hosted somewhere safe from the US. It's just that you are proud - because it's hosted in your country - and the previous person is annoyed, because it cannot be hosted in theirs.


But 'disgrace'?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: