Hacker News new | past | comments | ask | show | jobs | submit login
We need to talk about systematic fraud (nature.com)
325 points by nabla9 42 days ago | hide | past | web | favorite | 132 comments

There was an interesting comment on a recent In The Pipeline post about the relationship between academia and industry in drug developmemnt [1]:

Academics do some good work in target discovery and validation, but not anywhere near as much as they think they do. Discovery in big pharma has to repeat everything, because about 80% of the time, the published result doesn’t hold up. I suspect this is the result of no job security among grad students and post docs, because we can almost always reproduce data from Japan, where asst profs generate the data. (And we can always reproduce published data from other big pharmas, always.)

[1] http://blogs.sciencemag.org/pipeline/archives/2019/02/01/rep...

I'm hesitant to take an anonymous and unsourced internet comment at face value, but assuming it's true, that is quite damning for American academia! It occurs to me that the claimed effect, if it exists, might also be due in part to simple expertise. Or rather a lack of expertise among grad students and to a lesser extent post docs.

Is anyone here able to corroborate the claim made in OP's linked comment? And does anyone know what kind of experience and oversight the researchers doing discovery in big pharma will have?

People I know in big pharma companies have told me they first try to reproduce promising academic results before they invest a lot of money in further research, and that 70%+ of the time they fail to reproduce the results. This came up in conversation years ago when the "reproducibility crisis" first became a hot topic.

Other acquaintances of mine, who are in academia, did not find these results surprising at all. In fact, one added the observation that even when the experimental results can be duplicated, the conclusions of the study are probably still not valid because alternative, more mundane explanations are often overlooked, and the peer review process these days doesn't apply sufficient scrutiny or skepticism toward this.

I don't have any anecdotes on US vs. Japanese research.

I can’t corroborate it except with another anecdote, a medical researcher told me the same thing in passing. She had worked in academia and in industry, and said that industry doesn’t remotely trust any academic work, and that this had been valid in her experience as well. For example that at the university lab that she joined there was a measurement device which had been regularly used, but which no-one understood how to calibrate properly.

I suspect as you say it’s partly down to a lack of expertise, being thrown in at the deep end without much guidance on how to get good measurements, both practically and in terms of avoiding statistical traps like cherry-picking, allied to the pressure to find some exciting results, and journals’ lack of interest in negative results.

I'll echo the same from other commentators.

There are a LOT of issues with reproducbility in Bio [0] at the moment, and in US based science in general. From others in 'Big Pharma' that I know, they agree that much of bio-tech based science needs to be re-run. Others in the comments section here point out that Mainland China has a history of suspect research. That may have been true 5 years ago, but they are 'up to par' with US based researchers now. Meaning, that the Chinese have gotten more rigorous and the US has severely slipped in quality.

I mean this next statement with full honestly: I don't believe half the labs out there in the US these since about 2016 onward. From very private conversations with others, many labs are abusive, fraudulent, 'me-too' landmines, and generally lying to grant funders, donors, and potential/enrolled students in a strange game akin to 'Catch Me If You Can' style con-artistry.

Major efforts are being taken to 'fix' the system. Me-Too is a perfect example. The Nature paper that is linked is another perfect example. These efforts are very much needed and those that undertake these efforts should be applauded and emulated. That said, Max Planck's quote is the speed at which this change will occur, namely, one funeral at a time[2].

[0] https://www.reuters.com/article/us-science-cancer-idUSBRE82R... A bit old, but should get you started on more up-to-date research.

[1] https://www.theatlantic.com/health/archive/2018/09/what-is-f... Food Science, but it illustrates the structure that leads to fraud.

[2] https://en.wikiquote.org/wiki/Max_Planck

> From very private conversations with others, many labs are abusive, fraudulent, 'me-too' landmines, and generally lying to grant funders, donors, and potential/enrolled students

I have a friend who worked in a neurology lab a few years ago who described the PI in a similar way. I had assumed that it was not normal and told her to get the hell away from that lab asap... how sad that this kind of shortsighted idiocy is so widespread!

Why not call a spade a spade. Better title would be "We need to talk about systematic fraud in China"

It's hardly a secret. Any grad student in America already knows this - about the blatant/incredulous cheating and plagiarism that goes on between Chinese students. But oh no, can't talk about it. 'Coz it's racist y'know.

This is moving the goalposts in a dangerous way. You're suggesting that fraud isn't as big of a problem in the US or other countries, when in fact it definitely is. Chinese academics didn't force academics in the USA to fudge data, or play with statistics.

> In a 2016 study1 in Scientometrics, Byrne and Labbé reported 48 problematic papers, including the 30 papers that had incorrectly identified nucleotide fragments. These were all written by authors from China.

Right, I'm not denying that the story linked was about fraud in China. But nowhere do they state that the problem is limited to Chinese researchers.

The _degree_ of the Chinese problem is astounding. There are bad apples in all cultures, but the rampant situation in China is something else.

Is it out of line with their proportion of researchers?

It is racist! Making generalised judgements about an entire group of people based on their ethnicity background is classic racism.

If you're saying "Chinese people are predisposed to cheat and lie" -- that's racist.

If you're saying "The academic culture in China is such that cheating and lying is widespread" -- that's not racist.

I'm pretty sure the post you responded to is making the latter claim.

Blaming "culture" is a huge cop-out and often just a facade for racism. If you say "the financial incentives offered by the Chinese government for research publications encourage rampant fraud" you are getting much closer to the real issue.

"The academic culture in China is such that cheating and lying is widespread"

This statement doesn't "blame the culture". In this sentence, culture means "a set of behaviors". The sentence doesn't speculate about why that (speculated) culture exists, and it certainly isn't saying "Chinese culture promotes cheating". Actually, all it's really saying is, "Cheating and lying is widespread in academia in China". That statement is totally compatible with what you wrote.

When having these discussions, it helps to interpret other people's perspectives charitably. Communication is hard. We don't need to make it harder by assuming that everyone who says something that could be interpreted as racism is actually a racist. Plus, I've found that if I interpret other perspectives charitably, I almost always end up closer to understanding what the other person actually meant.

Wrong. Cultural effects are very real, and they are exist independently of material incentives.

Indeed. If this weren't true, then there would be no basis to shape the culture, since culture wouldn't have a shape.

Two groups can be compared on important values like "honesty" independent of group size. The comparison will look like overlapping normal distributions, usually with substantial overlap. Chinese students were famous for cheating at my school, especially in the pre-med biology track. I witnessed it myself. I also witnessed brilliant, hard-working non-cheating Chinese students in math and physics. Yeah, there's something cultural going on but it's a lot more complicated than "the Chinese are cheaters".

Financial incentives is one strong factor in shaping a culture.

The problem is, when you use the term "Chinese" do you include people from Taiwan and Hong Kong? And are you attributing the bad behaviours to the historical Chinese culture that has thousands of years of history or the comparatively recent (last century) rule by the communist regime?

The problem with the parent statement is that it has the typical lack of precision and accuracy of a racist statement.

China's population consists of more than one race. I'm all for political correctness but we need to distinguish between racial stereotyping and criticisms of a nation state.

Moreover, between racial stereotyping and criticisms of certain policies of the current government of a nation state.

To choose the maximally flamebaity example, i might not like the current Israeli administration's policy towards Palestinians, but that doesn't make me an antisemite, or even anti-Israeli.

Wouldn't it be nice if there was a platform, some kind of hub where all research papers could be published and visible for anyone with an interest in the particular field. The platform would allow anyone, be it academics or just field experts to request for a change or addition of text. Anyone could see these (pull)requests and the initial author has the power to accept or deny it. This allows for peer reviews and corrections being integrated into the paper, making the whole process more organic. Any change to paper is logged and tracked so you could see the origin of a change and it's reason. Fraudulent papers could be marked by the research community as such.

I'm being a bit sarcastic here, but a technology such as git with a platform such as GitHub could maybe make the process of publishing and reviewing research papers easier to control. Maybe you have to tailor it a bit to make it easier to work with this use case. Like open source, peers can start adding and reviewing directly at the source, instead writing a new paper or writing to all of the scientific journals to issue a different statement.

I'm all for research papers being made more open and available, rather than hidden away through paid-for journals --- this is in my opinion one of the worst parts of how knowledge is shared. However, I'm not sure that making the papers actually able to be modified through a platform like this will make research more accessible or authentic, at least without changing virtually everything else about how science is done in academic institutions.

If the content of a paper changes over time, it becomes more difficult to reference that paper. You now have to include both the paper and something which identifies which version of the paper you read. This is possible of course, but is a fairly big inconvenience for mostly stupid reasons, such as various citation styles not allowing hyperlinks of any form. There is also the possibility of the content of a paper changing completely, though we should hope that people would use a tool like this sensibly and it wouldn't be an issue.

This proposed platform puts a burden on the authors of the paper to maintain a paper indefinitely. Of course, any paper you author you are at least partially responsible for, and any legitimate errors should concern you. However, having to keep up with many comments and/or critiques after a paper is published would be very tiring. The metrics by which academics are measured for their employment are heavily biased towards publishing a lot of papers, and having to spend a lot of time curating old work would be detrimental to anybody's career. (These metrics are also a reason that people would rather write a new paper to build on the old, rather than incorporating it into an existing work). Another issues is papers with many authors: do they all need to agree on a change to have it merged? How often do you want to try to contact 10 people in 4 different timezones to debate about minor changes to old research?

I also don't really understand how a system like this would address the problem of fraudulent papers in a better way than is done today, which (as far as I am aware) is academics contacting journals to get the papers retracted.

I don't think it's a bad idea, but it is so far away from how people publish today that many things (journals, metrics, work loads) are against you.

I can't even get coauthors to use LaTeX+git. Word+EndNote+Emails is far more popular, despite how terrible it makes collaboration.

I think the exact tools used are irrelevant to the discussion. If this style of publishing + maintaining older papers became popular, hopefully some better tools would spring up.

For what it's worth, I think tools like ShareLaTeX and the like are much more usable for tracking changes in documents and enabling collaboration, especially with coauthors with little programming experience.

LaTeX and git are great for people who have invested the time to learn it. Any platform that will be successful for research collaboration could include these, but has to make the entry costs absolutely nil.

You could use indexes and web hooks or some sort of pubsub mechanism to auto notify researchers when a specific piece of text they referenced is changed.

I don't think that the technical challenges of notifying people are relevant here. The very fact that you would have to update your own research because something you referenced changed is kind of insane to consider. You now not only have the burden of maintaining your own research, but also staying up to date with changing prior research, and all of this work is not recognised by the institution that employs you.

If you reference something that turns out to be incorrect due to poor data or whatever, and it gets corrected surely you would want to update your paper?

For sure, I would want to. But with how institutions currently asses and pay researchers, it would be detrimental to my career to take the time required to understand the change and update my own work, which might involve dredging up years-old data and tools, contacting collaborators around the world (some of whom might have moved completely out of academia and therefore have no interest in all in maintaining work), getting them to agree on a change, etc. It might not even be possible to update my work - maybe a piece of apparatus would take too long to rebuild, or maybe the change requires that I change my entire experimental approach. It's a lot of work to do research and write a paper, and that's why there is such a massive focus on doing it once and doing it well.

I think its infeasible for various reasons to have people maintain old work in the way suggested in the original comment. Who is responsible for it, especially if the authors have moved institutions (which happens very regularly, especially for young researchers), or out of research entirely? Perhaps there could just be a shorter window in which the work could be looked at and changed if required, and after that it is frozen forever. But that actually sounds a lot like (a more open version of) the current peer-review-then-publish system.


It sounds like you’re imaging errors propagating through citations. Paper X cites paper Y, which in turn cites paper Z, so if there is an error in Z, X and Y are—-or should be assumed to be--invalid.

It doesn’t work like that. A few citations may be “critical” but a lot of them provide context and credit. Jones et al. first identified this problem. Smith tested several obvious but ultimately unsuccessful approaches. Here, we use the same methods as Wu and Lee. Our results differ from Cantorovich because....

Errors in those papers might make sections of the citing paper superfluous or less interesting but they don’t invalidate it. For ideas and methods, it might not even matter what the ultimate result was....

Yes, and that’s why you’ve got to push for the intitutions to change.

It's not a terribly good assumption. With programming and git (or any version control system), everything is submitted to the repository, functional features, tests, data, etc. With scientific papers, the only thing submitted is the final published paper itself. There is a countless amount of data, settings, computer code, etc, that is not included with a scientific paper.

The ability to inspect, comment, and modify a scientific publication doesn't mean much if the underlying work is unavailable.

> There is a countless amount of data, settings, computer code, etc, that is not included with a scientific paper.

And that, in itself, is a problem. How can others identify flaws in the paper if they can't see the data you used?

Open Source software includes all the code you need, not just the final binary executable. Any proposed "Github for Science" needs to do the same.

And any arguments that "the data is expensive to collect and proprietary" can be countered with "code is expensive to write and proprietary, yet open source still works".

Some data cannot be shared due to patient confidentially.

In fact, almost all medical trials or epidemiological studies use data that can't be shared outside the study.

surely there are methods of concealing patient identity? We do this with lots of other confidential personal information...

But then the research isn't reproducible...

Hashing identity solves this. All of the data can be anonymized while retaining the linkage of measurements and procedures performed on the same subject.

e.g. We don't know who this guy is, but we do know that every measurement of him was linked to this same enormous random string.

Only if your data is very limited in depth.

When doing data linking I can usually identify one person in a dataset of almost 1 million from three innocent seeming pieces of data. Pseudonimisation doesn't really work.

And any arguments that "the data is expensive to collect and proprietary" can be countered with "code is expensive to write and proprietary, yet open source still works".

These two are absolutely not equivalent. I'm all for open science, but the comparison with open source code is absurd. With the former you have to invest actual money: infraestructure, materials, suppliers, etc. and of course you have to deal/coordinate those elements. OS is... just time. There is a cost, but it is just an opportunity cost.

Most large Open Source projects are maintained by paid developers, employed at considerable cost by companies that benefit from their time. This is an "actual money" investment of considerable size.

And how many people are going to look through all the raw data you used and try to reproduce your results from it? And it's not even possible from that data to verify that the experimental setup that was producing the data in the first place was valid and and well-executed.

There are famous examples of widely-used open-source software having bad bugs (for example, OpenSSL) which no-one found, because although it was so widely used, no-one was performing careful enough analysis of the codebase. And programmers are actually paid to look at code all day. No-one working in a research institution is paid or recognised for trying to verify someone else's data or check that their research makes sense - this was the job of peer review. (Even peer review is unpaid and unrecognised at many institutions.)

open source is not a sure-fire way of catching all bugs, but it's still better than closed-source. We're talking an improved situation, not a perfect situation.

That doesn't mean there isn't a git repository (or multiple) somewhere with all the code and TeX source for the paper. It's just that these aren't usually published.

Your focus on the Tex source is pretty amusing. Imagine working for a year on a project, building countless experiments that your write code for, edit, annotate in random files or on a notepad, build huge datasets, spin up and down servers or various equipment. Then, when you finally get a result, you put write a nicely formatted document in Tex or Microsoft Word or whatever. That final document represents a tiny fraction of the work and thought you put into it. The ability to mess with that final product without the underlying 99% of thinking and effort that went into it gets you nowhere.

Seriously, this whole git idea shows the level of ignorance surrounding the actual scientific process and workflow. The data files for my work (protein crystallography) are terabytes upon terabytes large. There's no reasonable way to distribute them. They go through multiple layers of data paring. Any final raw data is included in the supplemental material, but it has already been heavily curated and reduced. And also biased.

The solution to this problem is not iteration and revision.

I'm not focusing on the TeX source. I said "code (i.e. simulation and/or analysis code) and TeX source (including the original images for plots and the data those were made from)".

I once suggested the same idea for bills in Congress. They use to stuff those little addons to the bills for billions of dollars in spending and you could not determine who did it.

I have a slightly another idea. Any research activity of a scientist should be recorded by him/her in his/her own repository. Any activity, like reading a paper, or moments like "I have an idea", or "lets try a small scale pilot experiment with fellow scientists of mine as a subjects", or "pilot experiment was unsuccessful, but can be modified...".

Any publication should be forked from the main repository, selecting from any relevant data from parent, and in that fork strict procedures of scientific research are recorded, processed, and finally end up with an article.

In this case you can track how an idea was formed, how it was modified to match empirical data. You can track when methods of data processing was chosen (before gathering of data or after), and so on.

Such a repository can give enough information to decide was the research is a fraud or no.

It would also heavily detract from actually writing the paper. It's easy in hindsight to think about certain steps it took to get where you end up, but if you are at the start, there are simply too many intentional and unintentional steps to record everything. I much rather have people focus on the actual science than documenting their steps ever few minutes.

An article is a hindsight reflections on what was done. If you are doing research you need to make notes. I personally can't remember what I've read and where I got an idea. I need to write things down, to be able to refer to it later. I'm logging my thoughts on regular basis, just to be sure they are verbalized and formalized. The goal is not to document every step in a formal manner, the goal is to leave a trail that can be tracked by a curious researcher to see how article was formed.

Current situation is bad, because articles do not capture all the information that needed to validate the research findings. p<0.05 is not enough for this. Including more information into the article is one of the ways to deal with that problem. But the other is to leave footprints, to create some historical data, that could be analized. Or not analized if no one needed it. It is like lazy-evaluation: just generate some data to allow your followers to reason about what precisely you've done. If they needed it, they will research your trail. You do not need to go through a difficult task of writing a high-quality texts about every your step.

The only thing you need to change is to make such notes public. It is coupled with some potential ethical issues, like privacy of subjects for example. If I run experiment, I gathered some data, I couldn't just throw this data into public access. It is possible that there are other ethical issues, maybe something about privacy of researcher as a person. These issues can make logging research activities a pain.

It's a bit rich for this to be proposed in a software-centric forum when our industry is merrily rebasing its way to clean linear histories that record the story the author wants to tell rather than what actually happened.

That is the idea behind the lab notebook (what was done) + journal (what was thought) system that people are supposed to be following. We only know about Gregor Mendel because of his meticulous notebooks, for instance. My boss taught me the easiest way to write a paper is to start writing it before your first experiment, references and all; it's a great way to find weaknesses in your approach before you go too far in one direction or another. The problem is people get lazy documenting everything, especially when you are trying to leave the lab as fast as possible at the end of the day :)

I am not sure about this... are you talking about experimental research? I am a theoretician. I go through several idiotic ideas before hitting on the right one. I'd be embarrassed to admit my silly, initial attempts before my colleagues.

The advantage of theoretical papers is that the reasoning is a proof/argument developed in the paper itself. With theory, there's less chance for a fraudulent paper, even though the results may have gaps.

> I go through several idiotic ideas before hitting on the right one. I'd be embarrassed to admit my silly, initial attempts before my colleagues.

Thats the point. If you published your preliminary silly idea and your colleagues didn't find immediately that it is silly, then your not sillier than they are. :)

> the reasoning is a proof/argument developed in the paper itself

Even when I studied math I found sometimes that just to have proof is not enough to understand. The only way to understand sometimes is to learn how the proof was devised. What the hell author thought to get an idea of some trick in the proof? Some tricks looks as a magic, you see they are true from a logical standpoint, but they make no sense. You cannot understand how they lead to a QED. You can verify proof by logic -- not a problem, but you unable to understand. And if you are unable to understand, then at least you are unable to reproduce this proof at the exam, the only way is to learn it by heart. By I never trained my memory to learn by heart, it is really hard for me and annoying, and moreover I do not believe in learning by heart: to master a subject you need to understand it, not to learn it by heart.

My math experience taught me that if I cannot follow the thoughts of other mathematician by just reading his proof, it means that he is smarter than me, so it doesn't matter then how stupid he behaved while searching for his proof, he outsmarted me already.

But yes, I was talking about experimental research.

> I go through several idiotic ideas before hitting on the right one. I'd be embarrassed to admit my silly, initial attempts before my colleagues.

I think this stigma against making mistakes is a big problem, as is the ego-investment. We're all stumbling around in the dark, and we shouldn't be afraid to admit that.

It’s common practice for scientists to keep logbooks for this exact reason. This is already done.

As I understand that logbooks are not available to other scientists, and if group of highly qualified scientists are making a meta-research, combining results of a dozen of articles (and rejecting two more dozens from their study), they cannot use logbooks of articles to reason about articles.

It's true that more open-access logbooks might improve things. But even if you had the logbook, you still wouldn't have the custom equipment, experimental apparatus, experimental subjects, and oversight throughout the experiment which you would need to verify that the logbook itself has data worth a damn.

At some point, you have to trust other professionals working in your field to do their work well. Or rather, decide who to trust.

Maybe you are right, but I have a chat with a person who did meta-research and as I understand they dig into articles, looking for methodology, and if they found methodology good enough they would incorporate that article into their meta-research.

It impies trust in good intentions of others. But not the blind trust that anything was done by a scientist was done good.

If logbook can be used to verify that data processing methods was picked before the first experimental subject came to a lab, then it is already a good sign. It could be much harder to do some p-hacking like [this](https://io9.gizmodo.com/i-fooled-millions-into-thinking-choc...).

I rather see reproduction platform where someone can use a paper and create a new paper or fork some parts of it with credit. The original paper should be immutable.

Subversion would be the obvious choice, not git. The whole point of a DVCS is that it avoids having a canonical repo, whereas in science we very much want to have a canonical version of the truth (otherwise there would be no such thing as fraud, if everyone is allowed their own version of the truth), and Subversion defaults to having a canonical version.

I agree. As I said probably the tools and technology should be tailor for the specific use case of scientific papers. It would be nice however that these papers evolve online, instead of the written in stone papers that are publish today in scientific journals. A small note instructing the reader that a certain theory has been disproved or changed in the original paper would already help.

Peer review is not about reaching consensus tho, is about verifying methodologies and reproducibility, a system like that could easily become a tyranny of the mediocre, see for example Wikipedia.

There are certainly initiatives going this way. E.g. LiveCOMS in my own field. The crux of the matter is whether these will gain traction or not.


I would drop the concept of "paper", and use a unique, gigantic graph of hypothesis and experiments that keeps growing as research goes on.

Everyone would contritube to the same graph without duplicate data, or unaccepted piece of proof/knowledge

It is hard to make happen in competitive fields because good data is an expensive advantage over other labs.

It is a great idea. However, we do not have a distributed free internet throughout the world.

Journals became more and more obsolete with the rise of Internet and only managed to retain their position in the academic world due to their citation metrics system that is used to judge academic worth and the enormous amount of articles that they've amassed over the last centuries and that academics need access to. I think it will take at least another 30 years until we will be able to stop relying on closed-access journals, possibly longer since there are still countless high-profile articles that only get published in those journals (though some universities and funding agencies changed their policies to demand that results of publicly funded research are freely available).

In Physics and many other fields Arxiv.org has basically replaced journals as the primary way of getting the latest research results. Unfortunately many high-impact-factor journals like Nature forbid you to publish a preprint of your article there (at least that was the case in 2012) so that they can defend their claim of publishing "cutting-edge" results only (which due to their long review processes are often 12 months old by the time they're actually published).

The most ridiculous thing is that the journals today do little more than act as an intermediary between the authors of a paper and the reviewers, which work for free and are colleagues from the same field. That and maybe styling the papers a bit, which is (IMHO) unnecessary in most cases as the source LaTeX version usually provides very good design already.

There are certainly problems with journals, but they do provide one benefit: they are a source of trust. If you read something in a major publication, you can trust it more than if it was just posted to a website. With the internet, reliable sources of trust are becoming more important, not less. I'm not condoning the current extremely expensive journals that prohibit access to the public, but a completely open model where anyone from anywhere can post anything isn't the answer either.

> If you read something in a major publication, you can trust it more than if it was just posted to a website.

I'm not sure that is true, I've studied a fair bit of statistics and I can usually find flaws in most social science studies I read. So I trust an article combined with a few hundred HN comments a lot more than I trust anything published in scientific journals without comments, since usually if something is wrong someone in the comment section will find it. Even on reddit if you wade through it you can usually find some nitpickers adding good discussion on articles with a few hundred comments.

There is still a much lower acceptance rate at a good journal than a bad journal that might accept anything and everything that is to the page limit. You might still get a faulty paper of course, but large journals have the benefit of choosing.

What creates trust is (IMHO) not the journal but the peer review process, and I think we can find a way to do that without invoking a middleman. Arxiv.org already started doing this and I'm confident that a community-managed review system will easily be able to beat the journal's current systems in quality, speed and reliability. It works in other domains (see e.g. Wikipedia, Stackoverflow) so why shouldn't it work in the academic world?

The peer review process is certainly a very large component to creating trust, but it's not just the process that's in place at the moment, it's the past track record and reliability. An untrustworthy source cannot instantly turn into a trustworthy one by adding quality peer review, it needs to develop a track record, which takes time.

Wikipedia is a perfect example. It has only relatively recently become seen to be trustworthy, after many of years of tweaking and improving their edit process. When Wikipedia first started, few took it seriously. Likewise, when an open journal starts adding more stringent publication requirements, it too will not be immediately trusted.

The point is that quality itself is not sufficient. You need a reliable level of quality over a relatively longer period of time.

A good way to deal with the issue is for reputable academics in a field to start new journals, and do all the work they did for the old paid ones for free, but this time for a free end-product. This exists: I know of one that's been going a decade or more, hosted and started at USC, and there are probably a lot more.

Even better, to support this, have an open-source freeware Journal CMS and publishing system for academics to use as a plug and play solution.

Why can’t the peer review system itself just evolve to embrace the full power of the internet? The main reasons these fraudulent strategies are at all effective is because (1) the review process is private and slow and (2) studies aren’t independently reproduced as a part of review. The feedback loop is inherently too slow. To speed it up, one must fight the inertia of senior (tenured) faculty and major funding vehicles.

It’s laudible these researchers have invested so much in fighting fraud, but this post shows that the offenders will essentially face no negative consequences for fraud. The researchers worked very hard for hardly any return / justice.

Perhaps it’s worth re-inventing science. Open reviews. A network to connect labs that ensures findings are reproduced. We could even re-do much of basic, academic science in the interest of improving exposition and shedding antiquated formulations of theory.

What if, in the face of this fraud, we simply reject tradition wholesale? The only tradition should be an honest, reproducible study of Nature.

My problem is, just the "Full power of the internet" end up looking like Facebook, or Reddit, or Twitter, where the fastest and loudest repliers often end up having the biggest effect?

Also, independently reproducing studies is super-important, but that going to take HUGE amounts of money, which someone is going to have to pay for. It was have to come from the top, most governments want to see new research, not reproductions, and scientists are guided by that direction.

"Full power of the internet" doesn't have to be like Facebook or Twitter. Maintaining some sort of decent moderation and having a wider "professional" culture (can't think of a better term here) can allow for some truly insightful interactions in the vein of HackerNews or some better known subreddits.

There was a good talk about this at 35C3 [0]. Definitely worth the watch.

[0] https://media.ccc.de/v/35c3-9744-inside_the_fake_science_fac...

Edit: You can change the language to English by clicking on the settings cog in the video.

Thanks :)

This author and her collaborator developed a tool to flag potential issues in papers. It appears to be FOSS, and based on a well-understood piece of bioinformatics code. "Nature" doesn't run material from cranks and misfits, so hopefully this will start a serious discussion.

Going forward, should journals, or the peer-reviewers they engage, test new papers with this tool? Sure, it turns up false positives and false negatives, so its results cannot be interpreted mindlessly. It is also possible to game a tool like this. But it seems obvious some authors are already gaming the system.

Undergrad teachers use software tools to check student writing for plagiarism. Shouldn't information-based science use similar tools?

We hackers can contribute to this kind of effort by encouraging and supporting open-source inspectable tools. Maybe there's even a business opportunity around maintaining and using the tools.

I thing lots of wrong people go into science for the wrong reasons. This is why you'll never really be able to do anything about this problem, unless you have a way to screen them out.

> Finally, efforts to police the literature need to be valued as highly as the publication of original data. It is more than ironic that systematic fraud is itself understudied.

Wonder if doing something like establishing "Bounties" (aka Security Bug Bounties in software) would start things in the right direction?

Sounds like there'd be no shortage of targets and low hanging fruit though.

Also, who'd fund it?

One single meaningless test: I am not a scientist and do not want to draw any general lessons but I took a document about an ALS gene therapy that I wrote for private use, and used the tool mentioned in the article. It found a distant article about a tumor protein. Indeed both documents are unrelated, mine was not published and they have very little in common, only the word protein.

I did it again with:

Dietary salt promotes neurovascular and cognitive dysfunction through a gut-initiated TH17 response


And it found:

Knockdown of NOB1 expression inhibits the malignant transformation of human prostate cancer cells

doi: 10.1007/s11010-014-2126-z

Apologies: I'm being obtuse.... Which tool?

This one:


Citation: Informatician Cyril Labbé at Grenoble Alps University in France and I have developed a tool, Seek & Blastn (go.nature.com/2hsk06q), to identify such papers on the basis of wrongly identified nucleotide sequences.

So far, our work has uncovered dozens of papers and resulted in 17 retractions, with several investigations pending (see Nature 551, 422–423; 2017).

The scientific process typically involves making inferences and conclusions based on an observation, whereby the result often include biases and in some cases even ill-constructed hypotheses that are caused by assumptions and imperfect information.

One way to reduce 'fraud' and also increase accountability of the individuals and organizations is to secure raw data at its source which the veracity of the data that will be subsequently used to support downstream conclusions can be easily verified and attributed to its source. I'd say a bottom up approach here is more holistic as it incentivizes people to produce high quality research that future work can then be reliably be built on top of.

Why does the author assume that this isn't a norm among researchers? The sheer number of irreproducible studies we have should indicate that this behavior is way more common than people would like to admit.

Every time i see a headline with that wording "we need to talk about x" I just ignore it. Often it's just click bait into something that is not really up for serious discussion and should rather be "I need to tell you what to think". I'm writing this to reduce the usage of it. And no, I will not write an article with that title about that subject.

However, this is an article written by a researcher in a particular field, published on the website of the best-known and most reputable journal in that field, trying to get a message out to members of that same field, about a topic that nobody wants to discuss. It seems like a pretty appropriate use of "We need to talk about ..." to me.

> Finally, efforts to police the literature need to be valued as highly as the publication of original data.

I don't think even the author would agree with this if it were about anyone other than them.

What about fraud in machine learning and artificial research research? Just to get a high paying industry job these professors do fraud research and claim them self as AI expert. I think the next bubble will bust when these fraudsters sitting in industry won’t deliver anything. Because all the company foolishly spending so much money on fraudulent research without any productive result. Look at Facebook ai research. They can’t even control fake news and their researcher are trying to solve ‘general artificial intelligence’ :)

Is it fraud or disinformation?

I think back to the recent group of academics that wanted to prove how most Grievance Studies journals don't property review submissions. One of the papers they got accepted had chapters from Mein Kampf rewritten with terminology from modern social cause rhetoric.

Instead of accepting the issues, with the entire process, these professors are now facing disciplinary hearings for submitting false research. In a way I can understand this; they did after all submit papers they knew weren't academic (in bad faith?) in order to prove a greater point. But does that justify what they did?

Please don't take HN on classic flamewar tangents. It leads to the same few hot controversies taking over everything else, and the quieter, less predictable discussions that we actually want here are the ones that tend to get burnt out of the picture as a result.


> But does that justify what they did?

Yes, it does.

These journals are an absolute joke, make a mockery of any serious attempt to advance knowledge and understanding, and deserve to be pilloried. Their standards of rigour and review are shockingly poor and as such they do not provide a suitable vehicle for the publication of genuine research, or for measured consideration of the content they do publish.

Just this week I ran across this, where a couple of the academics concerned were interviewed by Joe Rogan late last year:


Fascinating, horrifying, and entertaining in equal measure.

For more background, there's also this on wikipedia:


Thank you for your clarity. We need more of this.

I think it's surprising that on a site such as HN the narrative of these authors is so readily accepted without any second thought.

The claim is that they just made up trash and submitted it to high-quality peer-reviewed journals and they accepted them. The reality is much less outragous.

Of the 20 papers, 4 were published and 3 more accepted for publication. They spent a year doing this. The authors note that they submitted the papers to "top journals in the field", leaving out the fact that all of those top journals rejected the 20 papers and they had to revise the papers and send them to shoddier journals instead.

The review quoted by the authors was from an unsure grad student doing their first review of a paper. Said student later said they knew the paper was crap, but still tried to be constructive.

The assertion of the authors is that this stuff getting published means the whole field is worthless. Which is of course ridiculous. Medicine didn't become obsolete when a journal published a paper with an abstract ending in "The fact that these last sentences appear in the published paper tell you, dear reader, exactly how seriously the editorial process has been taken". Computer Science didn't implode when three MIT students made an AI to generate random papers and 120(!) were accepted by the reputable IEEE and Springer. The blame was back then correctly put on the publishers and the general flaws of the current scientific process instread.

So, as people have pointed out, it's ironic that their essay itself is incredibly scientifically unrigorous. No controls, unsourced claims, biased authors and conclusions that aren't supported by the data.

>The assertion of the authors is that this stuff getting published means the whole field is worthless. Which is of course ridiculous.

Perhaps, but we're undoubtedly in the midst of a crisis of confidence in the process of scientific publication. From the reproducibility crisis to the Diederik Stapel affair to the paper mills in China, almost every field is being severely undermined by weak or deliberately fraudulent research.

Frankly, I've almost given up on the scientific literature as a source of useful information, because I can't really trust most of it. The tools and processes of scientific publication just aren't fit for purpose. My Google Scholar and PubMed searches are overwhelmed with obviously junk papers from pay-to-play journals. Citation is no longer a credible metric because of citation rings. Established high-impact journals have reasonable quality standards, but they still get bamboozled with junk and a paper doesn't come with the supporting evidence I would need to evaluate its validity in any meaningful way. The signal-to-noise ratio of academic publishing is collapsing, and it's bringing down trust in the scientific method with it.

We need major reform and we need it now, because the credibility of science as a whole is under dire threat. The AllTrials campaign has made significant gains in reducing publication bias in medical research, but we need to go much further. We need a publish-by-default model, where everything associated with a research project is published and failure to do so is treated as academic misconduct. We need gold open access, we need full datasets, we need source code, we need rough drafts and lab notes and reviewer's comments. Equally importantly, we need to fix the cultural pathology of publish-or-perish that's driving the proliferation of poor-quality research.

“The assertion of the authors is that this stuff getting published means the whole field is worthless.”

I don’t get this at all from the authors. My take is that they call out serious flaws in the rigor of how papers are reviewed. I don’t think it’s reasonable to extend that to say the field is worthless. I did read through many papers in the referenced journals and it seems really surprising to me. I’d love to see some of the research reproduced. But I’m not even sure how you could scientifically prove a field worthless.

> I don’t get this at all from the authors. My take is that they call out serious flaws in the rigor of how papers are reviewed. I don’t think it’s reasonable to extend that to say the field is worthless.

I think it's rather undeniable that the Authors essay was not a good-faith attempt at criticizing the peer-review process, but an attempt to discredit certain fields that they didn't like, even by their own admission. I mean, the title of the essay alone should make this very clear. Here's some quotes nonetheless:

> We spent that time writing academic papers and publishing them in respected peer-reviewed journals associated with fields of scholarship loosely known as “cultural studies” or “identity studies” > As a result of this work, we have come to call these fields “grievance studies” > We undertook this project to study, understand, and expose the reality of grievance studies, which is corrupting academic research > The biggest difference between us and the scholarship we are studying by emulation is that we know we made things up. > these fields of study do not continue the important and noble liberal work of the civil rights movements; they corrupt it while trading upon their good names to keep pushing a kind of social snake oil onto a public that keeps getting sicker

[Note also that this essentially an academic way of phrasing the same things you'll hear people like the far-right Stefan Molyneux (on who's show one of the authors has been a guest several times) say.]

Of importance here that it's not the scientific practices that are "corrupting academia" but the fact that these fields exist at all. I'm sure you can find more if you really want to. Compare this to say the "reproducibility in Psychology" study, which unsurprisingly doesn't dedicate half of their text to talking about why the authors think psychology is bad.

Don't be mistaken: The whole thing was an attempt to cash in on the general crisis the scientific process is facing to score some easy political points and not a neutral and constructive attempt at improving the quality of research in the field. Or rather, their proposal at improving the quality would involve getting rid of the fields that they don't personally like.

>In a way I can understand this; they did after all submit papers they knew weren't academic (in bad faith?) in order to prove a greater point. But does that justify what they did?

A thousand times yes. Exposing cargo cults that attempt to co-opt the hard-earned cachet of genuine scientific inquiry for their own ideological ends is exactly what is needed to keep science functional.

Yeah but it's a thin line to walk.

If you don't do it, then you let all the pseudo-science out there masquerading as real science, which is bad.

But if you overdo it, you feed into the distrust of proper science too, see Brexit and we're done listening to experts soundbite, or any of the anti-vax crowd.

In my experience, people that have profoundly anti-scientific beliefs have always been such; whether their purported trigger is Jenny McCarthy or an odd alignment of Jupiter, it's hard to know if it's even relevant- the problem is that they don't have the skillset to make rational choices. And I just don't see a Sokal-style hoax making a whit of difference in their minds, as there are a million other things that could do the same.

In other words, their feelings of anti-science came first. Whatever reason provided is ex post facto confabulation.

Meanwhile, the sciences themselves need this kind of scrutiny to stay scientific.

I'm open to being wrong. But I think it is a mistake to put too much weight to the 'we should be careful what the people who never use science anyway' consider.

Of course it is justified: you can only prove that kind of abuse by making it glare: otherwise, the “secrecy” of the peer-review process is the excuse.

These kind of stunts are pretty annoying, though. I know with the original (who was it again?), it was essentially a case of the academic leveraging his reputation, the editors being a bit like 'this seems kooky, but sure, you're the expert', then him turning around and pretending like he'd fooled everyone.

It was by James Lindsay, Peter Boghossian, and Helen Pluckrose. They submitted papers with fictitious names and institutions. They were not leveraging their own reputations (besides the publication's, if they were accepted, and a Richard Baldwin who lent his identity for some of them.) They spent time reviewing and citing existing work, so they could legitimately claim to be building on it, but then try to go in the most absurd directions possible, so absurd that any self-respecting academic publication ought to immediately reject it with disgust, and see what happened. 7 out of 20 such papers were accepted, 4 were published, they received 4 invitations to peer-review other papers, and 1 "gained special recognition for excellence."


If I published something in any of the innumerable scientific 'journals' that bombard my spam box, I could probably publish whatever I liked. That doesn't mean I can say science is methodologically flawed.

The humanities, no doubt, deserve criticism, but I honestly don't see the appeal of stunts designed to 'prove' that a certain field is not worth the time of day, by people who don't even understand the field at the most basic level. How would they know? Why would they care? It strikes me as embarrassing for everybody involved- embarrassing, to publish something without properly vetting it, embarrassing, to dismiss something as nonsense without actually having the measure of it, embarrassing, to be so enthusiastic about this kind of 'proof', which so obviously combines ignorance with arrogance.

There is no such thing as grievance studies. Critical theory has nothing to do with identity. The whole thing is about as close to the mark as Pat Buchanan's critique of feminism, and about as clever. In short, depressing.

According to the little bit I've read about this, some of the publishing journals are/were fairly well regarded (which would be a self-serving claim by this group.) The "success" of those stories says more about the individual publication than the entire fields they represent. At most, it is a critique of some niches of otherwise valid fields. At best, it's a nice reminder to pursue nothing but objectivity. There are still worthy results from the fields involved, maybe even these publications, and there will continue to be more.

Amongst the embarrassing amateurishness of it all, and the obvious conclusions, I found it interesting how commonly reviewer comments used the word "exciting" in response to the "Going in Through the Back Door" paper.

> I honestly don't see the appeal of stunts designed to 'prove' that a certain field is not worth the time of day

You answer your own question. While the methods may claim to be "defending science", the goals are political.

... and they were only discovered to be frauds by the twitter feeds that mock social sciences research.

Oh the humanit[ies]!

Peer review is supposed to catch cases like Sokal's. As an editor, if you are not qualified to read the paper, then you should try to contact someone who is, or suggest publication in a different journal.

But the problem goes deeper than that. Sokal intentionally included meaningless sentences, self-contradictory sentences, and scientifically false sentences. On none of these was he called out by journal editors or reviewers.

>meaningless sentences, self-contradictory sentences, and scientifically false sentences

To provide a little explanatory context here, the humanities has always had a great deal more tolerance for bad writing than other fields. Kant introduced the main point of his Critique of Judgment in a footnote. Hegel loved self-contradictory sentences. Somebody like Heraclitus writes a lot of stuff that, if perhaps not meaningless, certainly looks meaningless.

I don't know what that says about the entire history of philosophy, but anyway, it does say something about a small group's willingness to publish something that had some shitty prose.

You say "essentially a case of the academic leveraging his reputation" like the fact that someone could use reputation rather than content to be accepted and published into a journal isn't an equally scandalous act in and of itself?

Seriously, I can't be the only one who sees that?

It is scandalous to anyone not normalized to it - unfortunately masses of people tend to devolve into those sorts of cliques and social clubs for personal advancement.

It pretty much is corruption essentially but you are the 'crazy' one if you point it out because 'things are done that way'.

I'm not an academic, but aren't there standard practices for doing that sort of ethically questionable research. Namely, an ethics committee that you go to for the green light before proceeding.

There's usually an institutional review board that you go through when using research subjects, but you don't need to do that to publish something like a new algorithm you developed in a CS journal.

In this case, the researcher probably should have gotten IRB approval because there were test subjects--the journal editors, peer reviewers, and probably journal readers.

But I'm assuming they just skipped that process and said there were no human subjects.

there are a number of fields that seem to operate on motivated reasoning over objective truth.


I also believe what they did was justified. They did actual science and revealed that a number of publishers within this field demonstrate such a strong preference for virtue-signaling papers that they have lowered the bar for critical thought in both content and review.

One of the honeypot papers won an award. That shows you how obvious the publisher bias is, and how obvious it must have been to these people before they decided to do something about it.

Not only did these individuals risk their careers to help save a dying system with this project, they held themselves to a pretty high standard of execution.

We place lots of limits on the amount of heroism individuals are allowed to display, and I think that’s mostly beneficial - but sometimes we need heroes.

It doesn't justify what they did, and what they did has very little value, for 4 reasons.

1) there are ethics panels for a reason -- many people have run unacceptable experiments on people. Academics don't get to skip the ethics checks because fhey feel like it.

2) their experiment was poorly designed. On their first attempt of their papers got ejected, so they polished and resubmitted them. What is their exact scientific theory?

3) they were clearly trying for contraversy. Maybe some part of Mein Kampf are actually useful valid statements to make. Just because of the terribleness of the Nazis, doesn't mean everything Hitler ever said, or wrote, should automatically be rejected from journals.

4) maybe Grievence Studies actually review better than everyone else. Any good study need a control.

darkpuma 41 days ago [flagged]

> On their first attempt of their papers got ejected, so they polished and resubmitted them.

That they found success with this process is damning. That you seem to think otherwise is probably also damning.

Some of their papers involved making up experiments. Peer review can't catch problems like entirely fabricated data, and outright lying.

That scientific publishing is easily broken by lying is a serious problem which effects all academia.

If they were lying about experiments about sensible things, that would be one thing. They were lying about experiments about insane things.

> 2) their experiment was poorly designed. On their first attempt of their papers got ejected, so they polished and resubmitted them. What is their exact scientific theory?

They've outlined this. There is a specific Grievance Studies nomenclature and orthodoxy which the first papers did not adhere to. When they republished and resubmitted, they followed the Orthodoxy and they got through.

That IS the scientific method. You run an experiment, observe the results. You tweak a few variables and see what happens under those conditions. It’s what distinguishes their work from Grievance Scholarship, where you know the conclusion you want before you start.

> Maybe some part of Mein Kampf are actually useful valid statements to make.


Let's pick a random bit of Mein Kampf (I just scrolled to a random page in the middle in a randomly chosen translation). This looks, to me, like something that wouldn't be out of place in any discussion of "Fake News". This is the problem with using bits of Mein Kampf, it feels designed to shock and offend. If I got this text below into an article in a major newspaper, would that make that newspaper fundamentally broken / untrustworthy, because it happens to come, word for word, from Mein Kampf?

Quote begins:

Just in journalistic circles one usually prefers to call the press a 'great power' of the State. As a matter of fact its importance is truly enormous. It cannot be overestimated ; it is indeed actually the continuation of the education of youth in advanced age.

Thereby one can divide the readers as a whole into three groups:

First, those who believe everything they read;

Secondly, those who no longer believe anything;

Thirdly, those who critically examine what they have read and judge accordingly.

Of course. "We quoted Mein Kampf and were praised for it" is a nice piece of rhetorical engineering.

Monstrous arguments are often assembled from tiny units which may be unobjectionable individually.

The implication that the journal editors failed to notice the evil taint of Nazism from short extracts is nonsense.

But the whole hoax is entirely political. The authors disapprove of what they call "Grievance Studies" and they wanted to call that out.

Which is fine as far as it goes, but as others have pointed out, it's not as if the SNR in other disciplines is perfect.

There's a huge amount of bullshit everywhere in academia - from theoretical physics to medicine to economics to computer science. [1]

Going Sokal proves nothing at all this point. We already know peer review doesn't work very well, because academia is now driven by financial and economic pressures more than it's driven by a noble quest for knowledge and wisdom.

Ironically the "grievance studies" people have done a better - albeit still slanted and ineffectual - job of pointing this out than the entire peer review system has.

[1] http://news.mit.edu/2015/how-three-mit-students-fooled-scien...

Good talk about xx community’s intention.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact