Hacker News new | past | comments | ask | show | jobs | submit login

I will never forget the day a postdoc in my lab told me not to continue wasting time trying (and failing) to reproduce [Top Institution]’s “Best Paper Award” results from the year prior. He had been there when the work was done and said they manipulated the dataset until they got the numbers they wanted. The primary author is now a hot shot professor.

My whole perception of academia and peer review changed that day.

Edit to elaborate: like many of our institutions, peer review is an effective system in many ways but was designed assuming good faith. Reviewers accept the author’s results on faith and largely just check to make sure you didn’t forget any obvious angles to cover and that the import of the work is worth flagging for the whole community to read. Since there’s no actual verification of results, it’s vulnerable to attack by dishonesty.

When I was in school pre-university, this type of "crap we can't get the what we wanted to happen so let's just fiddle around with it until it seems about right" was very common. I was convinced this was how children learned, so that as adults they wouldn't have to do things that way.

When I got into university and started alternating studying and work, I realised just how incredibly clueless even adults are. The "let's just try something and hope nothing bad happens" attitude permeates everything.

It's really a miracle the civilisation works as well as it does.

The upshot is that if something seems stupid, it probably is and can be improved.

In the lyceum where I studied, there was one lab on Physics, where the book that accompanied the lab was deliberately wrong. We were told to perform an experiment that "should" support a certain conclusion, but actually neither the "correct" conclusion nor its opposite could be done because of the flawed setup which measured something slightly different. A lot of students (in some groups, all students) fell into this trap and submitted paperwork with the "correct" conclusion according to the book.

A CS-specific analogy might be to give the students a compiler that has a bug in it, such that the students' code is deliberately mis-compiled. The standard of evidence to believe that the compiler is buggy is much higher than the standard to believe that my code is buggy.

A lab exercise like that could really just be selecting for chutzpah (feeling charitable) or arrogance (less charitable).

Well, that's more evil than my lab. A more direct equivalent in the CS would be an algorithm description in the booklet with one subtly wrong (e.g. proven using some well-hidden circular reasoning) and uncorrectable step. The expectation would be that a good student finds the mistake instead of submitting the implementation of the flawed algorithm, or, for even better matching with my case, proves that the supposed output cannot be obtained from the inputs at all.

I had an appendectomy just before the final first-year Modern Physics lab and had to come back in to do a make-up lab. Sure enough it was the slightly-messed-up lab where the results should in theory look exponential but come out linear. I, naturally, drew an exponential curve through the points. Lab instructor decided to grade it right there before I left and tore a strip off me.

Very valuable lesson, although it sure did suck at the time.

What does tore a strip off me mean?

And for those ALSO thinking "what does THAT mean":

He got criticized for it.

I've come to think things works as well as it does largely because a whole lot of what people do has no effect either way. I see so much stupidity where the only saving grace is that it is directed into pointless efforts that won't be allowed to do any real damage.

When you start talking millions of people damage gets subtle.

Robocall scams are very high on the profit:human misery scale, but their hardly going to end civilization. Pollution, corruption, theft etc all make things worse, but we never see the better world without such things so it all feels very abstract. Of course you need to lock your doors etc that’s just the way things are.

A bio professor of mine said something that stuck with me: “life doesn’t work perfectly, it just works.”

It has to work well enough to… work… and reproduce. That’s it. It’s not “survival of the fittest.” It’s “survival of a randomized subset of the fit.”

There’s even a set of thermodynamic arguments to the effect that systems are unlikely to exceed such minimum requirements for a given threshold. For example, if we are visited by interstellar travelers they are likely to be the absolute dumbest and most dysfunctional possible examples of beings capable of interstellar travel since anything more is a less likely thermodynamic state.

So much for Star Trek toga wearing utopian aliens.

> if we are visited by interstellar travelers they are likely to be the absolute dumbest and most dysfunctional possible examples of beings capable of interstellar travel

Otoh, they would be aware about that, and they might have spent some time improving how genes (or what they have) and evolutionary selection works for them, so that, say, their species with time becomes brighter and brighter than what's actually needed. If they wanted to do that.

How do you improve your genes? Removing obvious deleterious disease mutations is easy, but as soon as you try to “go where there are no roads” you hit the same combinatorial challenge as evolution.

Also more intelligence does not equal better ideas. The world is full of crazy or amoral people with apparently very high IQs. Your average flat Earther probably has an above average IQ.

Improvement is a war against entropy and n^n^n^… combinatorics any way you slice it.

> How do you improve your genes?

Slowly across hundreds and thousands of generations.

By adding evolutionary pressure, for what they want -- it'd be up to those space traveling aliens to decide -- they can change their species, generations into the future.

> Improvement is a war against entropy ...

Reasoning in that way, the humans would not have gotten brighter than the chimpanzee monkeys. There's been evolutionary pressure for the humans to get brighter, and it would be possible for you (I mean the humans), or the space travelers, to add artificial ev. pressure.

Anyway never mind all this, maybe talking about space travelers and the humans and their genes isn't the best way to spend the day. Have a nice day btw

The problem is the incentives. To do well, you must publish. To publish, you must have a good story, and ‘we tried this and it didn’t work’ is not one.

So after a certain time spent, you are left with a choice of ‘massaging’ the data to get some results, or not and getting left behind those that do or were luckier in their research.

"We tried this and it didn't work, and here's why we think it didn't" should be among the bests stories to publish. Looking back I learned more from stuff that didn't work, or rather figuring out why it didn't, than from success.

or rather figuring out why it didn't

That can end up being just as time consuming as doing the research to begin with. Often there is no time and no money to go back and do that. If your 'budget' is 6 month you're going to spend 6 month trying to get your experiment to work. You're not going to 'give up' after 4 month and spend 2 month putting together a "why we failed" paper.

However, the advantage is, if it is published, it can decrease the likelihood of multiple other attempts to try the same “unique” (and wrong) approach.

Even if something did not work, you still need a story for it to be readable.

For example, I imagine that archeological work is extremely high impact if excavation efforts led to discovery of ancient city.

Archeology paper would probably be less interesting if the paper said “we dug this area, found nothing”.

If one were to judge those two papers, obviously the discovery paper is higher impact than the negative result.

"We chose this area because we believed it should be archeologically interesting based on XYZ. However, we searched through ABC methods and found nothing there" would be valuable for the future. Maybe XYZ isn't as good as we though, maybe ABC couldn't find it. Maybe now some other sod in the future won't try that location.

Not as valuable as a discovery, but very far off zero value. Yet the reward in academia would be near-zero.

Tell that to the people who pay for research and the metrics they use to continue feeding those who perform research.

Ultimately, this is the problem.

Distracting from the main point, "let's just try something and hope nothing bad happens" (trial and error) is precisely the reason civilization made it this far :)

And in fact evolution. The thing to remember is, in many cases where something bad did happen the evidence got buried or eaten.

> It's really a miracle the civilisation works as well as it does.

I think this all the time.

I'm just done with a 3-hour reading session of an evolutionary psychology book by one of the leading scientists in the field. The book is extremely competently written, and is awash with statistics on almost every page, "70% of men this; 30% of women that ... on and on". And much to my solace, the scientist was super-careful to distinguish studies that were replicable, and those that were not.

Still, reading your comment makes me despair. It plants a nagging doubt in my mind, "how many of these zillion studies cited that are actually replicable?" This doubt remains despite knowing that the scientist is one of the leading experts in the field, and very down-to-earth.

What are the solutions here? A big incentive-shift to reward replication more? Public shaming of misleading studies? Influential conferences giving more air-time for talks about "studies that did not replicate"? I know some of these happen at a smaller-scale[1], but I wonder about the "scaling" aspect (to use a very HN-esque term).

PS: Since I read Behave by Sapolsky — where he says "your prefrontal cortex [which plays critical role in cognition, emotional regulation, and control of impulsive behavior] doesn't come online until you are 24" — I tend to take all studies done on university campuses with students younger than 24 with a good spoon of salt. ;-)

[1] https://replicationindex.com/about/

Evo psych is questionable to me for more basic reasons. It seems full of untestable just so stories to explain apparent biases that are themselves hard to pin down or prove are a result of nature not nurture.

It’s probably not all bullshit but I would bet a double digit percentage of it is.

I'm conscious that this is a flame-bait topic. That said, no, dismissing the whole field as "questionable" is callous. Yes, there are many open questions, loaded landmines, and ethical concerns in evolutionary psychology research. But there's also copious evidence in its favour. (Reference: David Buss et al.)

Many people might spare themselves at least some misery by educating themselves about evolutionary psychology, including the landmines and open questions.

psych is questionable for basic reasons. It is a humanities science. It's purpose is not to figure out the world but to change it. Figure out how to end poverty for example.

Therefore it is not well suited to figure out the world.

You should treat all of it with extreme helpings of salt.

Can't this be applied to wide swaths of hard sciences as well? Lots of scientific work overlaps heavily with engineering, which is all about changing the world.

Also, I don't think ending poverty is a major stated goal of psychology research. . .

> “how many of these zillion studies cited that are actually replicable?” This doubt remains despite knowing that the scientist is one of the leading experts in the field, and very down-to-earth.

I think the problem is much bigger than simply a binary is it replicable or not. It’s extremely easy to find papers by “leading experts” that have valid data with replicable results where the conclusions have been generalized beyond the experiments. The media does this more or less by default when reporting on scientific results, but researchers do it themselves to a huge degree, use very specific conditions and results to jump to a wider conclusion that is not actually supported by the results.

A high profile example of this is the “Dunning Kruger” effect; the data in paper did not show what the flowery narrative in the paper claimed to show, but there’s no reason to think they falsified the results. Some researchers have reproduced the results, as long as the conditions were very similar. Other researchers have tried to reproduce the results under different conditions that should have worked according to the paper’s narrative and conclusions, but found that they could not reproduce, because there were specific factors in the original experiment that were not discussed in the original paper’s conclusions -- in other words, Dunning and Kruger overstated what they measured such that the conclusion was not true. They both enjoyed successful academic careers and some degree of academic fame as a result of this paper that is technically reproducible but not generally true.

To make matters worse, the public has generally misinterpreted and misunderstood even the incorrect conclusions the authors stated, and turned it into something else. Almost never in discussions where the DK effect is invoked do people talk about the context or methodology of the experiments, or the people who participated in them.

This human tendency to tell a story and lose the context and details and specificity of the original evidence, the tendency to declare that one piece of evidence means there is a general truth, that is scarier to me than whether papers are replicable or not, because it casts doubt on all the replicable papers too.

I fully agree. Thanks for the excellent articulation of the layered complexity involved here, including a chilling example.

> The book is extremely competently written, and is awash with statistics on almost every page, "70% of men this; 30% of women that ... on and on". And much to my solace, the scientist was super-careful to distinguish studies that were replicable, and those that were not.

Out of curiosity, what's the title of the book?

Evolutionary Psychology by David Buss[1].

[1] https://www.routledge.com/Evolutionary-Psychology-The-New-Sc...

One approach that can be adopted on a personal level is simply changing the way one thinks. For example, switch from a binary (true/false) method of epistemology to trinary (true/false/unknown), defaulting to unknown, and consciously insist on a high level of certainty to reclassify an idea.

There's obviously more complexity than this, but I believe that if even a relatively small percentage of the population started thinking like this (particularly, influential people) it could make a very big difference.

Unfortunately, this seems to be extremely counter to human nature and desires - people seem seem compelled to form conclusions, even when it is not necessary ("Do people have ideas, or do ideas have people?").

Honest question: could you go ahead and publish an article titled "Failure to replicate 'Top Institution's Best Paper Award'"?

Yes. A famous recent example was stress-induced stem cells:


Yes, but you have to convince your readers that you did a more careful and meticulous job than 'Top Institution's Best Paper Award' did. After all, a failure to replicate only means that one of you is wrong, but it doesn't give any hint as to who.

at least one of you. If disproving the result turns out not to be that simple, you might fall for the same trap.

Just don't forget that the guy who wrote the Best Paper will probably review your articles in future.

Maybe all papers should have a replication score ?

> a postdoc in my lab told me not to continue wasting time trying (and failing) to reproduce [Top Institution]’s “Best Paper Award” results from the year prior. He had been there when the work was done and said they manipulated the dataset until they got the numbers they wanted.

Isn't that the moment where you try even harder to falsify the claims in that paper? You already know that you'll succeed so it wouldn't be a waste of time in your effort.

The problem with experimental results is that they are difficult to replicate. In software you can "git clone x.git & cd x & make" and replicate the correct or incorrect results. In hardware, it's more difficult.

The main problem is that even if you reproduce their experiment, they can claim that you did some step wrong, perhaps you are mixing it too fast or too slow, or the temperature is not correctly controlled, or that one of your reactive have a contamination that destroy the effect, or magically realize that their reactive that is important.

It's very difficult to publish papers with negative results. So there is a high chance it will not count in your total number of publications. Also, expect a low number of citation, so it's not useful for other metrics like citation count or h.

For the same reason, you will not see publications of exact replications. A good paper X will be followed by almost-replications by another teams, like "we changed this and got X with a 10% improvement" or "we mixed the methods of X and Y and unsurprisingly^W got X+Y". This is somewhat good because it shows that the initial result is robust enough to survive small modifications.

It's harder to publish negative results.

Even good negative results? If it's a problem to publish a negative result debunking an award-winning paper, then that is a problem.

Yes, and in most cases, no one will cite negative results. The positive results continue to be cited even long after debunked.

This is an example which did get cites:


But despite the high visibility, you can see the large number of papers published based on the original myth.

And this refutation doesn't have great methodology (but other ones do). It's mostly cited due to strong language used.

Hence the reproducibility crisis.

It is not possible (in principle) and it was never intended for peer review to protect against fraud. And this is ok. Usually if a result is very important and forged, other groups try to replicate and fail, after some time the original dataset (which needs to be kept for 10 years I think) will be requested and then things go done from there.

Assuming not good faith for peer review would make academia more interesting, only way would probably for the peer reviewer go to the lab and get live measurements shown. Then check the equipment...

I wonder if it's a better system to just hire smart professors and give them tenure immediately. The lazy ones in it just for the status won't do any work, but the good ones will. Sure, there will be dead weight that gets salaries for life, but I feel like that's a lesser problem than incentivizing bad research.

The problem isn't just the scientists, it goes all the way up. Let's say we implement your system. Who decides how many 'smart professors' the Type Theory group gets to hire? What if the Type Theory and Machine Learning departments both want to hire a new 'smart professor' but the Computer Science department only has money to hire one more person?

One reasonable approach might be to look at which group has produced the 'best' research over the past few years. But how do you judge that in a way that seems fair? Once you have a criteria to judge that, then people will start to game that criteria.

Or taking a step up, The university needs to save money. How do you judge if the Chemistry department or the Computer Science department should have its funding cut.

No matter how you slice it at some point you're going to need a way for someone to judge which of two departments is producing the 'best' research and thus deserves more money, and that will incentivize people to game that metric.

There is no shortage of resources to provide for every person who wants to devote their life to discovering something valuable for all of humankind.

We aren't short on food, shelter, clothes, tech, etc - those are all solved problems.

The problem that isn't solved is stupid people sitting in charge of decisions they don't have the brain make-up to comprehend or manage, making pretend they know what they're doing, holding people far superior to them hostage.

Smart isn't the biggest criteria for success as a professor. The PhD degree is a good filter because it trains and tests research aptitude, work ethic, ability to collaborate, ability to focus on a single problem for a long period of time, and others.

One problem is PhD degrees are too costly to those who don't get academic or industrial success from them. But as long as talented people are willing to try to become a professor I don't see the system changing.

Who is to judge the merit of their talent? Shouldn’t their results speak for themselves? And prey tell, what are the results of academia in the age of the digital revolution where there is no obligation to complete a university education with the knowledge of its mathematical scientific foundation?

I think many more are drawn to professorship for a sense of status, ie prestige. It shows in their overwhelming mediocrity, eg the failure of economics to progress to a biologically scientific paradigm.

Who is going to decide whether professor is smart?

The other smart professors. Which is exactly how it worked in the now distant past.

Which is exactly how it still works in many places. You have to be co-opted and have the vote of your peers. This doesn't do anything to ensure those elected are able. It ensures they are politically desirable.

Won't those professors then just hire people that agree with their pet theories?

Those professors in the distant past flourished in a more spiritual age. They did not treat themselves as professionals nor had a sense of “career”.

So what you are saying is, peer review.

Honest question: how do we fix this? The obvious solution, prosecuting academics, has an awful precedence attached to it.

Not my turf but I'll chime in.

In the past people who did science could do so with less personally on the line. In the early days you had men of letters like Cavendish who didn't really need to care if you liked what he wrote, he'd be fine without any grants. That obviously doesn't work for everyone, but then the tenure system developed for a similar reason: you have to be able to follow an unproductive path sometimes without starving. And that can mean unproductive in that you don't find anything or in that your peers don't rate your work. There'd be a gap between being a young researcher and tenured, sure.

Nowadays there's an army of precariously employed phds and postdocs. Publish or perish is a trope. People get really quite old while still being juniors in some sense, and during that time everyone is thinking "I have to not jeopardise my career".

When you have a system where all the agents are under huge pressure, they adapt in certain ways: take safer bets, write more papers from each experiment, cooperate with others for mutual gain, congregate around previous winners, generally more risk reducing behaviour.

Perhaps the thing to do is make a hard barrier: everyone who wants to be a researcher needs to get tenure after undergrad, or not at all. (Or after masters or whatever, I wouldn't know.) Those people then get a grant for life. It will be hard to get one of these, but it will be clear if you have to give up. Lab assistants and other untenured staff know what they are negotiating for. Tenured young people can start a family and not have the rug pulled out when they write something interesting.

I agree with your diagnosis of the problem, but don't think your solution is a good way forward - immediately after undergrad is way too early to be evaluating research potential and would just shift the hyper competitiveness earlier.

A better solution would be to stop overproducing PhDs. We could reduce funding for PhD students and re-direct that towards more postdoctoral positions - perhaps even make research scientist a viable career choice?

Overproducing PhDs seems to be a necessary aspect of how research is conducted in the current university. Most serious lines of work are pursued by a PhD student or Postdoc and advised by a Professor. They need a critical mass of PhD students which is definitely a much larger number than 1 per professorship. This is especially true in fields where industry jobs aren't readily available.

I think that's a huge part of the problem though - we've made it so the only way we can get research done is by training a new researcher - even though there's already plenty of trained researchers who are struggling to find a decent job.

I'm suggesting that we re-direct some of the funding for training PhD students into funding for postdoctoral positions (via either fellowships or research grants). Professors would still get their research team, but rather than consisting mostly of untrained PhD students, they'd have a smaller, but more effective team of trained researchers.

Isn't that the case simply because professors are expected to be highly productive, to the extent where it is not possible to meet the bar without offloading the work to students and switching to a full-time manager?

> I agree with your diagnosis of the problem, but don't think your solution is a good way forward - immediately after undergrad is way too early to be evaluating research potential and would just shift the hyper competitiveness earlier.

Immediately after undergrad is how it used to work in the golden days of science, more or less.

If the competitiveness is the problem maybe tenure should be a lottery that you enter once at a fixed stage, preferably before you're expected to start publishing in journals.

I think we had a far smaller number of people going to university back in the "golden days of science" - not sure you can really compare.

A tenure lottery seems like an extreme option - there has to be a middle ground between what we have now and something entirely random.

The system that produces PhDs isn’t that bad. It is a good way to create research portfolio useful for employment in private sector. We need to pay less attention to the title though - this is not a distinguishing achievement for life.

Correct, it's not a laurel to rest on.

The act of producing a doctoral dissertation usually leaves something of a mark on one's outlook, skills, etc. I claim it is a _distinguishable_ achievement for life.

Yet the principle of pursuing knowledge is not for pecuniary interests. So your judgment demonstrates the temporal shift of the Western University towards rubber stamping people’s vocational aptitude. This leads to corruption, of course.

This is one of the many reasons I like Universal Basic Income. Having UBI would let researchers take risks and have something to fall back on if needed and could reduce some of the pressure

I don't think UBI works well here because in most fields the level of success that the precarious group experiences in industry is substantially higher than a guaranteed minimum. A lot of people have identity aspects tied to their university affiliation and don't want to stop working with the university in part for that reason.

No matter what level we put UBI at, it will almost certainly be less than a third of what a researcher salary would be. Also it's not just about the money. Losing your job means losing access to a lab, access to data, access to grant money and basically everything you need to actually do research.

Solution is to publish data, not „papers“ first and assign it a replication score - how many times it was verified by independent research. The paper can follow with the explanation, but citations will no longer be important - what will matter is the contribution to the replication score (will also work as an incentive to confirm other‘s results).

I think it would be gamed just like the current system. Instead of citation rings you just get replication rings.

If someone gets contradicting result the replication score of the entire ring can be nullified or in case of intentional manipulation with data negated.

But you have the same basic problem as now - you’d need some sort of science police to control it, which goes against the scientific process. Essentially it’s a problem of establishing trust in an untrusted system. Putting it that way actually makes it sound like a blockchain problem. Maybe there could be some incentive system to replicate work based on smart contracts, but I don’t know how you could ensure the replicating parties to be independent.

Scientific progress today heavily depends on financial support of society, so as a whole it cannot be completely decentralized and independent. People want to know how their money are spent and want to have guarantees that science will not create something awful. This means that policing of science is inevitable and important part of the system. It is not a question if we need science “police”, it is a question how it should look like. Today it is decentralized: someone maintains the list of media publishing in which will count for citation index, there are ethical committees and scientific boards, lawmakers regularly tell what can be done and what should not etc. How this will change if there will be a new system of incentives in place, we can only imagine: it can be a good or a bad thing, but as long as the system remains democratic, all problems should be easy to fix.

This seems like the right answer.

Don’t (credible) journalists have an honour system of getting at least three sources for a story?

Can’t we make researchers get at least two more confirmations from separate teams for something far more important?

A key function of scientific publication is to inform other researchers in the field about potentially interesting things as quickly as resonable. Getting "two more confirmations from separate teams" is a very high bar, as it's not about just asking a source, it's asking someone else to do all the same work again. Not only we don't require it before publication, we don't expect it to happen for the vast majority of publications ever. Important studies get replicated, but most don't get repeated ever. A partial explanation of the original article's observation is the (very many!) papers that don't have much citations and don't fail to replicate because nobody cared enough to put the work in to try.

If publication would require two more confirmations from separate teams, that would mean (a) doing the work in triplicate, so you get three times less results for the same effort; (b) the process would take twice as long as I spend a year doing the experiment and then someone else can start and spend a year doing the same experiment, and only then it gets published; (c) there's a funding issue - I have somehow got funding to spend many months of multiple people on this, but who's paying the other independent teams to do that?; (d) it's not a given that there are two other teams capable of doing the exact same research, e.g. if you want to publish a study on the results of an innovative surgery procedure, it's plausible that there aren't (yet!) any other surgeons worldwide who are ready to perform that operation, that will come some time after the publication; (e) many types of science really can't get a separate confirmation - for example, we have only one Large Hadron Collider, you can't re-do archeological digs, event-specific on-site sociological data gathering can't really be repeated, etc; so you have to take the data at face value.

What you describe is absolutely right, it is important to have this kind of communication. If publications were only the means to communicate, that would serve the purpose and won't be a problem. The problem is that they are considered having a second purpose - to create scientific reputation, based on which society allocates funds and prioritizes the research. The original article illustrates how wrong this approach can be, substituting the ability to produce scientific facts with the good story telling.

Maybe research that cannot be replicated ought not be pursued? Aren’t there better directions for a society’s calorie outputs?

There aren't many credible journalists left. Maybe like 5.

Somewhere in there i see a blockchain pitch.

Having a scientific blockchain looks like a decent idea... but of course it will not suffice and will be gamed. The real causes of the mess are complexity of the world compared with our minds and tools and the lack of epistemologic undestanding as society,institutions, culture. Science can't be more than a nice and usefull collection of heuristics. Otherwise is just the Scientism religion lurking arround and pretenting to read God's mind. Metarationality concepts could offer an exit from the inevitable mess.

What do you mean by 'metarationality'? It's a term I've never seen before and I'm curious about it.

Game theory might be better suited.

I personally love this comment as a quintessence of startup mentality.

Ok, I got 12, 18, 45. Does anyone want to verify my results? If so, I'll write up a paper describing what they mean...

Hopefully it is clear that that data is useless without some written text explaining what it means. Given that for hundreds of years the accepted way of presenting that explanatory text was by writing papers, I don't see any reason to abandon that. Tweaking our strategies for replication (after a description of the experiment has been published!) and reputation don't seem to contradict that.

Im not sure prosecuting academics is particularly obvious: you'd need to prove malicious intent (rather than ignorance) which is always difficult.

For me a better solution would be to properly incentivise replication work and solid scientific principles. If repeating an experiment and getting a contradictory result carried the same kudos as running the original experiment then I think we'd be in a healthier place. Similarly if doing the 'scientific grind work' of working out mistakes in experimental practice that can affect results and, ultimately, our understanding of the universe around us.

I think an analogy with software development works pretty well: often the incentives point towards adding new features above all else. Rarely is sitting down and grinding through the litany of small bugs prioritised, but as any dev will tell you doing that grind work is as important otherwise you'll run in to a wall of technical debt and the whole thing will come tumbling down.

Open source and Free Software is (despite it being a cliche for programmers to over apply it) a good model to compare with.

You have big companies making Billions with the work of relatively poorly paid nerds. But as soon as you make it possible for the nerds to claim all the profits of the work then you have a whole class of people whose job is to insert themselves as middlemen and ruin it for everyone, both customers and developers.

So basically the aim is to limit the degree to which you can privately profit from science, and expand the amount of science you can easily build on. You still get enough incentives for progress, the benefits accrue to society as a whole, and competition and change is enabled without powerful gatekeepers controlling too much in their own interests.

I really don’t know.

One perspective is that, “knowledge generation wise,” the current system really does work from a long term perspective. Evolutionary pressure keeps the good work alive while bad work dies. Like that [Top Institution] paper: if nobody else could reproduce it, then the ideas within it die because nobody can extend the work.

But that comes at the heavy short term cost of good researchers getting duped into wasting time and bad researchers seeing incentives in lying. Which will make academia less attractive to the kind of people that ought to be there, dragging down the whole community.

This is a recent HN thread and Post you might find interesting.



Due to career and other reasons, there is a publish or perish crisis today.

Maybe we can do better by accepting not everyone can publish ground breaking results, and it's okay.

There are lots of incompetent people in academia, who later go to upper positions and decide your promotions by citation counts and how much papers you published. I have no realistic ideas how to counter this.

> Honest question: how do we fix this?

We need to create new a social institution of Anti-Science, which would work on other stimuli correlated with the amount of refuted articles. No tenures, no long-term contracts. If anti-scientist wished to have income it would need to refute science articles.

Create a platform allowing to hold a scientific debate between scientists and anti-scientists, for a scientist had an ability to defend his/her research.

No need to do anything special to prosecute, because Science is a very competitive, and availability of refutations would be used inevitable to stop career progressions of authors of refuted articles.

This seems like a pragmatic and workable idea. We could even have the same type of thing for journalism and "facts" in general, it would be a step up from the current tribal meme/propaganda war approach we rely upon.

Data and code archives, along with better methods training.

Data manipulation generally doesn't happen by changing values in a data frame. It's done by running and rerunning similar models with slightly different specifications to get a P value under .05, or by applying various "manipulations" to variables or the models themselves for the same effect. It's much easier to identify this when you have the code that was used to recreate whatever was eventually published.

Registering the methods/details before performing the experiments is another technique that is used.

Sure, but often there are perfectly valid reasons to change your methodology half way through a project when you know a lot more about the thing you are trying to do than you did before you started.

I don't think prosecution is the right tool but if we were going down that road material misrepresentations only would fit with anti-fraud standard for companies. Just drawing dumb, unpopular, or 'biased' conclusions shouldn't be a crime but data tampering would fall into the scope. Not a great idea as it would add a chilling effect, lawyer-friction and expenses and still be hard to enforce for little direct gain.

I personally favor requirements which call for bundling raw datasets with the "papers". The data storage and transmission is very cheap now so there isn't a need to restrict ourselves to just texts. We should still be able to check all of the thrown out "outliers" from the datasets. An aim should be to make the tricks for massaging data nonviable. Even if you found your first data set was full of embarassing screw ups due to doing it hungover and mixing up step order it could be helpful to get a collection of "known errors" to analyze. Optimistically it could also uncover phenomenon scientests thought was them screwing up like say cosmic background radiation being taken as just noise and not really there.

Paper reviewing is already a problem but adding some transparency should help.

Leveraging the prestigious papers to win grant proposals is where they need to get them. Citations aren't what gets you a job or tenure at a R1 research school, it's the grants that the high-impact papers help you win.

You don't have to convict people for full-on fraud. If you are caught using an obvious mistake in your favor or using a weak statistical approach, the punishment can be you are not allowed to apply for grants with a supervisor/co-PI/etc who's role is to prevent you from following that "dumb" process in the future.

We could use public funding to do the work OP tried to do.

Something like a well funded ten year campaign to do peer review, retrying experiments and publishing papers on why results are wrong.

I have a co-worker who had a job than involved publishing research papers. Based on his horror stories it seems like the most effective course of action is to attack the credibility of those who fudges results.

With added bounty for discovering bad faith.

The single biggest impediment to "fixing this" is that you haven't identified what "this" is or in what manner it is broken.

There will always be cases of fraud if someone deeps deeply enough into large institutions. That doesn't actually indicate that there is a problem.

Launching in to change complex systems like the research community based on a couple of anecdotes and just-so stories is a great way not actually achieving anything meaningful. There needs to be a very thorough, emotionally and technically correct enumeration of what the actual problem(s) are.

A couple of anecdotes is a very disingenuous way to frame the replication crisis. Heavily cited fraudulent research impacts public policy, medicine, and technology development. This means it's everyone's business.

The problem you're describing there is a public policy one, not something to do with the scientific community. Public policy should be implemented with a trial at the start and a "check for effectiveness" step at the end because there is no way to guarantee the research it is being based on is accurate. Statistically, we expect a big chunk of research to be wrong no matter what level of integrity the scientists have.

"Statistically, we expect a big chunk of research to be wrong no matter what level of integrity the scientists have" - that's the actual problem under discussion here.

Research is heavily funded because people believe it's something more than a random claim making machine. You say governments should assume research is wrong and then try to replicate any claim before acting on it. But you end up in a catch 22: if the research community is constantly producing wrong claims there's no reason to believe your replication attempt is correct, as it will presumably be done by researchers or people who are closely aligned.

Additionally inability to replicate is only one of many possible problems with a paper. Many badly designed studies that cannot tell you anything will easily replicate. A lot of papers are of the form "Wet pavements cause umbrella usage". That'll replicate every single time, but it's not telling you anything useful about the world. Merely trying to fix things with lots of replication studies thus won't really solve the problem.

Research is far better than a random claim making machine even if some of it has errors that have caused the replication crisis. It's easy to overstate the level of the problem even though it's fairly severe at this point.

"Wet pavements cause umbrella usage" is something where I'd want to see your specific examples because it's easy to get a correlational study of that nature but very hard to design a causal one. The correlational studies are usually accurate and often useful for other research.

I would argue the whole framing of the "replication crisis" is another example of the problem with "overselling" research results. Yes there is a problem with some research in some areas of science not being replicatable. However, the vast majority of research in many fields does not have this problem. Framing this as a "crisis" overstates the problem and gives the impression that the majority of research can't be replicated.

By waiting until scientists address this? Note that the 'replication crisis' is something that originated inside science itself, so, despite there being problems science has not lost its self-correcting abilities. The scientists themselves can do something by insisting on reliable and correct methods and pointing it out wherever such methods are not in use. It is also not like there are no gains in doing this. Brian Nosek became rather famous.

The replication crisis is not being addressed. It's being discussed occasionally within the academy, but a cynic might wonder if that's because writing about the prevalence of bad papers is a way to write an interesting paper (and who is checking if papers about replication themselves replicate?). It's been discussed far longer and more extensively by the general public but those discussions aren't taken seriously by the establishment, being as they are often phrased in street terms like "you can find an expert to tell you anything" or "according to scientists everything causes cancer so what do they know?". And of course the higher quality criticism gets blown off as mere "skepticism" or "conspiracy theories" and anyone who tries to research that is labelled as toxic.

So a lot of people only notice this in the rare cases when someone within the academy decides to write about it. This can make it seem like science is self correcting, but it appears in reality it's not. When measured quantitatively there is no real improvement over time. Alvaro de Menard has written extensively on this topic and presented data on the evolution of P values over the last decade:


Additionally as he observes at the end of his essay, the problems are due to bad incentives, so the only true changes can come from changes to incentives. However those incentives are set by the government. Individual scientists cannot themselves change the incentives. The granting agencies are entirely oblivious to the problems and the scale of their ambition is in no way equal to the scale of their problem:

"If you look at the NSF's 2019 Performance Highlights, you'll find items such as "Foster a culture of inclusion through change management efforts" (Status: "Achieved") and "Inform applicants whether their proposals have been declined or recommended for funding in a timely manner" (Status: "Not Achieved") .... We're talking about an organization with an 8 billion dollar budget that is responsible for a huge part of social science funding, and they can't manage to inform people that their grant was declined! These are the people we must depend on to fix everything."

Scientists with a proven track record should have life-long funding of their laboratory without any questions asked. So they can act as they want without fear of social repercussions. Of course some money will be wasted and the question of determining whether a track record is proven is still open, but I think that's the only way for things to work (except when the scientist himself have enough money to fund his own work).

I think this would be a positive step, but to play devil's advocate, what happens when this superstar scientist retires? If I'm a researcher in his lab, does my job just disappear? If so, I'm still going to feel pressure to exaggerate the impact of my research.

I've been spending a lot of time on 'bad science' as a topic lately (check my comment history or blog for some examples). I think what you're proposing is the opposite of what's required.

Firstly, the problem here is not an epidemic of scientists who feel too financially insecure to do good work. Many of the worst papers are being written by people with decades-long careers and who lead large labs. Their funding is very secure. They are doing bad work anyway for other reasons, sometimes political or ideological, more often because doing bad work results in attention, praise and power. Or sometimes because they don't know how to explain their chosen question, but don't want to admit that scientifically they failed and don't know where to go next.

Secondly, as you already realized your proposal relies on identifying which scientists have a proven track record, but the whole problem is that science is flooded with fraudulent/garbage claims which are highly cited ("proven") and which were written by large teams of supposedly respectable scientists at supposedly respectable institutions. Any metric you can invent to decide who or what has a proven track record is going to be circular in this regard. To Rumsfeld the problem, we are surrounded by "unknown knowns". You say this is an open question but to me that's a fatal flaw.

So the problem is actually the inverse. You say at the end, well, scientists who can fund their own work are an exception. Obviously in most cases scientists don't need to do this, they can also be funded by companies. Most computer science research works this way. Better CPUs and hardware is done almost entirely by companies. AI research has been driven by corporate scientists, and so on. In contrast academic funding comes primarily from government agencies that distribute money according to the desires of academics. This means a tiny number of people control large sums of money, and they are accountable to nobody except themselves. There are no systems or controls on academic behavior except peer review, which is largely useless because the peers are doing the same bad things as everyone else.

Viewed from an economic perspective academia is a planned reputation economy. The state is the source of all resource allocation decisions (academics being effectively state employees in most fields). There's also a deeply embedded Marxist worldview: universities have no working mechanisms to detect fraud, because of an implicit assumption that deep down when market forces are gone everyone is automatically honest and good. The hierarchy is stagnant; the same institutions remain at the top for centuries. A good reputation lets them select the people with the reputation for being smart (e.g. by school grade), so that reputation accrues to the institutions, which lets them keep selecting intake by reputation and so on. Supposedly Oxford and Cambridge are the best UK universities, they always have been, and they always will be. In a competitive, free market economy they would face competition and other institutions would seek to figure out what their secret is and copy it, like how so many companies try to copy the Toyota Way. In science this doesn't happen because there's nothing to copy: these institutions aren't actually different.

This implies a simple solution, just privatize it all. It would be wrenching, just like it was when the USSR transitioned to a market economy, just like it was when China (sort of) did the same. But one thing the 20th century teaches us is that you can't really fix the problems of a planned economy by tinkering with small reforms at the edges. The Soviets weren't able to fix their culture with glasnost and perestroika. They eventually had to give up on the whole thing. Replacing the current reputation economy with a real economy, with all the mechanisms that economic system has evolved (markets, prices, regulators, court cases, fraud laws etc), seems like a more direct and obvious approach to making things better, even if it may sound extreme.

Oh hey, Mike Hearn! I've long been a fan of yours in Bitcoin. It's good to see you're interested in 'bad science' lately as well -- this is a topic I've also been working on for the last N years along with Bitcoin. I hope we get to interact more in the future. :)

My envisioned solution is similar to yours, here. But rather than "privatize science", which I think most people will interpret as "move to industrial research", my rallying cry is a little more like "hey scientists, stop depending on public funding, let's find creative ways to get the science done."

I also like to point out that money is often not the missing factor as much as community. This has always been true. Mendel discovered genetics by experimenting on beanstalks in his garden at his monastery. It cost him very little to do it, and he only stopped the research when his community told him to stop wasting time on beans and get back to the important accounting work that impacted the church's politics at the time.

You might think that maybe science was cheap in the past, but that today you need lots of money, to get the lab equipment, etc. However, science always has a cutting edge of cheaply evaluable questions. We recently hosted a DIY Synthetic Biologist (currently on the homepage of https://invisible.college) who showed the actual costs of his work, and his laboratory equipment was far, far, cheaper than the "cost" of his time. We can get far more science done with "amateur scientists" (remember that "ama" means love, and an amateur scientist is one doing science for love) by creating a scientific community outside the institutions for interested parties to work together, pool their brainpower and resources, and come up with great novel work.

And if anyone else agrees with me on this, please let me know so we can forces. I'm toomim@gmail.com, and am doing work on invisible.college.

Hello! Absolutely, drop me an email any time you like.

I absolutely agree that a lot of science can be done very cheaply. Some of the most impactful papers were done by people who weren't in an institutional framework, even in the modern era (Satoshi being an obvious example). Additionally it seems most of the really problematic fields are ones where the budget gets dispersed over large number of people writing very cheap low budget papers, hence millions of social science papers with tiny sample sizes.

I'm a big supporter of industrial research though. Many great papers come out of industrial labs. Modern computing is practically defined by such research. The big advances all seem to come from big corporate labs (Xerox PARC, Bell Labs, Google, DeepMind, IBM, Sun, Microsoft, etc). The research is powerful because it's funded by people who expect some sort of meaningful results and supervise the work to ensure it doesn't go completely off the rails. Academic institutions have developed this totally hands off attitude that makes research more or less unaccountable to any standard beyond "will it get published", which in turn can be rephrased as "are the claims interesting".

Great! Thank you for the invitation! :)

> The big advances all seem to come from big corporate labs

That's an interesting claim, and I'd encourage you to find some statistics to verify this hypothesis, because in my experience, that doesn't ring true.

From my subjective perspective, it seems that academic and industrial research labs innovate at roughly the same rate per-capita. I was a PhD student when Microsoft was dominant, hiring the best faculty from all top-4 CS schools (CMU, Berkeley, MIT, Stanford), and they certainly produced a lot of papers, and did seem to dominate conferences, but the actual innovation in computing came from Apple and startups, which did not have "research labs". Microsoft, including its giant industrial research lab, certainly was not the driver of innovation in computing!

And here are some numbers to back that up: Microsoft's R&D budget in 2011 was 10x the budget of the entire NSF -- for all sciences. Yet, Microsoft was clearly not producing more than 10x the scientific output of all NSF-funded academic science.

So it would help to have some statistics for the claim that industrial research innovates more than academic research. They certainly pay more, and often hire more people, but per-capita they don't seem any more productive or healthier than academics.

Ah, right. That gets us into the definitions of innovation and research.

Apple does very little research, in the conventional scientific sense we're discussing here, I think that's pretty uncontroversial. They produce few if any papers. They are (or were, under Jobs) very good at coming up with new ideas that strongly appeal to the buyer and which got them a reputation for innovation, but which probably wouldn't be considered clever enough to be research papers. At least not top tier papers.

For example, exposé is a widely imitated feature and was considered very innovative at the time, but it wouldn't be seen as serious computer science. The iPhone is/was widely considered innovative but had basically no new research tech in it, given that capacitive touch screens weren't developed by Apple. It was just a really nicely implemented mobile computer. Actually the innovations in the iPhone are nearly all packagings of tech developed by third party firms that Apple then buys or buys exclusivity rights too. At least, that's true in my view.

Microsoft's R&D budget I think is also a victim of definitions. Software firms normally report all product development as R&D, right? I think these days they may even report datacenter builds as R&D. We can see this on Microsoft's investor website:

"In addition to our main research and development operations, we also operate Microsoft Research. Microsoft Research is one of the world's largest computer science research organizations"

i.e. the kind of university type "scientific" research we're discussing here is only a sideshow in Microsoft's R&D budget.

You're right to call me out though; I don't have any stats to prove that industrial research does more than academic research. It's not a statistical argument to begin with, just my own own perception ("all seem to"). I read a lot of CS papers and the best ones have corporate email addresses at the top - the second best, a mix of corporate and university addresses, the third best, only university addresses. If you asked the man on the street to name the biggest innovations in computing in the past 20 years they'd probably say things like, uh, smartphones, YouTube, AI, blockchain, etc etc. All things that have little connection to universities, with AI being the closest but it was Google that revived that whole field and has been pushing it forward ever since. Neural nets weren't receiving much investment by the academic community before that.

Anyway, that's CS. CS really isn't the problem here. The pseudo-science is elsewhere.

at least in some parts of computer science the solution is easy: do not ever publish results without the public source code of all the experiments.

I did peer review for a number of scientific papers that include code. Almost every time, I was the only reviewer that even look at the code.

In most cases, peer reviewers will just assume that authors claiming the "code is available" means that a) it is reproducible and b) it is actually there.

As a counter example, this recent splashy paper


claims the code is available on github, but the github version ( https://github.com/jameswweis/delphi ) contains the actual model only as a Pickle file, and contains no data or featurization.

So clearly, the peer reviewers didn't look at it.

that. The main task of the reviewers should be to re-run all the experiments on their own computer and check the results.

re-running is definitely too much work for most scientific papers, at least in ML and computational sciences were experiments might take 1000s of core-hours or gpu-hours, but that's usually not necessary. In addition, just running the code can spot really bad problems (it doesn't work) but easily miss subtle ones (it works but only for very specific cases).

I think it's more important for reviewers to read the source, the same way one would read an experimental protocol and supplementary information, mainly checking for discrepancies between what the paper claims is happening and what is actually being done. In the above example, a reviewer reading the code would have spotted that the model isn't there at all, even though it runs fine.

Providing source code is a good thing, but a lot of people confuse re-running experiments with replicating them. If you take the authors' source code and re-run it, then any bugs are going to invalidate your results too. The only way to actually have confidence in the paper's results are to rewrite the software from scratch.

In fact, I'd actually go further and question what kinds of errors could possibly be caught be running the same software that the authors did? Any accidental bugs will remain, and any malicious tampering with the experiment data is exceedingly unlikely to be caught even with a careful audit of the code.

That isn't possible if you're using commercially licensed source from other people, drivers for scientific instruments, lacking copyright assignment for some of it, etc. Same reason many commercial projects can't be open sourced even if the company wanted to.

So people who used proprietary software will not be able to publish. Sounds like a win-win to me!

Your definition of free software is more restrictive than the FSF’s.

of course i was simplifying... but it seems obvious to me that enforcing automatic reproducibility in peer reviewed publications can only be a good thing in the long run

My personal opinion is this problem fixes itself over time.

When I was in graduate school papers from one lab at Harvard were know to be “best case scenario”. Other labs had a rock solid reputation - if they said you could do X with their procedure, you could bet on it.

So basically we treated every claim as potential BS unless it came from a reputable lab or we or others had replicated it.

An approach to how go about it is to include a replication package with the paper, including the dataset... This should be regarded as standard practice today, as sharing something was never easier. However, adding a replication package is still done by the minority of researchers...

You can't, except trying to fix human nature..

Instead of adding a punishment, maybe we should remove the reward. How, that I don't know.

More transparency in some form, requiring researchers to publish code and data openly for instance.

I can understand why journals don’t publish studies which don’t find anything. But they really should publish studies that are unable to replicate previous findings. If the original finding was a big deal, its potential nullification should be equally noteworthy.

While I would have agree with that when I was younger. I learned there is a lot of possibilities why PhD students (the guys who do studies) fail to replicate anything (and I am talking about fundamental solid engineering).

this was exactly my experience and I remember the paper that I read that finally convinced me. It turns out the author had intentionally omitted a key step that made it impossible to reproduce the results, and only extremely careful reading and some clever guessing found the right step.

There are several levels of peer review. I've definitely been a reviwer on papers where the reviewers requested everything required and reproduced the experiment. That's extremely rare.

Why are so afraid to reveal the name and institution?

Their username is publicly linked to their real-life identity. Revealing the name and institution has a reasonable chance of provoking a potentially messy dispute in real life. Maybe eob has justice on their side, but picking fights has a lot of downsides, especially if your evidence is secondhand.

From what I have read, peer review was a system that worked when academia and the scientific world were much smaller and much more like "a small town." It seems to me like growth has caused sheer numbers to make that system game-able and no longer reliable in the way it once was.

Why not just name the paper :)

may i ask what field of knowledge the manipulated paper was from? Your page lists CS/NLP, so that field may also be linguistics or neurology (linguistics which would be easier to swallow for me) https://scholar.google.com/citations?user=FMScFbwAAAAJ&hl=en

Some wider questions would be: Are there similar problems in Mathematics/physics versus the life sciences/other social sciences? Are there the same kind of problems across different fields of study?

Also i wonder if replication issues would be less severe if there was a requirement to publish the software and raw data that any study is based on as open source / data. It is possible that a change in this direction would make it more difficult to manipulate the results (after all it's the public who paid for the research, in most cases)

I worked at a prestigious physics lab working for the top researcher in a field. It absolutely happens there and probably everywhere.

The only way to fix replication issues is to give financial and career incentives for doing replication work. Right now there are few carrots and many sticks.

Thanks! So all this is probably happening accross the board, amazing.

Frankly, sir, it is the reason you wish your anecdote to remain anonymous that such perfidy survives. If these traitors to human reason and the public’s faith in their interests serving the general welfare - after all who is the one feeding them? - became more public, perhaps there would be less fraudulence? But I suppose you have too much to lose? If so, why do you surround yourself in the company of bad men?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact