Hacker News new | past | comments | ask | show | jobs | submit login
How do we clean up the scientific record? (oup.com)
42 points by wjb3 7 months ago | hide | past | favorite | 20 comments



As with many problems this is a misalignment of incentives.

Quantity of papers is #1 and citations is a distant #2. People will outright tell you quality is rarely the goal and that unique results face tough publication odds.

Peter Higgs recently mentioned that his department looked to fire him before his Nobel prize, and his lack of interest in filling the metrics.

This is just an optimization problem that has favored "exploit" for many years, even when "explore" is the direct mission statement of many of these institutions.


Case-in-point: my org has licensed tech from a big academic institute (which we, the founders, generated at said big institute). We have industry recognized success and awards for said commercial solutions.

The academic research has lagged our commercial developments and continued to publish things that are commercially available in our systems... which use the same code-base and algorithms to demonstrate it. When we founders were at the big academic institute we won many academic awards for this tech, published the algorithms and methods.

One could make many arguments for training people or expertise at said academic institute or whatever but instead these publications are presented as "novel, unpublished etc". Just not true, and the licensure of this technology means that they need to move forward. So far 5 years later they are not.

Their public press releases deliberately do not mention our industry awards because if they did, they would not have much to publish.


I'm not sure if I understand you correctly, but if some know-how is implemented in some company but its workings are not public, then reinventing these methods really is novel, unpublished, and relevant information that is valuable to publish.

It's useless to reinvent the bicycle only if the workings of bicycles were already published. No one cares what inventions are hidden in someone's desk drawers or in black-box products, we care about knowledge that is accessible to the community.


I agree that publishing methods, code or even reinventing products in an open context is valuable... it is what I thought was one of my main contributions while at the academic institution.

From what I can tell, they are struggling to pay people given the 3x base salary overhead at the academic institute.

They haven't been pushing out free tools, haven't been demonstrating in new scenarios, etc. They continue to publish things that are in the open literature which work for corner cases. The company has expanded the generality of solutions, which they are not innovating on or publishing.

I would also say they have more money than they can spend and have delayed projects over 2years because they don't have people to lead them (getting back to the personnel problem).

I'm just disappointed in them as an academic institution, deliberately missing their own mission statement and refusing to cite publications before them, and feel the 3x overhead is a waste... at the very least they should pay their people better.


> then reinventing these methods really is novel, unpublished, and relevant information that is valuable to publish.

That's if you can get a journal to publish your results, isn't it? I can imagine a reviewer recommending a strong reject on the grounds that a submission is already "well-known" by virtue of having a commercial pendant available. Reverse engineering is impressive, but alas provides no novelty.


Arxiv is great... also forgot to mention that people should abandon paywall journals and potentially peer-review all together in certain cases.


How did things get like this?


It's easy enough to solve the problem theoretically and technically, whether or not there's the will to do so is another matter.

Such as by assigning a quality score, based on sentiment or some other parameters, to each citation.


A reproducibility index instead of an h-index would help I think, and having an independent body which holds research institutions accountable for a having low collective reproducibility index (per field). Funding should be tied to the reproducibility index.

Edit to add: some research is so niche that only one or two groups may be able to work on it due to equipment, resources or training/talent—in this case the independent body should be allowed to audit the results of such labs. Maybe scientists may game the index by saying "my competitor's results cannot be reproduced", in which the lab who this is claimed against may file a petition to the independent body to resolve the matter.


A big problem we have in a lot of fields is that no researcher is incentivized to even try to reproduce work. No credit is given by tenure boards or grant agencies.


We can't really decide what grant agencies should want, especially since their goals are often national or even more decentralized, and thus 'siloed' - it seems quite plausible that a grant agency (or tenure board) really has no good reason to value replicating the inventions of others but wants to put their funding solely towards "their own" new results.

Tragedy of the commons is a predictable outcome no matter how smart the agents are, because they're each acting in their own best interests.


A proposal of Balaji Srinivasan:

able to interrogate the information supply chain . . . . You see it in the media and that comes from, a government study, that's based on academic study, that's based on a data set. . . . have a actual academic supply chain where you can trace the etiology, the origin of a fact or an assertion all the way through the literature. And there's a famous paper that actually did track something like this all the way through the literature and found, it was just something that just got repeated with some like medical nostrum that actually didn't really have that base of evidentiary support where when you track all the back, you couldn't find the brought table.

https://podclips.com/ct/zgkzdp


You can just do this now, usually. It does just take some effort, but that's gonna be true whether or not people actually cite their sources properly or not: most of the effort is in actually reading and understanding the material, not finding it, even when it's not super straightforward to find.

(for example, when a piece of legislation is widely reported on, I usually can manage to find and actually read said legislation, which can be quite informative as to whether said reporting is accurate)


it would be interesting if an AI with semantic understanding could read and understand all the material and you could interrogate it about the backup behind a claim.

it would be interesting if the entire lab notebook record was available to such an AI and cryptographically signed and dated. that would potentially prevent people from doctoring images and data, as has been recently alleged on multiple occasions.

it would be interesting if that AI was trained to do causal inference and do-calculus, and every paper published generated an incremental update to an automatically constructed meta-analysis and systematic research review.

keep going and make a large body of real-world e.g. medical data available to the AI, and every interaction could be an addition to the clinical trial data. and then make the AI able to generate new hypotheses and directions for experimentation to test those hypotheses.


It's interesting that this came out the same week as Derek Lowe was discussing the problem with some of the most easily discovered scientific frauds (crystal structures).

https://www.science.org/content/blog-post/faked-crystals-and...

Since many of the journals (much less authors) don't seem to have an incentive to correct the record, perhaps journals need to be banned, if they can't police obvious frauds.


I have often thought it would be good for there to be a journal that specifically focuses on trying to reproduce existing high impact (and even medium or low impact if there's time) studies. Unfortunately, verification is not seen as glamorous, but it's important work that not enough people are doing.


We don't need to clean up the scientific record, it's self-cleaning, at least to the people who rely on it.

When I working in R&D and a paper was published with an unexpected result, the automatic response was to look at the lab who published it. Reputation matters and some labs were rock solid, and others, shall we say, "played lose and fast with the data".

And even if it was published by a reputable lab, it was assumed it was probably an anamoly, or some measurement error, or some other problem until other reputable labs reproduced it.

So I guess the solution is for scientists to stand fast with their skepticism. My PI told his students "your job is to punch holes in other people's work". That was always encouraged. Question everything - and if the scientist hasn't shown they already checked that - assume the results are bullshit.

Only after rigorous examination and repeated validation should we believe any science. Not because scientists are making things up or lying (though some are), but because even the most honest scientist sometimes misses something or makes mistakes.


It's almost like these folks haven't heard of a wiki. The term doesn't appear in the primary reference on possible solutions.[9]

> classical model tested over millennia

Peer review as standard practice is less than a century old. Scientific journals are less than 400 years old.

[9] https://link.springer.com/article/10.1007/s10838-022-09607-4


Not really relevant. Whether you implement the post-publication peer review and corrections with a wiki software or as corrigendums or voting or ... is an implementation question once you have solved the questions pointed out.

In particular, the first issue pointed out are misaligned incentives for these corrections (think of wikipedia spam; kernel contributions for trivial spelling mistakes or adding <3...). Or the lack of recognition for doing this work. Or author rights, when the "corrections" introduce quite radical changes (e.g. think of policy articles). "Just make it a wiki" will not magically fix these issues.


At this point, we need AI to help identify the flawed research and then there should be a notice on the flagged research which would require reproducibility to remove.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: