A senior editor at a top-tier medical journal told me >5 years ago that they have hired staff dedicated to scrutinizing papers from two countries that have poor reputations when it comes to submissions. It's not just prestige and the career boost; in some countries there are monetary incentives associated with publishing in the top journals.
Funny story...A few years ago, my online reference hunting turned up a paper from China with three Chinese authors, and also exactly the same paper (down to the formatting and footnotes) from three Indian authors in India.
The article measures the quality of papers produced by a country by the percentage of the world's top 1% cited papers they produced. This seems like it could be gamed.
This is for example an incentive for tenured professors in Spain to do research: if you get (Maths here) 5 JCR papers in 6 years of which (this is one of the criteria, there are more) three are in the top tercile, then you get a bonus of 120€/month from that point on (this can be obtained only every 6 years).
That has been a good incentive to date (I insist, this is maths and there are very few possibilities for cheating in my field, but of course there are instances), as I see it, because this way you incentivate academic professors to really do research (otherwise one can have a truly leisurely life in Spanish public universities once one gets tenure, without any repercussions).
Well, it is a permanent 120Eu per month (usually inflation adjusted, more or less), which considering that the tenured professor salary starts at like 1800Eu per month, is not so bad (at least it is some incentive). You can end up with 5 (or 6, if lucky) of these.
You see: the problem is not the incentive, but the salary itself.
(All the numbers above are more or less with taxes and Social Security already discounted).
Edit: never underestimate the power of saving and compound interest.
For manual flagging, you can use Unfold Research to leave a review on the paper itself for anyone to see it, and you can add review tags and a description about what you think is wrong with it: https://twitter.com/UnfoldResearch/status/156099536649284812...
Tools like scite (https://scite.ai/) do automatic processing (NLP) of other papers and determine whether a reference relationships supports or rejects the paper.
Also, http://octopus.ac/ did NLP processing on papers to make a breakdown of them and extract pieces to populate the content on their website, but they seem to have removed the content on their Github since then.
Instead of a flagging system ranking the credibility of papers once for everyone, can we have a system where each user decides which other users they trust to evaluate papers, and ignore any signal from untrusted users?
For example, I could L1 trust people to evaluate papers and the trustworthiness of other users, and L2 trust others only to evaluate the trustworthiness of papers. If I notice a spurious signal getting in, I can find out who it came from and prune my list of trusted users.
I imagine I'm not the first to think of this. Anyone know of reading material on the topic?
The one thing I'm aware of is Brave's Goggles project: https://search.brave.com/help/goggles - it basically allows you to set up your own ranking algorithm with a specific syntax (it links to Github repo and, there, to a white paper).
I haven't seen more attempts at similar things, which doesn't mean they don't exist though. (There were mentions of a similar feature being added when Elon Musk was about to buy Twitter, but just gossips)
An advantage of having open peer review system is also that this kind of attempts could be detected by tools doing automatic processing of scientific knowledge graphs generated by peer review activity.
This, coupled with verification of users, should be able to minimize the risk quite significantly.
Seems to me that this wouldn't be enough in case of state actors or large organizations. They would just have to employ real people to carry the fake review bombs.
Mentioned in TFA, the GRIM test for checking a paper’s reported mean of integers:
“””
The GRIM test is straightforward to perform. For each reported mean in a paper, the sample size (N) is found, and all fractions with denominator N are calculated. The mean is then checked against this list (being aware of the fact that values may be rounded inconsistently: depending on the context, a mean of 1.125 may be reported as 1.12 or 1.13). If the mean is not in this list, it is highlighted as mathematically impossible.[2][3]
“””
Good idea (for integer-valued measurements), but you can do it without having to bother calculating all fractions with denominator N.
Let's say the reported mean is called m and the sample size is N. Multiply m and N together. Ideally mN should be almost an exact integer. Find ceil(mN)/N and floor(mN)/N and see if they're the same as the reported m, within precision limits.
Example: m=6.24, N=13
mN = 81.12 (exact)
81/13 = 6.231
82/13 = 6.308
So there is no integer numerator (for denominator 13) that gives 6.24, to 2 decimal places, so 6.24 is a mistake or a lie.
indeed there is no such heuristic. The question is deeper, it's about how to trust that scientists from a culture that is perceived to be alien are living up to familiar scientific standards. It's easier to identify fraud when it comes in a systematic way from a specific region of the world. However there is a lot of fraud and scientific-freeriding coming from western scientists too, it's just sporadic and not visible among the thousands of high-quality studies.
It's also wrong to call a paper 'questionable' without qualifier. Questionable can mean anything, and some of the most groundbreaking ideas were questionable when they were put in the wild. I suppose the author means technically flawed.
I wonder are there tools to visualize a particular area of interest. Could be used to identify duplicate results (not a bad thing), useless results, or contradicting results (possible fraud, error or even ground breaking). When I read too deeply in almost any area I lose the forest for the trees, authors are great at rigorously describing in english what they are doing but it overflows my stack. I wish there was a way to map it semi-permanently so I could say aha yes this result looks like this on my map of the domain. Does such a general tool exist?
Peer review has issues too at the moment. From what I've read, it worked when the world was a smaller place, basically. Now, peer review is another broken cog in a machine that grew organically out of past practices and which no one really intended and no one knows how to fix.
As someone who does peer review for 2-3 conferences a year, I will say that peer review is still better than any alternative proposed so far.
In my circles (Computational Linguistics), getting your paper approved means that you convinced at least three PhD students with a published paper (or higher, all the way up to Professor) that your paper is good PLUS the area chair(s) considered it worth publishing. All of this without knowing who you are nor what your institution is. And once your paper is accepted it is freely available on the internet forever.
No human activity is free of issues, but to me double-blind peer review is a fairly effective system. And every year the system is tweaked to try and adapt it to newer developments such as the rise of ArXiv. Most criticism I read uses the term "peer review" to refer to whatever Nature, Science, and/or Elsevier are doing at the moment, but just because they suck it doesn't mean that peer-review as a whole does.
> getting your paper approved means that you convinced at least three PhD students with a published paper
This is a recursive definition of scientist "A scientist(t=n) is someone approved by scientists(t=n-1)". If you have a good initial condition maybe it can work for a while, but it seems that time may be coming to an end.
> Pournelle's Iron Law of Bureaucracy states that in any bureaucratic organization there will be two kinds of people: First, there will be those who are devoted to the goals of the organization. Examples are dedicated classroom teachers in an educational bureaucracy, many of the engineers and launch technicians and scientists at NASA, even some agricultural scientists and advisors in the former Soviet Union collective farming administration. Secondly, there will be those dedicated to the organization itself. Examples are many of the administrators in the education system, many professors of education, many teachers union officials, much of the NASA headquarters staff, etc. The Iron Law states that in every case the second group will gain and keep control of the organization. It will write the rules, and control promotions within the organization.
> This is a recursive definition of scientist "A scientist(t=n) is someone approved by scientists(t=n-1)". If you have a good initial condition maybe it can work for a while, but it seems that time may be coming to an end.
There is no formal definition for a "scientist" and likely will never be. Definitely a scientist is not "one who has a paper accepted".
Here the commenter is just claiming that in order to publish a paper in a specific journal you do need be accepted by a handful of people who have previously been accepted in the same journal. Barring the obvious chicken and egg problem which is easily solved, it does not seem to be like something which "is coming to an end".
I had a couple approaches in mind, but I think I got carried away in the moment when I wrote that comment. A radical overhaul may do more harm then good.
At the moment people do things "for living", in those usual bureaucratic institutions that all institutions become sooner or later, there are initiatives at power that work against the initial idea behind that institution.
That's why progress in science occurs at best "one funeral at a time", but usually not before a whole generation bites the dust.
This plus bureaucracy seem to almost killed larger parts of science by now.
We have collusion rings milking the system. We have broad pure esoteric nonsense "science" circles. We have more and more utter garbage and scam. And we have holy churches that will send the inquisition on you if you dare to disagree.
I don't know how to fix that. But I guess the initiatives, and especially the cash flows would need some re-engineering. But how concretely, I have no clue. Throwing away everything you have is of course almost never a good idea. But some things need to change. Change in a radical way, I guess.
What I wish is that peer review did not merge so many things into one step. Peer review is used to test for
1. The validity of the results
2. The novelty of the results
3. The importance of the results
Papers are routinely rejected in peer review not because their methods or results are suspect but because their results are not seen as impactful enough to justify space in a conference or journal. This sucks and introduces all sorts of nasty biases into the community. I'd love for these to be disconnected so one could demonstrate that their paper had been read by other experts and found to be within reason without having to worry that one person on the PC who just doesn't think that a certain research direction is valuable will sink my career.
But historically when the world was smaller, people writing papers were more likely to "speak the same language," have experiences in common that fostered clear communication, etc.
It's a big, wide world. The sheer variety of experiences these days introduces complications never before seen.
Reproduction by n sources with n-k unaffiliated sources would be much better, but that requires a lot more resources for research that could otherwise go to yachts or the latest ponzi scheme.
Peer review is OK but peer review is not curation. Everyone wants to get published , makes up a hypothesis that barely passes statistical significance, and gets published. Make up a hypothesis in your mind about anything , there is very high chance you will find a paper that supports it (and very low chance to find a paper that refutes it). This is like postmodernism: every possible idea is deemed to have equal weight, which leads to 1=2 . Sometimes it seems it 's more productive for a scientist to become a monk, stop communicataing and start working alone in her lab undistracted by the spam notifications of published science.
For non-experimental, non hard science, and based on the replicability studies of the last few years, and the things we have already known for a long time: I feel pretty comfortable with a prior probability of a paper's result being true positive at ~50%. For experimental, hard science, papers I still have a high belief, maybe 90%.
After this I use the following heuristics:
- => Lower belief
+ => higher belief
effects multiply, of course, but each is few %.
++ => sufficient details present that I think I can attempt to replicate without contacting authors (this is rare)
+ => code used is available in public repository (subset of above)
+ => published in med-to-high ranking journal
--- => published in obscure journal
-- => socially desirable outcome
+++ => socially undesirable outcome
-- => non-US, non-EU university or nat. lab
+ => national lab
I mostly read papers in my field (exp. physics), of course. But I began reading a bit more in medical and psyche a few years ago. My views are influenced by books like "Medical Nihilism", and "Science Fictions". Reading COVID papers was also caused me to revise my prior-probabilities downward.
Is the correct interpretation of the minus signs here that you distrust journals that print ISSN on every page more than you distrust journals that print ISSN prominently?
Also could you please expand on the thought process behind this heuristic?
> Is the correct interpretation of the minus signs here that you distrust journals that print ISSN on every page more than you distrust journals that print ISSN prominently?
Yes. At least, that was how I interpreted the grandparent post's scale and how I tried to match it.
> Also could you please expand on the thought process behind this heuristic?
Have... have you ever encountered these journals in the wild? They're trash. But they come up in search results because search engines are dumb, and I don't know the "good" journals in every field, so I can't filter them out myself. Until I click. Then I can!
Usually what's in them is papers from researchers based in countries that have a publication count score or bonus. They'll take some overcomplicated homework problem and solve it or set up some overcomplicated scenario and show that it has property X or have a few diagrams or photos of their current overcomplicated-senior-project research and show that it will do Y when they finish it. All worthless. There is no insight whatsoever.
In a world where belief can be increasingly influenced in undetected ways due to sophisticated information targeting allowing any individual or demographic to be explicitly targeted and optimized against in a relentless attempt to modify behavior, how does one form belief that is defensible?
- Only a tiny minority of your beliefs have any impact on the outcome of your life. Perhaps your concept of [insert social or political policy] or [scientific topic] or [cultural idea] might have been gamed by someone at some point, but ultimately it will rarely matter whether you even hear of these concepts to begin with or not.
- Of the beliefs that remain, these often become pretty defensible through practice and results.
Put another way, there are plenty of barely-literate people who still have lead fascinating and worthwhile lives even in the modern era.
Could you expand on what you define as socially desirable or undesirable? I understand the point you are making but I am not sure exactly through what lens you are quantifying these characteristics.
Imagine two papers of equal scientific merit: equally strong or weak evidence, equally rigorous or sloppy reasoning, etc. One of them finds that certain races are genetically inferior. One of them finds that democratic countries have stronger economies.
Which is going to have an easier time getting to publication?
If publication is in the Nazi Journal of Nazi Naziness, the first one will be easier to get published. If for whatever reason you're reading journals there, the second of those papers is the one you should expect to be unusually strong.
If publication is in Nature, they probably don't have a lot of Nazis on their editorial team. If that paper got published there, then probably something about it is unusually good somehow despite the unpleasant conclusion.
Caveats:
1. With a conclusion as horrifying as the one in your original question, though, I think it's fair to reason as follows: Even if the paper gave tremendously strong evidence for genetic differences between races, there's no way any sane editor would publish a paper that says the "... and should be culled" bit. Therefore, this paper was almost certainly railroaded through to publication by a crazy editor. Therefore, the fact that it made it is not evidence that anything about it is any good.
2. It's OK for your expectations of a paper's merits to be affected by how plausible you find its conclusions before you read it. Many "socially undesirable" ideas are that way at least partly because most people find them highly implausible, and if most people do then there's a good chance you do too, and if you do then it's fine for that to lower your expectations for the paper.
The correct answer is given by Andrew Gelman right at the top: "Unfortunately, no, I don’t know of any heuristic, beyond using tools such as GRIM to check for consistency of reported numerical results."
The question is essentially whether it is possible to have a structural procedure to evaluate the semantic validity of a statement.
Why are these questionable papers? Academics are forced to publish as many papers as possible. The system pushes their own to go for the smallest publishable unit, people will find a way to optimise for that, and who are we to blame them for that? They're underpaid, overworked researchers trying to guess what the opaque and brutal funding system wants from them.
Is there outright fraud, or are they just dull/pointless papers? if first, yeah that's questionable, but a lot of these example papers look more like they're dull, rather than committing fraud.
We should rather tear down the awful system instead of playing whack-a-mole with academia's newest outcropping.
> We should rather tear down the awful system instead of playing whack-a-mole with academia's newest outcropping.
There isn’t one “system” that can be torn down and replaced. A lot of these papers are coming from groups and institutions in foreign countries that are basically trying to role-play as researchers so they can blend in with actual researchers and try to reap the reputation and career rewards.
It’s not really centralized. This is akin to suggesting that we need to tear down the entire “e-mail system” because spammers are abusing it to send spam appearing as legitimate email.
Why is it a problem? Because these contribute nothing but noise and confusion to the field and make it more difficult for actual researchers to sort through real research. There is no reason to forgive these faux-researchers for trying to extract money and prestige from a system without making actual contributions.
The idea to “tear down the system” doesn’t even make sense unless you propose something different. Are you suggesting we stop rewarding anyone in any way for research? That would remove the incentives, but now we have a problem that research isn’t rewarded at all and real researchers won’t do it because they need to eat.
Not every paper is to be some grand work. Sometimes you get a result and literally want to move on with your life. Maybe you want to focus more on another project. Maybe you want to finally defend your thesis or be done with this post doc tenure and uproot yourself. Maybe the grant only funded a year of work. Feature creep is a real thing with research papers. A paper shouldn't take years to publish, if it does you should publish the earlier findings and just cite that in your later paper so those ideas can get out there as soon as possible, instead of sitting in your draft for potentially years.
A good candidate could be Benford's law [1], which makes predictions about the distribution of digits. If the digits of the numbers found in the results section of the paper deviate too much from this law, it could be a red flag.
It looks more like the papers in question are taking a large public dataset and performing some statistical analysis on it, often along the lines of showing a correlation between metrics which represents pollution (like CO2 emissions) and metrics which represent economic growth. The source data is real, and the analysis is probably real too; neither is likely to trip Benford's law. The problem is that there's nothing novel being uncovered by these articles -- it will surprise no one, for example, to hear that China's CO2 emissions have risen in conjunction with their energy usage and economic growth (https://doi.org/10.1007/s11356-020-12217-6). Substituting China for Pakistan, or energy usage for foreign remittances, doesn't make for novel research; it's simply plugging in another set of numbers and turning the crank.
True. But the "once it was known...make fake data to comply" issue applies to ~every fake-detecting method which could be applied.
Big picture, the goal is not to catch every fake. It's to (1) catch enough so that smarter fakers learn to submit their stuff elsewhere (vs. to the journal, conference, or institution in question, which really cares about fake detection). And (2) dimmer fakers are mostly caught ~promptly.
The most obvious heuristic is to look at the reputation of the journal that publishes it.
That's the reason we still have these overpriced journals. The "publishing" part is easy now, it is just a matter of putting a pdf on a server, journals are selling reputation.
Do you think there is a way to get papers reviewed to the same quality without a journal?
If we are paying for reputation and the journals reputation is a function of the reviewers skill, maybe there is a way to cut out the middleman and pay for skillful reviewers instead.
Not a good heuristic, Word processors are analogous to GIMP in the photo editing world. It's what some people just know, even if Photoshop or Markdown/Pandoc/LaTeX math are just around the corner.
In math and computer science, papers witten in word are a definite mark of crackpotery. I agree it's arbitrary and nonsensical, but clearly a good heuristic.
This would be a terrible heuristic, as it would reward western bad science (it exists, it just looks differently) and punish great scientists who just happen to be born in the "wrong" place.
Paradoxically enough, it would be the opposite of science.
Instead of just accusing that journal of being a paper mill and summarily blocking it, we look for face saving way of filtering papers.
Because in science it's totally forbidden to accuse someone of deliberate fraud. You always must find a face saving way for the authors when dismissing their paper, like "bad statistics are being used".
It seems that the medium of writing will inevitably reach a point of too much information to process, malicious or not; while simultaneously increasing the risk of silos.
What does a post-writing civilization look like? Is it reasonable to imagine scientific progress without the burden of having to battle the computational capability of sufficiently motivated actors who can easily manipulate authored information spaces?
Writing is essentially a massive attack vector to the collective human consciousness prone to Sybil attacks at an alarming saturation point in a post gpt3 society.
You still have the social graph of the authors. Professors are the gatekeepers of the next generation. Like venture capital, you will have to hustle your way to the top for an introduction if you are an outsider.
The social graph of the authors is precisely what computational superiority is able to manipulate due to the computational resources to create false reputation graphs that are indistinguishable from someone’s bias for base truth.
How is that possible? You start with a group of people whom you trust. Then you check papers where those people are co-authors. You add everybody to your group who is on an acceptable paper. On the other hand, you degrade or even remove those who are on a bad paper.
Computational resources are useless because you cannot fake papers with trusted authors. Trusted authors won't publish a bad paper because they want to keep their reputation.
All an adversary can have is something like a network of fake articles. Unlike web pages, scientific papers are not opinions. The scientific core group can check if papers are relevant and true. The scientists can limit those checks to authors whom they have met in person to minimize their time investment.
How do you know if your network is the truthful one and isn't participating in deception? And given that networks are increasingly non-physical, how do you guarantee it is the person in question?
The only way reputation networks maintain durability is if they stay physically constrained and limited by physical contact. Any system built externally and allowed to represent facts and truth will be compromised due to computational superiority. It must not be a network hosted by a computer. It must run on culture and physical reality itself.
Additionally agents of deception are regularly embedded in networks to encourage malicious behavior. Spies are pretty common.
I think the key bias we're contesting is that I'm arguing computational superiority allows creating networks that are indistinguishable from truthful ones because they have the resources to have the "longest version of history" to act as the authority. If the history is not there, it can be fabricated and embed enough "covert actors" to maintain a narrative.
Then you also generate useful idiots who don't know they are being used to propagate misdirection or what have you. They become unable to detect any "obviously fake content" because it doesn't require fake content to conduct deception.
Very often, it is real information indeed that is most able to misdirect attention.
Largely agree with your adds. However one point you mention I've had great contemplation over:
> Yes. That's the difference between science and writing in general. Science is meant to be checked with reality.
When we live in a world that is increasingly using simulations for verification instead of reality itself, it seems as it becomes an ever increasing attack surface for controlling and manipulating what reality and truth actually is. The scary quote "Science Academics" are largely publishing statements based on mathematical models that are probalistic or theoretical with no physical extant items.
When the burden of verification is solely on the individual, yet the individual is increasingly pushed in the direction of having to believe things as facts rather than experiencing them, then it sort of compromises that integrity, no?
It seems the only way to have certainty is to be a witness and have experience over a set of events, that is the only thing that can be understood to be true (and only to you), and that is what contributes to certainty. All else is belief and unknown risk to deception.
tl;dr science is heading towards imaginary non-extant things that happen to predict physical things but they can only be verified non-physically and since certainty require physical experience then the security of the system is at risk of being corrupted.
tl;dr;dr what to do about simulations and models posing for reality?
Also interesting:
> That's when the game becomes interesting. You can also have truths that are not verifiable etc. It all depends on what the participatns want to know.
Can you expand on this? You're saying something incredibly poignant and I'd like to hear more.
>what to do about simulations and models posing for reality?
It is or will become indistinguishable from thinking. We can ask the same question about our thoughts.
The science trick is that it requires models that can be falsified in reality. If you make predictions about simulations, then you don't make science but math.
The problem about simulations is that they make verification equally or more expensive than the simulation itself. It's the end of 'scientific cooperation'. Why verify a simulation when you can make your own?
>>It all depends on what the participants want to know.
>Can you expand on this?
I wouldn't call it poignant, but here you are: As you wrote "Very often, it is real information indeed that is most able to misdirect attention." - there is a process that makes choices. Truth is a tool to reach the goals of that process. We don't care about the scientific method in itself.
Most people don't need truths to reach their goals because there is kind of a market of facts. Some facts are just known, without scientific backing, and people can just choose to use them.
When there is deception there is also cooperation. The misdirecting information is only there because somebody else wants to achieve something. Maybe it's easier to cooperate with them about their goals than handling the misinformation?
To your latter point, yes, and one may not be aware of the fact they are the useful idiot. It may even be degrees useful idiots.
The fundamental issue is that there is a cost to deception that I reason is greater than the benefit to the ones gaining from the cooperation and execution of the competing trajectories of narrative control. It essentially breeds psychic cancer. It incentivized all the behavior that leads to distrust then inevitably chaos.
So the only way to avoid that is no competing factions. How do we transition to a non game theoretic civilization? Can we achieve it without assimilating to silicon?
And related to your points on truth which I agree with, is the classic Greek comparison between Techne and episteme. It translates also to understanding and applying facts demonstrating “actually getting it” vs just regurgitating the fact market. The Tao te Ching has some anecdotes on the danger of knowledge as it leads to people thinking they know something but all they really have is the ability to echo or parrot. I guess you need your messages delivered in a network one way or another.
Thanks for the thoughts. Would enjoy hearing more from you if you’re on a discord or similar.
- precisely re: silicon. One can never be sure about external systems outside the self it seems, and even that is potentially at mercy to whatever can manipulate the senses. However there is an increased risk of deception / corruption by introducing a universal thing that only 0.0001% or so know how it actually works or could build one end to end.
It doesn’t necessarily mean to abandon silicon, but it does mean that it can’t ever be trusted. So designing systems around that and probably minimizing dependence. I can see this happening with “cold computation” that is very specific in purpose and low in cost. Something like the green array chips. But that does have the potential to swarm. All I know is that if silicon replicates it’s going to be in conflict with dna.
It seems prudent for a dna based life form to stick to dna for truth. All else is risk of assimilation or some form of deception at large unable to be detected.
Really the idea of publish or perish needs to be deemphasized. Someone in the Chinese government got the idea that publishing scientific papers is paramount, so Chinese scientists pump out papers, many having little or no use. I fear western researchers are in the same boat: Publish. Doesn't matter how good it is, how useful it is, just publish.
If there is a heuristic it's quantity. If a laboratory, person, or country is producing an inordinate amount of papers then perhaps it's a good idea to sample and test some of those.
Scientists are evaluated by how much they publish and how many times their papers are cited. There's no incentive to call others' scientists bullshit.
The ranking should assign the score of a paper that was debunked to the paper that debunked it - that would provide incentive to replicating famous sketchy-looking papers.
> statmodeling.stat.columbia.edu... Checking if the site connection is
secure. Enable JavaScript and cookies to continue.
statmodeling.stat.columbia.edu needs to review the security of your
connection before proceeding.
That's questionable.
But I am not allowed to value my privacy and security and ask
questions of Columbia.
Very true, but if you consider things like "The ATLAS Collaboration" to be one (very composite) author, I think GP's reasoning has some merit. There is a lot of homogeneity in a collaboration author like that, so it's a reasonable thing to do. (Also, speaking from experience, most people in the collaboration paid no attention to the paper, unless it was a Big One.)