Hacker News new | past | comments | ask | show | jobs | submit login
Hundreds of extreme self-citing scientists revealed in new database (nature.com)
541 points by dmckeon on Aug 20, 2019 | hide | past | favorite | 208 comments

As someone who now works in astronomy, I'm not at all surprised at the high self-citation rate for the field. It is true that a lot of papers are published by large consortiums. For example, at LSST (where I work), if you have been working on the project for 2 years, you are considered a "builder" and added as an author to all major project wide papers.

Those papers, which tend to be long and full of great stuff, are cited a lot, and have hundreds of authors.

I wonder how many of these papers are where the first author has cited other papers where they are the first author. (Or really, at least the first few authors) It seems like for the data shown, it is just if anyone in the author list is anywhere in the author list of the citation?

Also for some research niches, you may be one of the few people writing papers on a subject. There's no one else to cite.

I do think there's some very valid points about bringing the person up to speed on previous research that brought them to the current paper. But I don't think those citations should really count as a citation in terms of metrics for how successful a scientist is.

To be honest, I find all the metric gaming about number of papers and citations to be ridiculous. I don't hear many people saying they want to write the best paper in their field, or something new. It all seems to be a numbers game these days. Academic career growth hacking, if you will.

This probably varies by field, but the "large project" thing can be gamed too.

So, for example, in biomedicine you often have lots of people on a paper who might only read a draft, make some trivial suggestions, and then be added as an author.

As a result, there's this pressure for large groups to form, where everyone is added and everyone can cite each other.

This doesn't mean the projects are bad, but it does lead to individuals with large citation counts primarily because they find ways to add themselves to everything, regardless of their level of effort. People should get credit where it's due, and large projects involve lots of people. But what defines a "project" has become very vague.

I've become extraordinarily disillusioned with academics. Science gets done but the rewards seem to filter preferentially to those who are able to game the system, and the system exists out of a need to make one's self look as productive as possible, in areas where contributions are generally necessarily tiny or nonexistent, even among very competent people, because the problems are hard and because so many people see the same things at the same time.

The problem is with the binary attribution. Either you're an author of everything in the paper or you're not an author at all.

Software world solved this issue with version control systems like git. And if scientists write papers in latex or other text-based formats it's trivial to use version control system for that too.

Then when you quote a fragment you do "git blame" on it, and you see who created and edited this fragment, so you can quote only the relevant people instead of authors of the whole book.

This would make it much harder to abuse quotation rankings.

Additional benefits - when a paper is found to contain manipulated data or other errors - it's trivial to check who did it, so only that person's career is done.

Except that's not how science is done. Some people are bad writers and just don't touch the paper at all. Sometimes you have a grad student do all the work for a paper and someone else writes it, or the majority (common in the first year or two, or with undergrads). Just because they had no or little commits should they not be the main author? They did the work after all.

Another example, my advisor doesn't like git. While writing papers I and my collaborators use git but send an email copy to my advisor. Clearly he's going to be on the paper because he's my advisor but you'll see zero commits from him.

I think it's just too easy to think that technology solves this in a trivial way. It's complicated. You have people from different eras working on things. And this is in a CS program, mind you. In different fields it gets much worse very quick.

Side note: go look at papers from top tier universities. You'll notice that they frequently cite colleagues at their University. Is this because they are gaming the system? Is it because they are doing the most related research (which is VERY common for a single University to work close)? Or is it a combination. In all likelihood it's a combination because citations matter. The h index is used in your performance because this is meant to be how impactful your paper is, but the system can definitely be manipulated (and likely isn't happening for malicious reasons nor necessarily unethical reasons)

Not to mention the politics of adding prominent names from your University/Institute to the list of authors to improve the chances of paper acceptance to competitive journals and conferences

A lot of publications now want you to do anonymous authors because of this. Though it's always pretty easy to tell what University something came out of, so the prestige of a university still plays a role.

I don't disagree with what your saying, but I think LSST's builder concept is actually quite amazing and the opposite side of that coin.

For people building the telescope (think hardware, software, logistics, everything before the science can be done), many of whom are not academics, and don't typically get authorship or write papers, it's great to get credit for working on the project in a formal, public way. You don't even have to edit or provide some kind of task directly related to the paper either, which I agree can get somewhat clique-ish.

Wouldn't it make more sense to adopt the way Hollywood does crediting by then crediting everyone but with a note on how they contributed?

The builder concept is actually really appealing. Academia can tend towards the same problem as consulting, where tasks get sharply split between "credit-producing" and "not worth doing".

Answering questions about your past papers; looking over someone else's proposed methodology; or cleaning up an internal tool into one you can share are all great tasks for advancing the field, but none of them bolster a CV, earn grants, or help you get tenure. If you want credit for them, you usually have to commit lots more time to the task, like running a formal discussion, becoming an author, or polishing the tool into an OSS contribution. All too often, the result is siloed projects and work abandoned as soon as it's published. (How many papers offering some novel twist on priming or ego depletion could have been turned into replication-and-extension if past authors had been involved?)

Especially in astronomy, with large projects and lots of non-PhD team members, this makes so much sense. (I believe something similar may happen at LIGO - if not formally then at least in practice?) If work is going to be judged by authorship, it's only fair to recognize that at a certain point the groundwork and floating aid people give is comparably valuable to the act of writing up some chunk of the text.

This definitely happens on LIGO. You have hundreds of authors. My optics professor in undergrad was never a first author but he sure is an author on a lot of papers.

> Science gets done but the rewards seem to filter preferentially to those who are able to game the system, and the system exists out of a need to make one's self look as productive as possible

This is largely due to the current model of science funding, and not just in the US. Here in Germany, many involved in public (i.e. at an university, not inhouse r&d at a company) science only get 1-year-limited chain contracts with no real security and low pay, as many grants and funds are also available only in the same time frame which means it's a constant hustle for funding, especially at third parties.

IMO there is only one way to solve this problem: politics have to irrevocably allocate fixed chunks of money for public scientifical investment for long terms (think 10 or 20 years) to give the scientists and universities actual security in their planning and staffing. This would also solve the problem that e.g. NASA has with each President reversing course. No wonder that the last Moon visit was decades ago when the priorities get completely turned over every 4-8 years.

I agree that the root cause of a lot of it (although not all of it) is funding. I basically share your perspective about long-term funding. I personally would like to see indirect costs in the US eliminated, or much more heavily reduced, audited, and justified. I also think there needs to be dramatic shifts in funding along the lines of what you mention. Proposals have already been floated, by former federal funding heads no less, along these lines. Lotteries would be good, as would awards based on merit rather than application (Hungary's model of funding people based on bibliometric factors is a good idea, even if it runs into the problem of gaming citations as pointed out here). Longer-term funds also seem like a good idea, which is the whole idea of tenure in theory.

There are other factors at play too, that are harder for me to pin down. Funding models are a big problem, but there's something related to attention-seeking or metrification at play too. Some of this has probably always been around, but in talking to older colleagues I get the sense that things are much more splashy and fad-driven than they used to be, with much greater pressure to produce in volume. A colleague explained that when it's that much easier to write and publish a paper, there's more of an expectation that you do more of them, even though the idea development time isn't any shorter.

> This would also solve the problem that e.g. NASA has with each President reversing course. No wonder that the last Moon visit was decades ago when the priorities get completely turned over every 4-8 years.

This is also one of the arguments against promoters of so-called "term limits" for congressmen and others; imagine this kind of churn in priorities occurring with major and minor public works projects!

We don't have to wonder too much - we can already see it at the state level with governorships changing; one recent large change of this kind is with California's high-speed rail system. For all of it's "boondoggle-ry" and problems, I don't think the way it's been "axed" lately will be of help to completing it. In fact, it might just be a self-fulfilling prophecy for its opponents.

That's only one example; I'm sure others in other states could be easily found as well if one were to look. Ultimately, that kind of thing would only get worse with term limits on representatives to Congress, because federal funding for such large scale projects is needed - and that would end up likely in flux, and ultimately scuttle projects that depend on steady funding to be completed.

One could argue that individual state projects should only be funded by the state itself, but that notion of state self-sufficiency went out with the end of the Civil War. I also tend to wonder if - under a term-limited system - such a thing as the interstate highway system could have ever been built. It doesn't seem likely.

I'd prefer a 12-year term limit (for a 4-year period). That's 2 years to get up to speed, 8 years of working, and 2 years to pass knowledge on to the next generation. For long term projects the funds could be allocated at project start (e.g. via special-purpose bonds, just as in the early days of railway construction), so the project will be independent from political issues.

Academics have basically reinvented the webring.

> I wonder how many of these papers are where the first author has cited other papers where they are the first author.

Also, it's worth mentioning that countries which support PhD via publication essentially require you to conduct self-citing research. This is to show you've had a common thread between your research, and that the PhD can be defended as to have all the papers be considered to be on the same subject.

To be honest it seems perfectly reasonably for a researcher to bring up their own related work. Maybe there should just be a separate metric which separates self-citations so it's transparent.

The most extreme cases of self-citation are definitely bad news.

There seem to be some authors and author groups who rely almost entirely on self-citation for impact factor, allowing them to get by with irrelevant or unchecked work. It might be possible to detect that with a metric like self-citation or high author-placement self-citation as a fraction of overall citations.

But overall, it seems like this metric should be limited to exploratory use. There are wholly legitimate cases of frequent self-citation, like mathemeticians pioneering a new technique, or astronomy research groups which cite a large support team and product many sequential findings. Discerning an apparent citation-mill like Vel Tech R&D from a legitimate research group like the LSST requires thought, not just statistics.

Meanwhile, the most egregious self-citers are usually doing something else wrong too. Robert Sternberg wasn't just self-citing, he was reusing large amounts of text without acknowledgement, and abusing his journal editorship to publish his own works without peer review. The Vel Tech author in the article seems to be citing his own past works which are irrelevant beyond vaguely falling in the same field, and the enormous range in his work (from food chain models to neurobiology to machine learning to fusion reactors) makes me suspect it's either inaccurate or insignificant.

Ioannidis is damn good at what he does, and was far too sensible to broadly condemn high self-citation researchers. But it would be a real shame to see self-citation rate blindly added to university standards the way citations and impact factor were. The lesson here is that reducing academic impact to statistical measures of papers doesn't work, not that we need some more statistical measures.

>> To be honest, I find all the metric gaming about number of papers and citations to be ridiculous.

That's the main issue, isn't it? Citations are a bit like tokens that can be exchanged for funding, so they become a commodity that people are incentivised to hoard and trade. That is just the worse kind of environment to promote good quality research. The only thing it can promote is ...lots of citations.

"In 2017, a study showed that scientists in Italy began citing themselves more heavily after a controversial 2010 policy was introduced that required academics to meet productivity thresholds to be eligible for promotion"

Cobra Effect


More like Goodhart‘s law: "When a measure becomes a target, it ceases to be a good measure."

"What gets measured gets treasured."

and the corollary, "the uncountable is of no account"

I'd never seen that, and it reminds me of Heisenberg's Uncertainty Principle somehow.

Heisenberg's principle says that when you measure one thing, you lose track of others. I guess it's kind of complementary to this (except that it applies to very different things, obviously).

I don't see that it's the Cobra effect. Did it actively make them less productive?

Publication standards are different between disciplines. Traditionally, a materials chemist publishes many times more papers than a structural biologist. With a metric like that, you run whole sub-disciplines out of town.

Reminds me of a theater professor I know who was having trouble meeting the new rigorous rules for tenure.

Publish? What and where exactly?

If it means that some academics are choosing to cite their own papers rather than keeping up to date on the literature of their field, there's likely to be some potential insights missed.

There's no reason to assume that adding a self cite affects how much they read others.

Not self-citing in incremental work would be like a software team erasing commit history at the beginning of each sprint, claiming that it's all new code. The post-metric amount of self citing might be less misleading than what they did before.

People who are rewarded are the people most cited, even if they are citing themselves.

Think that’s producing the best scientific results?

> Think that’s producing the best scientific results?

That's a strawman. The answer you were looking for is, "I don't know". I don't know either.

I’d say it makes Publish or perish even worse:


Is being cited a productivity metric?

In Italy it is one of the three metrics used. The others are the number of published papers and the h-index.

This is the table of thresholds for associate professorship ("II Fascia") and full professorship ("I Fascia"): http://abilitazione.miur.it/public/documenti/2018/Tabelle_Va... "Numero articoli" is "number of papers", "Numero citazioni" is "number of citations", and "Indice H" is "h-index". The thresholds are different for each research area (e.g., "INFORMATICA" is "computer science").

The topic is actually a little bit more complex, since looking at the metrics is only one step of the process.

Self-citation is appropriate for a new paper that builds on the results of a previous paper. But in evaluating how influential a researcher is, it makes sense to exclude self-citation, while being careful to avoid any implication that self-citation is wrong.

when self-citation is OK and when it is not OK should be part of one's academic training/phd training.

PLOS gives reasonable citation guidelines, and in this context their Rule 5 is particular relevant: https://journals.plos.org/ploscompbiol/article?id=10.1371/jo...

I find this approach strangely condescending. For example the author says:

> Understanding the value attributed to X, Y, and Z in that particular text requires assessment of the rhetorical strategies of the author(s).

They could've just said, if you want to know why the author thinks XYZ are important, you need to look at what they are saying about it.

I'm a hardcore postmodern leftist, but I don't see how writing in such a contorted way helps practicing scientists. In fact I would argue that this kind of listing obscures a politics of its own; it is so busy prescribing citation practices that it won't examine its own politics.

That said, it's the first time I've seen this guide so maybe I need to read up on the issues; a list of do's / don'ts isn't the best way to introduce and help people understand the issues.

What is a hardcore postmodern leftist? You don’t believe in objective truth but make claims as if it exists?


Please stay courteous and on topic. Nobody cares about you being a "hardcore postmodern leftist" ...

Great link. Thank you. I fully agree with rule 5, which puts this article in a different perspective.

The core problem here is that universities think that citation statistics are a useful metric to evaluate the quality of the work of a scientist. There's plenty of evidence that this is not the case or that even the reverse may be the case [1], but this idea refuses to die.


It sucks as a metric but it does have some rough correlation in most cases, and I'm not aware of any better easily measurable metric - if you have one in mind, it'd be great to hear. The alternative of having a bureaucrat "simply judge quality" IMHO is even worse, even less objective, and even more prone to being gamed.

The main problem is that there is an objective need (or desire?) by various stakeholders to have some kind of metric that they can use to roughly evaluate the quality or quantity of scientist's work, with the caveat people outside your field need to be able to use it. I.e. let's assume that we have a university or government official that for some valid reason (there are many of them) needs to be able to compare two mathematicians without spending excessive time on it. Let's assume that the official is honest, competent and in fact is a scientist him/herself and so can do the evaluation "in the way that scientists want" - but that official happens to be, say, a biologist or a linguist. What process should be used? How should that person distinguish insigtful, groundbreaking novel and important research from pseudoscience or salami-sliced paper that's not bringing anything new to the field? I can evaluate papers and people in my research subfield, but not far outside of it. Peer review for papers exists because we consider that people outside of the field are not qualified to directly tell whether that paper is good or bad.

The other problem, of course, is how do you compare between fields - what data allows you to see that (for example) your history department is doing top-notch research but your economics department is not respected in their field?

I'm not sure that a good measurement can exist, and despite all their deep flaws it seems that we actually can't do much better than the currently used bibliographic metrics and judgement by proxy of journal ratings.

Saying "metric X is bad" doesn't mean "metric X shouldn't get used" unless a better solution is available.

I think a problem here is Goodhart's law: "When a measure becomes a target, it ceases to be a good measure." [1] And it seems like there's an element of the streetlight effect [2], too; sometimes a bad metric really is worse than no metric.

Also, I really question your notion that people outside a field should be able to evaluate the quality of someone's work, especially in academia, where the whole point is to be well ahead of what most people can understand. That theory seems like part of managerialism [3], which I'll grant is the dominant paradigm in the western corporate world.

I understand why a managerialist class would like to set themselves up as the well-paid judges of everybody else. But I'm not seeing why anybody would willingly submit themselves to that. It's a commonplace here on HN that we avoid letting managers make technical decisions, however fancy their MBA, because they're fundamentally not competent to do it. That seems much more important for people doing cutting-edge research.

[1] https://en.wikipedia.org/wiki/Goodhart%27s_law

[2] https://en.wikipedia.org/wiki/Streetlight_effect

[3] https://en.wikipedia.org/wiki/Managerialism

> the whole point is to be well ahead of what most people can understand

That’s not the case at all. Being at the leading edge of research should mean that you are creating new knowledge. That doesn’t imply that people cannot understand it. This expectation that laypeople cannot possibly understand science is one of the reasons so many papers are written so densely and obtusely. “They” can’t understand it anyway, right?

Feynman said if he couldn’t explain it to freshmen he didn’t understand it himself.

I think a lot of cutting-edge work is also done at the edge of understanding, and that's fine. It can be hard enough to explain groundbreaking work to experts with deep context; it's reasonable to me that it takes more time and work to find the explanations that make sense to the average person.

I do agree that researchers should be able to give decent "here's what I do" explanations to the general public. But that's very different than a member of the general public understanding the context well enough that they can judge the value of the work to the field.

Okay, I'll try to clarify what exactly I mean by "people outside a field should be able to evaluate the quality of someone's work" - especially because, as I said regarding peer review, we generally consider that it's impossible to do so directly.

It's about the question of resource allocation. Pretty much every subfield of academia is a net consumer of resources, i.e. someone outside of that subfield is funneling resources to it. That someone - no matter if it's a university, or some foundation, or a gov't agency, or a philantrophist - needs to make a decision on how to allocate resources. And, in general, they honestly want to make a good, informed decision on which projects and researchers to support; but nonetheless they have to make a decision according to some criteria. So there's no choice of "no metric", there will always be a metric and we can only argue that it should be better. And the answer to "why anybody would willingly submit themselves to that" is that duh, you don't get a choice - you can suggest a better method to fulfil their goals of allocating resources in a way that is (also in their opinion) fair and objective; but you can't get around the fact that scientists are generally funded by nonscientists. And they need(or want) to make decisions.

They could delegate that, but that doesn't solve the question about the criteria - if they delegate that to universities, they still have to decide on how to allocate between departments; if they delegate that to scientist councils uniting all the departments in the country working on some subfield, they have to decide on how to allocate between the different organizations. So no matter what, you have to compare not only quality of similar scientists, but also of dissimilar scientists working in different (sub)fields. And delegation doesn't absolve you from responsibility, so if the money is (or looks!) wasted, then that's a failure - so when you delegate, you want to require them to use objective criteria. Which is hard - I could tell you which researchers in my subfield are doing excellent work and which are useless; but if I had to justify these decisions, to demonstrate why they're not just my bias because of politics/liking certain methods/gender/ethnicity/etc then it would actually be tricky; and I think that I'd actually reach out for these metrics. And I'm quite certain that the metrics (for the people that I have in mind) would agree with my subjective opinion; on average, the great research gets cited much more and is in higher-ranking venues; while the lousy stuff gets no citations apart from the author's only grad student.

Also, there's a lack of trust (IMHO not totally unwarranted). You could get a bunch of experts who are qualified to evaluate who gets what amount of resources, you can't rely on them actually doing so - if we take spiders as a totally random example, in general you're qualified to distinguish which spider research is good and which is useless only if you actually work on spider research, most likely in one of these teams - and the expected result is nepotism, allocating resources based on purely (intra-field) political reasons. And who'd decide on how to split resources between spider research and bird research? Do you expect the spider guys and bird guys to reach a consensus? Or would it go to whatever field the dean is in? This is a big problem even currently, and a big part of why the metrics are being gamed - but at least metrics are something that require effort to game and can't be gamed totally; if we'd do away with them, then we'd be left with absolutely arbitrary political allocation, which would be even worse.

So at the end of the day "they" need some way to transform the only reasonable source of truth - actual peer-review - to something that "non-peers" can use to judge what the the aggregate of that peer review says. That need is IMHO not negotiable, I really believe that they do actually need it - they don't want to do resource allocation totally arbitrarily, they want to do it well, they need (because of external pressures) objectivity and accountability, and currently this (journal rankings, bibliometerics, etc) is the best what we have suggested to summarize the results of that peer review.

If I had to write a law draft for a better process of allocating resources, what should be written in it?

Just to be clear, I understood what you were saying before. I just disagree. I think the right approach is to select trusted experts and let them make the decisions about their fields.

Again, since this is a tech community, let me use that for an analogy. It's a classic problem for non-technical founders to evaluate their technical hires. They aren't qualified.

The right solution is not to find some gameable metric of tech-ness, like LoC/day or Github stars. Instead one uses either direct experience-based trust or some sort of indirect trust, like where you have a technical expert you trust and have that person interview your first tech hires.

Yes, having expert humans make the decisions is imperfect. But it's not like a managerialist approach is either. And the advantage of using expert humans, rather than a gameable metric and managerial control, is that we have centuries of experience in how people go wrong and many good approaches for countering it.

This raises a simple issue: how would the non-experts in an area choose the most appropriate set of experts? In this case it would correspond to funding agencies or governments needing to decide on a "fair" way to establish the right experts to ask. It is very difficult to suggest a way to do this that would not correlate strongly with "highly successful under the current system". That group of experts would, of course, have a strong bias towards the current system.

I think we solve this problems not through finding a universal approach, but through heterogeneity.

We fund academic work because we see value in it. But there are many kinds of value, and many different sorts of value. So I think it's appropriate that we have many different universities which have many different departments. Many different funding agencies and many different foundations. Each group has their own heuristics for picking the seed experts.

There are still systemic biases, of course, but that's true of any approach. And distributed power is much more robust to that then centralized power or a single homogeneous system.

It seems like "Selecting trusted experts" alone would defer more to human subjectivity and biases than would be necessary if objective measures were utilized as much as possible.

Existing community/expertise based moderation and reputation systems might not be directly transferable or adequate. But it shows there are new ways to think about more decentralized measures of reputation that are new to this century and haven't been tried. New ways that may be preferable to a small group of kingmakers.

I think the biggest problem is leadership and cooperation of community to try something different. It's not just that there is no person who can mandate these things, it's that multiple constituencies have widely diverging interests, i.e. authors, universities, corporations, journals.

I understand why it seems that nominally objective measures would be better. But I don't think cross-field, non-gameable objective measures of research quality are practically possible

I also don't think it's a problem that different groups have different interests, etc. As I say elsewhere, I think that diversity is the solution.

You could be right, but I don't see how it can be known with any confidence until a few approaches are given extended good faith trials. There are anecdotal examples supporting both scenarios and the problem simply seems too unknowable and important not to test drive whatever the top 2 or 3 approaches end up being.

>I also don't think it's a problem that different groups have different interests, etc.

I don't see how you can disagree that cooperation of community to try something different is not a major hurdle.

How many years has it been since important issues in the academic process were widely known? How much success in adoption has there been to date, regarding any fundamental changes?

It seems on its face to be crucial.

I doubt there's a single solution, so I think trying to get people in many, many fields to coordinate will just slow down improvement. If anything, I think the drive to centralize and homogenize, which is part of managerialism, is a big part of some of the prominent problems in academia.

Require all academic scientists to be self funded...no universities, no gov't agencies, no foundations, no philanthropists...problem solved.

I wonder about the idea where we measure a metric, and cut off the tails of the distribution. This presumes we have a regular means of sampling the metric to lend a time profile of it.

Why not simply use replication as a measure? Have your studies been replicated? How many other studies have you replicated?

Would both help solve the replication crisis, and resolve this problem.

Of course then you might have 10 000 studies replicating the same easy to do study... which is why the "score" should be reduced based on how many other times that study has been replicated.

This solution seems to assume that all studies are equal, and they're not.

An insightful study that's replicable (but has not yet been) is valuable. A lousy study that's been replicated five times (not because it's interesting, but because it was easy to do, and the replicators knew that they'd be rewarded for replicating anything) is not valuable.

A metric that says "number of studies" is IMHO even more arbitrary, more gameable, and more detached from actual value than citation count - which does have some notion that your study actually matters to other poeple; that it was worth writing that paper because someone read it.

This might work for hard sciences, but not for mathematics.

Or, I dunno, paleontology or sociology or other stuff.

Indeed. My research (in statistics) is primarily methodological: I invent and describe methods that might be useful, and on a good day prove some theoretical results demonstrating that they might be useful. There's nothing to replicate there.

Citations can be a useful metric here, particularly if you can identify citations of people actually using the method (as opposed to people just mentioning it in passing, or other methodological researchers comparing their own methods to it).

Wouldn't replication here just be peer reviews ?

If you judge contributions by just getting papers through peer review then that's even worse than using citations.

Well, I'm talking about a math paper. So to me, it's the same as a code review. Someone has to go over the logic and proofs, and double check no mistakes are made.

The number of people who did and gave their approval would be a good indicator I can trust the paper.

What does a citation do that's better then this?

For experiments, or non math papers, you might need something more robust. I think mostly because reviewing the paper isn't really reviewing the full study, but only what the researcher put in the paper. So it is very hard to review methodology and details to be sure they followed proper protocols, etc. You'd need someone to have been reviewing the study as it is happening, and not just the output paper from it.

A citation indicates that other people actually care about the content of the paper.

Consider Researcher A, who has one paper with a hundred citations, and Researcher B, who has ten papers with two citations each. Probably Researcher A has made a larger contribution.

Whether you're in math or any other field, the fact that a paper is correct or reasonable enough to make it past peer review doesn't mean anybody gives a shit about it.

The issue is how many researchers want to spend their time replicating the research of other people rather than doing their own original work. Getting funding is already incredibly hard, plus no-one is going to give you tenure or promote you for replicating the work of others.

In academic situations yeah, but in industry? You can make money replicating other people’s work.

Extreme self replication!

Yes, bizarrely, reputation / trust is still the primary foundation of academia from a pragmatic perspective, even though it is the antithesis of science. At least some disciplines can have replication studies cross-culturally. It’s a hard problem to solve; knowledge is inherently a social/shared experience.

Trust is a curious one, it's not the antithesis of science, indeed it's required for science to actually work rather than simply be an idea.

You trust in people, in consistency of physical laws, in coherency of your own mind, in constancy of temporal flow, in so many things because - as the Pyrrhonist refrains - nothing is certain, not even this.

Blaming the authors for demonstrating the deep flaws in this metric is certainly wrong however. This article is almost accusing some rigorous scientists, who choose to publish often and diligently reference their prior work to encourage fact checking and peer review, as frauds.

I disagree with the assertion that bad metrics should be used if there are no alternatives. Bad metrics give wrong answers, and only the illusion of meaningful information. The most common use of bad metrics is to lie to people, and it isn't the scientists using the metrics but the organizations that employ them.

>I'm not aware of any better easily measurable metric

Why should an easily measurable metric which has meaningful value exist? It doesn't seem obvious to me that it should at all. Determining the capability of a researcher is inherently a very complex intellectual task. The desire is to reduce that task to something which removes the need for the person doing the evaluation to read and understand the produced research, or to even understand the field of study in many cases. Perhaps, instead, those who are put in charge of things like awarding grant funding, granting tenure at universities, and deciding who to hire to teach ought to be expected and required to evaluate the research on its merits. This would greatly increase the intellectual sophistication and capability needed for people in those positions, but the alternative will always be fairly easily exploitable because it is easier to goose a metric than to do solid research.

We see the shortcomings of trying to reduce complex intellectual challenges to checklists or metrics all the time. And we simply ignore the alternative of relying upon intellectually capable people meeting the challenge. Personally, I don't understand why.

I think that we could add some random (i.e. pure luck) factor for evaluation. Although this may sound unfair, almost all optimization problems do this, be it natural or man-made. Evolution does this by adding random gene mutation, and machine learning does this by randomizing certain parameters to avoid being stuck at local minima. In theory, the right mixture of rigid metrics and randomization can make a better result.

>The alternative of having a bureaucrat "simply judge quality"

The bureaucrat in this case would be another university professor working in the same field.

How about page rank instead of only using number of citations

Before everyone goes bananas citing Goodhart's law: many universities and academic medical centers in the US don't care at all about impact factor - they care about grant $$, period full stop. (They appreciate the occasional high-impact paper that they can use in marketing materials, but it's really all about the $$.)

And for what it's worth, I've almost never heard impact factors discussed at NIH study sections, where investigator quality is explicitly on the agenda. Reviewers talk about relevant prior publications in the field, esp in marquee journals. [this latter feature is the reason we don't just put everything on biorxiv or equivalent and move on.]

It seems as if universities live in the dystopia that the software industry avoided when it stopped counting lines of code.

Did you forget to include the source? (I'm using an app so maybe it's not displaying it?)


not everyone is in the same mindset, no need to be a jerk about it.

Lol that's definitely what happened. Looked back after this and had a laugh at myself. It's been a long day. I appreciate the defense but I don't find the "whoosh" comment offensive. I'll own up to my lapse.

Is "Woosh" really being that much of a jerk? It's a reference to https://www.xkcd.com/1627/


The flip side of that is in the soft sciences. There are so many PHDs walking around in gender studies, sociology that have published papers that were never cited once!

We do need a metric imo, but I agree we don't have a perfect one yet.

You forgot to edit post to link to your own comment.

Did he forget or is the Hacker News smart enough to remove recursive links automatically?

[Link Test] [/Link Test]

Haha. Well done.

So when querying a count of citations, did anyone consider adding a GROUP BY contributors which will at least give you distinct groups of contributors (assuming they always get listed alphabetically)

Even better split it into individual contributors to give a count of researchers who have cited the paper?

PageRank might be better way to evaluate quality. It too can be gamed. Maybe not as easily, though.

This is such a pointed reference that I can't tell if you're just being ironic.

In a very loose sense, PR is the same algorithm universities use, evaluate quality of some content based on the number of references to that content.

It is definitely gamed in similar ways. I'm surprised we haven't seen professors hire SEO firms to help increase citation counts of their research.

Err, isn't PageRank almost the same as "citations"?

No, because it counts citations from influential papers with a higher weight.

That's the same with academic scores. Citations in the "Self-published journal of amateur chiropractors" don't buy you much academic credit...

In fact PageRank was inspired by academic rankings in that aspect:

"PageRank was influenced by citation analysis, early developed by Eugene Garfield in the 1950s at the University of Pennsylvania, and by Hyper Search, developed by Massimo Marchiori at the University of Padua. In the same year PageRank was introduced (1998), Jon Kleinberg published his work on HITS. Google's founders cite Garfield, Marchiori, and Kleinberg in their original papers."

Isn't that how is done in academic circles? Maybe not quantatively but qualitatively surely tenure boards or hiring boards or student applicants notice such things.

I think this is a neat idea. Basically, you'd get more "credit" if you're cited by a good paper than if you're cited by a bad paper.

Aka eigenfactor

My understanding is that eigenfactor rates journals, not individual papers, so if somehow you get low-quality (whatever you want that to mean) papers into nature it has no independent way to realize that your specific paper is low quality. Also eigenfactor is biased towards favoring larger journals, which is not obviously a good thing. It would honestly be really cool if someone did page rank for individual papers. It seems like a much saner metric than anything that is currently used.

Oh good grief you’re right. This is doubly sad because using an ensemble metric for per-author eigenfactor seems like it would be tractable.

Carl Bergstrom is a smart guy so I suppose the practical implementation of the above must have some wrinkles, but with enough brute force it seems tractable. What I despise more than anything is the gaming that takes place for “impact factor”.

I do OK by standard metrics but would very much like to know where I stand by less easily gamed metrics of influence.

> PageRank might be better way to evaluate quality

And suddenly Google is the authoritative source on literally everything in the world. I hope you like their political views, because they would become "the one".

Pagerank is referring to the graph algorithm known as pagerank, not anything provided specifically by Google.

"PageRank" is a Google trademark, and also the subject of a patent belonging to Stanford University and licensed exclusively to Google.

That patent expired.

I am pretty sure self citation is required to be listed separately. I mean, even the eb1 visa application required that.

What about the metric: count citations from all papers on which you were not a co-author.

Not just individual scientists but entire labs.


I'm not Italian... and am not meeting any productivity threshold.

But my work is incremental, and I obviously don't want to repeat what I said in a different paper, so I cite earlier work in later work. TBH, I don't think it's possible to avoid self-citation unless:

1. Your research is so popular that by the time you need to cite it, it's been surveyed, or improved upon, or otherwise adapted. 2. You switch research subjects relatively often. 3. You publish "blocks" of work, each based on fundamentals in your field established by others - and they're not incremental.

It doesn't say that self-citation is wrong per se, but that some people use it to game their citation count to the extreme.

If you narrow yourself to a specific niche well enough, you'll see the same names in citations. To be fair, the areas I dig into don't feel nearly as competitive as say, physics, which I couldn't make heads or tails of.

The whole reason the internet and wikis took off is we were very liberal in how we linked. If we disallowed inbound citations, wouldn't it be a lot harder to backtrack and grasp contextual underpinnings?

Anecdote: In the field of adult attachment theory <-> love there are a few prominent scholars that cite each other: Shaver, Hazan, Mikulincer. They do papers citing their own work and each other [1]. There's also a book by Mikulincer highlights Shaver's upbringing with his parents, his past as a hippy, etc. They're delivering very nice content, and they cite others outside their ("circle"?)

Are there potentially scholars in the field with valuable contributions that go unnoticed? Possibly. It doesn't make self-citations in their papers any less helpful. Also I worry that regulating citations through some system may affect the quality of content and fix something that's not broken.

Which brings me to another issue, aren't we supposed to be helping each other?

[1] Example: http://adultattachmentlab.human.cornell.edu/HazanShaver1990....

When you say ‘“circle”?’ I think “clique” is appropriate as ‘a network where every node is connected to every other node’.

Perhaps it would be useful for reviewers to point out which citations do not contribute to the paper? It really is a tough problem. If someone is toiling along in some niche they have carved out, they and their colleagues may be the only one working in that space. That leads to a lot of cross citation and self citation.

That said, if you publish paper A, and then cite it in paper B which builds on that work, then in paper C you really only need to cite paper B if you're building on the work, not B and A. It might make for in interesting data set to plot out those sorts of relationships.

As a reader I personally prefer if they do a more complete set of citations, instead of making me follow up a multi-step chain to dig them up, as if I'm a compiler resolving transitive dependencies. I like little history-map sentences like: "This technique was introduced by Foo (1988) and recast in the modern computational formalism by Bar (2009); the present work uses an optimized variant (Bar 2012)."

You could just cite the last paper here, which is the only one used directly, and which presumably itself cites the earlier papers. But it's more useful to me if you include the version of the sentence that cites all three and briefly explains their relationship.

That kind of sentence is gold.

Often half or (much) more of the value of a paper is in the references, and that's not a bad thing. Sometimes it is the first thing I read.

There's no ink shortage, no link limit on the Internet, and every paper has an abstract for quick filtering. As a curious person I want everything that serves to establish the argument cited so I can be guided to papers of interest and get a better idea of where an idea fits in the broader field.

> if you publish paper A, and then cite it in paper B which builds on that work, then in paper C you really only need to cite paper B if you're building on the work, not B and A.

Logically I agree with you, but a lot of academics seem to believe differently when it comes to citing other people's work, and if we are to go by that logic (which a lot of people are inevitably forced to do), I don't see why one should treat their own work any differently.

That's a fair point. Having reviewing policies that would call that out however could hopefully push back on the behavior.

You need a source of trust in these systems. Journals used to have that role. They had high standards that were upheld by editors selecting only worthy publications. Today it seems that many journals aren't as trustworthy as they seemed to be in the past. It's also easier to spam the journals with your publication and to bullshit your way into publication. The incentives to publish a lot are also way higher now that your grant money is highly dependent on your citation count. Journals can publish more and easier and lower the standards for submission to earn more money. The system is basically eating itself and we haven't found a cure yet.

Filtering for self-citations is useful to identify the bubbles. But it is not sufficient to determine if those bubbles only contain hot air or if these scientists are actually working on something with substance in a narrow field where few others publish.

"It Is Difficult to Get a Man to Understand Something When His Salary Depends Upon His Not Understanding It" -Upton Sinclair

Citations should primarily serve to mention relevant work, which often includes authors' earlier works.

The problem really is the abuse of citation metrics and journal brand names (and especially journal-based metrics) as a means of evaluating researchers. What we really need is a different method of evaluating researchers that does not rely on where they publish or what they cite.

(But I would say that, given that I work on one such a system.)

The opposite of extreme self-citing is self-plagiarism (either out of ignorance, to avoid extreme self-citing on ground-breaking research, or with malicious intent: passing the same paper to multiple journals as a new result).

> The rate of duplication in the rest of the biomedical literature has been estimated to be between 10% to 20% (Jefferson, 1998), though one review of the literature suggests the more conservative figure of approximately 10% (Steneck, 2000). https://ori.hhs.gov/plagiarism-13

If work by another author was enough to inspire you and add a reference, then your own previous work should certainly qualify, if it added inspiration to the current paper. Self-citing provides a "paper trail" for the reader when they want to investigate a claim or proof further.

(Like PageRank, it is very possible to discount internal PR/links under external links, and when you also take into account the authority of the referencer, you avoid scientists accumulating references from non-peer reviewed Arxiv publications).

I found this situation regularly when going down the rabbithole of the anti-vaxx or anti-5g people. One "scientist" makes a highly dubious claim, thousands of nutjobs cite this one scientist, "scientist" then goes on to cite articles that cites their work. I'm basically waiting to find Alex Jones cited in a serious article at this point.

Thats pretty good :) It sounds better than incition or whatever word I might make up on the fly.

If you work in a very narrow field of science you're basically on your own and have to cite yourself because there's nobody else to cite.

I think you have a good point but chose to place in response to the wrong comment.

In which case you have to ask yourself, are you so brilliant that you’ve found an important topic that no one has considered yet, or have all the brilliant people already figured out that topic isn’t worthy of study?

It’s the same with the startup world. If you’re the only one doing a thing, are you brilliant or foolish?

Most scientists work on topics that are quite niche. Most of those topics lead to nothing. A lot of good research took years to ripen enough to be of actual value. A lot of popular topics started out in a niche. Most of mathematics took dozens of years to fully come to fruitition. Can you decide beforehand which one will be the next big thing?

Today, most scientists go for the popular topics and whatever is on the government research plan to get funding.* Whenever the wind changes direction they change their topics because they need that funding.

*: This might seem to contradict with the statement that most scientists work on nice topics. But only on the surface. In order to get published you have to do something novel. So, you choose a popular topic and then research a rather unpopular side aspect on it like how a specific chemical behaves when applied to the popular topic. If you're successful you publish and continue. Citations come later or they don't but the next round of funding comes with publishing. After a few years without many citations you move on to the next thing.

On government plans often you need to publish and then it's done. The citations only matter long-term if at all. Most scientists don't achieve anything of greater value. They are happy if they can publish at all. If the institute has a few scientists with a high citation count it carries all the rest of them.

So to answer the question, you are so brilliant no one else has stumbled on this question

No, the answer is that there is a limited number of scientists and a limitless number of research directions. This doesn't have to be correlated with brilliance. In fact, it can be easier to research some of the less popular paths because there is less competition and more low-hanging fruits.

Fine. The more accurate word would be "advanced" instead of brilliant.

You're arguing semantics like a Humanty's major. Your exact definition of "advanced" or "brilliant" have barely any relevance to the topic at hand.

There are so many gaps in our knowledge, it's ridiculous. Go look for papers studying how to kill the eggs of canine roundworms (e.g. in veterinary settings) or whether surgery is an effective treatment for exotropia. The literature is SPARSE.

Two economists are walking down the street. One sees a $20 bill and starts to bend over to pick it up. The other economist says "Don't bother - if it were really worth $20, someone else would have picked it up."

There's countless numbers of scientists that made great strides working on topics others deemed rediculous. Heck, many of the Nobel prize winners were ridiculed by their colleagues as borderline wack-jobs at the time they were working on their research. Even after winning the prize, some still were with their later work (Crick's search for consciousness comes to mind, and why it would be so worthless a search does not).

If anything, the hubris of the scientific community would be as deafening as the pseudo-science BS and hold back progress just as much if not more except for one key thing: the scientific method.

Luckily, we have a process by which crackpots get differentiated from geniuses. So let's not leave $20 on the ground assuming others would have picked it up, especially when that $20 represents collective progress for the entire species.

> Heck, many of the Nobel prize winners were ridiculed by their colleagues as borderline wack-jobs at the time they were working on their research. Even after winning the prize, some still were with their later work (Crick's search for consciousness comes to mind, and why it would be so worthless a search does not).

Do you have any good examples of being considered wack-jobs before their winning?

Semmelweis. He never received the Nobel prize, but I think he counts towards the point.

> Dr. Ignaz Semmelweis discovered in 1847 that hand-washing with a solution of chlorinated lime reduced the incidence of fatal childbed fever tenfold in maternity institutions. However, the reaction of his contemporaries was not positive; his subsequent mental disintegration led to him being confined to an insane asylum, where he died in 1865.


Bill Beaty's list of Ridiculed science mavericks vindicated has dozens, most with the story in a nutshell + links.


Not a Nobel prize winner (somewhat shockingly, given the contribution), and he wasn't (to my understanding) so much ridiculed, but when The Great Debate occurred, which was a debate about whether or not galaxies besides our own existed,

> if Andromeda were not part of the Milky Way, then its distance must have been on the order of 108 light years—a span most contemporary astronomers would not accept.

the size and distance of these objects (galaxies) seemed far too absurdly large to one side of the debate to be accurate; it would mean the size of the universe would be absolutely enormous. Of course,

> it is now known that the Milky Way is only one of as many as an estimated 200 billion (2×1011)[1] to 2 trillion (2×1012) or more galaxies[2][3] proving Curtis the more accurate party in the debate.

https://en.wikipedia.org/wiki/Great_Debate_(astronomy) — I think it's an interesting read, and a good example of how sometimes the right answer can seem absolutely wrong.

It isn't too hard to find examples of scientists who were ridiculed for their ideas and eventually win the Noble prize:

>...Stanley B. Prusiner, a maverick American scientist who endured derision from his peers for two decades as he tried to prove that bizarre infectious proteins could cause brain diseases like “mad cow disease” in people and animals, has been awarded the ultimate in scientific vindication: the Nobel Prize in medicine or physiology.

>...Prusiner said the only time he was hurt by the decades of skepticism “was when it became personal.” After publication of an especially ridiculing article in Discover magazine 10 years ago, for example - which Prusiner Monday called the “crown jewel” of all the derogatory articles ever written about him - he stopped talking to the press. The self-imposed media exile became increasingly frustrating to science journalists over the past decade as his theories gained scientific credibility.


>....The recent 2011 Nobel Prize in Chemistry, Daniel Schechtman, experienced a situation even more vexing. When in 1982, thirty years ago, he made his discovery of quasicrystals, the research institution that hosted him fired him because he « threw discredit on the University with his false science ».

>...He was the subject of fierce resistance from one of the greatest scientists of the 20th century, Linius Pauling, Nobel Laureate in Chemistry and Peace Nobel Laureate. In 1985, he wrote: Daniel Schechtman tells non-sence. There are no quasi-crystals, there are only quasi-scientists!


An example that is pretty well known is Barry Marshall

>...In 1984, 33-year-old Barry Marshall, frustrated by responses to his work, ingested Helicobacter pylori, and soon developed stomach pain, nausea, and vomiting -- all signs of the gastritis he had intended to induce.

>...Marshall wrote in his Nobel Prize autobiography, "I was met with constant criticism that my conclusions were premature and not well supported. When the work was presented, my results were disputed and disbelieved, not on the basis of science but because they simply could not be true."


It was Max Plank who said "A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die, and a new generation grows up that is familiar with it." - so this isn't a new issue and things are probably better now than they were in the past.

Kary Mullis?

You're seriously underestimating the number of things there are to study and overestimating how many people there are to do the study.

This is nothing like tech startups, where you have tons of people sharing a relatively small problem space (creating tech tools companies want).

Consider for a moment: there are well over 350,000 different species of beetles. Theres just too much to study and too few people doing the work to expect there to always be a plethora of external research to draw upon.

This is actually a very interesting question that might take one down the rabbit hole.

Acquiring knowledge (I should say 'beliefs about valid knowledge') and brainstorming (and certainly collaboration and getting an advisor to adopt you) appear to be social activities, as much as purely logical and analytic activities.

Social activities like this, for social or herd creatures, are subject to flock or swarming patterns.

Maybe all the brilliant people are swarming around a locus of interest? It's certainly a good way to have the population explore the ins and outs of a, well, locus of interest. It's also a good way to have a loner get shunned by wandering off and poking at an uninteresting pile of dung.

I guess my point is: why not both? (Mathematically, statistically, egotistically, I know the idea that I am the foolish one is almost certainly more likely to be the case)

Or you just happen to be there for the good bit?

It seems like PhD candidates work on peripheral elements of their sponsor/tutor/professor's work, of that professor at some point is going to make a significant step then one of those PhDs will be along for the ride; not necessarily the genius one.


At the time, chytrids were about as obscure as a topic in science can be. Though fungi compose an entire organismal kingdom, on a level with plants or animals, mycology was and largely still is an esoteric field. Plant biologists are practically primetime television stars compared to mycologists. Only a handful of people had even heard of chytrids, and fewer still studied them. There was no inkling back then of the great significance they would later hold.

Longcore happened to know about chytrids because her mentor at the University of Michigan, the great mycologist Fred Sparrow, had studied them. Much yet remained to be learned—just in the course of her doctoral studies, Longcore identified three new species and a new genus—and to someone with a voracious interest in nature, chytrids were appealing. Their evolutionary origins date back 600 million years; though predominantly aquatic, they can be found in just about every moisture-rich environment; their spores propel themselves through water with flagella closely resembling the tails of sperm. Never mind that studying chytrids was, to use Joyce’s own word, “useless,” at least by the usual standards of utility. Chytrids were interesting.

The university gave Joyce an office and a microscope. She went to work: collecting chytrids from ponds and bogs and soils, teaching herself to grow them in cultures, describing them in painstaking detail, mapping their evolutionary trees. She published regularly in mycological journals, adding crumbs to the vast storehouse of human knowledge.

And so it might have continued but for a strange happening at the National Zoo in Washington, D.C., where poison blue dart frogs started dying for no evident reason. The zoo’s pathologists, Don Nichol and Allan Pessier, were baffled. They also happened to notice something odd growing on the dead frogs. A fungus, they suspected, probably aquatic in origin, though not one they recognized. An internet search turned up Longcore as someone who might have some ideas. They sent her a sample which she promptly cultured and characterized as a new genus and species of chytrid: Batrachochytrium dendrobatidis, she named it, or Bd for short.

That particular chytrid would prove to cause a disease more devastating than, as best as scientists can tell, any other in the story of life on Earth. After Longcore’s initial characterization, she and Nichol and Pessier proceeded to show that frogs exposed to Bd died. Other scientists soon linked Bd and its disease, dubbed chytridiomycosis, to massive, inexplicable die-offs of amphibians in Costa Rica, Australia, and the western United States. No disease had ever been known to cause a species to go extinct; as of this writing, chytridiomycosis has driven dozens to extinction, threatens hundreds more, and has been found in more than 500 species.

Almost overnight Longcore went from obscurity to the scientific center of an amphibian apocalypse. “Had I not been studying the ‘useless’ chytrids,” she says, “we wouldn’t have known how to deal with them.” Her research has been crucial—not only the initial characterization, but also her understanding of the systematics and classification of chytrids, which helped provide a conceptual scaffold for questions about Bd: Where did it come from? What made it so strange and so terrible? Why does it affect some species differently than others?

>In which case you have to ask yourself, are you so brilliant that you’ve found an important topic that no one has considered yet, or have all the brilliant people already figured out that topic isn’t worthy of study?

Imagine how many inventions we would have missed if all inventors had shared your mindset.

What? These are perfectly valid questions to be asking and do not inherently stop you from researching what you're working on.

I know some scientists that define their research direction by asking these questions first before pursuing an idea. Many great inventions like optogenetics or expansion microscopy came from this investigative strategy. It can help keep your resources and energy in check.

Both strategies work to some extent.

Some topics need a large investment to show anything at all.

Some topics show immediate results (good or bad).

Should we, for example, stop all activities in fusion research because so far nobody has shown that it will work and we already invested billions?

I think you’ve misinterpreted what I said. I’m not suggesting everyone is a fool, quite the opposite.

It’s just an important question to ask yourself.

The purpose of PHDs are to move human knowledge forward. You have to do an analysis of something that, in all likelihood, nobody has done before (or not enough to be considered settled).

But then your analysis has to be challenged as well. And the challenges should be published. Success or fail.

If you live in your own bubble the needle doesn't move forward.

He is not talking of a mindset. He is talking about making a reflection.

If you are the only one doing the work, why is it worth doing?

John Money and his "work" is a good example of this

Went into the data and took the top 1000 individuals with self cite percentages over 40%, then sorted by institution. Nearly every major institution had individuals in this group: Johns Hopkins (4), Cal Tech (4), Georgia Tech (2), MIT (5), each of the Max Planck Institute campuses (3-7), Moscow State (7), Penn State (6), Stanford (1), Utrecht (2), University of Zurich (4), ETH Zurich (1), DLR (3), Imperial College London (3), University of Tokyo (2), Princeton (5), Kyoto University (4)...

I feel like if this problem were very concerning we'd see the distribution concentrated at certain institutions but I'm not sure there's one with over 10 researchers at them. We hear a lot about questionable Chinese journals, but the highest institution in this list is the Chinese Academy of Sciences with 3 individuals.

I think the more likely case is there are a few bad apples, some bad practices we can't ever fully get rid of, and that some research lends itself more to self-citation.

Given that pagerank had its origins in citations it should not be surprising to find link farms and other spam in scientific publications.

This is not pagerank. But they really should use pagerank in this nature article because (very likely) it will blow away this self-citation problem. People should be able to self-cite as much as they want. Not using pagerank is the problem.

The SEO hackers of their day.

For those who wonder why there are climate-change deniers who won't listen to "the science" this article is for you. The sad and honest fact is, science has become a spin-factory in a way mainstream media has.

Yes, in theory, the scientic method / process is a wonderful standard. Unfortunately, once it's exposed to egos and profits it becomes somethings else far less worth of praise and honor.

I'm not doing a take down of science, science has already done that to itself. The sooner the rest of us come to terms with that, the better.

Says ChiefAlchemist? Twas Newton's chief failing, alchemy.

What surprises me is how naive the scientific community publicly pretends to be on these matters.

We 'marketed' science as due to our socioeconomic dogmas vaguely based on completely misunderstood caricatured Darwin ion theory there can be no alternative. We turn science into a quantative metrics game, and by golly, act all surprised that scientists do game the system?

How useful do you think Google Search would be if they just stopped after Pagerank v0.1 and called it a day, then let all the websites 'vote with their links'?

People who are working out their ideas far outside the mainstream have no one to cite but them selves. Some are quacks, to be sure, but sometimes a field is just not ready for their work because the utility of the idea is not easily apparent, or because it's perceived to be too risky. A field needs a healthy mix of the the curmudgeonly stubborn thinkers going their way no matter the cost, and those making steady progress on solvable problems.

Sure, but there is no 'pruning of the tree' in case a dead end is reached, so the citations stay allowing the quack to pretend they have more credibility than they do. In fact, the whole idea of these citations is to build credibility where there is none.

Right. The purpose of citation metrics is to measure the intellectual influence of a person's ideas. If an academic's works are never being cited by anyone else, the correct conclusion would be that their ideas have little influence -- regardless of how many self-citing papers they've published.

influence is not the same thing as credibility.

The need to step outside of mainstream views has been important for intellectual progress over the years, indeed. The problem we have now is we're stuck with a series of bad faith, polarizing and disingenuous views that "suck the oxygen out of" actual wide-ranging thinking - IE, climate denialists can still get a lot of money, anti-vaxers can pull in a lot of money from scams, etc.

Do you have any good examples, historical or otherwise?

Possibly Barry Marshall and Robin Warren who discovered H Pylori causes peptic ulcers and subsequently won a Nobel Prize. They were ridiculed by doctors and scientists for their theory.

I can see them citing themselves once or twice but not more than say 10-15% or so of their citations.

I was going to suggest Peter Turchin as an outsider to history / sociology but he seems to have been able to get published in highly reputable journals http://peterturchin.com/academic-publications/#Articles_in_N...

Between the two authors here (McKinsey and Tarski), they cite each other 7 times, with 14 citations, giving 50% outgoing self-citation, however this paper also received incoming citations contemporaneously...


Dijkstra rarely cited others and wrote entire books without citations.

possibly Sutton and Barto's work on reinforcement learning throughout the 80s.

On a semi-related note, I occasionally look at the news items Google news suggests for me, and these include a significant portion of climate change denialists propaganda, including one shocked, shocked by Nature "suppressing academic freedom with this list" (and my searches are never for climate denialism).

Which is to say, these may be a few sciences but it seems like they significant resources behind them, somehow.

Strange, I use google news on a semi regular basis, and have the opposite experience; most are fairly bland coverage of climate science research announcements with the occasional hyperbolic doomsday stuff.

My undergrad college physics professor (Fay Ajzenberg-Selove) introduced this metric back in the 50’s. She faced major sexism and bullshit claims against her productivity, and had to use this metric to prove her detractors wrong, that she was as good as or better than most of her male colleagues in terms of performing useful and interesting research, to earn herself a faculty position.


Did I describe this wrong? Not sure why it got moderates down.

Basically, my professor was one of the first female physicists trying to make it in a clearly male-dominated field nearly 60+ years ago.

She was not offered jobs, or jobs at reduced salary vs male colleagues, being told her research was not as productive.

She literally invented the concept of considering number of citations as a proof of impact, relevancy, and productivity, and showed that her work was better than most of her male colleagues.

Only after she jumped through this hoop herself was she able to get a faculty position. Damn impressive of her.

Counting citations is a rubbish metric in general. It's supposed to be a proxy for reserach quality but it's so easy to "game" (in the sense of optimising for citation count, rather than research quality) that a high number of citations doesn't mean anything.

Neither does a low number of citations. For example, my field is small and kind of esoteric, so we don't get lots of citations either from the outside or the inside (one of the most influential papers in the field has... 286 citations on Semantic Scholar; since 1995).

With a field as small as a couple hundred researchers it's also very easy to give the appearance of a citation mill. Given that papers will focus on a very specific subject in the purview of the field, it is inevitable that each researcher who studies that specific subject will cite the same handful of researchers' papers over and over again- and be herself cited by them, since she's now publishing on the subject that interests them.

As to self-citations, like Ioannidis himself says there are legitimate reasons, for instance, a PhD student publishing with her thesis advisor as a co-author. The student will most probably be working on subjects that the advisor has already published on and in fact will most likely be extending the advisor's prior work. So the advisor's prior work will be cited in the student's papers.

So I'm really not sure what we're learning in the general case by counting citations, other than that a certain paper has a certain number of citations.

To be fair to some researchers in certain specializations, there may only be a handful of scientists publishing on the topic. Self-Citation proceeds naturally from such circumstances.

Having conducted a reasonable amount of academic and scientific research, this metric is more likely to be mischaracterizing research than revealing any issues. This doesn't even establish a causal-link between self-citation and poor research quality, it just assumes it.

Most researchers continue to do new research on the same concept after a publication, and they will of course site their earlier work when continuing. Additionally, post-graduate researchers often have their names placed on the research of grad students they are in charge of, even though they often have minimal involvement in the research or conclusions drawn.

You might be able tell something from the ratio of other authors from all citations to the number of self-citations, but only if you could eliminate self citations that were not either inclusion by proxy or cases where they are merely continuing research on the same topic with new methodologies.

There are already methods for identifying bad research, none of which can be achieved through the use of non-human-assisted data analysis of the authors list of research. The only way to be sure is critical review and 3rd party verification of results with repeated experiments.

I worked as a RA in my university days in a large, globally reputable university in Australia. I can safely say there was a culture of: 1. Dismissing research from other staff who had "less than X" citations as non-sense. And 2: Self-referencing from new academics trying to break into the "exclusive, trusted" club. It was genuine madness that corrupted a lot of good people.

Interestingly enough, the published work [1] has two self-citations. So maybe there's something to that phrase, “the next work cannot be carried on without referring to previous work.”

[1] https://journals.plos.org/plosbiology/article?id=10.1371/jou...

Regrettable that the article begins by outing a prof from an unheard of university in India, who probably publishes in low repute journals and conference. Ideally the citing malfeasance score should be weighted based on journal and conference reputation.

And how do you measure journal and conference reputation? Conventional measures there also lean on citations, and journals have been known to do quite a bit of manipulation to get those scores up.

Start with Google Scholar's list of top publications: https://scholar.google.com/citations?view_op=top_venues, and detect self-citing and citing rings within the universe of these publications. Almost certainly the guy with the record for self-citing doesn't have a paper in any of the top publications.

A particular interest of mine is database systems (and by extension, distributed systems) and I’ve noticed this pattern a fair amount when reading database-related CS papers. It tends to feel like a small circle of researchers citing each other. Don’t get me wrong, that doesn’t say much about the actual quality of said papers, but it’s a pattern that I’ve definitely noticed. As a result of this, I tend to also make sure that I pay attention to seemingly interesting papers with low citations; sometimes, it immediately makes sense why they have low citations, while, other times, I find myself fascinated by the research despite the low citations.

Some fields are very small, and some authors only focus on one small part of a field.

So, high citation count isn't always a direct indicator of quality.

Now the question we all want to have answered: what's the database engine?

A few months ago a couple of physics postdocs set up a website (vanityindex dot com) and proposed a few "vanity metrics". Can be fun to check out your own vanity index and that of your colleagues.

This is a news? I thought the Publish or Perish idiom made it obvious.

2 weeks of effort goes into doing something and you spend 2 months of writing a paper of the slightest of the result. Many of these papers introduce an infinitesimal increment to knowledge, at best and you can tell how long would it have taken to get it working. And this is Numerical Mathematics. There are clans of Mathematicians who just go around citing each other. And a quality paper comes around every 5 years or so.

Not defending people who self-cite but here is an alternative explanation. There are many areas of science that have very few researchers working on the same or related problems. In addition papers tend to build on prior works of the same or related researchers. Over time we may see clusters of what look like "self-cited" papers. This is not abnormal.

Yup, I self-cite quite a bit. But not to demonstrate productivity or some other arbitrary metric(s). I self-cite because the images I use in my research are really difficult to make and I'm in a niche enough field that using the images really helps people who are not familiar with the area understand what is going on. These images were arduous to make, so I reuse them. (I should note, that I've gotten in "trouble" for reusing them without a citation).

Self-citation or clique-citation is only problematic if it doesn't fulfil its main purpose of providing relevant references. This should not be a surprising statement. But we just read an article where that property of citations is a sideshow. We got to the point where citation scores are bandied about without even asking the main question: "Well are the papers any good?" And now I worry that papers could get worse simply because of a citation-inflation.

I'm currently reading "The Systems Model of Creativity; The Collected Works of Mihaly Csikszentmihalyi" and was really surprised at the rate of self-citation in the included papers. Now it does seem to me that the cited studies are relevant. And I can't judge whether there would have been papers from other authors even better suited for citing.

And why am I even reading that book? Well because of the persistence with which Csikszentmihalyi gets cited in other writings I read. How do I know these writers weren't shills for Csikszentmihalyi? I don't care all that much when the material is good.

So in the end should I care about backwater publications that cite themselves excessively? Because I don't have to read them. As a consumer I don't seem to get hurt by the practice.

PageRank was precisely invented to solve this issue. I have never understood why Google Scholar itself took stance not to even compute it and stick to h-index. Google Scholar is a defacto standard for looking up researchers and whatever metric they adopt would be adopted by the rest of the world.

It's frustrating to see science emasculating itself and rely on commercial, even monopolistic services when there is no lack of research in document search and comprehension. Ten or fifteen years ago there was citeseer for tracking citations. It was a messy Perl program/site to scan through mostly TeX files for metadata and references, yet it worked relatively well, and was held reasonably up-to-date. Then they rewrote it into CiteseerX, such that it became useless. It never recovered.

Doesn't PageRank have the exact same problem of manipulation, where people link to other sites in order to prop those sites up, just like people cite articles to prop them (or their authors) up?

The fundemental problem with science today is that it needs funding. Funding means vested interests.

If a scientist is trying to game the system, they are doing it to secure future funding. Either they are gaming the system to make the funder look good, or to appeal to a future funder.

As an indication that this is a hard problem to solve, remember that these papers have usually gone through peer review anonymised. They cite "Smith & Jones 1982", but it might not be obvious from the writing that they are either Smith or Jones.

I completely agree with this. Sometimes you have to look closer than people often do to spot self-citation.

Like many other good things, you've got to figure out what to do when the @ssholes show up.

Number of citations in academia == lines of code as a metrics for a software developer?

More like Github stars

I think lines of code is a better metaphor, because the number of lines of code is a necessary artefact of writing code, but easy to game without actually improving quality once they become a measure of evaluation.

Maybe there exist an academic equivalent to blackhat PBNs, groups of academic self-referencing in loops to boost their citations. Number of citiations seems like an awful measure for academic contribution.

Hmmm. There is also the problem that you cannot distinguish between a citation and a mere reference. I used to cite earlier work a lot for context, not for mining citations.

A lot of researchers build upon their previous work, and the work of their peers. I don't see how researchers can avoid citing their old work in this case.

I know it's a consequence of conference and journal guidelines but nothing bothers me more than researchers who self-cite in the third person.

If you try to reduce a complex intellectual task to a metric or a checklist, you are begging for exploitation.

Like a Las Vegas show rated #1 by the casino owned magazine. They were voted #1, however.

"...so again, like I said earlier,.."

Did they assess this list for GDPR and harassment law in the UK? IANAL, but if I was naming and shaming in this way via a derived data set, I'd be scared of getting into quite a lot of trouble with the law.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact