Hacker News new | past | comments | ask | show | jobs | submit login
Goodhart’s Law: Are Academic Metrics Being Gamed? (thegradient.pub)
71 points by ubac 20 days ago | hide | past | web | favorite | 27 comments

Glad to see this important piece here (disclosure: I am one of the editors of The Gradient).

https://ieeexplore.ieee.org/document/5089308, from RCIS 2009 (Beel and Gipp) noted that "Google Scholar seems to be more suitable for searching standard literature than for gems or articles by authors advancing a view different from the mainstream."

Unrelated, but interesting: scraping Google Scholar is remarkably annoying if you want to actually use the data. The easiest way (in my experience) seems to be regex hacking on the BibTeX files, but this seems truly broken.

Blocking scraping is the norm for Google, for instance the Public Youtube API allows you to view a grand total of 3 or so videos per key per day before it starts blocking you.

Google has basically got as bad as twitter in terms of giving a big middle finger to third party devs, but they have been smart enough to maintain a completely useless public/free tier for most things.

That makes sense. I'd hope that Scholar would be different, though.

A piece on how a researcher spent a summer filling out CAPTCHAs / scraping: https://www.nature.com/articles/d41586-018-04190-5

Scholar should be different, considering that they are the only ones in the world who are given access to everything

Scholar locks you out of the bibtexes after you download ~20 or so in my experience, but you can get around this if you instead save the paper to your favorites and then access the bibtex from the paper link in your favorites page.

Something seems a bit wrong with the graph "Publication rate by career length" -- should the y-axis be "Average number of published papers per year"? (I can't imagine that someone whose first paper was in the 1950s only published 1 additional paper in the next 30 years)

They got their PhD and entered industry. Publication was no longer a priority and maybe not even an option.

If the axis labels are correct (which, after referring to the full paper, they seem to be) then I think this is the only reasonable explanation. I.e. that the dataset tracks all authors of scientific publications, rather than those who are active researchers (or had research careers and subsequently passed). [Note: I did not see any attempt to take this into account upon skimming the full paper]

Given the high attrition rate in many academic fields (and the small number of publications typical in early years) this would then rationalize the low numbers seen here. Though, I would say the meaning of these numbers would be quite different than if this were presented for those who had research careers (the more relevant number I think).

No, that's reasonable. Look at is this way: the modal category of authors is the group that has published one paper. The next-largest group published two. And so on.

It's a very very small overall percentage of authors that publish more than one paper per year!

I disagree. One paper per year for someone who started publishing in the 50s wouldn't be insane to me (though, low to my eye) -- but publishing one paper every few decades would be insane by any standard (and you couldn't get to millions of papers published per year at that rate).

My grandfather (born well before the 50s, but still) was a professor his entire career and I can't a single (unambiguous) publication of his online. I would not be surprised to learn there were no papers, although I believe a book chapter occurred at some point.

My father, born in the 50s and like another commenter discussed primarily involved in industry, appears to have three or four.

Your norms and intuitions are based the present craziness. Here's Higgs (of Higgs-Boson fame) who's published 10 papers since 1964:


Fantastic over view of current trends in academia. There is truly a huge bias in the research publications. I think blogging is a better way to put forth your ideas and research rather than getting a publication in some cases.

Why not both? Berkeley AI Blog, Stanford AI Blog, CMU ML blog... all show that you can do both. Reviews (part of paper submission process) are legitimately useful, if done well. As a researcher, having Arxiv as a standard medium and conferences to help filter interesting papers (if imperfectly) is also useful.

Yes, agreed. Arxiv is also an awesome platform.

Excellent article. A relevant link would be the San Francisco Declaration on Research Assessment (DORA) https://sfdora.org/

Back to the article, lots of gems, like:

>Today's researchers can publish not only in an ever-growing number of traditional venues, such as conferences and journals, but also in electronic preprint repositories and in mega-journals that offer rapid publication times.

Did I just read a very normal point of view of a researcher putting on equal footing electronic preprint repositories and mega-journals?

The BOAI Open Access preachers surely can't believe their eyes :) Heresy! (No researcher was involved in the BOAI flawed definition of green OA as archiving and gold OA as publishing.)

Do people benefit from gaming the system? Then it surely is being gamed. And they do. Funding and tenure depend on these metrics.

> Overwhelmed by the volume of submissions, editors at these journals may choose safety over risk and select papers written by only well-known, experienced researchers.

There is a bit of a "circle jerk" in this process: if you know the right people, you can get better reviews. In return, you review their papers or requests favorably. That also leads to repeating authors.

Spot on. I like to think that the system-gaming doesn't necessarily disadvantage the committed and passionate researcher (who researches to satisfy personal curiosity).

'circle jerk' is hardly the right term, this is more like mafia, insider ism, or maybe just normal human behzvior

Sometimes people idealize academia though. It's important to realize that everything has its flaws :).

The thing to realize about academia is that the vast majority of people in it have never experienced the real world. They went straight from grade school to college to working in academia, and stay there until they retire or die.

Academia is literally all these people know, and they've been sheltered their entire lives. No one should be surprised that this produces strange results.

Obviously there are exceptions, but there's no denying it produces some interesting world views.

Yes. But a good department will know you for who you are. Plenty of great science takes a long time do do, and it is known it's hard to get funding for long-term monitoring experiments. Most grant money is for new and innovative ideas and no one is pushing out one of those every year. And if you are then you're not doing 90% of the work on any of them.

They should have limited the analysis to the most popular journals. There are tons more journals nowadays because its so easy to run one - but it s more important to know what’s happening at the well known ones. The lesser know are largely ignored

Tough call: Goodhart's Law says yes, metrics will be gamed, but Betteridge's law says the answer is no.

One of those is a fundamental condition of a game-theoretic scenario, independent of environment. The other is a marketing trend.

No. We're all being gamed. Academics were just nearer the front of the queue.

I wish there was a Google n grams for Google scholar...

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact