Hacker News new | past | comments | ask | show | jobs | submit login
Troubling trends in machine learning scholarship (2018) (arxiv.org)
197 points by scottlocklin on Dec 1, 2019 | hide | past | favorite | 62 comments

This is not just isolated to machine learning, but most technical fields (at least one other to my knowledge).

I used to work in microscopy image analysis and the papers often would obfuscate the fact that they were not exactly doing anything new by using what looks like fancy math and some trendy names.

One of the most outrageous examples is this "high profile" paper that says it does compressive sensing with superresolution microscopy - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3477591/ except I don't think they do; the math when you remove the bullshit sounds more like deconvolution than anything else (and the results are only as good). Yet, it got reviewed and accepted by Nature Methods, and is cited by 360 papers already. Why? Apparently no one in this field knows what compressive sensing really means. At least one professor in the field when I confronted him, just said he doesn't have time to go through compressive sensing literature first before evaluating this paper.

What's the root cause? Frankly I'd argue the majority of professors nowadays aren't smart in innovation but smart in hustling. Because hustlers are who become professors in today's academic climate. They are able to publish good papers still if the field isn't mature, but if the field is saturated, you have to be really smart to make meaningful progress, and these hustlers are not. So they just try to find some way to wrap meaningless progress in fancy math and shove it in papers. The papers also go to reviewers who are similar hustlers (not every paper can be reviewed by Hinton ) so they either don't notice the problem or they do but let it slide because it's just their colleague (yay for journals asking for "suggested reviewers" to the authors itself).

On my first internship I worked at a lab and at one point a researcher asked all the staff to be nice enough to read his draft to spot mistakes. I obliged as well, not understanding he meant typos. I asked him about an equation I did not understand in his paper, because I could not link some terms to the rest of the text. He answered that he did just copy it from another paper, that it only made sense in the original publication and that he was not really understanding it either but just enough to know it had to be there.

On another internship I worked for a researcher who bragged about having published almost 10 papers on one of his algorithmic discoveries without ever revealing totally (which allowed him to start a for-profit company with it)

You know, the whole publication + review + reproduction thing really helped science become a more solid process, but we need something more elaborate now. Probably some kind of reputation system that would not just be the number of citations by friends and colleagues.

> You know, the whole publication + review + reproduction thing really helped science become a more solid process, but we need something more elaborate now.

No, we don’t. Increasing demands for rigour in pre-publication peer review are why publication times from submission sociology and economics reach and exceed two years and why papers which start off thirty pages long end up with eighty, after adding robustness checks and citing every tangentially relevant paper in the literature. We know that post publication peer review works perfectly well because it was the norm until after WWII with the rise of state funding of big science and the accompanying ass covering and form filling proceduralism that made it popular.

As always only replication counts, whether that’s checking that an experiment has the results claimed or that an argument follows from its premises.

We certainly don’t need to lean on reputation more. Science isn’t law; arguments from authority aren’t valid.

I think the accessability of publication has lead to a broader sampling of the normal curve, and as a result the overall quality of scientific literature is in decline. I imagine that just 50 years ago University was for a select elite, whether by nature or nurture, and the cost of running and printing journals pre internet ensured prioritization of a scarce resource. Nowadays publishing is relatively cheap and that coupled with what I imagine is a modern use of publication numbers as a KPI means lots of noise.

I felt it personally in grad school. If you objectively observe the work of yourself and your peers in such an environment, you may notice that there's a reason that none of you got into the Ivy Leagues.

Actually I totally botched the intent of my first message. My point is that the publication process is used as a reputation metric, which is something the research world (not science) needs. It needs it for valid purposes and using number of citations and impact factor for it is a hack that is now becoming very noisy due to the various exploits that can be made to this metric.

The publication+review+reproduction process is fine to discover scientific fact, I totally agree.

You can always use PageRank instead of raw citation counts.

That still requires citations. Except they are now weighted. I wish there was no incentive on forcing/accepting an irrelevant citation in a paper.

NIH already does that with it's iCite initiative.

> We know that post publication peer review works perfectly well because it was the norm until after WWII with the rise of state funding of big science and the accompanying ass covering and form filling proceduralism that made it popular.

You know if there's anything written about the history of the modern scientific process, specifically on the rise of state funding of science? I'm particularly interested in when academics started to be essentially required to bring in external funding. I've only read offhand remarks like this and don't feel I have the full story.

This is an active area of interest in History of Science scholarship these days, which is steadily dismantling a lot of myths about how long peer review has existed and where it came from. It is in fact linked to the need to bring in funds from big grants agencies during the Cold War. You might try for example Melinda Baldwin, Scientific Autonomy, Public Accountability, and the Rise of “Peer Review” in the Cold War United States, which has a lot of references to recent scholarship: https://www.journals.uchicago.edu/doi/pdfplus/10.1086/700070

Thanks, your reply is among the best I've ever received at HN! This is why I post here.

But there is zero reward for showing replication results. Not novel enough, you won't get published. And if you're unable to replicate it then maybe you just did it wrong, or there was a small trick they were using in the code which they left out of the paper, etc.

The answer is evident: replication has to become (once again) relevant.

Of course, agreed!


Having replication studies/papers be on par with "innovation" studies/papers on academic conferences and journals. Or maybe not on par, but considering them as something worthy of publication.

Great point/s.

> As always only replication counts

It's too bad replication studies have such a hard time getting funded.

One of the best proposals for research/review system I have seen is from Yann LeCun: http://yann.lecun.com/ex/pamphlets/publishing-models.html.

The idea is that you post your paper on some common repository like arxiv. The "reviewer entity" or RE which is self-organized group with interest in sub-field can be invited to review it. The RE accepts or rejects it. Citations graph propogates to RE as reputation score.

The problem with his proposal:

1. Authors are not anonymous which can bias prestigious REs to avoid papers from unknown authors.

2. Everything still depends on citations, most of which are usually worthless as people mostly cite to fill obligatory related sections, not because they are actually using your work.

3. No obligation for REs to review anything. Doing reviews is tiresome and busy people may just avoid it unless there is a clear obligation/honor system involved.

4. Prestigious REs will be invited to review by everyone, causing highly uneven distribution for the workload.

I believe openreview system was started based on LeCun's thoughts but needless to say we haven't found a good system that can resolve above issues. More importantly, any change in system needs to come from community leaders who are part of organizing commitees for conferences or website developers like arxiv and Google Scholars. Unfortunately, last two have been virtual stasis for long time.

It's astonishing how little investment exists for our main engines of progress that is scientific progress. Compare number of developers working on arxiv or Google Scholar vs Twitter!

> You know, the whole publication + review + reproduction thing really helped science become a more solid process,

I think the model is still good. The examples you stated really do fail at the reproduction part. I think one of the difficulties here is that reproduction is difficult and costly, and there is no one willing to pay for it. A common metric for a paper is "can a smart PhD student replicate the experiment from the contents of the paper?" How many advisors ask their students to replicate a paper? How many students can?

I had this experience with multi-objective evolutionary algorithms. I worked with a researcher who was hung up on the idea of "rotationally invariant search operators." This idea made little sense in the first place. To the extent that it meant anything at all, the properties he ascribed to the search operators were more or less the opposite of their actual behavior.


Interesting, to say the least, though I can't say I have time to read all 150 pages at this very moment.

Chapter 6 is the only place you'll find me talking about rotational invariance, and unfortunately I had to recite the B.S. party line there. If you read any of it, read chapter 5 and chapter 2.

I'm somewhat familiar with compressed sensing, but I only glanced through the linked paper for like a minute, but it looks like you're right. They seem to be taking the overlapped returns and doing some sort of weird sparse basis deconvolution.

Specifically "Using compressed sensing, we analyzed a simulated image with 100 molecules randomly distributed in a 4 µm × 4 µm region (Fig. 1a). Although the images of individual molecules completely overlapped, we could identify almost all of the molecules. "

Unless that is a close enough approximation to a random projection? I'm not that familiar with STORM.

My understanding is that you need to apply masks at the sampling stage, and there are legitimate researchers trying to play with that concept using single pixel cameras. Never heard of anyone else try compressed sensing with regular CCD or CMOS sensor array cameras.

Yeah exactly. I'm not familiar enough with STORM to say if it already provides something close enough to a random projection simply by the acquisition process.

I'm not so sure that it is hustling, though I'm sure that exists. I'm a firm believer of Hanlon's Razor [0], and we shouldn't rush to attribute malice.

I do think there is a problem that our breadth of knowledge, as humans, is far larger than what one person can understand. There's famous examples of revolutions in science being claimed as mundane results. Topologists said Nash was just applying topology to economics and it was nothing new (to them). Mathematicians saw Einstein's results as unsurprising because of the tensor analysis. (Some of these are over exaggerated and there's definitely a post hoc superiority complex in play). But if we just take this at face value, is any of this bad? I would argue no, because it still takes someone to connect the dots between fields and push studies to think in those ways.

But one thing is for sure, as we gain more knowledge it is more likely that someone else independently discovers something that was already discovered. It is also more likely that some of these are rediscoveries of ideas that were not useful at the time.

I think there is a way to solve this though, but I'm not sure we can (yet). We need some good way to check research in a cross-disciplinary manner. Not only that, but in a highly technical way.

[0] https://en.wikipedia.org/wiki/Hanlon%27s_razor

Regarding hanlons razor, I'm not making any accusations of malice. It could be malice, or it could be stupidity. Either way the results are the same. Frankly my opinion is that the line between the two is often quite blurred.

Regarding "undermining" novel innovation, you're right for sure. In this example, however, it's not undermining, I'm alleging they're not even doing compressive sensing, and that a competent unbiased reviewer would have caught it.

> I'm not making any accusations of malice.

Sorry then, I read "hustling" as with intent. It definitely has negative connotation. Since it many times is seen as an act of fraud. Though strictly by definition this could be over zealousness and not malice, but it definitely has that connotation in vernacular use.

> I'm not so sure that it is hustling, though I'm sure that exists. I'm a firm believer of Hanlon's Razor [0], and we shouldn't rush to attribute malice.

From what I saw in academia, it definitely was partial hustling. The thing is that it became so prevalent in some disciplines that the new generation now views this as "research" and not hustling.

Academics are no less immune to social proof. For them, academics is what academics do. And if most of them do this, then this is academics. The notion that it is problematic is waved away.

> There's famous examples of revolutions in science being claimed as mundane results. Topologists said Nash was just applying topology to economics and it was nothing new (to them). Mathematicians saw Einstein's results as unsurprising because of the tensor analysis. (Some of these are over exaggerated and there's definitely a post hoc superiority complex in play).

Do you have any references for these examples?

I could not find any through a quick search. Maybe it is convoluted search terms. I got a degree in physics and was pretty close to the math department. This is just stuff that I've heard from both departments and seen in a few documentaries. I had seen a direct source on Nash (IIRC someone upset about the Nobel), but that was years ago and I can't find it easily.

I did try to give a bunch of side notes to downplay and justify it in ways that may be conceited, but also reasonably human. I don't know if these stories I heard were a single person (I'd be surprised if there weren't at least one person!) or a larger group. I wouldn't be surprised if a ridiculous voice got amplified (can't think of any times that happened in modern times.....)

Of course, hustler types don't look at a career in science and think "how can I game this system" -probably not even in a high dollar field like ML (FWIIW I am OP).

But the reality is the present incentive system in the "sciences" is such that bullshit, hustling, trend following, sucking up to the associate dean, marketing chops and chasing fashion (to say nothing of politics and witch hunts) are rewarded, and honest diligent research isn't. This is one of the reasons actual breakthroughs (such as deep learning) are incredibly rare in the modern day, compared to, say, the way stuff worked in the 50s.

> This is one of the reasons actual breakthroughs (such as deep learning) are incredibly rare in the modern day, compared to, say, the way stuff worked in the 50s.

I'd argue that the more we learn the harder it is to find major breakthroughs. Looking at Math and Physics as a model, I think this is clear, that major breakthroughs become more sparse as time passes. Luckily we have more eyes looking at things now, which helps a lot.

There are so many huge things to know about in physics ... so many gaping lacunae, I can't accept this. Sure nobody's thinking about them, instead preferring to fool around with non falsifiable piffle like noodle theory and "quantum information theory:" that's part of the problem.

Similarly in technology, there is much to do, but tinkering with hardware, craftsmanship; the things that worked in industrial labs in the old days: they're not done any more. Even making some simple piece of junk aluminum part, people waste time fooling around with solid designer and FEA instead of just handing a piece of graph paper to a former Navy machinist. There are entire books written in the post WW-2 era about fast development when people actually used to develop things quickly. Nobody does it. Well, the Russians and Chinese do, but they also develop big technology a lot faster than we do in the west.

I have no idea what's wrong with machine learning research; probably gratuitous abuse of grad students, people jumping after fashion, and lately brain drain.

I'm usually a fan of Hanlon's razor, but I don't think it cuts cleanly when there are well-known and clearly visible incentive structures that strongly promote hustling. Attributing malice here seems closer to saying "water flows downhill" instead of "water flows randomly, and it so happened it randomly picked this direction".

EDIT: I also don't buy the dominance of stupidity for another reason: if you're being stupid, you'll make mistakes. But you can't obfuscate and bullshit without intent to do so.

I see Hanlon's Razor get throw around a lot, especially on HN. But does it really have any basis in reality?

I think so. If you really try to understand how people become evil you will often find that they do it with good intentions. There's the extremely old adage that I'm sure has earlier origin than the following well known one (anyone know an earlier example?). "The path to Hell is paved with good intentions."

I think we'd agree that there are a large amount of people that do not do great things, momentarily or as a way of being. But how many people see themselves as bad? Very few. You can find tons of psych studies on this. Where people doing evil things justify it for many different reasons. "Just this one time", "for the greater good", "I have no other choice", "because _they_ are cheating", "the system is rigged against me", "fight fire with fire", etc. We all know these things. We've done them ourselves (to some extent or another).

I think a good example is politics. I think a lot of westerners like democracy (vague term). But how many imagine that if we were dictator for a day how we could just fix everything? Besides being naive and overestimating our intellectual prowess, it goes against the fundamental idea of a democracy. I'd argue that a lot of authoritarians see themselves in this way (there's good evidence to support this). That it is for the greater good. I'm sure you can think of at least two examples today that think that they have to control their countries because the people they rule over are not smart/civilized enough to know what is best for themselves.

Is this malice? I think that depends on the perspective. And that's really what Hanlon's Razor is about, perspective. Understanding the mind of the actor.

Hanlon's Razor is not a physical law, it's a heuristic, and a very useful one for stopping you thinking everyone is out to get you

Sounds like mathematicians need to collaborate more then.

I for one would welcome this. I'm not sure if they push themselves into a corner or it is this idea that math is scary that pushes them into a corner. Probably some combination. But they definitely have some things that have a lot of practicality that is yet to be untapped.

> There's famous examples of revolutions in science being claimed as mundane results. Topologists said Nash was just applying topology to economics and it was nothing new (to them). Mathematicians saw Einstein's results as unsurprising because of the tensor analysis. (Some of these are over exaggerated and there's definitely a post hoc superiority complex in play).

Relevant XKCD "Fields of Purity" - https://www.xkcd.com/435/

I'm really disappointed that PubPeer is used so little outside of bio/medical/social science fields. I'm not sure why this is. There seems to be essentially negligible interest in spotting mistakes in the literature in CS/physics/math/engineering (outside of materials science, just glacing at the site).

(For the record: I have posted a comment on PubPeer myself so I'm not a hypocrite.)

In my view many if not most professors are "careerists" in the sense that they put advancement of their career over accuracy. I wouldn't quite say that you need to be really smart to make progress in most fields, though being smart helps. If you can do a lot of work often you can get a great result without being a supergenius. Ultimately it's much easier to fool reviewers than to do good research.

I think most researchers are somewhat innocent in the sense that they produce bad research without knowing it's bad. Many of these folks seem apathetic to the quality of research, as you found in the professor who said they don't have time to go through the literature. As far as I'm concerned, if you don't have time to go through the literature (at a reasonable level, i.e., more than is typical now), you don't have time to do research at all.

"... yay for journal asking for "suggested reviewers" to the authors [themselves])"

That is common practice at a journal like Nature when there are a limited number of investigators in the field with the reputation and experience "necessary" to evaluate the work.

Of course the chosen panel of reviewers is supposed to remain anonymous.

How would you improve the system?

>Of course the chosen panel of reviewers is supposed to remain anonymous.

If only! My professors were quite adept at figuring out who the reviewers were, especially if they're someone from their "cabal", because every investigator has their own grammar and language style and whether professors are good at science or not, they are most definitely really good at English (gotta write those grants perfect!).

Typical practice nowadays, is that these journals would choose two reviewers from the suggested list, and get one other by themselves just to try to be unbiased. Hence, the fate of the paper would reside on that single reviewer (my experience has been that the "suggested" reviewers will never reject the paper unless it's absolute shit).

Perhaps the journals should check the citation history between the suggested reviewer and the author to make sure they are not mutually citing each other and stroking their backs.

A fair amount of techies suffer from 'math sickness' which is most often spread by hot papers at conferences that apply some difficult to grasp mathematical approach to a new problem domain.

No one in academia has time except for a few at the top. We're constantly swamped with tasks that aren't research and we're not paid to research. What do people expect?

> we're not paid to research

Isn't it in your job description, and what your performance is assessed against?

It depends, it is more complicated than having it as a metric in your contract and being paid to meet some demand. Generally what happens is you are expected to research outside all the other responsibilities that are not research which consume your contracted hours. So undergrad teaching, administration, grant applications etc take the majority of your day alongside managing the welfare of students you are responsible for. Don't forget meetings, committees, staff training days, postgraduate supervision and development of up to date course materials. I wouldn't even bother trying to fit in a personal life while trying to stay on top of the field.

When you're graded against whatever metric your institution uses they will decide whether or not to pile on more undergraduate teaching and administrative responsibilities (basically seen as the shit jobs) which in turn reduces your time available to research. That is problematic in and of itself as developing successful undergraduate systems requires people who know the field well and can teach relevant materials born from practice. What you end up seeing is young researchers who go into Lecturer positions end up deciding the next 30 years of their life on the first evaluation rather than being invested in as researchers who could contribute to their own projects or to teams. To compound the issue university administration has exploded in the last 10-20 years creating more middle management, more cost and more downward pressure on researchers to adhere to the demands of people less qualified than themselves. The same administrators are also the ones calling for mass casualisation of teaching roles to save money without seeing the long term downsides this has for building institutional knowledge and departments which can generate high quality research. Luckily I am in an institute that mandates at least 1 day per week is a research day and you are allowed to work from home and be uncontactable on that day if you wish but other people have it so much worse.

You really hit the nail on the head there about hustlers VS innovators. This is also epidemic in my fields - volcanology, planetary science, and robotics.

"In this paper, we focus on the following four patterns that appear to us to be trending in ML scholarship: (i) failure to distinguish between explanation and speculation; (ii) failure to identify the sources of empirical gains, e.g., emphasizing unnecessary modifications to neural architectures when gains actually stem from hyper-parameter tuning; (iii) mathiness: the use of mathematics that obfuscates or impresses rather than clarifies, e.g., by confusing technical and non-technical concepts; and (iv) misuse of language, e.g., by choosing terms of art with colloquial connotations or by overloading established technical terms. "

To their credit, the authors actually own-up to doing this themselves in various papers. It seems like a way to describe the situation is that neural nets have become such computational monsters that talking about them exactly becomes very difficult with the language opaque and ambiguous.

I'd say a lack of a proper fundamental understanding of trained neural networks is the main cause. People throw NNs at any problem they can think of, get good results and when they want to publish, they come up with an explanation that is more esoteric than founded in solid theory because the monster they generated is so inscrutable.

The stuff is what the stuff is, brother. https://youtu.be/ajGX7odA87k?t=931

Thanks! This is a great talk.


We'll have to see where this all leads, to ver the next few years/decades. Maybe someone will manage to combine "a proper fundamental understanding of trained neural networks," and good results. That'll lead to (perhaps) good theories, to explain the good results.

If "good results" continue to outpace our understanding of wtf the useful NN is up to... It'll have to be studied expirementaly, like the way we study biology.

Ie, we might see CS theory adapt from "mathematical," to "scientific" to in its methods and theories.

The current trajectory seems to be heading here. There is tremendous interest and resources in NNs. As they become more commercially important, interest and resources dedicated to developing them increases. They only need the NNs to work, not to be scriptable.

Scientists are not just going to give up though. They'll study NNs expirementaly as black boxes if that's all they have.

What you're saying is a bit tautological in a way you may not intend.

What the paper describes is those research papers which aim at, that, giving a fundamental understanding of a trained neural network. That the papers are satisfied with "it works" stands in the way of anyone having this fundamental understanding.

Regarding (i), isn't that just because it's an immature field? People don't really know where the border between knowledge and speculation lies.

It’s questionable how “immature” ML really is. Most methods that get used were initially designed 50+ years ago, with various improvements over time. E.g., neural networks were invented in the 1950s, backprop was introduced in the 80s, architectures like LSTM and CNN in the 90s, etc.

The only thing that’s really new is the amount of computational power at our hands. That has allowed us to shift from relatively simpler methods to more powerful but opaque methods like NNs. They just don’t lend themselves to easy analysis because it’s a lot harder to explain why inputs to these ML systems map to their respective outputs. Hence, attempts at drawing the connection between inputs and outputs become more speculative.

Sure people had these ideas, but there was no consensus that they were the right ones. As such they weren't studied as much as they are now.

The people who make a paper have to know where the border is for their particular paper. That is, which things in the paper are claims with evidence to back it up - and which things are speculations about what might be an explanation. That some things are speculative is not such a big deal, as long as it is clearly marked as such. Then someone else can investigate it properly in another paper. Or people can use it in another work, by treating it as an assumption that they can verify whether holds, and then make use of.

> "...overloading established technical terms. "

Subtle. ;)

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact