Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> paper reviewers are usually not supposed to look at the actual source code of the papers

Wait what? I haven't reviewed for ACL but most conferences don't say "don't look at the source code." They will say that reviewers are not required to look at it (as well as the appendix). But generally it just isn't uploaded. I do always look at the main method when it is there but talking to my peers and advisor, this is very uncommon[0]. My experience is that most reviewers do not spend more than an hour on a work and make an opinion within 15 minutes.

> Not sure what the best solution is, other than having the most "hyped" papers double verified by researchers on Twitter.

I'd say (as a start):

1) Get rid of the conference system. A zero-shot (maybe 1-shot if "rebuttal" is allowed) zero-sum system is just disastrous, especially at scale. There's high incentives to actually reject works you review for. A conference system has a binary outcome and the purpose is to reject 80% of papers based on a rather noisy metric of "top tier." A journal system is a back and forth where reviewers are trying to improve the paper. The purpose of the reviewers here is to determine if the idea is indeed good, and then if the paper meets the requirements or not and must explicitly state what needs to be changed for acceptance.

1.5) An actual rebuttal system could help alleviate some of these issues. Using OpenReview for a conversation between authors and reviewers is critical. A singular 1 page response (the norm) is not adequate to respond to 4 different people who often have low similarities in responses. Reviewers are allowed (though breaks guidelines) to respond in one sentence.

2) ACs need to do a better job at validating reviewers. The number of inane and absolutely unacceptable level of reviews I have gotten is astounding (>25%). I've also seen reviewers often break guidelines and have nothing happen. Examples are comments such as those claiming lack of novelty with no explanation or asking authors to compare to concurrent works (I've had this happen for a work that was put out _after_ submission deadlines. Not mine, but here's an example[1] of this being done publicly). If the reviewer is pushed to update their comment then the authors have no ability to respond to their update without the conversation aspect. If there is high variance in response -- not just scores, but what the critiques are about -- then the ACs need to look closer as something is going wrong. We're in a crisis for reviewers but we also have an undisclosed crisis in quality of reviewers. Benchmarkism is on the rise but benchmarks are extremely limiting for evaluation. There's a certain irony given our frequent discussion of Goodhart's Law or Reward Hacking. I'll even make the claim that the quality crisis influences the quantity crisis as I have seen many peers stop reviewing because it isn't worth their time and they aren't getting a fair shot in return. On a personal note, there is a journal I will no longer review for because in-actionable and unreasonable responses, but I also won't submit to them either.

3) Either get rid of double-blind, or actually do it. Everything is published on arxiv these days, which in general is great for the community as it allows things to move fast. But with this it is incredibly easy to de-anonymize authors. Though for big labs, they de-anonymize themselves actively[2]. In a very noisy process even a very slight edge becomes a significant edge[3]. These biases can even come unconsciously given that we're all reading arxiv papers constantly and it isn't unlikely that we come across some of the works we end up reviewing (yet to knowingly happen to me fwiw). But certain labs do have keywords that they use that can be identified.

I think one of the major problems comes down to this: in a small community we have a certain level of accountability, as we all end up knowing one another through minimal connections. But in a large community there is little to no accountability and what depends on good faith can no longer be trusted. This encourages bad actors, especially when the system is highly competitive (see 1)), and creates bad science/evaluation creep. (e.g. now standard to tune HPs on test data results -- this is information leakage. If you don't, you likely can't compete).

======

[0] Here's a prominent researcher explicitly saying they don't read the appendix, calling it trash, and a poll showing most people don't look at it https://twitter.com/david_picard/status/1660293648796340226

[1] Here's a prominent researcher criticizing a paper for "not citing his work". I linked the top response which is telling him the submission date was 2 months prior to his arxiv release. This is someone who published >250 papers vs someone with <50. For added reference, paper 2 (prominent researcher) was _published_ June 26th in TMLR, but they did cite the other work (gotta give credit for that) https://twitter.com/RinonGal/status/1667943354670170118

[2] We have 2 scenarios here: either reviewers do not know Chinchila == DeepMind, where I'd argue that they are unfit for reviewing given the prominence of that model or 2) they do know, and thus know this is a DeepMind work, and we have an ethics problem. Neither sound great. https://openreview.net/forum?id=OpzV3lp3IMC&noteId=HXmrWV3ln...

[3] The conclusion in this analysis of consistency experiment is that even a small amount of inconsistency leads to a lot of noise given a highly selective standard. Which means that paper acceptance itself is highly stochastic: (2014 experiment) https://inverseprobability.com/talks/notes/the-neurips-exper...

[3.1] A shorter version: https://blog.mrtz.org/2014/12/15/the-nips-experiment.html

[3.2] A follow-up on the 2014 experiment tdlr: reviewers are good at identifying bad papers, but not good at identifying good papers (i.e. bias to reject): https://arxiv.org/abs/2109.09774

[3.3] A follow-up 2021 experiment (consistent with 2014 experiment): https://blog.neurips.cc/2021/12/08/the-neurips-2021-consiste...

[3.4] Video form https://www.youtube.com/watch?v=19Q-vMd9bYg



I try not to submit to conferences if I can avoid it. It's like you say, reviewers are looking for a reason to reject. I don't understand what makes the difference since it's usually the same people reviewing in both conferences and journals, but somehow journal reviewers do a much better job. Some journals have a fast turnaround even, and still the quality of reviewing is considerably better.

My second journal paper got rejected with encouragement to resubmit. Half the reason for that was because the reviewer had, I think, genuinely misunderstood the description of an experiment, so I re-wrote it in painstaking detail. I had a long section where I hammered out a proof of complexity spanning three pages, with four lemmas and a theorem, and the reviewer waded through all that like a hero, and caught errors and made recommendations for improvement. They made a new round of recommendations when I resubmitted. That paper took three rounds of revisions to publish (reject, resubmit, accept with minor revisions) but it got 80% better every time I had to revise it. I wish there was another couple of rounds! It was exhausting, and I bet much more so to the reviewer, but it was 100% worth it.

And yeah, I absolutely do my best to review like that myself. Even in conferences, which probably seems really weird to authors. But, hey, be the change you want to see.


Yeah, honestly the only reason I submit to conferences now is because my advisor asks me to. If it was up to me I would submit exclusively to journals or just to arxiv/open review directly. I think I'll do this when I graduate (soon).

As for the reason why it happens in conferences, I think it may actually be a different set of reviewers. While journal reviewers are going to be conference reviewers, I don't think the other way around is true. I think conferences tend to just have a larger number of shitty reviewers (as well as more shitty submissions). And as you note, it is quite easy to misunderstand a work, doubly so when you're reading under a time constraint. It just makes for a noisy process, especially when reviewers view their job as to reject (not improve). I just think it is a bad system with a bad premise that can't really be fixed. For conference reviewing, I always try to write what would change my mind and if I think the authors should resubmit to another venue. But even reviewing I don't feel authors get a fair shot at responding. They can't address all my comments while addressing others in a single page.

Edit: I saw your bio. I actually have a SOTA work that is rejected (twice). Good performance jump with large parameter drop. But just couldn't tune or run enough datasets because compute limited. Conferences are fun.


>> paper reviewers are usually not supposed to look at the actual source code of the papers

> Wait what? I haven't reviewed for ACL but most conferences don't say "don't look at the source code." They will say that reviewers are not required to look at it (as well as the appendix). But generally it just isn't uploaded.

Sorry, I formulated that badly. I meant what you say, that are usually not presented with the source code, and aren't expected to go hunting for it online. If they do anyway, they are going above and beyond.


If they do go hunting for it, then that would be a ethics violation as they break double blind. But it's not like that exists, at least for big labs.


In terms of “getting rid of the conference system”, I would suggest what I see in maths, which is split into two types of talks:

* talks about already peer reviewers conference papers.

* talks about active research. Here you only submit an extended abstract, and you don’t get to “claim credit” for giving the talk. People tend to only talk about things where they genuinely want to hear feedback, maybe even new collaborators.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: