
Half the papers at NIPS would be rejected if the review process were rerun - urish
http://mrtz.org/blog/the-nips-experiment/
======
nl
For those who didn't read the story: NIPS decided to run this experiment
themselves to identify problems.

That's a a really brave thing to do, and deserves serious credit.

~~~
ehurrell
Absolutely, it's hugely important. The core idea that subjective beings will
disagree holds obvious weight, but it takes serious commitment to improving
the process to offer up this kind of experiment to prove it.

------
andrea_s
Since NIPS is a very prestigious conference, I'd expect a lot of submitted
papers (perhaps a vast majority of them) would fall in a grey area between
"clearly unsuitable" and "clearly suitable". I personally think there are too
many factors at play in evaluating a series of papers - no objective sorting
can really exist.

A note for people outside the academic Machine Learning field: NIPS is widely
believed to be on a different level from the rest - during my PhD, my advisor
used to say that a NIPS paper would be on par with a paper published on a good
journal, resume-wise. The difference is especially striking if you've had the
chance to attend other conferences (including, alas, IEEE-sponsored events) -
that are, with very few exceptions, fairly terrible from a scientific point of
view.

~~~
drpgq
CVPR is pretty good.

------
Canada
Selecting talks really subjective. It's more like choosing what songs to play
at a party than an objective sorting process.

You have limited speaking slots. You have to guess what the conference
attendees will find interesting this year. You are biased by your own
particular interests.

Also, some of the submitters are your friends/colleagues, and even if they
didn't tell you what they were submitting already, which is unlikely since
your relationship is based on talking about this stuff, you can tell it's
theirs in less than 250 words...

However a conference tries to sell the fairness and objectivity of its
process, you can't anonymize or double blind these things away.

------
sytelus
For two independent committees, 6% of papers were acceptable without
disagreement, 25% were rejectable while the rest were coin flips. This means
when your paper gets accepted or rejected, luck is playing huge part. This is
not because judges are actually flipping coin but vast majority of people
don't seem strikingly good or bad. So for a repeated trial outcome may not be
same. Also, the asymmetry here is striking. Definitely bad papers dominates in
number by 4X than definitely good papers.

These are really great observations with deep implications. This same patterns
might get applied in other aspects of life such as interviewing candidates or
selection of mate or buying a shirt. In all these cases, we might have similar
distribution at work.

I have often wondered why is it so hard to have less mediocrity in world? Why
is not every book, t-shirt or smartphone is just great? One obvious reason is
that lot of times people create something out of obligation such as demand
from job instead of out of urge to create. So subsequent question is that if
it was possible that no one has to have any obligation to create, can above
distribution turn its head over hill? For example, in that scenario would we
have, say, 70% great papers, 5% mediocre and rest coin toss?

~~~
dalke
"Why is not every book, t-shirt or smartphone is just great?"

Different people have different ideas of what "great" means. Not everyone
thinks the Harry Potter series are great books, while many do. We see that in
movies where a movie does poorly at the box office while the critics.

The definition of greatness changes over time, so "It's a Wonderful Life", now
considered one of the most critically acclaimed films ever made, had only
mediocre revenue when it came out.

Greatness is sometimes situational, so "Dan Brown ... is the undisputed king
of airplane books — the not-too-heavy, not-too-long potboilers perfect for a
long layover." If you don't fly, then perhaps there's no time when Brown's
works might appeal.

Travel has its own category of "good enough." Visiting Germany once I bought a
book from the limited English selection not because it was great, but because
it was something to read on the long train ride.

A lot of people watch sports, but surely it can't be that all sports games are
great, so greatness can't be the only reason for keeping someone's interest.

Since it's hard to predict greatness, people will test out ideas to see if
there's a response. Sometimes this can lead to feedback and improvements.
Sometimes this testing is through writing clubs. Sometimes (as with smartphone
apps) this is with the market itself.

~~~
Houshalter
It's simpler than that. We focus on the differences between things, not the
similarities. If all movies were equally good, we would then grow to focus on
the tiny differences between them and start to judge them based on those.

------
danieltillett
This is the way "peer review" works. It is basically random. I have always
found it comforting whenever I had a paper rejected as I would know it was
nothing to do with the quality of my work. I would fix any of the typos found
by the reviewers (you always get a spelling nazi as one of reviewers) and send
it out again unchanged. I only have had one paper rejected twice and it was
accepted unchanged on the third attempt.

~~~
IndianAstronaut
Isn't this why Mendeley and EndNote exist? Just change up the citation
formatting and resubmit to a different journal until one accepts your work.

~~~
danieltillett
Yes :)

My favourite peer review story is when I submitted one on my articles to the
top journal in my field at the time (Appied and Enviromental Microbiogy). It
came back with the usual peer review trivial changes (cite this irrelevant
paper of mine,etc) which I did (this nearly always easier than arguing with
the reviewers). The editor made a mistake and instead of sending the updated
manuscript out to the original reviewers, they sent it out to a new lot of
reviewers. What was funny about the whole exercise was the second set of
reviewers called the first set of reviewers idiots and told me to change
everything back.

~~~
Natsu
They always have to find some way to leave their mark.

~~~
danieltillett
This is true, but there is always exceptions. The second paper I published I
sent off to the journal and after a couple of months I had not heard anything
(this was in the physical paper days where you had to mail everything). My
supervisor decided to call the editor to ask what was happening. The editor
said "oh we published it last month". The whole paper had gone straight
through without a single change. This of course was the last time I ever had a
paper accepted like this :)

------
at-fates-hands
Is this the Neural Information Processing Systems convention you're talking
about?

I'm sure more than a few people won't have any idea what "NIPS" stands for.

~~~
davmre
To be fair, calling it "Neural Information Processing Systems" isn't
significantly more informative. The name is just a quirk of history; NIPS in
its modern form includes research in all areas of machine learning, not just
neural nets.

~~~
_delirium
In fact for some years neural networks were very out of fashion there, and it
was almost purely a statistical machine learning conference. I tend to just
think of it as a machine-learning conference named "NIPS", which stands for
something historical (like Perl and Lisp do).

------
army
I think people are reading more into this than there is. Reviewing papers is a
highly subjective, high variance process, and very few papers get universally
positive reviews.

From the point of view of an author, if you get a paper rejected that you know
is worthwhile, you just have to make whatever improvements you can and then
submit it again.

------
graycat
FYI: Apparently NIPS abbreviates Neural Information Processing Systems.

------
djulius
That's a very neat experiment.

SIGMOD made an interesting move this year by accepting all papers reaching its
standards. However not every paper will be given a presentation slot during
the conference.

------
dhm
I am surprised the committees were "tasked with a 22.5% acceptance rate".
Couldn't more than 77.5% of the submissions have been of poor quality?

~~~
liquidise
I am more curious of the inverse: what if more that 22.5% of papers were at
acceptable quality levels? Wouldn't that leave each committee to pick and
choose, thus artificially inflating their disagreements?

~~~
wsxcde
Yes, and I think that is essentially why we're seeing these disagreements.

I've heard from lots of professors that a good conference gets a lot of "very-
good-but-not-great" submissions and the job of the program committee is to
pick the best among these. I wouldn't be surprised at all if minor personal
preferences (which from the outside look rather random) ended up having a big
say in the fate of a particular paper. Maybe some reviewers are more forgiving
of poorly-written but technically strong papers, maybe some reviews consider
certain fields "dead" and so are biased against them, reviewers tend to wildly
different standards on how extensive an experimental analysis should be to be
acceptable, ...

------
hyperbovine
What is going on with these graphics?

~~~
rtkwe
It's an XKCD styled graph by the looks of it. There are a couple generators
out there that take data and make the hand drawn looking graphs. e.g
[http://xkcdgraphs.com/](http://xkcdgraphs.com/)

~~~
jff
And they look fucking awful.

~~~
flopto
They're supposed to look hastily-made to reflect the imprecision in the
estimates.

[https://www.chrisstucchio.com/blog/2014/why_xkcd_style_graph...](https://www.chrisstucchio.com/blog/2014/why_xkcd_style_graphs_are_important.html)

~~~
desdiv
I love the general idea of comic-style graphs, but this particular
implementation of it does indeed look "fucking awful" in my opinion. The graph
itself is fine, but the axis is made of these faux wavy line that eerily
repeats itself.

~~~
userbinator
_The graph itself is fine, but the axis is made of these faux wavy line that
eerily repeats itself._

I see the same things in fonts that are supposed to look "hand-drawn", and CG
renders of realistic scenes - it looks "imperfect" but the way the
"imperfection" is itself perfect is what stands out. A little randomness goes
a long way to avoiding that.

------
techaddict009
Something similar happened with IEEE[1] - It had accepted approx 120 papers
generated by SCIgen[2]. [1] - [http://www.nature.com/news/publishers-withdraw-
more-than-120...](http://www.nature.com/news/publishers-withdraw-more-
than-120-gibberish-papers-1.14763) [2] -
[http://pdos.csail.mit.edu/scigen/](http://pdos.csail.mit.edu/scigen/)

~~~
sqrt17
IEEE may be more prone to precision errors (letting bad papers in) while NIPS
may be prone to recall errors (throwing good papers out). With the way
reviewing is done (no one can take a week off to read and fully comprehend the
four papers they are given) you cannot achieve perfect separation - even if
that were possible.

~~~
onan_barbarian
Calling the process of "accepting a SciGen-generated paper into a allegedly
peer-reviewed journal" a "precision error" is a bit on the optimistic side. It
implies that someone was making a decision after reading the content of the
paper, as opposed to, well, just accepting everything in sight.

It doesn't take a "week off" to notice that a paper is gibberish, at the very
least.

~~~
userbinator
Unless the reviewer doesn't actually know anything at all about what he/she
claims to.

It wouldn't surprise me at all if most of the general public would be unable
to distinguish a SciGen-generated paper from a real one.

~~~
GFK_of_xmaspast
What is such a person doing reviewing for the IEEE.

