
The Holy Grail of Crackpot Filtering: How the arXiv decides what’s science - yetanotheracc
http://backreaction.blogspot.com/2016/05/the-holy-grail-of-crackpot-filtering.html
======
vintermann
This is dismaying. I can sort of understand that the physicists want lots of
filtering, but in the area I follow on Arxiv (machine learning), it seems to
me as if there's a) little filtering and b) this works out great.

I have seen crank papers, but they are just ignored and quickly crowded out by
papers people are actually interested in reading.

A while back they ran surveys on whether they should filter more. I and
everyone I talked to said, don't worry about filtering, don't worry about
prestige (yours or author's), just give a place to publish and good tools for
searching/browsing (a la external sites like arxiv-sanity), and it's perfect.

But it didn't occur to me that attitudes might be very different in other
fields.

~~~
brudgers
[I am conjecturing]

1\. In a recent field like machine learning, there has not been enough time
for rank amateurs to assimilate the relevant language without understanding
the core material, but in physics, people have had centuries for the Newtonian
model and a hundred years for quantum mechanics.

2\. Because machine learning is so recent, there is less difference between a
rank amateur's quackery and the set of ideas "unexplored which might work".
That is to say, there isn't a strong established model and supporting theories
about how and why techniques work. The state of the art in machine learning is
still so empirical that it hasn't even developed its own "phlogiston".

3\. As a new field, it is "easier to make progress" in machine learning. In
part because the low hanging fruit has yet to be plucked. In part because
people are less able to point to what has worked in the past. And in part
because the pace of advance creates more tolerance toward mistakes...it's
currently in a continuous delivery mode.

But I could easily be wrong.

~~~
vintermann
Mostly reasonable, I think. But there's also something more: experiments are
readily reproducible, and it's becoming a norm to include code for
reproduction.

About assimilating the language, I'll add that old ML papers use a lingo which
is almost incomprehensible to me, whereas with newer ones, they obviously
place high value in being as clear and readable as possible, to get the most
attention from practitioners as well as researchers.

