
On End-To-End Program Generation from User Intention by Deep Neural Networks - apsec112
http://arxiv.org/abs/1510.07211
======
teraflop
RNN's are pretty cool, and I don't want to be a downer, but we need to keep
the hype under control. Ever since Karpathy's excellent article[1] and sample
code made the rounds, it seems to have become a popular pastime to grab some
arbitrary dataset, throw a deep network at it, and marvel at whatever output
it synthesizes. Those experiments are fun, but we need to avoid the temptation
to make assumptions about how well they generalize to "hard AI" problems.

Let's look at the actual experiment described in this paper. Given a corpus of
a couple thousand short programs, they discovered that a neural network can:

* Mix and match fragments of near-identical programs to produce something that is "almost" compilable and "almost" equivalent to one of the originals.

* Identify patterns that occur in specific frequent contexts (e.g. the array name that appears before the string "[100]" in the examples given) and remember them for short periods of time, albeit not reliably.

* Do this for four different problems with a single network. We are not told what the problems are, much less how they were chosen or what the sample data looks like.

What's notably lacking is any discussion of generalizing beyond the training
data. This isn't much different from writing a paper about how if you give a
classifier the same data at training and test time, it can achieve a high
accuracy, and using that as a "case study" to argue that the algorithm has the
potential to be useful on different test data. Even if the claim turns out to
be true, the experiment provides no evidence for it.

(Note also that there is zero _technical_ contribution from the authors; they
appear to have literally just downloaded Andrej Karpathy's code and cat'ed a
bunch of files into it. The paper cites Karpathy's article as "other work" but
makes no mention of the fact that they used his code.)

It's not entirely uninteresting as a quick demo, and there are a few
paragraphs of interesting speculation about the difficult aspects of automatic
program generation. However, I don't think either those speculations or the
near-trivial demonstration justify the paper's claim to "demonstrate the
feasibility" of end-to-end code generation.

[1]: [http://karpathy.github.io/2015/05/21/rnn-
effectiveness/](http://karpathy.github.io/2015/05/21/rnn-effectiveness/)

