On End-To-End Program Generation from User Intention by Deep Neural Networks

teraflop · on Oct 29, 2015

RNN's are pretty cool, and I don't want to be a downer, but we need to keep the hype under control. Ever since Karpathy's excellent article[1] and sample code made the rounds, it seems to have become a popular pastime to grab some arbitrary dataset, throw a deep network at it, and marvel at whatever output it synthesizes. Those experiments are fun, but we need to avoid the temptation to make assumptions about how well they generalize to "hard AI" problems.

Let's look at the actual experiment described in this paper. Given a corpus of a couple thousand short programs, they discovered that a neural network can:

* Mix and match fragments of near-identical programs to produce something that is "almost" compilable and "almost" equivalent to one of the originals.

* Identify patterns that occur in specific frequent contexts (e.g. the array name that appears before the string "[100]" in the examples given) and remember them for short periods of time, albeit not reliably.

* Do this for four different problems with a single network. We are not told what the problems are, much less how they were chosen or what the sample data looks like.

What's notably lacking is any discussion of generalizing beyond the training data. This isn't much different from writing a paper about how if you give a classifier the same data at training and test time, it can achieve a high accuracy, and using that as a "case study" to argue that the algorithm has the potential to be useful on different test data. Even if the claim turns out to be true, the experiment provides no evidence for it.

(Note also that there is zero technical contribution from the authors; they appear to have literally just downloaded Andrej Karpathy's code and cat'ed a bunch of files into it. The paper cites Karpathy's article as "other work" but makes no mention of the fact that they used his code.)

It's not entirely uninteresting as a quick demo, and there are a few paragraphs of interesting speculation about the difficult aspects of automatic program generation. However, I don't think either those speculations or the near-trivial demonstration justify the paper's claim to "demonstrate the feasibility" of end-to-end code generation.

[1]: http://karpathy.github.io/2015/05/21/rnn-effectiveness/