
DeepFix: Fixing Common C Language Errors by Deep Learning - homarp
https://www.aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14603
======
Safety1stClyde
When I saw this title, I was expecting to find something about using a
computer to detect things like "use after free" errors. However, skimming the
paper, it appears to refer to altering the source code to remedy compile-time
errors rather than to finding flaws in program logic. The notion of the
authors is to repair broken inputs so that they compile correctly despite
syntax errors by the programmer.

I don't think this is a particularly useful thing to do. Essentially it is
somewhat like making a compiler which accepts broken inputs using heuristics
to guess at what the programmer intended to write. That will not result in
better programs being written, rather it will result in the programmer not
only writing worse code but not even being aware of the flaws in their code.

~~~
marvy
Obviously you don't want silent autocorrect. You use this to get suggested
corrections when things don't compile.

~~~
Safety1stClyde
> You use this to get suggested corrections when things don't compile.

It repeatedly says that the aim is to fix errors. Assuming, however, that it
is intended to give better error messages, here is the output of Clang on the
input C program given as an example:

    
    
      $ cc gupta.c
      gupta.c:3:5: warning: incompatible redeclaration of library function 'pow'
          [-Wincompatible-library-redeclaration]
      int pow(int a, int b);
        ^
      gupta.c:3:5: note: 'pow' is a builtin with type 'double (double, double)'
      gupta.c:14:23: error: function definition is not allowed here
      int pow(int a, int b){
                          ^
      gupta.c:18:14: error: expected ';' at end of declaration
      return res;}
                  ^
                  ;
      gupta.c:18:14: error: expected '}'
      gupta.c:4:11: note: to match this '{'
      int main(){
                ^
      1 warning and 3 errors generated.
    

It identifies the problematic line of the program code better than gcc. The
function declaration within main is not legal. I tried this with the gcc
compiler and did not get the error above. Running "indent" on the code would
have revealed the problems:

$ indent < gupta.c

/ __INDENT __Error@18: Stuff missing from end of file * /

A simple count of braces, { and }, would also have revealed the problem.

~~~
marvy
So you're saying this whole paper could have been avoided if they just used
clang. And it seems you're right.

------
shriphani
There was similar work by W. Zaremba (OpenAI I think). LSTM trained to
interpret python source code. Took a ton of hackery to get it to not fall
apart on simple operations and it continued making indexing errors. I wouldn't
read too much into these results.

[https://arxiv.org/abs/1410.4615](https://arxiv.org/abs/1410.4615)

------
nullc
I keep wondering why people don't mine open source commit histories for bug
fixes to train a classifier to recognize buggy vs non-buggy code in order to
help draw more review attention to code that the classifier thinks smells
buggy.

~~~
denzil_correa
They do. Example -
[https://users.soe.ucsc.edu/~ejw/papers/cc.pdf](https://users.soe.ucsc.edu/~ejw/papers/cc.pdf)

> Change classification uses a machine learning classifier to determine
> whether a new software change is more similar to prior buggy changes, or
> clean changes. In this manner, change classification predicts the existence
> of bugs in software changes. The classifier is trained using features (in
> the machine learning sense) extracted from the revision history of a
> software project, as stored in its software configuration management
> repository.

------
averagewall
This is depressing and typical of academic research that apparently has
immediate useful application. They apparently just did it to publish a paper
and will abandon it now. Nobody else is going to have the motivation to
recreate their work into a usable piece of software. It's just one more
unfinished project on the massive heap of unrealized and forgotten academic
work.

~~~
TezlaKoil
Alas, spending time on writing production-ready software is career suicide in
academia. Conferences and journals value novelty, and are not interested in
incremental improvements.

If you're really lucky (EvoSuite [1] / Google), BigCompany will take interest
in your project and give you some money. The expectation is that you'll hire
students to improve your software, and then these students will go on to work
for BigCompany.

[1] [http://www.evosuite.org/evosuite/](http://www.evosuite.org/evosuite/)

------
toolslive
it would be great if you could improve the design of the language to prevent
common mistakes from happening in the first place.

------
rjeli
They missed the chance to call it DeepC...

~~~
jjgreen
Peter van der Linden beat them to it with his 1994 classic "Expert C
Programming Paperback, Deep C Secrets" (outstanding book BTW).

