
Automatic bug-repair system fixes 10 times as many errors as its predecessors - jonbaer
http://news.mit.edu/2016/faster-automatic-bug-repair-code-errors-0129
======
vmarsy
Link to the paper mentioned in the article:
[http://people.csail.mit.edu/rinard/paper/popl16.pdf](http://people.csail.mit.edu/rinard/paper/popl16.pdf)

quick summary: They trained on apr, curl, httpd, libtiff, php , python, svn,
wireshark.

It basically trains the machine learning algorithm with the Abstrat syntax
tree (AST) of the code before and after the successful patches. It then uses
the machine learning predictions on real world defects and see what comes out
of it.

------
caseysoftware
Oh man. I'm already behind on reviewing the Pull Requests I get from human
contributors. :(

------
ryporter
This is undoubtedly cool research, but how useful is the automatic generation
of a fix? In practice, finding the bug is by far the most important step. Once
you've identified the problematic code, it's normally trivial to fix it.

~~~
anon987
That makes me think of the amount of satisfaction one will receive pushing
MIT's autofix-it button as opposed to those seconds or minutes when you've
caught the fucker and go in for the final fix.

Those moments of updating the files: That little euphoria of finding the fix
but needing to focus to get it done correctly and finally hitting enter for
the last time...ahhh.

I'l take that instead.

~~~
themartorana
Yeah but sometimes that is predated by unfamiliar or unwanted behavior in a
production system and years being taken off one's life.

------
neals
I wonder sometimes what will cause all us thousands of software-engineers to
become obsolete.

Not that long ago, we'd be a massive workforce of engineers, armed with
sliderules, calculating away.

[http://s7.computerhistory.org/is/image/CHM/500004055-05-01?$...](http://s7.computerhistory.org/is/image/CHM/500004055-05-01?$re-
medium$)

In 100 years, we'll be the people in that poster, but what will replace us?
Smart software? Quantum computing?

~~~
nothrabannosir
Aren't we those people in the poster? Didn't they become us?

~~~
skatenerd
did you operate a slide rule in a previous job?

------
ryporter
The full paper can be found on Martin Rinard's website [1].

[1]
[http://people.csail.mit.edu/rinard/paper/](http://people.csail.mit.edu/rinard/paper/)

------
kryptiskt
After reading about the quality of the patches produced by these in "Is the
Cure Worse Than the Disease? Overfitting in Automated Program Repair"
([https://www.cs.umd.edu/~tedks/papers/2015-fse-cure-or-
diseas...](https://www.cs.umd.edu/~tedks/papers/2015-fse-cure-or-
disease.pdf)), I'm not so interested. One has to have a very awesome test
suite to not risk that overfitting.

------
lopatin
One thing that I don't get from the article is if it can really catch my
logical bugs. And I'm not trying to bring this research down because I don't
doubt the results, but I am curious, and a bit skeptical, if this is a true
step forward in catching "real" bugs or just surface ones. Even a
sophisticated algorithm such as this. It takes into account features such
variable scope and the operations performed on them, and is trained by real
examples of bug patches on open source projects. Can it really understand my
project? Because my logical bugs are specific to my project. It kind of
reminds me of a recommendation IntelliJ once gave me for the following code:

return {left: am.left, top: am.right};

"WARNING: 'right' should probably not be assigned to 'top'"

This sounds like the kind of conclusion that such an algorithm might come to
after seeing enough examples of similar bugs. It turns out that my code in
this situation is actually correct, but in order to know this, you would have
to have a thorough understanding of the rest of the program.

~~~
xyzzy123
Of course it can't catch your logical bugs. Please see my comment below.

~~~
lopatin
I read it and it's a great overview of the different kinds of bugs that happen
in the real world. Certainly more examples than the article provided. But your
comment seems to be focused more about how a good language with a proper type
system can prevent bugs from happening in the first place which is a
definitely a good point but not what this article was getting at I think.

------
Animats
I wonder if this can be re-purposed to find zero-day exploits.

~~~
cfallin
Not sure about the posted paper, but another recent work that comes to mind is
from David Brumley's group at CMU:

[http://security.ece.cmu.edu/aeg/](http://security.ece.cmu.edu/aeg/)

"Automatic Exploit Generation" which does basically this. (I haven't read
enough of either paper to understand how similar the analyses are, but both
seem to be based on symbolic execution.)

------
coderdude
Are there any languages that automatically (as in, deliberately) lend
themselves to automated error detection? I feel like it's either not possible
or so possible that Haskell is the answer.

~~~
colanderman
Certain languages lend themselves better to static analysis than others. Some
qualities (there are others) that simplify static analysis include:

* static (not necessarily explicit) typing

* no dynamic metaprogramming (static OK)

* no "eval"

* no first-class functions (second-class OK)

* no prototype-based OOP (class-based OK)

* no mutable variables (lexical shadowing OK)

* no mutable values, or very strict aliasing rules

* no unbounded recursion

* no dynamically defined structures

Of course you can perform useful static analysis with only some of these
guidelines met. e.g. Erlang's Dialyzer does pretty well with dynamic typing,
first-class functions, and unbounded recursion, because these features
generally aren't abused in Erlang. (Though this took a hit recently due to the
recent introduction of the `maps` feature, which was designed in such a way as
to encourage violating that last guideline, despite proposals for a similar
feature which would not have violated it.)

Surprisingly, C also meets all but two of these guidelines, and, although it
is an egregious violator of those both, it is somewhat amenable to static
analysis (see Coverity).

JavaScript, on the other hand, is notoriously difficult to statically analyze,
since it not only _permits_ code to violate all the above guidelines, but it's
_common_ for code to violate all of them.

~~~
Splines
Pardon if this is a naive question, but are there languages designed with
static analysis in mind?

~~~
colanderman
There are, but they're not too widely used. Cyclone [1] and SPARK [2] are two
examples. Of course many others may have been designed in such a way that they
_are_ easily analyzable, without that explicitly being a goal.

[1]
[https://en.wikipedia.org/wiki/Cyclone_(programming_language)](https://en.wikipedia.org/wiki/Cyclone_\(programming_language\))

[2]
[https://en.wikipedia.org/wiki/SPARK_(programming_language)](https://en.wikipedia.org/wiki/SPARK_\(programming_language\))

------
sowhatquestion

        there are indeed universal properties of 
        correct code that you can learn from one 
        set of applications and apply to another 
        set of applications
    

Pardon the speculative thought, but wouldn't a program that detects "universal
properties of correct code" be equivalent to a program that detects whether
another program will halt? Hence, impossible?

~~~
SatvikBeri
It's a machine learning based system, so it's probabilistic. You can certainly
write programs that can detect when _some_ other programs halt, just not
universal ones. For example, it would be relatively trivial to look for a
"while true:" loop that has no end conditions.

------
Qantourisc
There is a small downside: there is the potential for learning less of your
mistakes. Especially when one doesn't barely do any code review.

------
nharada
Anyone who has read this paper: how is their learning algorithm any different
than a linear SVM?

~~~
abelbeepboop
It's either an SVM or multinomial logistic regression, but the bulk of the
work is not the choice/design of model, but the structuring the data in a
meaningful way.

The main issues in many interesting ML projects are 'how do you feed the data
to the model' and 'how do you determine its success/failure'.

For instance: you could try to train a Pacman AI by feeding the game state
into some model and asking for an up/down left/right output. But a lower bound
for all the game states would be 2^(number of dots possible on level), making
it impractical/impossible to store such a model in memory much less train it.

The strategy would be to encode the game state into a manageable number of
features. This is a lossy process and the hope is that your set of features is
small enough to be trained on, yet meaningful enough that the model can learn
from them.

In the paper, they parse the data (code) into a syntax tree, compares it with
the patched version. This identifies the 'point' in the tree where the patch
modification takes place. The 'point' the 'type of modification' and a
'collection of neighboring points' are the features that are fed into the
model.

tl;dr, yep, probably linear svm or multinomial logistic

------
mrfusion
I think low hanging fruit would be a system that backs out commits and system
updates until tests pass again. Has that been done.

That way you could at least keep a system running until you can fix it.

What do you guys think?

~~~
bwda
Essentially, this is git bisect.

~~~
mdaniel
To add to this 100% correct comment, if you haven't tried using git bisect, I
highly recommend it. The only thing you need is an automated way of
determining if a specific sha is "good" or "bad" based on the question you are
trying to answer. _Often_ that's "do the tests pass?" but (AFAIK) it can be
any property of the state of the code.

A++, will use again. (err, hopefully not, but you know what I mean)

------
jhallenworld
Does it find incorrect locking code bugs in multi-threaded languages?

