
Computer program fixes old code faster than expert engineers - Libertatea
http://newsoffice.mit.edu/2015/computer-program-fixes-old-code-faster-than-expert-engineers-0609
======
mafribe
I think what the paper does is a variant of _profile-guided optimisation_
(PGO) [1], an old approach to optimising compilers, where the binary is
annotated and run to produce information about runs that is then feed back to
the compiler so as to allow better optimisation the next time the compiler is
run. The most well-know off-shoot of PGO are tracing JIT compilers. The first
(well-known) tracing JIT compiler was Dynamo [2], and indeed, the system under
discussion here uses DynamoRIO [3], a descendant of Dynamo to instrument a
compiled binary. The instrumented binary is then executed multiple times to
generate execution traces that allow the optimiser to find hot code, which is
then analyses to adapt the code to the new architecture, e.g. change buffer
sizes.

[1] [https://en.wikipedia.org/wiki/Profile-
guided_optimization](https://en.wikipedia.org/wiki/Profile-
guided_optimization)

[2]
[http://www.cs.virginia.edu/kim/courses/cs771/papers/bala00dy...](http://www.cs.virginia.edu/kim/courses/cs771/papers/bala00dynamo.pdf)

[3] [http://www.dynamorio.org/](http://www.dynamorio.org/)

~~~
aaronkrolik
Interesting. So what would you say is novel/noteworthy about this research?

~~~
vmarsy
In typical PGO you have access to the source code, the compiler instruments it
for you automatically, you run the program with what You consider good input.
During that time the instrumentation records a lot of interesting stuff. Then
the compiler recompiles your program with that extra knowledge, making much
better decisions[1]. For instance the compiler might realize: "Oh, this
function is called much more often than I thought, I will now inline it."

Here (Link to the paper:[2]) this is different: The authors do NOT have access
to the source code, they only have stripped binaries. From these unreadable
binary instructions, they are able to identify interesting patterns: they
extract the algorithm from the assembly. For this to work, they seem to need
run time information. This means, as in conventional PGO, they need to have a
wide variety of valuable input. It apparently cannot be done at compile time:
" _Current state-of-the-art techniques are not capable of extracting the
simple algorithms from these highly optimized program._ "

Once the algorithm is figured out, the Helium framework is generating domain
specific code in "Halide". Halide compilers knows how to optimize these
stencil computation better than old hand written code, which give them these
impressive improvements.

[1][https://msdn.microsoft.com/en-
us/library/e7k32f4k.aspx](https://msdn.microsoft.com/en-
us/library/e7k32f4k.aspx)

[2] [http://groups.csail.mit.edu/commit/papers/2015/mendis-
pldi15...](http://groups.csail.mit.edu/commit/papers/2015/mendis-
pldi15-helium.pdf)

~~~
marktangotango
Also seems like 'highly optimized' is another way of saying 'obfuscated' in
this case.

------
aaronkrolik
Question for clarification: the title uses the word "fix", implying that this
new program repairs code that was broken. But in reality, it optimizes
repetitive algorithms. Is that the correct takeaway?

~~~
kenbellows
I think it "fixes" code in that it makes it work on newer hardware, but the
title is still misleading; it suggests that the software is performing
bugfixes and correcting code that has failing unit tests, which is totally
unrelated.

------
CmonDev
Title correction: "Computer program optimizes the speed of image filters".

~~~
IceyEC
And general programs like MS Word...

~~~
chucksmash
Where do you see that? I saw mention of using it on one other program - an
image viewer, Microsoft Windows IrfanView.

~~~
jaawn
The confusion arises from the apostrophe that implied that IrfanView somehow
belongs to Microsoft Windows (it doesn't). I almost made the same mistake
while reading quickly.

------
aruggirello
Title is misleading; can a computer program step in and fix it, please?

------
cscurmudgeon
The MIT PR machine has perfected the press cycle eclipsing good research.

[http://www.phdcomics.com/comics.php?n=1174](http://www.phdcomics.com/comics.php?n=1174)

~~~
thrownaway2424
The amount of meaningless verbiage in this press release is indeed disturbing.
What does it mean for the program to become "less effective"? A program is
either effective or isn't. And a "billion dollar problem?" Adobe doesn't even
spend a billion dollars on R&D entirely.

~~~
birdman3131
Less effective meaning that the program does its job slower. From the way I
read this it sounds like the original programmers used optimizations that
worked well on older processors but not newer ones. For instance you might
optimize one way for a pentium 4 but differently for an i7. It is not that the
code for the P4 will not work on the i7 but the code is optimized for an older
processor. Now this can be mitigated by the fact that the i7 is much improved
over the P4 but you can get greater improvements by actually modifying the
program source code as done by the software here.

As for the billion dollar problem I would guess it (Bit rot) is that or more
over the entire software industry.

------
omarforgotpwd
The headline made me think: Imagine the irony if, after years of various human
jobs being replaced by software, someone one day completes the cycle by
inventing an AI that can develop applications for humans. Far off, but funny
to think about.

~~~
sz4kerto
Software development is mostly not about syntax and programming languages, but
domain knowledge and understanding humans. If a non-programmer could really
describe what he wants, then you could write a compiler for that text.

But it's not possible yet, as the AI would need to have a human-like
intelligence, and that's very far away.

~~~
codeshaman
I think what we're looking at in about a decade are interactive natural
language development, in which one 'architect' will replace an entire team of
programmers, designers, testers, etc.

"When I tap on this thing " \- put the finger on the control "Go ahead and get
the latest articles from HackerNews"

The System contacts "Hacker News" and queries for "latest articles" API. The
systems negotiate an API key, account, etc and is good to go. The result is a
list of articles with title and number of comments.

"Now use a nice table. " System formats output. "Show me other styles" Lists
table styles. "This one. Now if the number of comments is more than 100, show
them in red" And so on.

In the end, the System can generate source code in a multitude of languages
(why?), some kind of pseudo code and of course an interactive, editable video
of the "programming" session which others can watch and fix if necessary.

So basically, one person can design the whole "application" in a couple of
hours.

We're not that far from that. And as we get closer, it will become better and
better and it will be a lot easier to extend and modify this "System".

Imagine developing the System using the System.

Programming as we know it will remain a thing of the distant past as is the
case with all things that evolve over time.

~~~
waitForCompiler
That is highly optimistic and I do not buy into this. This would require an
understanding of the human language and here we are FAR off, considering that
it is currently even hard for engineers to grasp a user's requirements.

~~~
seanmcdirmid
RNNs can already write what looks like viable code without any human
intervention. Of course, the code doesn't do anything useful and is just a
reflection of its training, but all we have to do is figure how to guide that.

We are doing OK in human language recognition as well as understanding in
simple dialogue frames. The technology is also moving awfully fast at the
moment. You are thinking in terms of human level intelligence, but it really
doesn't have to be that good. It only has to provide enough random-but-
feedback guided choices until the user finds what they are really looking for.

Put it this way: if the user could get what they wanted from the computer
directly just by "searching", the process would be more efficient since the
most inefficient part about programming is human-to-human communication and
coordination.

~~~
LnxPrgr3
RNNs are good at learning the structure of their training input—even
character-level networks can output vaguely plausible English. Of course, what
you end up with often looks like a computational model of a thought disorder.
They have seemingly lucid moments, but so do Markov chain generators.

As far as code goes, we've already solved the problem of modeling the
structure of any programming language with an implementation. That's not the
hard part.

------
r4um
Relevant paper
[http://dl.acm.org/citation.cfm?id=2737974&CFID=526962266&CFT...](http://dl.acm.org/citation.cfm?id=2737974&CFID=526962266&CFTOKEN=68864296)

~~~
aaronkrolik
Thanks! and the project source:
[http://projects.csail.mit.edu/helium/](http://projects.csail.mit.edu/helium/)

~~~
akkartik
Oddly enough, there's another group at MIT working on rewriting binaries that
was recently discussed on HN:
[https://news.ycombinator.com/item?id=9804036](https://news.ycombinator.com/item?id=9804036)

------
mzs
Here's the paper in PDF:
[http://groups.csail.mit.edu/commit/papers/2015/mendis-
pldi15...](http://groups.csail.mit.edu/commit/papers/2015/mendis-
pldi15-helium.pdf)

------
paulsutter
I suspect that a valuable application of deep learning will be learning how
legacy UIs operate by watching real users, and implement an easier to learn /
nice UI / mobile interface on top of the legacy app. Probably with a human UI
designer stitching it all together. It might even learn a voice interface by
observing call center operators.

Could be easier, less risky, and more urgent than learning to recode the whole
system.

------
Fede_V
Yeah, the paper in itself is interesting, not sure why they had to give it
that completely misleading title.

------
Florin_Andrei
Computer program generates machine code faster than expert engineers. It's
called a compiler.

------
plaes
Did they lose original source code?

~~~
cognivore
No, I don't think they did. I think they're just taking their old awful
unmaintainable source, adding more to the hot heap of slag, and then using
some clever compiler optimizations to make it run better. No love for the
actual programmers. You get to keep working on the nightmare code (3 months to
make changes...).

Now, color me impressed if the thing output the same language as the original
source, all spruced up.

As it is, they'll just have a code base that will eventually transform into
the anti-christ cause management will always be like, "Hey, don't ever fix the
code, just run that see-saw whatever thing on it afterwards."

------
jakozaur
Black swan event in 2020: AI replaces 80% of software engineers.

</fiction></joke>

~~~
prof_hobart
I know you're joking, but the reality is that software has been replacing huge
chunks of developers' work for years, whether through optimising compilers,
common libraries, CI/CD tools, web APIs, IDEs or whatever.

I've been involved in development for over quarter of a century. Back in the
late 80s, we had a dev team dedicated for years building some mapping
software. Their entire product could probably be put together in a couple of
hours using Google Maps and a bit of JavaScript.

All this automation has happened, yet I see no slowdown in the amount of
development that still happens. They can just concentrate on more bespoke,
more value-adding work these days.

~~~
pjc50
_more bespoke, more value-adding work these days_

A software version of the Jevons paradox: making software cheaper increases
the demand for software.

------
foofoo55
For the love of the human race and all of us still using Windows, I really
hope a faster version of Irfanview gets released as a result of this.

------
jnaglick
When people on HN don't bother to read the article, clickbait titles flood the
frontpage.

~~~
brudgers
The article is interesting because it is about the potential disruption of an
industry _and_ is based on work done at an institution with a reputation for
research that disrupts technical industries, e.g. Chomsky's formal grammars.

------
lostmsu
Clickbait title

