
Avast open-sources its machine-code decompiler - matt_d
https://blog.avast.com/avast-open-sources-its-machine-code-decompiler
======
ggambetta
Is there any way to use this on a DOS EXE? This would be a lovely tool to
port/remaster old games (i.e. decompile, replace rendering methods with modern
equivalents, compile again -- an "offline" version of the "online" idea I was
going for with
[http://www.gabrielgambetta.com/remakes.html](http://www.gabrielgambetta.com/remakes.html))

~~~
partycoder
Note that it was not rare for some old DOS games to have bits of assembly in
them.

This was usually done to overcome some performance bottleneck but with today's
hardware you might not need that at all.

~~~
khedoros1
Yesteryear's C compilers weren't great, optimization-wise. But anything
remotely modern would just brute force its way through a DOS-era program, well
optimized or not.

~~~
partycoder
Also yesterday's instruction sets were a bit more limited.

------
israrkhan
Cool. Finally some free alternative to IDA decompiler plugin. IDA is still
better due to interactive nature, i.e. you can explore the code and rename
variables/functions as you keep on exploring. I hope this evolves into
something like that.

~~~
milcron
You might be interested in the free decompiler Radare2
[https://en.wikipedia.org/wiki/Radare2](https://en.wikipedia.org/wiki/Radare2)

~~~
chainsaw10
radare2 is a disassembler, not a decompiler.

Disassembly is a much easier task than decompilation, since it's a mostly
mechanical process. Decompilation requires you to undo the
optimizations/transformations the compiler did as it generated the binary,
which is much harder.

That said radare2 is still cool, and a GUI (Cutter) is in the works.

------
pc2g4d
Backward R in the logo: this drives me nuts. To any readers of the Cyrillic
alphabet it says "Ya". So this says "Yaetargetable Decompiler"

I can't be the only one!

~~~
Dylan16807
It's not written in a Cyrillic language. It's written in English.

Nobody comes along and points out that in Spanish the "g" works differently,
and it bugs them to see words with "g" in them.

It's one thing to do faux-Cyrillic and get the letters wrong. It's quite
another to do something silly to a latin letter, and get complaints that it
_resembles_ a non-latin letter.

~~~
pc2g4d
It doesn't _resemble_ the Cyrillic letter---it _is_ it. "R" is an English
letter but not a Cyrillic one, and "Я" is a Cyrillic letter but not an English
one, and by flipping them horizontally you transform them into each other. I
imagine for many of the 7,574,303 people in Russia who speak English and
probably also a fair chunk of the 854,955 Americans who speak Russian (and
presumably have mastered both alphabets), it's annoying. Not a huge deal, just
annoying.

[https://en.wikipedia.org/wiki/List_of_countries_by_English-s...](https://en.wikipedia.org/wiki/List_of_countries_by_English-
speaking_population)

[https://en.wikipedia.org/wiki/Russian_language_in_the_United...](https://en.wikipedia.org/wiki/Russian_language_in_the_United_States)

~~~
Dylan16807
There are so many symbols from different languages that resemble each other.
If I use a smiley face, that doesn't mean I used a "ü" or a "ツ" from another
alphabet just because it looks similar. A backwards R is visually the same as
a Cyrillic character, but that doesn't mean I'm writing in Cyrillic, just like
a "P" is visually the same as a Cyrillic character but doesn't mean I'm
writing in Cyrillic.

------
skate22
Does anyone know why intel discontinued their tamper protection toolkit? They
had an obfuscation compiler that would turn compiled C code into a self
encrypting/decrypting code. The idea was if you dissassembled the code at any
point you wpuld get mostly garbage instructions. I always wondered how a de
compiler could get around that.

[https://software.intel.com/en-us/articles/intel-tamper-
prote...](https://software.intel.com/en-us/articles/intel-tamper-protection-
toolkit-beta-closed)

~~~
_wmd
Google for anything Rolf Rolles has published on the topic, believe it or not
there are general approaches to solving this. Someone already mentioned
dumping the text segment, that only works for silly 90s-era obfuscators.

Contemporary obfuscators _rewrite_ the protected code as a series of
instructions executed on a virtual machine whose bytecode (and bytecode
semantics!) are randomly generated at build time. The solution (AIUI) is
symbolic execution of the instructions to determine their underlying
architectural effect, synthesize some compiler IR that is equivalent to those
effects, run an optimization pass (like a regular compiler) over that IR, and
finally generate x86 from the result.

The optimization passes are necessary to remove side effects that do not
impact the state of the program ("noise"), which modern obfuscators like
Themida insert a ton of into the instruction stream

In other words, rather than attempt to dump some particular part of the
program, the binary as a whole is statically analysed to determine, regardless
of the indirections inserted by any obfuscation pass, what machine
instructions are ultimately executed for a given program input. The abstract
representation is then compiled to an equivalent new program which is much
easier to read, because all of the indirections and noise have been optimized
away.

When I was reading about Rolles' work initially, I couldn't help but imagine
this is the kind of approach Geordi La Forge would have come up with if
cracking an encrypted binary were ever the plot for an episode of Star Trek :)

~~~
j_s
_instructions executed on a virtual machine whose bytecode (and bytecode
semantics!) are randomly generated at build time_

Like the one built into Windows: [https://github.com/airbus-
seclab/warbirdvm](https://github.com/airbus-seclab/warbirdvm)

------
bogomipz
>"As we announced in our Botconf 2017 presentation at the beginning of
December (slides), RetDec, our machine-code decompiler, is now open, which
means anyone can freely use it, study its source code, modify it, and
redistribute it."

These slides linked in the above looks like this was a really fascinating
talk.

Does anybody know when or if this presentation was recorded or if it will be
made available? I would love to watch this.

~~~
j_s
There was a live stream, but nothing appears to be available yet.

[https://twitter.com/Botconf](https://twitter.com/Botconf)

~~~
bogomipz
Yeah I also checked youtube with various search terms and came up empty.
Hopefully someone from Avast or Botconf will read this and post them :)

It looks like some conference presentations from years past have made it to
youtube.

------
Direct
Really cool stuff. I don't like being negative when it comes to fantastic
moves like this, but I'm still really disappointed that it doesn't support
64bit executables.

~~~
SkyPuncher
A reddit comment had a pretty good explanation.

Most malicious code is still written in 32-bit since 64-bit Windows supports
running 32-bit code.

Write something in 32-bit - target 100% of devices. Write something in 64-bit
- target ~50% of devices.

~~~
Zuider
But if security software only treats 32-bit executables as suspicious,
wouldn't it make sense for malware creators to switch to 64-bit?

~~~
firethief
Is any security software that easily bypassed?

------
taspeotis
I've used retdec before, its output is quite nice. I even had some problems
with it (doing dumb stuff like putting in executables that were beyond the
limits they imposed on their website) and whoever they had supporting it were
quite friendly in helping me anyway.

------
Tobba_
Looks like it's also relying on LLVM for disassembly? Ouch; that's an
incredibly bad idea if you're trying to analyze malicious or unusual code
(it's not designed for that), but I guess it's the easiest for a proof of
concept like this.

Although, there's no way an AV company doesn't have its own disassembler, but
those are almost always treated as trade secrets (especially the stuff that
isn't in the spec / the spec is wrong). They'll probably hook it up to that
before doing any real work with it themselves.

~~~
UncleEntity
> Looks like it's also relying on LLVM for disassembly?

[wild speculation here] I suspect they're using llvm to go from an ast to
c(++) code since they have tooling for stuff like that.

Now I have to find me a binary-blob kernel module that manufactures like to
put out and see what the C code it spits out looks like -- another wasted day
methinks...

~~~
Tobba_
You could just ..uh.. _acquire_ a copy of IDA that has the AMD64 decompiler.
It's more mature and spits out C code of wildly varying readability, though
only for one function at a time.

------
tonetheman
Super cool move. Always interesting to look at techniques for this type of
stuff. It always feels like black magic.

------
maxton
This is awesome! I had been using retdec.com since before Avast bought AVG
(where RetDec was originally developed). I'm very excited to do away with the
limits the website imposed.

------
gaspoda
Why they do it? This is their competitive advantage.

~~~
heavenlyblue
Modern AV software uses a virtual machine and a decompiler in tandem. This is
not their only competitive advantage.

------
drej
Now the usual for large projects like this one (this is just for the main
repo):

    
    
      $ loc .
      --------------------------------------------------------------------------------
       Language             Files        Lines        Blank      Comment         Code
      --------------------------------------------------------------------------------
       C++                    587       202592        23441        43727       135424
       C/C++ Header           450        34934         6371        11733        16830
       Bourne Shell            10         2363          247          518         1598
       Plain Text              16          827           46            0          781
       Autoconf                 1         2507          551         1635          321
       Python                   1          195           32           22          141
       Markdown                 2          162           45            0          117
       ASP.NET                  2            2            0            0            2
      --------------------------------------------------------------------------------
       Total                 1069       243582        30733        57635       155214
      --------------------------------------------------------------------------------

~~~
slowmotiony
So there are two lines of ASP.NET code? Do I read this correctly?

~~~
dspillett
Two single line files that have a file extension making them look like ASP.net
code is how I read that.

It could be place-holder scripts that redirect to the correct location, for
instance a default.asp/index.asp file that does nothing but redirect to
index.html some other default that IIS doesn't recognised out of the box. This
would catch cases where someone has just dumped the web assets on a IIS share
(IIS doesn't, or at least didn't used to, consider index.html as a potential
default document). In classic ASP this would be something like the line:

    
    
        <% response.redirect "index.whatever-extension" %>
    

It could also be false positives in whatever test is being performed. For a
project like this, I suspect that is more likely to be the case.

