
Snowman: native code to C/C++ decompiler - ingve
https://derevenets.com/
======
xvilka
radare2 [1] project is also working on a decompiler, which uses ESIL [2]
intermediate language as a source and lifts it to the RadecoIL, whish is then
simplified and transformed to C. The missing parts now are mostly Memory SSA,
C AST generation (partially done) and Type Inference. The decompiler itself
written in Rust and uses the radare2 as a source of ESIL and other
metainformation. Using the ESIL as a source will allow to implement the
support for a different architectures, not only the common ones. Currently
we're running RSoC - Radare Summer of Code [3], and hope that our 2 students
will make the significant progress on both Rune (Symbolic Execution on top of
ESIL) and Radeco projects. And we are always happy to welcome a new potential
contributors to all underlying projects, including radare2 itself. If you want
to help us - please join #radare IRC channel or #radare Telegram channel [4].
The sources of Radeco are located at [https://github.com/radare/radeco-
lib](https://github.com/radare/radeco-lib)

[1] [http://rada.re](http://rada.re)

[2]
[https://radare.gitbooks.io/radare2book/content/disassembling...](https://radare.gitbooks.io/radare2book/content/disassembling/esil.html)

[3]
[http://radare.today/posts/RSOC-2017/](http://radare.today/posts/RSOC-2017/)

[4] [https://telegram.me/joinchat/ACR-
FkEK2owJSzMUYjt_NQ](https://telegram.me/joinchat/ACR-FkEK2owJSzMUYjt_NQ)

~~~
ecma
I'll preface this by saying that I love radare2. It's my goto tool when I
don't need to share work with IDA/Binja users and don't need to decompile
something.

The radeco project is a train wreck. The current state of radeco-lib (unless
it's been remediated in the last month) is disappointing and the only reason
it compiles is because the last SoC student appears to have commented out the
bindings that radeco is meant to use to get radeco-lib to do anything. I
actually spent an evening attempting to undo that absurd series of commits but
after getting a lot of the commented out back in place, not being a Rust
programmer, hit roadblocks I did not understand regarding types and traits.

Unsolicited advice incoming. Please keep a close eye on your RSoC students
this year. Their goals to achieve anything which they can present do not
necessarily grok with the ongoing health of your project. I'd also love it if
you would drop Rust and work with a more accessible language, at least while
you work toward an initial version which spits out something resembling C
code. Ultimately it's your project so do whatever you want but IMHO making
everyone understand an inherently complex project in a language which is not
straightforward is not the best option. Or at least add some documentation and
make your lib and program build together...

------
guest_may_2017
I'm glad to see a new decompiler, but it looks like it isn't an Optimizing
decompiler like Hex Rays yet.

I tested the IDA plugin and it happily gave me very long lines like this:

    
    
        esp74 = reinterpret_cast<void*>(reinterpret_cast<int32_t>(__zero_stack_offset()) - 0x104 - 4 - 4 - 4 - 4 - 4 + 4 - 4 + 4 - 4 + 4 - 4 + 4 - 4 + 4 - 4 + 4 - 4 + 4 - 4 + 4 - 4 + 4 - 4 + 4 - 4 + 4 - 4 + 4 - 4 + 4 - 4 + 4 - 4 + 4 - 4 - 4 + 4 - 4 - 4 + 4 - 4 - 4 + 4 - 4 - 4 + 4 + 20);
    

Edit:

Furthermore there seem to be some correctness issues, or at least misleading
output.

If a string is modified at runtime (for example for obfuscation purposes) then
passed as an argument, Snowman will show the original string directly like
foo("incorrect", 23), instead of just using an opaque variable like
foo(some_var, 23)

~~~
moyix
Looks like it needs a constant folding pass, yep.

------
moyix
Another open source decompiler is fcd:

[https://zneak.github.io/fcd/](https://zneak.github.io/fcd/)

I quite like the authors' blog about the development of the decompiler, as it
gives a lot of insight into how it works and what academic literature it draws
on.

You can also find a video of a talk the author (Felix Cloutier) gave at the
Security Open Source workshop:

[https://www.youtube.com/watch?v=h1NP-
DV4GVQ](https://www.youtube.com/watch?v=h1NP-DV4GVQ)

------
smartmic
Sorry for the maybe silly question, but why does one need a decompiler? Isn't
it easier to look an disassembly from tools like objdump? The example from the
Hello World decompilation does not look significantly more readable to me than
a disassembly (with some basic knowledge of assembler).

~~~
moyix
A good decompiler can have a massive impact on the readability of the code.
For example, here's a study where the authors found that their decompiler
allowed students without reverse engineering expertise to approach the
performance of RE experts on some tasks.

[https://net.cs.uni-
bonn.de/fileadmin/ag/martini/Staff/yakdan...](https://net.cs.uni-
bonn.de/fileadmin/ag/martini/Staff/yakdan/dream_oakland2016.pdf)

Sadly, DREAM++ has never been released open source :(

~~~
ant6n
If there's a binary of it, you could use it to decompile itself. A kind of
reverse bootstrapping.

~~~
moyix
Sadly there is no binary, only papers. Now, if someone could come up with a
technique for automatically creating source code from a PDF description... :D

------
baby
Coincidentally a colleague of mine tried it yesterday and ended up with a
better decompilation than hopper. Hopper wouldn't catch a loop and would
display it weirdly while Snowman just worked. I've been wondering if binary
ninja would have gotten good results, but there is no demo for 64-bit
binaries.

Unfortunately for me I'm stuck with Hopper as Snowman is windows only.

~~~
hellofunk
What do you use a decompiler for? Just for fun, or is it part of your work?

~~~
baby
For fun :) there is this challenge here that is ending today:
[https://github.com/kudelskisecurity/cryptochallenge17/blob/m...](https://github.com/kudelskisecurity/cryptochallenge17/blob/master/README.md)

------
heeen
Has anyone tried training a RNN on high level language <-> assembly?

Would be cool if it could even guess variable names from patterns it has seen
before, like x,y,z for vector structs.

~~~
devrandomguy
I dunno... If you took a few random textbook physics problems, and replaced
all of the nouns and units with arbitrary consistent strings, do you think
that it would be possible to tell an electromagnetic problem apart from a
plumbing problem? What if the subject is an electric pump?

------
userbinator
IMHO the "holy grail" of decompilation is to decompile the compiler, compile
the decompiled decompiler, and get back a functioning decompiler that can also
decompile itself ad infinitum. After several iterations, it may reach a fixed-
point... this is essentially the exact opposite of what's customarily done
with compilers: compile the compiler with itself, and repeat with the self-
compiled compiler until a fixed-point is reached.

Thus, I naturally tried this one on itself, but that didn't work so well ---
it spent several minutes analysing, then crashed.

Then I picked something slightly easier, upon which it did manage to decompile
successfully, but the output is... not exactly what I expected. Copious void
pointers of various levels of indirection (plenty of "three-star-programmer"
code...) and reinterpret_cast sprinkled everywhere --- I have the original
code and it was written in C, so it amusingly enough decided to automatically
convert it to C++, along with the inability to recognise accesses to local
variables leading to long sequences of -4-4-4-4+4-4-4+..., mean that for me
it's not really all that better than reading the Asm directly.

The latter test was with a binary compiled with a very old compiler, so I
suspect something with the newest optimising compilers will produce even more
confounding output.

That said, it's great to see plenty of decompilers being written and released
publicly; I remember around 2 decades ago when any mention of decompilation
would be met with disdain and chants of "that's impossible!" Hex-Rays and IDA
may have spurred a lot of this development; but speaking from experience,
cracking groups have always written their own private decompiler-ish tools,
mainly featuring dataflow analysis.

------
rhabarba
And it has been integrated into the awesome x64dbg for quite a while. :)

~~~
StavrosK
Huh, that looks a lot like OllyDbg. Do you know how it compares to
Olly/IDA/Binja?

~~~
rhabarba
I guess it wouldn't even exist if OllyDbg x64 would be a (non-alpha) thing.
x64dbg provides a number of plugins in order to fill missing features to
IDA/Olly:
[https://github.com/x64dbg/x64dbg/wiki/Plugins](https://github.com/x64dbg/x64dbg/wiki/Plugins)

~~~
e12e
Looks like a very interesting project, but maybe there's some
misunderstandings about the license; from the readme:

> x64dbg is licensed under GPLv3, which means you can freely distribute and/or
> modify the source of x64dbg, as long as you share your changes with us.

Should probably read: "... as long as you make genuine offer of providing the
source code and changes to those you distribute your version of x64dbg to."

In practice it of course makes sense to upstream changes, but there's nothing
in the gpl about that.

~~~
mrexodia
This is in fact on purpose. Basically I stated my intent of using GPL.

~~~
e12e
That's fine, and it is of course how many projects use the GPL in most cases
in practice -- but as it reads in the readme, it sounds like the GPL doesn't
[allow] someone to fork the project, port it to say, OS X, or arm - and sell
the changed fork to a to a customer without giving the changes back upstream.
The porter would have to offer sources to the customer, and the customer would
be free to upstream the sources - but from the GPL, there's no legal
compulsion to do so.

Anyway, I guess I would have reworded it somewhat, to make it more obvious
that the source is under GPL, but that the project welcomes and encourages
upstreaming changes. This opposed to the code being under a _modified_ GPL.

------
haberman
I would love to see support for this on gobolt.org. It would be really fun to
see an optimizer's output expressed as C. For example, you could easily see
the results of strength reduction operations, where something like "x / 2" is
compiled into "x >> 1".

------
AdmiralAsshat
[https://derevenets.com/examples.html](https://derevenets.com/examples.html)

So, a decompiler is cool and all, but...a five-line "Hello World" program
turned into a 144-line decompiled program. Is that an accomplishment? I'm
pretty sure the "reconstructed" C from that is longer than the assembly.

EDIT: Just to confirm, this is what I got when I put the Hello World code into
"hello.c" and ran GCC against it:

    
    
      gcc -O2 -S -c hello.c
    

hello.s:

    
    
              .file   "hello.c"
              .section        .rodata.str1.1,"aMS",@progbits,1
      .LC0:
              .string "Hello, World!"
              .text
              .p2align 4,,15
      .globl main
              .type   main, @function
      main:
      .LFB11:
              .cfi_startproc
              subq    $8, %rsp
              .cfi_def_cfa_offset 16
              movl    $.LC0, %edi
              call    puts
              xorl    %eax, %eax
              addq    $8, %rsp
              .cfi_def_cfa_offset 8
              ret
              .cfi_endproc
      .LFE11:
              .size   main, .-main
              .ident  "GCC: (GNU) 4.4.7 20120313 (Red Hat 4.4.7-18)"
              .section        .note.GNU-stack,"",@progbits

~~~
moyix
You haven't linked the program in your assembly example. All the extra code
you see there is a result of the libc startup code. Decompilers work starting
from then entry point ( _not_ your program's main), which is why there's so
much extra code. If you look at just the code starting from main, you get
something much simpler:

    
    
        int64_t puts = 0x4003e6;
    
        void func_4003e0(int64_t rdi) {
            goto puts;
        }
    
        int64_t main() {
            func_4003e0("Hello, World!");
            return 0;
        }

~~~
AdmiralAsshat
Alright, that makes much more sense. Thanks!

------
AlexDenisov
For those who curious and who is in Berlin:

There is going to be an event on this Thursday (July 27, 2017) where the
author of this tool will be talking about decompilation.

[https://www.meetup.com/LLVM-Social-
Berlin/events/241197713/](https://www.meetup.com/LLVM-Social-
Berlin/events/241197713/)

~~~
moyix
Do you know if this talk will be recorded? I would love to watch but it is a
bit far from NYC :)

~~~
AlexDenisov
Yes, we are going to record it. But the publishing is up to the speaker. I
will post the link here if it happens.

------
0xcde4c3db
From the examples:

int64_t puts = 0x4003e6;

void func_4003e0(int64_t rdi) { goto puts; }

What is this? Is there some compiler that will actually accept this use of
goto? Is it just a convention meant for human consumption to translate jump
instructions with no translated target? Is it a bug in the decompiler?

~~~
c_shu
Strange. Could it be related to labels as values?
[https://gcc.gnu.org/onlinedocs/gcc/Labels-as-
Values.html](https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html)

------
api
Could you compile a Go program and then decompile to C? I can see actual uses
for that like porting to old OSes.

~~~
ZenoArrow
I can see the benefits, but I doubt it would be that simple. The Go code you
decompiled would depend on a Go runtime. That specific Go runtime would then
have dependencies on OS libraries. So for example, when you open a file in Go,
I'd imagine that this functionality is built on top of the file handling
functionality of Windows/OSX/Linux. You could work around these dependencies,
but it's probably less hassle to port the Go runtime to the new OS.

~~~
api
Doesn't the Go runtime get linked in and would just get decompiled? It would
generate a huge blob of C but the idea is just to port.

~~~
ZenoArrow
Not everything that the program needs to run is included in the binary.

Let's use a more concrete example. Let's say we write a Go program that copies
a file. On Windows, this might use an API call like CopyFile:

[https://msdn.microsoft.com/en-
us/library/windows/desktop/aa3...](https://msdn.microsoft.com/en-
us/library/windows/desktop/aa363851\(v=vs.85\).aspx)

If you decompile the compiled Go program into C, it'd still have the
references to API calls like this. These APIs would have to be implemented on
the new OS for the decompiled program to work without modification.

