
Reverse – reverse engineering tool with pseudo-C output - jaybosamiya
https://github.com/joelpx/reverse
======
jacquesm
I built something like this long ago. The easy part - I thought - would be to
separate data and code by evaluating all the branches twice (once for the
branches taken and once for those not taken) and build up a backlog of the
branches not yet taken. Every step along the way you'd mark the instruction
pointer as the opcode, the other bytes for that instruction as the operands
and the remainder after a complete run would be the static data present in the
binary.

If only.

The easiest way to throw such a scheme off is to have an instruction jump into
the middle of an operand and have the code be dual meaning, one reading gives
you one disassembly, another a completely different one.

If you then try to convert the result to a pseudo-C construct (a function with
a stackframe and the corresponding unwinding of that stackframe at the end of
the fuction) then it no longer works...

The short version of the above is there is a 1:1 correspondence between a C
language source code and the un-optimized output of a compiler. That coupling
is a lot less hard when you start optimizing and when you use hand-crafted
assembly as the input to a reverse-compiler the output may simply no longer be
functionally equivalent because the mapping might be non-existent and this can
be a hard situation to detect.

~~~
saurik
Eleven or twelve years ago, I was presenting a paper at the European Joint
Conferences on Theory and Practice of Software; I went to a talk while I was
there by Gogul Balakrishnan, and spent some time talking to him about
decompilation: I remember him telling me that the problem you describe is
simply not possible to solve for x86.

FWIW, jumping into the middle of an instuction is not actually a problem (even
though I do realize it trips up a number of disassemblers: I actually
publishef a proof-of-concept obfuscating assembler that was designed to
specifically break all the best disassemblers at the time, and leaned heavily
on that), as you just need to build your own graph of the execution of the
program: don't try to mark regions of memory, as that will screw you.

What really gets you quickly are computed jumps: even super popular
disassemblers often hiccup on simple switch tables, as there is no way to
really know for sure all the cases that were proven to be impossible by the
compiler and left out of the checks. And one could imagine a case where a jump
is computer based on the output of a potentially undesirable function... you
essentially need to be able to prove things about programs to even ask the
question, and those proofs almost always fail due to incompleteness :(.

~~~
jacquesm
Ah yes, switches, I totally forgot about that bit but you're absolutely right,
those were impossible to get right, especially with the optimizer on. The end
result that best matched the code would be a jumble of ifs and gotos rather
than a switch and because you don't know the bounds you're going to have to
make a bunch of assumptions about the contents of the addresses of the jump
table. Bad luck to you if one of the jump tables (you can get more of them for
disjoint sets of case statements that cluster together) happens to be shorter
than you think and is followed by other address pointers.

Decompilation for C is a very hard problem. Conceptually super easy but
extremly hard if not impossible to implement. The java folks have it so much
easier in this respect.

~~~
drakenot
So has the Hex-Rays made significant breakthroughs in this area or does their
decompiler have the same issues?

~~~
jacquesm
It works for relatively easy switch statements:

[https://www.hex-
rays.com/products/decompiler/compare_vs_disa...](https://www.hex-
rays.com/products/decompiler/compare_vs_disassembly.shtml)

It requires compiler generated code as input (which I don't find a huge
restriction, but any assembly code will screw up the output).

------
user9982
I'd definitely just prefer using the dissassembled output as it's infinitely
more readable once you've grokked it. Nice project though, in a field as
monopolised as reversing, you can never have too many tools!

------
andrewchambers
I find the comments redundant and annoying tbh. This is more like an
alternative assembly syntax.

------
eccstartup
rm -rf /usr/lib/python3. _/ site-packages/capstone_

???

~~~
lorenzhs
context:
[https://github.com/joelpx/reverse/blob/19e45ff4892ae7aaf628e...](https://github.com/joelpx/reverse/blob/19e45ff4892ae7aaf628e6d698d1c2c46da5846e/requirements.sh#L19)

~~~
eeZi
It installs capstone using "sudo make install". Unconvinced.

------
steipete
So, just like IDA from Hex-Rays, but way worse?

~~~
skissane
Well, this is open source, IDA is not. And while the features are far less, if
the author keeps working on it, the gap will shrink with time. (Given the vast
feature set of IDA, it's unlikely the gap will ever be closed completely, but
it might become good enough for some people's purposes.)

