
A Sampling of Anti-Decompilation Techniques - swalsh
https://blog.ret2.io/2017/11/16/dangers-of-the-decompiler/
======
DogestFogey
I'm surprised the Movfuscator hasn't been mentioned yet. It compiles C code
into unconditional MOVs, and if you watch the author's Derbycon 2015 video
there are ways you can scramble the MOV instructions, truely making it a
decompilation nightmare.

1\. Movfuscator page
[https://github.com/xoreaxeaxeax/movfuscator](https://github.com/xoreaxeaxeax/movfuscator)

2\. Derbycon 2015 video
[https://www.youtube.com/watch?v=R7EEoWg6Ekk](https://www.youtube.com/watch?v=R7EEoWg6Ekk)

~~~
akanet
This is because "movfuscation" isn't a practical option for people actually
trying to ship binaries that still perform well for customers but resist
reverse-engineering. One of the battlegrounds for this sort of thing is the
tug of war between game developers and cheat developers, and games still need
to perform very well. Things the author mentioned, like address-rewriting at
runtime, don't incur a performance penalty.

~~~
doctorwho
Just rebuild the sensitive portions of your code using movfuscator and leave
the performance critical stuff alone. As long as everything is statically
linked and you don't do anything stupid like "if (check) unlock()" that can be
easily patched, it would make life pretty miserable for the RE crowd.

~~~
ttoinou
What's wrong with "if (check) unlock()" ? x)

~~~
emiliobumachar
It can be modified to "if (true) unlock()" relatively very easily, even in
binary. No disassembling needed.

------
krylon
As someone who has only ever written about 20 instructions worth of assembly,
I am kind of torn.

The kind of cleverness needed to thwart the decompiler's efforts is _very_
impressive. A part of me wishes I had been around at a time when assembly was
an acceptable "language" instead of the last resort it is today for most
scenarios. Getting this close to the metal (or silicon) and being able to pull
off such a stunt must be exhilarating.

At the same time, another part of me is very happy I do not have to deal with
such low-level details. Debugging code written in assembly (with assembly
being the main language, not just inline assembly in a C/C++ program or
something like that) must have been exquisite torture.

~~~
Someone
One can also claim that today’s programming is torture.

Back in the microcomputer days, all you needed to write a program was a desk
with a single computer, one or two tools that you knew through and through and
a few reference books.

Nowadays, we sometimes spend days or even weeks choosing our tools and
libraries of sometimes questionable quality and getting them to work together
before we can even start thinking of the problem we want to solve, and even
then, we still spend half our days googling, and being lucky if we get semi
decent answers.

~~~
thaumaturgy
It's six o' one, half-dozen t'other for me.

My most formative years as a programmer mostly involved 68040 assembly on the
pre-PowerPC Macs. Back then, I used CodeWarrior a lot, and if there was a
head-scratching bug in my software (as there often was), I could launch it
from a fully-featured debugger, set breakpoints, skip ahead to a block of
code, step through it line by line, see every single value that existed in raw
hex on the heap, see exactly what was on the stack. If I was wanting to flip
bits in someone else's software, I had MacsBug, which was a stop-the-world OS-
level debugger that did all the same, but even better. [1] Back then, there
was a very definite sense that I _owned_ the computer, that it was my tool,
that it would do precisely and exactly what I wanted it to. In a way, whether
something was open source or not didn't matter; I could modify it anyway, and
often for less effort than it takes to crawl through a modern over-architected
codebase.

However, now I can glue some really impressive libraries together, written by
other people and available on massive centralized code repositories, and
deliver an application to as many people as I want, all running different
hardware or operating systems (within reason) but still all getting the same
software experience (...mostly...), and even charge some money for it without
having to worry about somebody cracking the license check on my shareware.

The effort required to make complicated new software has gone way down. The
effort required to troubleshoot and fix complicated software has gone way up.
Nowadays, when I'm doing the former, I'm happy, and when I'm doing the latter,
I'm not.

I think that might be part of why there's so much churn in the software
industry today.

[1]: MacsBug did this in real-time. Start up an application, drop into
MacsBug, set a breakpoint on a function call, switch back to the OS and
application, trigger the function call, then step through the code in MacsBug.
If you were careful, you could even "rewind" code under certain conditions, so
you could, say, change a branching opcode, back up, go back to the OS and see
if the change did what you want.

~~~
krylon
That is a _very_ insightful reply. Thank you so much!

I wish I could upvote this more than once. ;-)

------
glandium
I don't know if decompilers are able to do something about it, but there's a
"neat" technique where your machine code can be interpreted as two different
sequences of instructions depending where you start instruction decoding. For
a simple but artificial example of what I mean:

The following sequence of bytes:

    
    
      b8 50 83 ec 10
    

Decodes as:

    
    
      mov $0x10ec8350, %eax
    

if you start at the first byte, and, if you start at the second byte, as:

    
    
      push %eax
      sub $0x10, %esp
    

Here I essentially hid an instruction in the mov'ed data, because that's the
easiest way to create something like that, but I've seen mind blowing examples
of this technique. I unfortunately don't remember where.

~~~
psykotic
That's an issue for static disassembly, which is step one of decompilation.
Your snippet is a classic example that can throw off naive linear-scan
disassemblers. Recursive disassemblers can handle it easily if (and it's a big
if) they can identify the basic block entry points. If you have top-level
entry points (which can also be a problem), the main problem for a recursive
disassembler are indirect jumps that don't fit standard patterns like switch
jump tables.

All of this gets easier if you can augment your static analysis with control
flow traces from program executions with coverage of the relevant branches, so
you don't miss basic block entry points.

~~~
munin
> All of this gets easier if you can augment your static analysis with control
> flow traces from program executions with coverage of the relevant branches,
> so you don't miss basic block entry points.

That's only true if the traces you can generate do a reasonable job at
covering the states the application can find itself in, which is a big
assumption.

~~~
psykotic
Yes, that's what I meant by coverage of the relevant branches. But the good
news is that all these partial, heuristic sources of information can be
combined. E.g. for detecting function entry points, you can combine
information from ELF/PE export tables (if present), function prologues
detected by a linear scan, vtables, static CALL targets, dynamic CALL targets
from run-time traces, etc.

------
tptacek
This is a really cool post. You can also target IDA itself directly, rather
than the decompiler, making it difficult to even view the disassembly. It's
been awhile since I did any of this kind of work (it's relevant to software
security tokens, games, and content protection), but within the last few years
people have published IDA RCE memory corruption, so I imagine it hasn't gotten
too much harder to hopelessly confuse IDA.

~~~
rjzzleep
Well, same here, it's been very long since I've seen that.

But wasn't what got scrambled imports, polymorphic code nasty anti debugging
the shareware packers? Asprotect and whatnot.

------
munin
There's an interesting vein of research work here in making software reverse
engineering more difficult, and measuring how much more difficult.

A precursor to decompilation is control flow analysis, the production of the
control flow graph you see in the "before" stages in all of the examples in
this post. You can go one step further, on a good day, and make it very
difficult (perhaps very very difficult, perhaps impossible) to recover a
_precise_ control flow graph for a function.

There are a few different ways to do this, and I like these approaches more
than targeting specific heuristics in IDA/Hexrays because, on a good day for
the obfuscator, you can make a theoretical statement about the work effort
required to un-do the obfuscation. If you can make that work effort large,
then you start to have a security guarantee that is a shade of the security
guarantee you get in cryptography. The methods outlined in the parent blog
post are great because you can start using them today, but if they annoy Ilfak
enough, he'll fix them and they'll stop working.

------
amenghra
Reminds me tricks people used to crash debuggers. E.g.
[https://reverse.put.as/2012/01/31/anti-debug-
trick-1-abusing...](https://reverse.put.as/2012/01/31/anti-debug-
trick-1-abusing-mach-o-to-crash-gdb/),
[http://blog.ioactive.com/2012/12/striking-back-gdb-and-
ida-d...](http://blog.ioactive.com/2012/12/striking-back-gdb-and-ida-
debuggers.html) and [https://xorl.wordpress.com/2009/01/05/more-gdb-anti-
debuggin...](https://xorl.wordpress.com/2009/01/05/more-gdb-anti-debugging/)

------
kccqzy
There are also undocumented instructions and prefixes. For example at this
year’s DEFCON someone presented a particular kind of e9 jump prefixed with 66.
That instruction is incorrectly disassmbled by all hitherto known
disassemblers (IDA, gdb, objdump, VS, etc). And since that will change the
length of the instruction, effectively you can make the disassembler produce
garbage.

------
krylon
There is an interesting PowerPoint presentation you can find on the internet
on how Skype used to evade debuggers.

[http://www.secdev.org/conf/skype_BHEU06.handout.pdf](http://www.secdev.org/conf/skype_BHEU06.handout.pdf)

The techniques described in that document are different, but in a way they are
alike - both use dirty tricks to shield themselves from people trying to
reverse engineer them. The only difference is that one takes place "at compile
time", in way, while the other works in a dynamic program as it is executing.

------
ttoinou
What about using UPX ? [https://upx.github.io](https://upx.github.io)

Can it make harder for software crackers to achieve their goal ?

~~~
TACIXAT

        upx -d packed.exe
    

UPX is fairly easy to defeat, but many malicious samples will have their own
packers / loaders.

