
Writing a Simple Decompiler for .NET, Part 1 - zerratar
http://www.codeandux.com/writing-a-simple-decompiler-for-net-part-1/
======
jbevain
If that's a subject you find interesting, you can also read the source for two
OSS .NET decompilers:

ILSpy:
[https://github.com/icsharpcode/ilspy](https://github.com/icsharpcode/ilspy)

JustDecompile:
[https://github.com/telerik/JustDecompileEngine/](https://github.com/telerik/JustDecompileEngine/)

Both are based on the same library that is used in the post: Mono.Cecil
([https://github.com/jbevain/cecil](https://github.com/jbevain/cecil)).

~~~
zerratar
Hi jbevain :-) Nice to see you here! Having a tiny switch to a new blog
engine, but next part should come up soon enough. Have a nice day, and once
more. You rock!

~~~
jbevain
Thanks! Looking forward to reading the next posts in the series!

------
userbinator
Stack-based, high-level VMs like CLR, JVM, and Flash's AcriptScript are
certainly quite easy to decompile, although I think this article unfortunately
misses the point - it's full of (rather verbose) code, but little explanation.
From what I can see it's very fragile too - it attempts to match exact
instruction sequences so won't work for anything even _slightly_ different
from what's presented. This is equivalent to the test() method given, but
won't get decompiled correctly:

    
    
        ldc.i4.4
        ldarg.0
        call System.Int32 Test1.AwesomeClass::c()
        starg.s b
        starg.s a
    

The right way to decompile a stack-based language requires keeping track of
what's on the stack, building expressions instead of evaluating values.

That InstructionHelper class also looks like it could be rewritten more
clearly...

~~~
jcranmer
The standard way to resolve distinct variables is SSA-based decompilation.
I've only worked with decompiling Java bytecode, so I don't know how the CLR
works, but in Java, the compiler definitely reuses the local variable slots.

There's also no discussion of type inferencing for variables, parenthesizing
expression DAGs properly. I suspect properly decompiling control flow would be
in part 2, but I'd be surprised if that were anywhere near robust, based on
the quality demonstrated so far. Which is sad because this sort of
decompilation has been practically demonstrated and solved for, oh, 10-20
years.

~~~
tptacek
Are you and 'userbinator saying the same thing? I can't tell. I know how the
simple symbolic stack->expression evaluation works, and it happens that in my
code I generate something pretty close to SSA expressions, but does SSA do
something else profound for decompilation?

~~~
sklogic
SSA abstracts the stack away, and allows to reason about types much easier.

~~~
tptacek
I'm not sure I'm following. To get from stack operations to expressions, I
just symbolically evaluate the stack, creating temporary variables as I go. It
happens that the resulting IR is pretty much SSA form. But I'm not taking much
else from SSA. I'm wondering if I'm missing opportunities.

~~~
sklogic
It's easier to transform your expressions into a useful form from a
guaranteed, proper SSA than from a simple tree representation. For example, an
induction variable extraction is totally trivial in SSA, and you really need
do to it if you want to reconstruct nice looking `for` loops.

It also pays well to have distinct basic blocks - loop analysis is much easier
then.

~~~
tptacek
This is helpful. But I read it and think, for instance, "distinct basic blocks
aren't SSA"; compilers worked in terms of CFGs before SSA existed. :)

Again this is more about my lack of confidence about fully grokking the
implications of SSA; I'm not nerd-sniping.

~~~
sklogic
Of course, you can have basic blocks without an SSA. It's just another feature
that was missing from the article that was worth mentioning.

Another thing you'll get for free from an SSA - nice ternary expressions
reconstructed (even if the original code was using ifs).

------
ghuntley
If this is of interest to you then you will most likely find the the recent
.NET Core Design API review on ILDASM interesting as well:
[https://www.youtube.com/watch?v=HuRc6CpiOVg](https://www.youtube.com/watch?v=HuRc6CpiOVg)

------
EliRivers
For something with "UX" in the name, it's a surprisingly bad layout. Massive
waste of screen width, and code boxes forcing me to scroll sideways even as
acres of empty space sits there unused.

~~~
EliRivers
Ah, I see it's been partially fixed. The source code sections no longer
require scrolling sideways to see it all, at least.

~~~
lichinobu
I really apologize for that, we were totally taken by surprise by all the
attention and around 2AM i saw your (very valid) reply and was like oh snap!
It's not a perfect fix, but I'll try and improve it as soon as I can. And
thanks Eli.

