
McSema: A native code to LLVM IR translation framework - wyc
http://blog.trailofbits.com/2014/08/07/mcsema-is-officially-open-source/
======
btown
Coupled with emscripten, which compiles LLVM IR to asm.js, this could be huge.
Questions of IP and legality aside for the moment, imagine being able to port
everything from old games to VST audio plugins to the browser... without
needing access to the source code!

Others have already had similar ideas: see
[https://lobste.rs/s/m39toj/a_preview_of_mcsema_a_framework_f...](https://lobste.rs/s/m39toj/a_preview_of_mcsema_a_framework_for_converting_x86_binaries_to_llvm_bitcode)

Of course nothing other than trivial examples will work at this early stage...
but we can only dream!

------
0x09
Other LLVM decompilation projects:

Dagger, [http://dagger.repzret.org](http://dagger.repzret.org) (x86)

Fracture,
[https://github.com/draperlaboratory/fracture](https://github.com/draperlaboratory/fracture)
(x86, ARM, PPC)

libbeauty,
[https://github.com/jcdutton/libbeauty](https://github.com/jcdutton/libbeauty)
(x86)

~~~
jevinskie
I've actually used dagger to transform a very simple library into IR. I had
trouble trying/was unable to do the same with the other two projects.
Unfortunately when I contacted the dagger authors for some tips on how to fix
some deficiencies that I found with a more complex binary, I received no
response. =(

Granted, they are students and thus always busy

I can't wait to try out McSema. =)

------
toleavetheman
I have been waiting for something like this to be solved by those more capable
than myself. My dream is to combine this with ZeroVM for cloud execution and
LLVM Polly for automatic parallelism (and other static analysis), and to have
a runtime that seemingly magically takes a simple scientific app and runs it
on a huge virtual machine in the cloud.

------
moyix
The talk at REcon that introduced this tool is also excellent:
[http://recon.cx/2014/video/recon2014-10-artem-dinaburg-
andre...](http://recon.cx/2014/video/recon2014-10-artem-dinaburg-andrew-ruef-
Static-Translation-of-X86-Instruction-Semantics-to-LLVM-With-McSema.mp4)

------
beernutz
Pardon my ignorance, (I don't mean to be dense here) but what the heck is this
doing? It sounds like recompiling machine code into intermediate code. Is that
a fair (if short and over simple) description?

~~~
SloopJon
The announcement links to an earlier blog post (pardon my formatting):

[http://blog.trailofbits.com/2014/06/23/a-preview-of-
mcsema/](http://blog.trailofbits.com/2014/06/23/a-preview-of-mcsema/)

"McSema translates x86 machine code into LLVM bitcode.

"Why would we do such a crazy thing?

"Because we wanted to analyze existing binary applications, and reasoning
about LLVM bitcode is much easier than reasoning about x86 instructions.

"Not only is it easier to reason about LLVM bitcode, but it is easier to
manipulate and re-target bitcode to a different architecture."

~~~
beernutz
Thank you. I missed that post.

------
jacob019
Would this allow one to recompile X86 binaries to another architecture?

------
greeq
What's the performance overhead though? And size overhead? I'd imagine it'd be
quite large.

