
A Survey of Symbolic Execution Techniques (2018) - kachnuv_ocasek
https://arxiv.org/abs/1610.00502
======
dtornabene
This is a solid paper if you want to learn something about SymbEx. If it
_really_ interests you and you want something more tutorial oriented, pick up
a copy of Practical Binary Analysis, out recently from nostarch. The last
chapters are a tutorial intro to it using Triton. If you're a functional
programmer and already know some Haskell or Ocaml or are able to pick it up
quickly, check out BAP, the binary analysis platform, its got a bunch of
tools/analysis built in already for this stuff and there's a fair amount of
fascinating research done using or on the platform itself.

[https://practicalbinaryanalysis.com/](https://practicalbinaryanalysis.com/)

[https://github.com/BinaryAnalysisPlatform/bap](https://github.com/BinaryAnalysisPlatform/bap)

Cool fact about BAP, there's a really weird but very cool embedded lisp that
allows you to drive execution of single instructions via meta programming. It
works via a term rewriting system. Seriously one of the coolest lisps I've
found out in the wild.

[http://binaryanalysisplatform.github.io/bap/api/master/Bap_p...](http://binaryanalysisplatform.github.io/bap/api/master/Bap_primus.Std.Primus.Lisp.html)

Also, the docs for the project are borderline decadent in their completeness
and the team responsible are quite active via gitter. For real, if you're
interested in binary analysis, check it out.

~~~
tempguy9999
No offence intended I don't really see the value of this book WRT to this
paper. The paper seems to be about HLLs where the book, great as it surely is,
is 90% about, well, binaries.

Given that I won't have time to learn binary analysis (there's no payback for
me), is it really worth getting the book just for the final chapter, which I
presume is not going to cover HLLs anyway?

~~~
dtornabene
I mean, my comment was pretty straight forward as to the relation? The book
has full, detailed code listings and provides a vm with runnable code, you end
the book building an automatic exploitation engine for a C++ program, so it
does cover HLLs? The binaries in the book are all C/C++ programs. I guess I
don't really get the comment(no offense taken), given that techniques like
symbex are used on binaries written in HLLs. The paper itself is a survey of
the tradeoffs inherent in different implementations of symbex. So, if you want
to get a sense of what symbex is, read the paper and follow up with some of
the endnotes, if you want to build an application that actually uses symbex
buy the book and work through it. Hope this helps!

------
touisteur
I'm curious, HN. Apart from Fujitsu, do any of you use Symbolic Execution in
an Industrial setting or in your day-to-day job ? I'm starting to see fuzzing
'largely' adopted, and an uptick of interest in proof (see recent partnership
between Nvidia and AdaCore on Spark2014 for firmware), but not much on
Symbolic Execution ?

Apart from Trail of Bits, University Research teams, and automatic patching
competitions ?

Anyone using cbmc, Klee, angr, s2e ?

~~~
jor-el
I have come across many pentesting labs using these tools to deal with
obfuscated binaries. You can check Quakslab's blog, they have articles on this
[0]. Another interesting project is of cracking Tigress VM using Symbolic
execution [1]. The use has dramatically increased in past few years as many
new tools are available and also hardware is more performant now. I am also
using these tools in my day-to-day job to deal with obfuscated binaries and
reverse engineering. I use Miasm [2].

PS: My company is hiring for such roles
[https://www.reddit.com/r/netsec/comments/b90hep/rnetsecs_q2_...](https://www.reddit.com/r/netsec/comments/b90hep/rnetsecs_q2_2019_information_security_hiring/ek374i0)

[0] [https://blog.quarkslab.com/deobfuscation-recovering-an-
ollvm...](https://blog.quarkslab.com/deobfuscation-recovering-an-ollvm-
protected-program.html) [1] [https://blog.quarkslab.com/deobfuscation-
recovering-an-ollvm...](https://blog.quarkslab.com/deobfuscation-recovering-
an-ollvm-protected-program.html) [2] [https://github.com/cea-
sec/miasm](https://github.com/cea-sec/miasm)

~~~
dtornabene
Yeah, this is it. Security firms specializing in static/dynamic analysis are
going to be using this stuff. Vulnerability hunting, things like that.

~~~
souprock
Doing exactly that, and hiring:
[https://news.ycombinator.com/item?id=19797601](https://news.ycombinator.com/item?id=19797601)

~~~
departure
Funny that none of your posts mention that it's (I think) Raytheon. Do you
think you get less responses if you mention it's a major defense contractor?

~~~
souprock
Multiple reasons:

Big companies are frequently composed of components with distinct culture,
benefits, legal entity, and so on. People get to know one part, then assume
that all parts are that way. I'm in a component with compensation that is
better than average for the big company. The association isn't beneficial.

Using the big company name means adhering to corporate branding requirements.
People could somehow imagine me to be issuing official corporate
communications, which is far from the truth.

I might get more responses, but would I want them? We're fighting to keep out
the toxic people who want to focus on politics. We have important work to do.

There is an absurd Glassdoor review out there. If that is to be believed, we
pay highly experienced cleared specialists about as much as they'd make
cooking food at In-N-Out or Chick-Fil-A. Our office would be empty if that
were the pay being offered.

~~~
dtornabene
lol, fighting to keep out "toxic people" who "want to focus on politics" means
you hire people cool with making weapons of war. I for one am glad the person
who asked you that did so, because I _do not_ work for defense contractors,
and I don't need to waste my time looking into a place that would filter out
anyone with conscience.

~~~
souprock
That is a different sort of "toxic" and a different sort of "politics", but
yeah we don't want that either. I was referring to the sort of people you'd
find in Google or even HP, backstabbing and undermining to jockey for position
in the corporate hierarchy.

People with a conscience can feel that it is wrong to devote their efforts to
tracking people all across the internet to sell ads and other junk. (Facebook,
Google, etc.) They can feel that it is important to support their country, and
that it would be wrong to pass up a reasonable opportunity to do so. With a
better conscience, you would feel guilty about how you benefit from the nation
without contributing much, and of course you would stop supporting violent
hate groups that pretend to be otherwise.

------
derefr
Is anyone here using Symbolic Execution for something _besides_ source code
verification?

I’m attempting to use the technique right now in the context of writing an
execution tracer for a bytecode VM runtime, where I want to “recover” ABI
types for arbitrary bytecode (that I don’t have the source to), in order to
emit traces that describe what’s going on in high-level terms (e.g.
“foo[x][y].z = 3” rather than “$329f = ($329f & (~(1 << 32) << 8)) | (%r25 &
0xff)”.)

So I’m doing a static-analysis pass of the bytecode (essentially a
decompilation pass) which symbolically executes math ops to build up symbolic
formulae representing the math being done, and to associate program-counter
positions in the bytecode with value bindings in the formulae; and then I’m
feeding those associations to the runtime tracer, so that it can notice when
it’s on an “interesting” PC position and feed its current register/stack value
into an instance of the associated formula, for me to emit once I get all the
bindings of the formula filled in.

So far I haven’t found any texts or papers talking about using symbolic
execution in this context (to recover information lost due to previous
compilation, in a pass “within” a tracer, JIT, transpiler, or anything else
that starts with bytecode or IR), which is kind of annoying. Anyone have any
resources? Projects that use this technique? (I’m thinking that this is
something you’d see done in sufficiently-advanced game-console emulators, as a
“deoptimization” pass to recover structure to help dynamic recompilation
optimize better.)

Also, related question: if I’ve got a stack machine, and _all_ the jumps in
the ISA are indirect jumps (but with literal values pushed on the stack), is
there any way to recover the edges between basic blocks _without_ doing a
symbolic exec over the stack ops (i.e. to discover, among all the DUPs and
SWAPs and such, which constant stack-slot the jump op is actually receiving)?
I’m pretty sure there is a “weaker” method, but I’m stymied here by my lack of
compiler-theory education; everything I google about dominance hierarchies and
such assumes you’re starting with a model of a register machine with direct
jumps (where the edges are context-free extractable from the jump ops).

~~~
kolinko
That's roughly how I do it for Eveem.org (Ethereum VM decompiler) - what you
see there is essentially an output from the symbolic execution trace.

------
kolinko
Oh oh, a room for a shameless plug! :)

I managed to use symbolic execution to build a smart contract decompiler -
Eveem.org. What you see there is essentially a trace from a symbolic execution
of an Ethereum contract, plus some postprocessing.

------
some_random
"Sometimes you can’t see how important something is in its moment, even if it
seems kindof important. This is probably one of those times.”

> Cyber Grand Challenge highlights from DEF CON 24, August 6, 2016

------
srfilipek
> Soundness prevents false negatives, i.e., all possible unsafe inputs are
> guaranteed to be found, while completeness prevents false positives, i.e.,
> input values deemed unsafe are actually unsafe.

I really dislike this definition. It is confusing and backwards from normal
math/logic parlance.

~~~
smallnamespace
Isn't this the normal definition for, say, type systems? A sound type system
will never accept an invalid program (no false negatives) but may reject
correct programs.

~~~
srfilipek
The problem is really this:

> all possible unsafe inputs are guaranteed to be found

A sound system can reject all programs and still be sound. Therefore, it does
not follow that all possible unsafe inputs are guaranteed to be found.

In my view, focusing on program acceptance or rejection is not really helpful.

------
tempguy9999
Oh heck yes, this is interesting to me.

In part answer to touisteur, I don't use it and can't see myself ever using it
in industry where it's just 'get stuff done' and if it's not too buggy it goes
into production. Which is a real downer.

