
Symbolic Execution: Intuition and Implementation - ingve
http://www.usrsb.in/symbolic-execution-intuition-and-implementation.html
======
jes
Many years ago, I lead a small team of very talented software developers. We
built "execution trace disassemblers" for in-circuit emulators. Basically,
they took an array of bus cycles recorded from the bus of a microprocessor
based system, and re-created the instruction stream (expressed as assembly
language instructions) that must have been executed to result in the given set
of bus cycles.

It was an interesting problem to solve. Generally speaking, it was a search
problem, constantly looking for evidence to confirm or contradict an
assumption being made about some bus cycle (e.g., "Was the instruction data in
this i-fetch data cycle executed by the processor?") and then, if the
assumption seems solid, reasoning about how the state of the processor (e.g.,
the general registers) must have changed.

Once it got started disassembling a trace, it learned the contents of the
general registers, and we were able to annotate the reconstructed assembly
language listing, to show the actual data that was operated on.

It was a very cool piece of software for its time, which was the late 1980s /
early 1990s. Eventually, though, microprocessors got internal caches, which
made the problem much harder. Some microprocessor vendors though, provided
undocumented features that allowed us to snoop what was happening on-chip.

~~~
halflings
Did you check "differentiable neural computers" by DeepMind [0]? I wonder if
that type of model could ultimately automatically infer the things you're
describing (since it can do memory reads/writes, execute instructions).

~~~
jes
Thank you for that reference. I have not yet chased it down. If I do and find
it interesting, I'll follow up here.

------
dguido
Check out DeepState, a Google Test-like library for easily using symbolic
execution in C/C++ apps. It's still very early on but it's already usable!

[https://github.com/trailofbits/deepstate](https://github.com/trailofbits/deepstate)

------
Patient0
To me there's a lot of similarities between this and type inference.

"First, as we step through this program, we don’t think about specific values
of x like 13. Instead, we think about the set of possible values x might take
on. At different points in the program, x might be an element of all integers,
negative integers, or positive integers plus 0. In other words, we think of x
as a symbolic value (i.e., a set of possible values) rather than a concrete
value (i.e., a particular element of that set of possible values)."

vs:

"We expound a view of type checking as evaluation with ‘abstract values’."

[http://okmij.org/ftp/Computation/FLOLAC/lecture.pdf](http://okmij.org/ftp/Computation/FLOLAC/lecture.pdf)

~~~
int3
Type inference is a special case of abstract interpretation:
[https://www.irif.fr/~mellies/mpri/mpri-
ens/articles/cousot-t...](https://www.irif.fr/~mellies/mpri/mpri-
ens/articles/cousot-types-as-abstract-interpretations.pdf)

(Not that I've read that paper in detail, mind you...)

------
fenollp
Funny Klee [1] wasn't mentioned.

[1]: [https://github.com/klee/klee](https://github.com/klee/klee)

~~~
touisteur
Or cbmc (some interesting papers :
[http://www.cprover.org/cbmc/applications/](http://www.cprover.org/cbmc/applications/)),
they're on GitHub :
[https://github.com/diffblue/cbmc](https://github.com/diffblue/cbmc)), seems
to also handle java, not sure...

Or angr ([http://angr.io](http://angr.io)) which works on binaries and is in
python and is loads of fun to use... On small codebases ? Still some people
have used it in combination with AFL to help the fuzzer when it's stuck
('driller' :
[https://github.com/shellphish/driller](https://github.com/shellphish/driller))

Or manticore (you'll need IDA Pro there ?) :
[https://blog.trailofbits.com/2017/04/27/manticore-
symbolic-e...](https://blog.trailofbits.com/2017/04/27/manticore-symbolic-
execution-for-humans/)

There's a good survey there :
[https://arxiv.org/abs/1610.00502](https://arxiv.org/abs/1610.00502) (no cbmc
though).

But nickpsecurity is right, Klee seems to be the place where most of the
interesting stuff seems to be happening right now.

The recent improvements in SMT/SAT solvers bring back lots of techs that were
thought impractical (formal proof, symbolic execution...) a while back.

------
Chris2048
I actually started building something similar in Python: it uses Mockito like
'mock"/"spy" objects to record calls to its methods, and the idea was to
eventually branch at every test condition to automatically derive possible
execution paths. Might go back to that :-)

The trick would be that conditions are tested via magic methods that return
True/False (e.g. for > or ==), so you just need to branch at every test. You
don't even need specific test values, just substitute object e.g if the test
is X > Y, you don't need actual numbers, just an X that returns True/False for
__gt__, and _any_ Y since the actual evaluation is X.__gt__(Y). If more nuance
is needed (consistency so that X > Y always has the same answer... unless you
_want_ to test for exotic things like inconstant 'number' objects) then you
can have X return True/False depending on if passed argument 'is' the Y
object. Maybe embed Y in X or a shared data-struct so that it can make this
identification.

------
fulafel
For those wondering about applicability of SE to other languages, see here for
a compendium of tools: [https://github.com/ksluckow/awesome-symbolic-
execution](https://github.com/ksluckow/awesome-symbolic-execution)

------
leipavoi
I think symbolic execution is quite an "academic subject".

Things could change as I think symbolic execution would very well fit when
analyzing and verifying the correctness of smart contracts in blockchain.

------
thechao
So... abstract interpretation?

~~~
arnarbi
Yes. Symbolic execution is an example of abstract interpretation.

Not sure why the snark, that's fairly accepted terminology.

------
sklivvz1971
> The rationale, of course, is to feed in values that test each code path

That's the absolutely wrong way to test a function. You want to test a
function so _as the implementation changes, the same output follows the same
inputs_.

Counterexample: I might refactor that function without those code paths (e.g.
write `x*sign(x)` instead). Do I need only one test now?

Morale: DO NOT TEST IMPLEMENTATIONS

~~~
ghettoimp
If none of your test cases ever hit some lines of your functions, it seems
pretty clear that your tests are missing functionality.

Of course, hitting each line with a single test doesn’t guarantee your tests
are “sufficient,” but pragmatically it seems like a reasonably good and easy
to understand starting point.

------
alpb
This is posted before:
[https://news.ycombinator.com/item?id=16532053](https://news.ycombinator.com/item?id=16532053)
Please refrain from posting your own site, especially multiple times. :)

~~~
jwilk
No, it's OK to post your own site.

HN FAQ says:

 _If a story has had significant attention in the last year or so, we kill
reposts as duplicates. If not, a small number of reposts is ok._

The linked post had no comments, so it clearly doesn't qualify as having
"significant attention".

