Cool fact about BAP, there's a really weird but very cool embedded lisp that allows you to drive execution of single instructions via meta programming. It works via a term rewriting system. Seriously one of the coolest lisps I've found out in the wild.
Also, the docs for the project are borderline decadent in their completeness and the team responsible are quite active via gitter. For real, if you're interested in binary analysis, check it out.
Given that I won't have time to learn binary analysis (there's no payback for me), is it really worth getting the book just for the final chapter, which I presume is not going to cover HLLs anyway?
Apart from Trail of Bits, University Research teams, and automatic patching competitions ?
Anyone using cbmc, Klee, angr, s2e ?
PS: My company is hiring for such roles https://www.reddit.com/r/netsec/comments/b90hep/rnetsecs_q2_...
Big companies are frequently composed of components with distinct culture, benefits, legal entity, and so on. People get to know one part, then assume that all parts are that way. I'm in a component with compensation that is better than average for the big company. The association isn't beneficial.
Using the big company name means adhering to corporate branding requirements. People could somehow imagine me to be issuing official corporate communications, which is far from the truth.
I might get more responses, but would I want them? We're fighting to keep out the toxic people who want to focus on politics. We have important work to do.
There is an absurd Glassdoor review out there. If that is to be believed, we pay highly experienced cleared specialists about as much as they'd make cooking food at In-N-Out or Chick-Fil-A. Our office would be empty if that were the pay being offered.
People with a conscience can feel that it is wrong to devote their efforts to tracking people all across the internet to sell ads and other junk. (Facebook, Google, etc.) They can feel that it is important to support their country, and that it would be wrong to pass up a reasonable opportunity to do so. With a better conscience, you would feel guilty about how you benefit from the nation without contributing much, and of course you would stop supporting violent hate groups that pretend to be otherwise.
This is particularly the case when the new technology is a particular competitive advantage.
One firm that I worked for developed a highly competitive product using a deep neural net (back when the Restricted Boltzmann Machine was a thing!) and the "deep learning" hype was just starting. They sent a marketing person to my office to interview me about what things we could say about the product that would make it sound impressive and I told him all about it, but his superiors didn't let him use the material because they didn't want competitors to know how it worked.
But you never know, maybe I looked in the wrong parts of the Internet.
I'm genuinely curious though: since most of those projects are open-source you'd expect to see some traces of some activity/problems on GitHub or on mailing-lists, and... well... Not much to be seen.
angr I tried very hard to use on Ada programs but it seems the VEX lifter and CFG builder couldn't handle simple Ada programs (wasn't patient enough to submit this on GitHub...). But for C programs it's very fun and all scriptable ! There is also this nice of a 'pluggable test environment' (deepstate from Trail of Bits https://github.com/trailofbits/deepstate) that allows to switch between hand-written unit tests, fuzzers, and symbolic execution (letting you decide which variables are Symbolic, which are concrete...) with angr. Also some cool stuff mixing AFL and angr (driller) and another project mixing Intel PT and angr.
KLEE seems to have the most papers (quantity and quality), just landed support for C++ and upgraded to a recent llvm...
(off topic: your blog articles and tech are very inspiring. The deepstate paper, articles and slides made me rethink the whole test pipeline of my applications, from manual oracle-based or model-based tests, to PBT, to fuzzing, to Symbolic/Concolic execution...).
By the way I thought remill and mcsema required IDA ?
Back on topic: One of the reasons I asked about industrial use, is that almost everyone in the security community (that works on SE) seems focused on reverse engineering executables. While I understand the interest and the importance of this part (being able to work on binary code without the source code) I think it's an adversarial view of the topic of Symbolic Execution. It's really hard working precisely on binary code, even if you disable optimizations...
Well, what if I have the source code and want to use SE ? Couldn't I be more precise/efficient ? I'm guessing I'm stuck with KLEE if I'm working in C/C++, and to write an frontend to llvm for Ada stuff...
Glad that DeepState had an impact on you :-D We continue to evolve DeepState, both in the direction of better fuzzing, and better test case reduction.
Remill is instruction granularity, and so all it requires is raw bytes. McSema uses Remill in conjunction with a disassembly frontend (IDA Pro, Binary Ninja, or Dyninst).
If you have source code you can likely be more precise/efficient. Sometimes you may have access to source but not the ability to change/influence the build.
I think there's a lot of room for improvement with KLEE. If I were to write an LLVM symbolic executor from scratch then I think I would do some things differently.
Do you have something written somewhere on how you would do different from KLEE ?
What I was thinking was some improvements to gnattest (https://docs.adacore.com/gnat_ugn-docs/html/gnat_ugn/gnat_ug...) but also a way to add some quickcheck-like generator features (Ada already has the property description language through contracts) with Libadalang... One can dream !
But the whole "Let the developer write one test harness and use it almost as-is with different testing/validation techs" (I'm not explaining it well) was some kind of revelation.
When you say 'adding a fuzzing test harness is only 2-3 days work' you still get complaints: that's too much (but I then found dozens of bugs), it's /another/ test harness to maintain, we'll have to rebuild an input corpus for every interface break (true...). Anything that could alleviate the pain would be great...
I didn't bother proving it, but I think that some of our techniques could be interpreted as some kind of SE; but the general case is that they're AI.
I know installing it from source is a pain, but a scriptable pain.
I'm also wondering (curiosity) why you'd have to package and deliver it (to customers ?). I always saw it as a developer tool. Seems you're working on very interesting stuff :-)
In terms of packaging, it's nothing exciting-- we just had to produce a .deb if we wanted to install something on a build or test server.
I’m attempting to use the technique right now in the context of writing an execution tracer for a bytecode VM runtime, where I want to “recover” ABI types for arbitrary bytecode (that I don’t have the source to), in order to emit traces that describe what’s going on in high-level terms (e.g. “foo[x][y].z = 3” rather than “$329f = ($329f & (~(1 << 32) << 8)) | (%r25 & 0xff)”.)
So I’m doing a static-analysis pass of the bytecode (essentially a decompilation pass) which symbolically executes math ops to build up symbolic formulae representing the math being done, and to associate program-counter positions in the bytecode with value bindings in the formulae; and then I’m feeding those associations to the runtime tracer, so that it can notice when it’s on an “interesting” PC position and feed its current register/stack value into an instance of the associated formula, for me to emit once I get all the bindings of the formula filled in.
So far I haven’t found any texts or papers talking about using symbolic execution in this context (to recover information lost due to previous compilation, in a pass “within” a tracer, JIT, transpiler, or anything else that starts with bytecode or IR), which is kind of annoying. Anyone have any resources? Projects that use this technique? (I’m thinking that this is something you’d see done in sufficiently-advanced game-console emulators, as a “deoptimization” pass to recover structure to help dynamic recompilation optimize better.)
Also, related question: if I’ve got a stack machine, and all the jumps in the ISA are indirect jumps (but with literal values pushed on the stack), is there any way to recover the edges between basic blocks without doing a symbolic exec over the stack ops (i.e. to discover, among all the DUPs and SWAPs and such, which constant stack-slot the jump op is actually receiving)? I’m pretty sure there is a “weaker” method, but I’m stymied here by my lack of compiler-theory education; everything I google about dominance hierarchies and such assumes you’re starting with a model of a register machine with direct jumps (where the edges are context-free extractable from the jump ops).
I managed to use symbolic execution to build a smart contract decompiler - Eveem.org. What you see there is essentially a trace from a symbolic execution of an Ethereum contract, plus some postprocessing.
> Cyber Grand Challenge highlights from DEF CON 24, August 6, 2016
I really dislike this definition. It is confusing and backwards from normal math/logic parlance.
> all possible unsafe inputs are guaranteed to be found
A sound system can reject all programs and still be sound. Therefore, it does not follow that all possible unsafe inputs are guaranteed to be found.
In my view, focusing on program acceptance or rejection is not really helpful.
In part answer to touisteur, I don't use it and can't see myself ever using it in industry where it's just 'get stuff done' and if it's not too buggy it goes into production. Which is a real downer.