Wouldn't this project be more usefully done at the LLVM IR level so we could plu...

nickpsecurity · 2024-05-04T17:02:15

Another problem with LLVM I’ve heard about is that it’s intermediate language or API or something is a moving, informally-specified target. People who know LLVM internals might weigh in on that claim. If true, it’s actually easier to target C or a subset of Rust just because it’s static and well-understood.

Two projects sought to mitigate these issues by going in different directions. One was a compiler backend that aimed to be easy to learn with well-specified IL. The other aimed to formalize LLVM’s IL.

http://c9x.me/compile/

https://github.com/AliveToolkit/alive2

There have also been typed, assembly languages to support verification from groups like FLINT. One can also compile language-specific analysis with a certified to LLVM IL compiler. Integrating pieces from different languages can have risks. That (IIRC) is being mitigated by people doing secure, abstract compilation.

sweetjuly · 2024-05-05T03:45:35

Yes, the textual IR format is not an API. It's an internal implementation detail which changes somewhat rapidly. It's well specified but there's no guarantees that if you write some IR which works in LLVM 16 that it'll work as expected in LLVM 17, etc. Inside of LLVM itself IR and its related manipulation and analysis functions are "API" in so far as there's an expectation that if you change how the IR works that you won't break existing in-tree passes/that you'll work with maintainers to fix them.

nickpsecurity · 2024-05-05T03:52:18

That makes more sense. Thanks!

flohofwoe · 2024-05-04T15:55:36

My guess is that LLVM IR preserves too little semantic information to implement such a feature.

tkz1312 · 2024-05-04T18:11:12

klee is a similar tool that operates on llvm bytecode: http://klee-se.org/

xvilka · 2024-05-05T03:38:09

It doesn't work on real software though. Anything non-trivial.

rjmunro · 2024-05-04T16:30:55

I think the LLVM level doesn't work as it throws away too much semantic information, but much of the CLANG front end can be used via it's API.

There is a fork of CBMC called ESBMC that does this and many other changes: https://ssvlab.github.io/esbmc/documentation.html

quotemstr · 2024-05-04T16:37:08

I'm curious about what relevant details get thrown away in translation to LLVM IR and that we can't add back with annotations. It's a shame that it's not possible to use these analysis tools on modern C++ codebases.

rocqua · 2024-05-04T17:58:41

Undefined behavior around void pointers springs to mind.

SnowflakeOnIce · 2024-05-04T16:13:31

It predates LLVM. That's one reason, anyway.

kldx · 2024-05-05T16:42:14

VCLLVM was presented at ETAPS 2024. The goal is to compile both languages and specifications to LLVM IR.