Wouldn't this project be more usefully done at the LLVM IR level so we could plug in any existing LLVM frontend and gain full support for language features like C++ templates?
Another problem with LLVM I’ve heard about is that it’s intermediate language or API or something is a moving, informally-specified target. People who know LLVM internals might weigh in on that claim. If true, it’s actually easier to target C or a subset of Rust just because it’s static and well-understood.
Two projects sought to mitigate these issues by going in different directions. One was a compiler backend that aimed to be easy to learn with well-specified IL. The other aimed to formalize LLVM’s IL.
There have also been typed, assembly languages to support verification from groups like FLINT. One can also compile language-specific analysis with a certified to LLVM IL compiler. Integrating pieces from different languages can have risks. That (IIRC) is being mitigated by people doing secure, abstract compilation.
Yes, the textual IR format is not an API. It's an internal implementation detail which changes somewhat rapidly. It's well specified but there's no guarantees that if you write some IR which works in LLVM 16 that it'll work as expected in LLVM 17, etc. Inside of LLVM itself IR and its related manipulation and analysis functions are "API" in so far as there's an expectation that if you change how the IR works that you won't break existing in-tree passes/that you'll work with maintainers to fix them.
I'm curious about what relevant details get thrown away in translation to LLVM IR and that we can't add back with annotations. It's a shame that it's not possible to use these analysis tools on modern C++ codebases.