
Show HN: A pure Go library for interacting with LLVM IR - mewmew
https://github.com/llir/llvm
======
krasin
LLVM IR is being changed at a high rate. While there are some weak backward-
compatibility promises (older, but not too much old, bitcode can be read by
newer official IR readers), new features are introduced overnight. I expect
this project to be a high maintenance job to keep up with the LLVM trunk.

~~~
mewmew
@krasin The LLVM IR assembly language does change with a rather high rate. Not
too often to be unfeasible, but an external project definitely needs to make
sure it can keep up with maintenance to remain useful. Since the llir/llvm
project started back in 2014, the LLVM IR assembly language has had one major
change, and that is the new metadata syntax introduced in LLVM 3.6.

To facilitate maintenance efforts over time, a BNF grammar for LLVM IR
assembly has been written, from which the lexer and parser are generated
[1,2,3]. The intention is for this BNF grammar to eventually become the basis
or starting ground for an official BNF grammar of the LLVM IR assembly
language (but that's a different project altogether, and a huge effort in
itself).

[1]:
[https://github.com/llir/llvm/blob/master/asm/internal/ll.bnf](https://github.com/llir/llvm/blob/master/asm/internal/ll.bnf)

[2]: [https://github.com/llir/grammar](https://github.com/llir/grammar)

[3]: [https://github.com/goccmack/gocc](https://github.com/goccmack/gocc)

~~~
krasin
@mewmew, have you already added support for .bc files which have multiple
bitcode modules (aka merged modules)? These are used in the work-in-progress
ThinLTO.

Merged modules were a major change to LLVM. And this part is still evolving.
Like, the last breaking change was 16 days ago: [https://github.com/llvm-
mirror/llvm/commit/e74c64e05ab257c37...](https://github.com/llvm-
mirror/llvm/commit/e74c64e05ab257c37a743e1f5bcb27445aa8fc7e)

~~~
mewmew
@krasin Thanks for the input. From my understanding ThinLTO is intended to
bring compilation speed of LTO builds closer that of non-LTO builds. Prior to
the change you are referring to with merged modules, it seems this was
achieved by optimizing multiple .bc modules in parallel during link time
(using summery information from thin-link) [1].

Is there any high-level information of the design behind merged modules? Are
they simply a concatenation of .bc files with a table of file offsets for each
module?

I am new to ThinLTO and the work related to merged modules, so any information
providing insight would be appreciated.

As for the llir/llvm project. It includes a .ll parser, but relies on the LLVM
toolchain for converting .bc files into .ll; i.e.

llvm-dis -o foo.ll foo.bc

This decision has been taken so that we can focus time on maintaining good
support for one of the isomorphic LLVM IR forms.

Any application which requires good performance should definitely make use of
the official LLVM C++ library for interacting with LLVM IR.

The llir/llvm project is intended for those who wish to write tools in Go
which consume, produce, process or manipulate LLVM IR.

Future releases of llir/llvm will try to get closer in performance to the
official LLVM C++ library, but at this point of the project the aim is to iron
out a good API for interacting with LLVM IR, and to have fun coding :)

For those interested, the llir/llvm project was born to support the
requirements of a decompiler project [2] which decompiles LLVM IR to Go source
code. The llir/llvm project has since become a general purpose library, and is
now looking for anyone curious to try it out at this early stage to provide
feedback on its API and design.

[1]: [http://blog.llvm.org/2016/06/thinlto-scalable-and-
incrementa...](http://blog.llvm.org/2016/06/thinlto-scalable-and-incremental-
lto.html)

[2]: [https://github.com/decomp/decomp](https://github.com/decomp/decomp)

~~~
krasin
@mewmew generally, the .bc files with multiple bitcode modules existed for a
while (almost a year now). The change I referred was just an incremental
(breaking) improvement of that scheme.

I don't remember a good doc on how it's implemented. Last time I tried to
understand it was early February and things had been changing at crazy speed.
Your best bet is to ask on llvm-dev mailing list or #llvm IRC channel. Either
tejohnson@ or pcc@ will know for sure.

~~~
mewmew
@krasin Thanks! I know I'll continue to play around with LLVM on many levels
in life. Just recently an LLVM Socials meetup has started in Sweden, and it
feels great to get a chance to meet other people excited about exploring these
topics :)

------
jpsim
Maybe this would lend itself well to parsing bitcode slices (serialized LLVM
IR). I've often wanted to diff bitcode produced by different compiler
versions.

~~~
boulos
What's wrong with llvm-diff [1]? Is it that you want to compare across bitcode
versions that weren't backwards compatible? (so the current llvm-diff won't
parse an old old bitcode)

[1] [http://llvm.org/docs/CommandGuide/llvm-
diff.html](http://llvm.org/docs/CommandGuide/llvm-diff.html)

~~~
jpsim
llvm-diff is indeed pretty great, but having the IR structure as an AST that
you could programmatically access and manipulate would allow more semantic
diffing, or integration into workflows that would benefit from structured
representations of a diff.

