
IR is better than assembly (2013) - oherrala
https://idea.popcount.org/2013-07-24-ir-is-better-than-assembly/
======
amelius
> If you really can write better assembly than LLVM, please: Don't write any
> more assembler by hand - write IR and create new LLVM optimisers instead.
> It'll benefit everybody, not only you. Think about it - you won't need to
> write the same optimisations all over again on your next project!

Very noble goal, but I can imagine that it can take a lot more time to do that
than just writing a bunch of assembler instructions.

Perhaps there could be some intermediate approach, where LLVM can learn from a
IR/assembly pair and improve itself (?)

~~~
brigade
Sure, you can file a bug with a snippet and expected assembly and make the
devs do the work :P

They generally seem pretty willing if it's simple, and if it isn't then it'd
probably be a pretty involved (but interesting!) side project for you anyway.

------
ori_b
If I wanted to do this sort of thing, I'd probably use either intrinsics or C
directly -- the compiler is already good at dealing with both, and will
probably do a better job than LLVM IR.

The biggest reasons to drop to assembly is because there are high level
constructs that the compiler is very unlikely to recognize and optimize
effectively. Things like AES-NI, hardware RNGs, and similar.

------
jimmyswimmy
This is neat, I had no idea there was an intermediate language like this which
is cross platform. It would seem that I could decompile binaries using llvm
tooling and then recompile for other platforms.

[http://stackoverflow.com/questions/6981810/translation-of-
ma...](http://stackoverflow.com/questions/6981810/translation-of-machinecode-
into-llvm-ir-disassembly-reassembly-of-x86-64-x86)

Obviously not cross os, but might be good for bare metal stuff. I've gotten
libraries in the past compiled with weird ABIs. This sounds really neat.

~~~
chrisseaton
I'm afraid not, because the LLVM IR does not abstract for things like
endianess, word size, header file contents, or many other things that are
platform dependent.

~~~
exDM69
LLVM IR is pretty close to portable - the issues you mention are an issue with
Assembler code as well. Even the examples in the article show the same IR
being compiled to different CPU architectures.

I'm not sure I'd go hand writing IR code, though. It's pretty easy to just
write C code with vector extensions, etc to produce the IR I'd be after. When
I do need to write assembler code these days, it's typically to get access to
some privileged instructions in kernel space. Most other instructions are
available in C code via __builtin_foo_bar functions.

~~~
the_duke
You have to be very, very careful when writing platform agnostic IR.

~~~
exDM69
Sure but you have to care about word size, endianness, etc when writing C code
too.

You would not go writing entire apps with it anyway, just a few inner loops or
so.

I'd still use assembly-looking C with extensions and intrinsics for that,
though.

------
andreiw
The other thought I had here, is that AFAICT IR is not a standard. There is no
requirement that it remains compatible in 50 years or 5 months. There is no
standard IR, and shouldn't be, as that would become an impediment to compiler
evolution and fit/optimization to newer architectures.

Doesn't AS/400 use an IR approach as well? Which let IBM seamlessly migrate
the underlying CPU a few(?) times now?

~~~
gbrown_
> The other thought I had here, is that AFAICT IR is not a standard. There is
> no requirement that it remains compatible in 50 years or 5 months.

Correct. Libfirm[0] is the only compiler I am aware of that attempts to use a
"firm" IR.

[0] [http://pp.ipd.kit.edu/firm/Index](http://pp.ipd.kit.edu/firm/Index)

~~~
david-given
The ACK's IR (which it calls EM code) has been stable for 34 years.

[http://tack.sourceforge.net/olddocs/em.pdf](http://tack.sourceforge.net/olddocs/em.pdf)

~~~
gbrown_
Neato! Thanks for the link.

Sorry if the tone of "only compiler I am aware of" came of as snotty it was
meant as an expression of my naivety on the subject.

~~~
david-given
Nah, don't worry; _nobody 's_ heard of the ACK! (Mostly because it's so old
that it doesn't really believe that registers exist, which means it doesn't
get on well with modern architectures.)

------
rurban
Duplicate of
[https://news.ycombinator.com/item?id=6096743](https://news.ycombinator.com/item?id=6096743)
(220 comments)

------
greglindahl
The Open64 / PathScale compiler suite has had intrinsics written in the IR
(WHIRL) for a long time. WHIRL is stable enough that it's not a maintenance
problem. Being written in IR means that the full power of inlining, function
specialization, etc etc will be used, even if whole-program optimization isn't
being used.

~~~
nickpsecurity
As described in this paper:

[https://www.mcs.anl.gov/OpenAD/open64A.pdf](https://www.mcs.anl.gov/OpenAD/open64A.pdf)

------
lacampbell
Most of the guides I've seen for LLVM recommend you use the LLVM libs to
generate the IR. Why? I feel like it would be much easier to generate the IR
directly like the author has done.

It also wouldn't tie me to any particular library - I think the only actively
maintained one is the C++ one.

~~~
legulere
The IR is highly unstable, especially the text representation, which is
basically just a debugging tool.

The bitcode format is rather complicated, but at least there is backwards
compatibility.

LLVM is the library, so you are bound to it anyway.

Further by having code for emiting IR you are basically just copying
functionality that already exists.

~~~
exDM69
> The IR is highly unstable, especially the text representation, which is
> basically just a debugging tool.

It's not guaranteed to be stable, but it's not "highly unstable" either. Not
too many breaking changes have been introduced in the past few years and it's
unlikely that you'd hit those parts when hand writing it.

Not that I'd recommend hand writing LLVM IR.

It's a shame that we don't have an actual portable IR, which would not be tied
to toolchain version or contain target specifics. LLVM IR can't be used as
such for a portable IR, which is why we've seen efforts like SPIR-V (Vulkan
GPU IR) and WebAssembly (IR for browsers), both of which are very similar to
LLVM IR (and a lot of work was duplicated).

~~~
fmap
WebAssembly is very different from LLVM IR, because it is essentially just the
greatest common denominator of what existing JavaScript VMs handle in the
backend. For instance, WebAssembly allows only reducible control flow, has
very limited support for function pointers, and a particularly nasty memory
model (linear memory, which precludes any kind of non-trivial alias analysis
and optimization).

All of these things may change in the future (apart from the reducible control
flow restriction), but, at the moment, the goal of WebAssembly is just to have
a viable compilation target for C++ in the browser. Google was championing
LLVM IR for this purpose in the form of pnacl, but that was not enough of a
compromise to work on the web platform. :(

------
the_duke
I object to the article just associating the term "IR" with LLVM IR.

IR is short for "Internal Representation". Most complex compilers have at
least one level of IR, usually multiple ones that are progressively lower
level.

The point is that an IR carries more information than machine code, and so
potentially allows more specific optimizations.

This should at least be mentioned in the article.

Rust and Swift, for example, both use LLVM, but have their own intermediate
levels of IR ('Mir' for Rust).

LLVM IR is already stripped of a lot of information that might be important
for certain higher level optimizations. For example, numbers are all unsigned,
and there are different operations for signed and unsigned arithmatic.

~~~
exDM69
In this context, IR stands for Intermediate Representation.

Many if not most compilers have one or more intermediate representations, but
most of them are not as rigorously specified as LLVM's.

------
andreiw
In this way, IR would fullfil the same role Macro-32/64 did for porting VMS to
Alpha and beyond. However, it appears to my understanding (sorry, I was still
crawling when VAXes were on the way out), that the benefit was retaining "VAX"
syntax to avoid massive rewrites.

If you're starting from a clean slate, what's the benefit of writing IR? Why
not use C? After all, IR won't really give you complete control over generated
code, and it's still an abstract VM (albeit that obviously allows writing IR
that will only sensibly compile on a specific arch - e.g. system register
acceses and so on).

~~~
lacampbell
Why the hell are people down-voting this? It's a purely technical comment.
Seriously people, someone making an argument against a technology you like
shouldn't make you reach for the down-vote button. I've noticed I get down-
voted too if I make something critical of certain technologies, and it's
really sad . Make a response or an argument.

~~~
david-given
(Replying here, so my upvote on the parent doesn't get lost...)

Working in C means you get restricted to only doing things which C can do, and
you're out of luck if you want to do things that C _can 't_ do: unaligned
accesses, tail calls, saturating arithmetic, overflow detection, exceptions,
stuff like that. IR allows you to do all this, at the expense of being
considerably more complex and painful to use.

Of course, if your compiler _don 't_ want to do any of this, then C is a
perfectly valid choice, as it doesn't tie you to the LLVM toolchain --- see
Nim, for example. But as soon as you venture outside C's comfort zone, working
in IR starts paying off.

~~~
lacampbell
LLVM IR can detect overflow, has exceptions, tail call optimisations etc? I
thought it was lower level than C itself. I have kind of a strict hierarchy of
abstraction in my head, with LLVM IR sitting a bit below C, but maybe it's not
that simple.

~~~
munin
Yes.

[http://llvm.org/docs/LangRef.html#arithmetic-with-
overflow-i...](http://llvm.org/docs/LangRef.html#arithmetic-with-overflow-
intrinsics)

[http://llvm.org/docs/LangRef.html#exception-handling-
intrins...](http://llvm.org/docs/LangRef.html#exception-handling-intrinsics)

[http://llvm.org/docs/LangRef.html#call-
instruction](http://llvm.org/docs/LangRef.html#call-instruction)

------
nickpsecurity
It's interesting since I proposed using LLVM in place of inline assembly a
while back. I got this counter point when I asked on ESR's blog:

[http://lists.llvm.org/pipermail/llvm-
dev/2011-October/043724...](http://lists.llvm.org/pipermail/llvm-
dev/2011-October/043724.html)

Any LLVM experts have thoughts on this or my original goal within the context
of LLVM's _current_ situation?

~~~
hyperpape
I'm the opposite of an LLVM expert, but I'm having a hard time seeing the
relevance of that email. It's about how LLVM IR isn't cross platform, but
inline assembly isn't either, right?

------
mahdix
This may be a little off-topic but does anyone know a good and up-to-date
tutorial for using LLVM in C language?

~~~
oherrala
LLVM's C is compiler clang. You can find it from

[http://clang.llvm.org/](http://clang.llvm.org/)

In many cases it's just drop-in replacement for gcc.

~~~
ygra
I rather understood it that they wanted to know whether there is a C API to
use LLVM, so perhaps something like this:
[https://pauladamsmith.com/blog/2015/01/how-to-get-started-
wi...](https://pauladamsmith.com/blog/2015/01/how-to-get-started-with-llvm-c-
api.html)

~~~
mahdix
That's exactly what I am looking for but it's almost two years old and I think
I tried it a few months back and got lots of errors. I was hoping to find a
more up-to-date tutorial. Unfortunately LLVM people don't pay much attention
to documentation and tutorials for their product.

~~~
the_duke
I've been there.

Pretty much the only way is to just read the C api header and figure things
out from there.

It does contain comments, which are mostly helpful, but sometimes incomplete
or out of date.

------
sebastianconcpt
I don't have projects that fits on this, but sounds like a no brainer.
Abstracting the specifics and keeping the timeless is a beautiful move!

------
imtringued
LLVM IR is not really suitable because of compatibility. Someone should create
a portable assembler on top of LLVM instead.

------
zump
What's the difference between LLVM IR and various GCC IR's?

------
faragon
TL;DR: vendor lock-in.

~~~
oherrala
Vendor lock-in comparable to writing in Rust or Go.

~~~
faragon
Sure. My point was related to that it would be interesting a standard IR,
instead of a vendor-specific one :-)

------
qwertyuiop924
Yes. Let's all write IR instead of assembly. Let's encourage others to do the
same, making Assembly even more of an ivory tower (or, more accurately, a
grimy sub-basement) than it already is, further discouraging newcomers from
learning it, thus keeping them from properly understanding the machines they
program on. Eventually, nobody will even understand any of these machines.

Use whatever you want in production, I don't care. But don't discourage people
from learning assembly. It's a worthwhile task.

------
SFJulie
Anyone tell the LLVM team that the Babel tower is a myth and that it ends bad?

Some CPU have specific idioms that are not only hard to translate but requires
to be used fluently. Like natural language.

Btw, I never uses any software relying on a name of a myth that was a pure
failure such as Babel or death star. It makes me feel like people intend to
fail.

~~~
andreiw
Not really. IR is exactly a good fit for compiled languages, because there are
no "hard to translate" idioms in languages targetted by LLVM that need really
orthogonal translations on different architectures, which by the way aren't
really all that different at a high level.

However I am still looking for a use case to write IR directly, or in place of
bits of inline assembly.

