
C as an intermediate language (2012) - ColinWright
http://www.yosefk.com/blog/c-as-an-intermediate-language.html
======
tikhonj
If you want to use C as intermediate language, you may be interested in the
CIL project[1]. It stands for "C intermediate language"\--might be relevant
:P.

Fundamentally, CIL is a nice subset of C wrapped up into a nicely usable API.
The idea is to shave off as many of C's inconsistent sharp edges as possible.
You won't have to worry about quirks in the syntax or odd behavior because you
have a nice high-level, curated API for generating C code.

It's designed to have a _clean semantics_ , and semantics are very important.
(Or so I maintain.)

[1]:
[http://www.cs.berkeley.edu/~necula/cil/](http://www.cs.berkeley.edu/~necula/cil/)

~~~
mkehrt
I've used CIL (if only for school projects), and I can vouch for it being a
joy to use, at least if you like OCaml. Also, their page on wacky C edge cases
("Who says C is simple?", link to frame contents:
[http://www.cs.berkeley.edu/~necula/cil/cil016.html"](http://www.cs.berkeley.edu/~necula/cil/cil016.html"))
is great.

------
pjmlp
For anyone wanting to read about C as intermediate language,

Compiler Design in C (1990)

[http://www.amazon.com/Compiler-Design-C-Prentice-Hall-
softwa...](http://www.amazon.com/Compiler-Design-C-Prentice-Hall-
software/dp/0131550454)

EDIT: Adding some extra remarks I think might also be interesting to share.

Another approach, that I really like, is to output bytecodes that are mapped
directly to macros in typical macro assemblers like NASM/MASM/TASM. Those
macro assemblers provide very powerful macro systems.

Then map those macros to the corresponding assembly code.

Sure it gives a bit more work, but I find it more fun.

~~~
argv_empty
From a look over the Amazon page, the book seems to be about writing a
compiler in C, not about writing a compiler targeting C. Does it actually
describe potential issues with compiling to C?

~~~
pjmlp
It is an old book about implementing an C compiler in C.

The intermediate code is similar to the article, a mix of macros and basic C
expressions as high level assembler.

I cannot remember all the details, the last time I used the book was around
1996.

You can still get the source code,
[http://www.holub.com/software/compiler.design.in.c.html](http://www.holub.com/software/compiler.design.in.c.html)

------
pcwalton
Some of these aren't as simple as they seem at first glance:

> Dynamic binding: easy enough.

That technique won't be fast enough for dynamic languages. You really need
polymorphic inline caches to make dynamic binding fast, which requires careful
cooperation between the code generator, the front end, and the IR. A C
compiler won't give you that level of control.

> Garbage collection

It doesn't work that well. The problem is that C compilers don't provide a way
for the runtime to find all roots on the stack (tell apart integers from
pointers). You pretty much either have to be conservative on the stack or
spill all roots to the stack across function calls. Neither of them is very
good: the former costs accuracy and prevents you from using a bump allocator
in the nursery, and the latter costs performance.

There are other issues to consider as well, for example tail call optimization
and undefined behavior.

~~~
_yosefk
I agree that compiling to C will not give you a great language implementation
if you need this type of features; I think the implementation will be good
enough for many cases though. To take an extreme example, CPython isn't a
bleeding edge Python implementation perhaps, but it's still the most popular
and practically relevant one; this is in fact true for many popular dynamic
languages, one big exception in recent years being JavaScript.

Languages where a really great implementation might involve C code generation
are probably indeed quite static, however. An example is Synopsys's VCS which
AFAIK compiles Verilog to C++.

My own hands-on experience with this type of thing is with an in-house HDL and
an in-house C dialect with (sizable, static) extensions for accelerator
programming.

------
zrail
I worked at a company that did a very large amount of data processing on a
relatively small number of machines using a well-optimized c++ library. The
library did everything, including the calculations and the data storage. This
made it hard to write one-off queries that the executive team would request
from time to time, since we would write a custom c++ program every time.

One day we came up with an idea: what if we could query the data store with
SQL? The first iteration actually attempted to embed sqlite3 into the data
access layer, which was functional but extremely slow because of all the type
marshaling going on. A coworker and I came up with the second iteration which
worked like this: a custom SQL-like language would be parsed by a Perl program
using Parse::RecDescent, a recursive parser generator. The parse tree would
then be translated into C++ that used the data access layers and processing
layers directly. The compiled program was distributed to the cluster in the
same way as the daily processes.

As far as I know this monstrosity still gets daily use, four years later.

------
octo_t
There's a lot of extra stuff you get with targeting C: being able to write the
run time in C very easily. For example its a lot easier to write your entire
OO system in C in a few hundred lines, and keeping that easily debuggable is a
massive time saver.

~~~
lmm
Wouldn't that be even easier if you used LLVM IR, as you'd then be able to
write your runtime in any language that LLVM supports?

~~~
limmeau
Advantage of writing it in C: You can prototype an example client of your
runtime in C before your code generator is working.

~~~
lmm
Again, why couldn't you do that with LLVM IR?

~~~
luikore
LLVM IR is designed for machine, not human, it's too verbose, and requires SSA
form. A normal person can't easily reason the logic in such verbose language.
Plus people are more familiar with C.

~~~
lmm
Sure, but you can write your runtime test code in C - or any other language
with an LLVM compiler.

------
Roboprog
I worked at a company ("Morada Corp") in the early 90s that did just that for
... RPG II!

We took the code from IBM minicomputers and compiled it into C on many, many
platforms (back when there were a few more unices, as well as OS/2 and VAX/VMS
kicking around).

I only did a little maintenance on the compiler front end checker, though. I
mostly worked on some supplemental tokenizers/runtimes for data file browser
language and a DB/form DDL.

Anyway, GCC made a nice target to hit a large number of systems. (alas,
Borland C on DOS at the time tended to choke on larger generated subroutines,
being 16 bit w/out "huge" pointer support and all)

------
chalst
An intermediate language (or representation) is the internal representation of
code used for program transformation and reasoning about the code. This is not
what the story is talking about - all the benefits are about the object
representation (the compiler emits C code), and the presented compiler does
not distinguish between the intermediate representation and the object
representation; it does not need to, as it does no optimisations. There's
nothing innovative about emitting C code from a compiler - it used to be more
common than it is now.

There has been a little bit of research done on writing optimising compilers
that use source-code-like intermediate representations: the Janus project from
the 1990s springs to mind.

------
gjndrtjh
Ć programming language
([http://cito.sourceforge.net/](http://cito.sourceforge.net/)) compiles C#
subset to C.

The C code generator is very simple:
[http://sourceforge.net/p/cito/code/ci/master/tree/GenC.cs](http://sourceforge.net/p/cito/code/ci/master/tree/GenC.cs)

------
bane
Didn't Lex and Yacc pretty much use C as an intermediate language?

------
joshuaellinger
memsql is doing the same thing with SQL.

harder than it looks.

