
C-- - msvan
http://www.cminusminus.org/
======
carterschonwald
Hey everyone, I'd like to point out that the c-- domain is no longer
cminusminus.org, the historical site can be found on norman ramsey's homepage
here [http://www.cs.tufts.edu/~nr/c--/](http://www.cs.tufts.edu/~nr/c--/) ! It
also has actual reading material / papers!

The cminusminus domain is no longer valid (though it has more modern CSS),
also it lacks links to all the informative papers!

C-- is very similar overall to LLVM IR, though there are crucial differences,
but overall you could think of them as equivalent representations you can map
between trivially (albeit thats glossing over some crucial details).

In fact, a few people have been mulling the idea of writing a LLVM IR frontend
that would basically be a C-- variant. LLVM IR has a human readable format,
but its not quite a programmer writable format!

C-- is also the final rep in the ghc compiler before code gen (ie the "native"
backend, the llvm backend, and the unregisterized gcc C backend).

theres probably a few other things I could say, but that covers the basics.
I'm also involved in GHC dev and have actually done a teeny bit of work on the
c-- related bits of the compiler.

relatedly: i have a few toy C-- snippets you can compile and benchmark using
GHC, in a talk I gave a few months ago [https://bitbucket.org/carter/who-ya-
gonna-call-talk-may-2013...](https://bitbucket.org/carter/who-ya-gonna-call-
talk-may-2013-ny-haskell)
[https://vimeo.com/69025829](https://vimeo.com/69025829)

I should also add that C-- in GHC <= 7.6 doesn't have function arguments, but
in GHC HEAD / 7.7 and soon 7.8, you can have nice function args in the C--
functions. See
[https://github.com/ghc/ghc/blob/master/rts/PrimOps.cmm](https://github.com/ghc/ghc/blob/master/rts/PrimOps.cmm)
for GHC HEAD examples, vs
[https://github.com/ghc/ghc/blob/ghc-7.6/rts/PrimOps.cmm](https://github.com/ghc/ghc/blob/ghc-7.6/rts/PrimOps.cmm)
for the old style.

~~~
X4
Code examples: [http://www.cs.tufts.edu/~nr/c--/download/c--
exn.pdf](http://www.cs.tufts.edu/~nr/c--/download/c--exn.pdf)

Slides: [http://www.cs.tufts.edu/~nr/c--/download/c--
exnslides.ps.gz](http://www.cs.tufts.edu/~nr/c--/download/c--exnslides.ps.gz)

Audio: [http://wino.eecs.harvard.edu:8080/ramgen/nr-
pldi00.rm](http://wino.eecs.harvard.edu:8080/ramgen/nr-pldi00.rm)

The Manual:
[http://www.cs.tufts.edu/~nr/c--/extern/man2.pdf](http://www.cs.tufts.edu/~nr/c--/extern/man2.pdf)

The manual contains the specifications and a few code examples. It looks like
it's easy to learn, but it's a little different than other languages.

------
m_mueller
Could someone enlighten me what's the advantage of this over LLVM-IR?

Edit: Ok, I've found the following SO thread:
[http://stackoverflow.com/questions/3891513/how-does-c-
compar...](http://stackoverflow.com/questions/3891513/how-does-c-compare-to-
llvm)

~~~
rayiner
C-- is intended for higher-level languages than LLVM. E.g. the latter still
doesn't have a modern infrastructure interfacing with garbage collectors.
Also, C-- is a lot easier to read in textual form than LLVM IR. Finally, the
use of C++ and templates makes the code size of LLVM absolutely enormous (20MB
binary).

That said, so much work goes into LLVM supporting new platforms,
optimizations, etc, that it's probably easier to hack around LLVM's
limitations than use C--, etc.

~~~
jlebar
> Finally, [C-- is better than LLVM because] its use of C++ and templates
> makes the code size of LLVM absolutely enormous (20MB binary).

In a world where my phone has 2gb of RAM, I don't understand why 20mb for an
optimizing compiler is in any way onerous or unreasonable.

~~~
haberman
Small is beautiful. How much cache does your phone have? How long does it take
to compile a 20MB binary? How long does it take it to load all those symbols
into a debugger? How long does it take to process all those relocations if you
are linking dynamically? How big will your overall binary be if every
component is 20MB?

------
peapicker
When I first entered college in 1988 there was a small DOS compiler floating
around called C-- back then, which I got from some BBS (yes, a BBS, how
antiquated!), probably in 1989. It was a mix of a subset of C and proto-
assembly. I have looked for it a few times over the years, and this one isn't
it, although it has some similar ideas. It makes me wonder how many other
little-known C-- projects there are.

~~~
dragonbonheur
It was Sphinx C--
[http://www.cs.utexas.edu/users/tbone/c--/](http://www.cs.utexas.edu/users/tbone/c--/)

~~~
peapicker
Thank you, thank you, thank you!

~~~
henderson101
Wow - I used this one too (vividly remember the "fire" effect demo) and have
looked for it a few times to no avail. Nice find :-)

------
sambeau
According to this, C-- is still a large part of the Glasgow Haskell Compiler.
It looks like (Fig 5.2) code goes into C-- before being translated to LLVM.

[http://www.aosabook.org/en/ghc.html](http://www.aosabook.org/en/ghc.html)

~~~
gngeal
Uh, I believe that the GHC C-- (Cmm) is one of the dozen different C--es that
have nothing in common with this project save for the name.

~~~
ibdknox
Given that Simon Peyton Jones is listed as a co-author on several of the
papers, I find that a bit too coincidental to be true.

~~~
gngeal
This thing is written in ML, not in Haskell, and the GHC docs themselves claim
that Cmm (the GHC thingy) is _" is rather like C--. The syntax is almost C--
(a few constructs are missing), and it is augmented with some macros that are
expanded by GHC's code generator"_. So "GHC's Cmm/C--" sounds rather like an
independent reimplementation of a superset of a subset of "this C--".

~~~
asdasf
It literally can't be an "independent reimplementation" if one person played a
major role in both of them.

~~~
vanderZwan
Domain specific dialect?

------
gdonelli
"As of May 2004, only the Pentium back end is as expressive as a C compiler.
The Alpha and Mips back ends can run hello, world. We are working on back ends
for PowerPC (Mac OS X), ARM, and IA-64. Let us know what other platforms you
are interested in."

------
jliptzin
I own the cminusminus.com domain. Was planning on using it for a blog, mainly
to post horrible c/c++ code snippets I come across. If anyone wants it, let me
know.

~~~
carterschonwald
Hello! Very very cool.

I'm involved in GHC dev (and thus incidentally c-- dev as it exists in GHC).
And i may be spending a lot of time helping improve ghc's code gen over the
coming year (which is essentially the most widely used c-- compiler on the
planet per se)

your remark intrigues me!

~~~
jliptzin
Email me joshliptzin at gmail dot com

------
secoif
According to this infographic, a C-- was the influence for JavaScript.

[http://www.georgehernandez.com/h/xComputers/Programming/Medi...](http://www.georgehernandez.com/h/xComputers/Programming/Media/tongues-
cleaner.png)

If this has any truth (perhaps a different c--?) I'd like to know which one is
being referred to.

~~~
masklinn
> perhaps a different c--?

Yes, the graph talks about
[http://sourceforge.net/projects/cmmscript/](http://sourceforge.net/projects/cmmscript/)
not the C-- used in/extracted from GHC.

There's at least one other C-- used in the OCaml compiler.

------
cylinder714
Another take on portable assembly languages is Dan Bernstein's qhasm:
[http://cr.yp.to/qhasm.html](http://cr.yp.to/qhasm.html)

An overview here:
[http://cr.yp.to/qhasm/20050129-portable.txt](http://cr.yp.to/qhasm/20050129-portable.txt)

------
Radim
The originality (and practicality!) of choosing the name "C - -" leaves me
speechless.

~~~
blueblob
I felt the same way about "C++" postincrement, returns C. This seems more like
it should be a predecrement (--C) because it does not return C, rather
assembly.

EDIT: changed, I said decrement and meant increment.

~~~
_delirium
Predecrement is fairly unidiomatic in C, much less commonly used than in C++.
Typically it's only really used if you're doing something tricky that actually
depends on the semantics of predecrementing, usually in an array reference of
the x[--i] variety. It's idiomatic in C++ I'd guess because of operator
overloading: postdecrement/postincrement may produce large unused object
copies that not all compilers will optimize away, so using the pre- versions
by default has become idiomatic in that community.

~~~
robert_tweed
It's actually pretty idiomatic in optimised C, for the same sort of reason
(when all extra instructions count), e.g., this sort of loop:

for( i=LEN-1; i>=0; --i ) {...}

I've carried this into a lot of other languages, as it's equally readable to
the alternatives, when ordering isn't important. In (mainly older) optimised
code where ordering is important, you'll often still see this, with a second
incrementing variable so that the exit condition retains the compare to zero
(avoiding a variable comparison); although that part is separate from the use
of the pre-decrement.

I'm not sure if either of these make much difference with modern CPUs and
compilers. Certainly not worth worrying about for the most part.

It's untrue that this idiom started with C++. This style is preferred by K&R,
which predates optimising compilers. Since then it's use among C programmers
has probably been force of habit, but it is definitely idiomatic. It's use in
K&R makes it about as idiomatic as it is possible to be.

~~~
_delirium
That isn't the style used in K&R, though. When K&R introduces a _for_ loop for
the first time, in section 3.5, it does it like so, and sticks to this style
throughout:

    
    
        for (i = 0; i < n; i++)
            ...
    

This is the kind of code I run across most commonly (almost exclusively) in
pre-1990s C. It's also the style used in the old Unix sources. For example,
take a look at the source code to 'nohup' or 'mount' from 5th Edition Unix,
1974: [http://minnie.tuhs.org/cgi-
bin/utree.pl?file=V5/usr/source/s...](http://minnie.tuhs.org/cgi-
bin/utree.pl?file=V5/usr/source/s2/nohup.c) [http://minnie.tuhs.org/cgi-
bin/utree.pl?file=V5/usr/source/s...](http://minnie.tuhs.org/cgi-
bin/utree.pl?file=V5/usr/source/s2/mount.c)

They do also use predecrement/preincrement, but only in assignments or
comparisons, where it actually semantically matters that the
increment/decrement is "pre":

    
    
        while(*--np == '/')
                *np = '\0';
    

In contexts where it doesn't matter, like the 3rd clause of a _for_ loop, or
just incrementing a variable as a standalone operation, they always default to
_x++_.

~~~
gsg
I just flicked through my copy of K&R (second edition), and at the beginning
it clearly prefers preincrement. The introduction of for loops comes before
that of ++, and the first for loop in the book is actually

    
    
        for (fahr = 0; fahr <= 300; fahr = fahr + 20)
            ...
    

Each for loop after the introduction of ++ uses preincrement (where
applicable) for a while, and then the style shifts to postincrement.

EDIT: specifically, increments are deliberately prefix ("For the moment we
will stick with the prefix form") until the full introduction of increment and
decrement operators in section 2.8, after which they are postfix.

------
protomyth
I must be missing something since I see the line "The specification is
available as DVI, PostScript, or PDF.", but cannot find any download link.

~~~
burntsushi
I think you'll have better luck with this page:
[http://www.cs.tufts.edu/~nr/c--/](http://www.cs.tufts.edu/~nr/c--/)

------
vezzy-fnord
This is pretty old. I don't know if anyone else besides the GHC team use it?

~~~
elehack
OCaml also internally uses an intermediate code representation called C-- (or
Cmm). I do not know if the two have any relationship.

~~~
_delirium
It looks like OCaml's C-- predates this C--, but has had an influence on it.

From Xavier Leroy, one of the lead Ocaml developers [1]:

    
    
        I think I'm the one who coined the name "C--" to refer to a low-level,
        weakly-typed intermediate code with operations corresponding roughly
        to machine instructions, and minimal support for exact garbage
        collection and exceptions.  See my POPL 1992 paper describing the
        experimental Gallium compiler.  Such an intermediate code is still in
        use in the ocamlopt compiler.
    
        I had many interesting discussions with Simon PJ and Norman Ramsey
        when they started to design their intermediate language.  Simon liked
        the name "C--" and kindly asked permission to re-use the name.
    
        However, C-- is more general than the intermediate code used by
        ocamlopt, since it is designed to accommodate the needs of many source
        languages, and present a clean, abstract interface to the GC and
        run-time system.  The ocamlopt intermediate code is somewhat
        specialized for the needs of Caml and for the particular GC we use.
    

[1]
[http://article.gmane.org/gmane.comp.lang.caml.inria/9436/](http://article.gmane.org/gmane.comp.lang.caml.inria/9436/)

------
ohwp
Wow, I was surprised by the content of the website. I think this is how a
website should be.

First they are talking about the problem and then they present the solution.

I also like the words that are marked bold.

This is how interaction design should be done (imho).

------
sanxiyn
For another take on this area, I recommend reading about Pillar from Intel.
[http://dl.acm.org/citation.cfm?id=1433063](http://dl.acm.org/citation.cfm?id=1433063)

------
mrcactu5
this seems really awesome, but I have no idea what it does?

I know that Python compiles to C and that Clojure compiles to JVM (or even to
JavaScript).

My cartoon:

    
    
      scripting lang --> programming lang --> native code
    

Honestly, I have never experimented with Assembly language much except for
COOL
([http://en.wikipedia.org/wiki/Cool_(programming_language)](http://en.wikipedia.org/wiki/Cool_\(programming_language\)))
and TOY
([http://introcs.cs.princeton.edu/java/52toy/](http://introcs.cs.princeton.edu/java/52toy/)).

~~~
elehack
Python does not compile to C (well, there might be such a compiler, but it is
not the standard mode of operation).

In CPython, Python compiles to bytecode, which is then interpreted by the
Python interpreter (which itself is written in C).

~~~
gsnedders
RPython, the Python subset that PyPy is written in, is compiled to C. (That
said, it is a fairly restrictive subset, and the compiler is very much
designed for interpreters and nothing else.)

------
jdc0589
Cool! the first parser/compiler I wrote was for C-- (the version that is a
small subset of C, not this one). Had not even heard an mention of the
different C-- languages for a few years now.

------
paulhodge
Would C-- be a good choice for JIT machine code generation, or is it mostly
for static compilation?

------
EGreg
How does C- compare to CIL of .NET?

~~~
Locke1689
As far as I can tell, some similar features, but overall completely different.

For one, I think C-- still looks a lot like C, i.e. still an imperative
language. CIL is a stack language (also with a built-in object system).

------
ErsatzVerkehr
Where can I find a code example?

------
EGreg
If the language is called C-, how come the website is named Cminusminus?

------
davebees
Is the 'minus minus' being converted into a dash throughout?

~~~
tel
I saw it in a few places, but often it was typeset C - -

------
Dewie
Say you're writing a compiler for a language in Haskell, and you want to
generate machine code rather than having it be interpreted. Is C-- a natural
choice on this platform? Or might LLVM be a better choice?

~~~
carterschonwald
Use llvm-general. [http://hackage.haskell.org/package/llvm-
general](http://hackage.haskell.org/package/llvm-general)

Idris uses llvm-general to have a simple llvm backend. Also llvm general is
probably the nicest and most thorough llvm binding you'll find.

