cminusminus.org

carterschonwald · on Oct 27, 2013

Hey everyone, I'd like to point out that the c-- domain is no longer cminusminus.org, the historical site can be found on norman ramsey's homepage here http://www.cs.tufts.edu/~nr/c--/ ! It also has actual reading material / papers!

The cminusminus domain is no longer valid (though it has more modern CSS), also it lacks links to all the informative papers!

C-- is very similar overall to LLVM IR, though there are crucial differences, but overall you could think of them as equivalent representations you can map between trivially (albeit thats glossing over some crucial details).

In fact, a few people have been mulling the idea of writing a LLVM IR frontend that would basically be a C-- variant. LLVM IR has a human readable format, but its not quite a programmer writable format!

C-- is also the final rep in the ghc compiler before code gen (ie the "native" backend, the llvm backend, and the unregisterized gcc C backend).

theres probably a few other things I could say, but that covers the basics. I'm also involved in GHC dev and have actually done a teeny bit of work on the c-- related bits of the compiler.

relatedly: i have a few toy C-- snippets you can compile and benchmark using GHC, in a talk I gave a few months ago https://bitbucket.org/carter/who-ya-gonna-call-talk-may-2013... https://vimeo.com/69025829

I should also add that C-- in GHC <= 7.6 doesn't have function arguments, but in GHC HEAD / 7.7 and soon 7.8, you can have nice function args in the C-- functions. See https://github.com/ghc/ghc/blob/master/rts/PrimOps.cmm for GHC HEAD examples, vs https://github.com/ghc/ghc/blob/ghc-7.6/rts/PrimOps.cmm for the old style.

X4 · on Oct 27, 2013

Code examples: http://www.cs.tufts.edu/~nr/c--/download/c--exn.pdf

Slides: http://www.cs.tufts.edu/~nr/c--/download/c--exnslides.ps.gz

Audio: http://wino.eecs.harvard.edu:8080/ramgen/nr-pldi00.rm

The Manual: http://www.cs.tufts.edu/~nr/c--/extern/man2.pdf

The manual contains the specifications and a few code examples. It looks like it's easy to learn, but it's a little different than other languages.

m_mueller · on Oct 27, 2013

Could someone enlighten me what's the advantage of this over LLVM-IR?

Edit: Ok, I've found the following SO thread: http://stackoverflow.com/questions/3891513/how-does-c-compar...

_delirium · on Oct 27, 2013

If you want to know why it exists despite LLVM-IR existing, one answer is that at the time C-- was launched, LLVM-IR didn't yet exist.

As for how they compare, the answers to this question, on why Haskell didn't use LLVM (in 2009, it didn't) get into it a bit: http://stackoverflow.com/questions/815998/llvm-vs-c-how-can-...

rayiner · on Oct 27, 2013

C-- is intended for higher-level languages than LLVM. E.g. the latter still doesn't have a modern infrastructure interfacing with garbage collectors. Also, C-- is a lot easier to read in textual form than LLVM IR. Finally, the use of C++ and templates makes the code size of LLVM absolutely enormous (20MB binary).

That said, so much work goes into LLVM supporting new platforms, optimizations, etc, that it's probably easier to hack around LLVM's limitations than use C--, etc.

gngeal · on Oct 27, 2013

You probably don't need LLVM to work with LLVM-IR. In fact, I imagine that there could be a market for a "low-level" IR with more compact implementation, sort of similar to Oberon "slim binaries" or something like that. The fact remains, though, that the "pragmatics" stuff around LLVM-IR (GC and other interfaces) seems still a little bit problematic.

But I'm a Luddite, though; I like self-hosting native compilers and I'm not terribly fond of the idea of having to pack a 20MB blob of C++ code with my code either. (There's a lot of apps that could profit from dynamic translation at run-time, but using LLVM for that feels unnecessarily heavyweight. I wish the algorithms and passes from LLVM were available in form of some high-level DSL that you'd be able to translate into whatever language you use and automatically adapt to whatever data structures you use in your code. It doesn't sound exactly impossible.)

carterschonwald · on Oct 27, 2013

LLVM-General but more so LLVM-General-Pure give you a starter kit for building something like that, http://hackage.haskell.org/package/llvm-general http://hackage.haskell.org/package/llvm-general-pure

jlebar · on Oct 27, 2013

> Finally, [C-- is better than LLVM because] its use of C++ and templates makes the code size of LLVM absolutely enormous (20MB binary).

In a world where my phone has 2gb of RAM, I don't understand why 20mb for an optimizing compiler is in any way onerous or unreasonable.

haberman · on Oct 27, 2013

Small is beautiful. How much cache does your phone have? How long does it take to compile a 20MB binary? How long does it take it to load all those symbols into a debugger? How long does it take to process all those relocations if you are linking dynamically? How big will your overall binary be if every component is 20MB?

Figs · on Oct 27, 2013

Downloading 20 MB here and 20 MB there adds up pretty quickly if you're using a cellphone data connection like I am.

peapicker · on Oct 27, 2013

When I first entered college in 1988 there was a small DOS compiler floating around called C-- back then, which I got from some BBS (yes, a BBS, how antiquated!), probably in 1989. It was a mix of a subset of C and proto-assembly. I have looked for it a few times over the years, and this one isn't it, although it has some similar ideas. It makes me wonder how many other little-known C-- projects there are.

dragonbonheur · on Oct 27, 2013

It was Sphinx C-- http://www.cs.utexas.edu/users/tbone/c--/

dpcan · on Oct 27, 2013

YES! That's the one I was thinking about too. I remember making some simple games in this as a kid and I learned a lot of assembly using this at the time also because, if I recall, you actually referenced registers and such right in with your other code - or something strange like that.

peapicker · on Oct 27, 2013

Thank you, thank you, thank you!

henderson101 · on Oct 28, 2013

Wow - I used this one too (vividly remember the "fire" effect demo) and have looked for it a few times to no avail. Nice find :-)

sambeau · on Oct 27, 2013

According to this, C-- is still a large part of the Glasgow Haskell Compiler. It looks like (Fig 5.2) code goes into C-- before being translated to LLVM.

http://www.aosabook.org/en/ghc.html

gngeal · on Oct 27, 2013

Uh, I believe that the GHC C-- (Cmm) is one of the dozen different C--es that have nothing in common with this project save for the name.

ibdknox · on Oct 27, 2013

Given that Simon Peyton Jones is listed as a co-author on several of the papers, I find that a bit too coincidental to be true.

gngeal · on Oct 27, 2013

This thing is written in ML, not in Haskell, and the GHC docs themselves claim that Cmm (the GHC thingy) is "is rather like C--. The syntax is almost C-- (a few constructs are missing), and it is augmented with some macros that are expanded by GHC's code generator". So "GHC's Cmm/C--" sounds rather like an independent reimplementation of a superset of a subset of "this C--".

asdasf · on Oct 28, 2013

It literally can't be an "independent reimplementation" if one person played a major role in both of them.

vanderZwan · on Oct 28, 2013

Domain specific dialect?

carterschonwald · on Oct 27, 2013

The GHC C-- is a subset of the full Norman Ramsey C--

gdonelli · on Oct 27, 2013

"As of May 2004, only the Pentium back end is as expressive as a C compiler. The Alpha and Mips back ends can run hello, world. We are working on back ends for PowerPC (Mac OS X), ARM, and IA-64. Let us know what other platforms you are interested in."

jliptzin · on Oct 27, 2013

I own the cminusminus.com domain. Was planning on using it for a blog, mainly to post horrible c/c++ code snippets I come across. If anyone wants it, let me know.

jloughry · on Oct 28, 2013

In case anyone [else] is wondering, RFC 1035 disallows domain names starting or ending with "-", so there is no registering C--.com, alas:

    <label> ::= <letter> [ [ <ldh-str> ] <let-dig> ]

    <ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>

    <let-dig-hyp> ::= <let-dig> | "-"

    <let-dig> ::= <letter> | <digit>

    <letter> ::= any one of the 52 alphabetic characters A
      through Z in upper case and a through z in lower case

    <digit> ::= any one of the ten digits 0 through 9

carterschonwald · on Oct 28, 2013

Hello! Very very cool.

I'm involved in GHC dev (and thus incidentally c-- dev as it exists in GHC). And i may be spending a lot of time helping improve ghc's code gen over the coming year (which is essentially the most widely used c-- compiler on the planet per se)

your remark intrigues me!

jliptzin · on Oct 28, 2013

Email me joshliptzin at gmail dot com

secoif · on Oct 27, 2013

According to this infographic, a C-- was the influence for JavaScript.

http://www.georgehernandez.com/h/xComputers/Programming/Medi...

If this has any truth (perhaps a different c--?) I'd like to know which one is being referred to.

masklinn · on Oct 27, 2013

> perhaps a different c--?

Yes, the graph talks about http://sourceforge.net/projects/cmmscript/ not the C-- used in/extracted from GHC.

There's at least one other C-- used in the OCaml compiler.

saljam · on Oct 27, 2013

That's almost certainly a different one. It's described as a scripting language and that it appeared in 92. This C-- only came around 97...

It's quite an obvious name, I'd expect there to be many more called C--!

brent_noorda · on Oct 28, 2013

That must me the Cmm we did in the early 90s. That engine became the basis for (probably) most JavaScript uses up until 10 years ago. I wrote too much about it here: http://www.brent-noorda.com/nombas/history/HistoryOfNombas.h...

etherealG · on Oct 27, 2013

it mentions c-- as a scripting language, i'd guess they're different things.

cylinder714 · on Oct 27, 2013

Another take on portable assembly languages is Dan Bernstein's qhasm: http://cr.yp.to/qhasm.html

An overview here: http://cr.yp.to/qhasm/20050129-portable.txt

Radim · on Oct 27, 2013

The originality (and practicality!) of choosing the name "C - -" leaves me speechless.

blueblob · on Oct 27, 2013

I felt the same way about "C++" postincrement, returns C. This seems more like it should be a predecrement (--C) because it does not return C, rather assembly.

EDIT: changed, I said decrement and meant increment.

_delirium · on Oct 27, 2013

Predecrement is fairly unidiomatic in C, much less commonly used than in C++. Typically it's only really used if you're doing something tricky that actually depends on the semantics of predecrementing, usually in an array reference of the x[--i] variety. It's idiomatic in C++ I'd guess because of operator overloading: postdecrement/postincrement may produce large unused object copies that not all compilers will optimize away, so using the pre- versions by default has become idiomatic in that community.

robert_tweed · on Oct 27, 2013

It's actually pretty idiomatic in optimised C, for the same sort of reason (when all extra instructions count), e.g., this sort of loop:

for( i=LEN-1; i>=0; --i ) {...}

I've carried this into a lot of other languages, as it's equally readable to the alternatives, when ordering isn't important. In (mainly older) optimised code where ordering is important, you'll often still see this, with a second incrementing variable so that the exit condition retains the compare to zero (avoiding a variable comparison); although that part is separate from the use of the pre-decrement.

I'm not sure if either of these make much difference with modern CPUs and compilers. Certainly not worth worrying about for the most part.

It's untrue that this idiom started with C++. This style is preferred by K&R, which predates optimising compilers. Since then it's use among C programmers has probably been force of habit, but it is definitely idiomatic. It's use in K&R makes it about as idiomatic as it is possible to be.

_delirium · on Oct 27, 2013

That isn't the style used in K&R, though. When K&R introduces a for loop for the first time, in section 3.5, it does it like so, and sticks to this style throughout:

    for (i = 0; i < n; i++)
        ...

This is the kind of code I run across most commonly (almost exclusively) in pre-1990s C. It's also the style used in the old Unix sources. For example, take a look at the source code to 'nohup' or 'mount' from 5th Edition Unix, 1974: http://minnie.tuhs.org/cgi-bin/utree.pl?file=V5/usr/source/s... http://minnie.tuhs.org/cgi-bin/utree.pl?file=V5/usr/source/s...

They do also use predecrement/preincrement, but only in assignments or comparisons, where it actually semantically matters that the increment/decrement is "pre":

    while(*--np == '/')
            *np = '\0';

In contexts where it doesn't matter, like the 3rd clause of a for loop, or just incrementing a variable as a standalone operation, they always default to x++.

gsg · on Oct 27, 2013

I just flicked through my copy of K&R (second edition), and at the beginning it clearly prefers preincrement. The introduction of for loops comes before that of ++, and the first for loop in the book is actually

    for (fahr = 0; fahr <= 300; fahr = fahr + 20)
        ...

Each for loop after the introduction of ++ uses preincrement (where applicable) for a while, and then the style shifts to postincrement.

EDIT: specifically, increments are deliberately prefix ("For the moment we will stick with the prefix form") until the full introduction of increment and decrement operators in section 2.8, after which they are postfix.

zanny · on Oct 27, 2013

Well, the really early C++ compilers would output C code and daisy chain on a C compiler.

That trend didn't last very long, though.

cwzwarich · on Oct 27, 2013

The EDG C/C++ frontend (used by a lot of commercial C++ compilers) still supports this, and Comeau's compiler still relies on this for code generation. EDG also implements a different template instantiation model than other C++ compilers, partially so that it can function with a C++-agnostic linker.

Kronopath · on Oct 27, 2013

I notice that whatever CMS they're using for the website seems to be turning double hyphens (--) into dashes (–), which is unfortunate.

maemilius · on Oct 27, 2013

I'm glad I'm not the only one that noticed this.

nrnrnr · on Oct 28, 2013

Actually, it turned out to be totally impractical: you can't register the domain c--.org, and you can't usefully Google for "C--". Live and learn.

nrnrnr · on Oct 29, 2013

Actually, it turned out to be totally impractical: you can't register the domain c--.org, and you can't usefully Google for "C--". Live and learn.

protomyth · on Oct 27, 2013

I must be missing something since I see the line "The specification is available as DVI, PostScript, or PDF.", but cannot find any download link.

burntsushi · on Oct 27, 2013

I think you'll have better luck with this page: http://www.cs.tufts.edu/~nr/c--/

vezzy-fnord · on Oct 27, 2013

This is pretty old. I don't know if anyone else besides the GHC team use it?

VLM · on Oct 27, 2013

I think its part of the quaint appeal of the article, that c-- only supports i386 but the way the world moved did not require c--, FSF supported GCC supports something like 50 archs and many more not officially FSF supported. And GCC is hardly the only c compiler out there. The world just didn't move in the direction directly requiring c--.

Now as for current ideas and projects similar in concept to c--, that is interesting to think about.

Maybe the startup lesson is sometimes, if you try to intermediate yourself as a middleman, even if you do a good job of it, and appear to be a good idea, it just doesn't work. It would be interesting to dissect the c-- experience and figure out why.

_delirium · on Oct 27, 2013

GCC did already exist when the C-- project was started, so that part didn't really change. The C-- team's hypothesis was that some kinds of languages (especially functional languages) would benefit from a cross-platform layer that is a better compilation target than C, partly through being a bit more low-level. The previous two popular routes were native-code compilers and compile-via-C compilers. Native-code compilers have the disadvantage that porting them and maintaining them on N architectures is significant work, and compile-via-C compilers have the disadvantage that C isn't a very convenient compilation target for many language features, especially to efficiently compile some common FP language features. The hope was that C-- would be a nice middle ground between compiling via C and compiling to native code, producing a target that was nicer than C as a target.

Imo, that hypothesis did have some legs, but people are now instead usually using LLVM in various ways to accomplish it. LLVM's intermediate representation isn't really properly cross-platform, but it can be sort of hacked to be used for that purpose.

mattgreenrocks · on Oct 27, 2013

Do you think LLVM IR covers this ground adequately, despite not being cross-platform?

_delirium · on Oct 27, 2013

I'm not really a language implementor, more of a spectator of language implementations, so someone with more experience would give a better answer.

My impression is that the GHC people, at least, still think that C-- is a nicer IR than LLVM-IR for their purposes. But they are slowly moving more things to LLVM anyway, because the LLVM project as a project has, in the meantime, built up a lot more momentum and infrastructure. In the early 2000s this wasn't obvious, but in 2013 it's clear that LLVM has an ecosystem, institutional support, resources to maintain ports, etc., while C-- didn't manage to get the same traction.

thinkmoore · on Oct 27, 2013

From my personal experience, LLVM is tied a bit to closely to the C ABI, which makes it difficult to implement some FP features cleanly. One example is forgoing C's stack discipline to implement double-barreled continuations.

elehack · on Oct 27, 2013

OCaml also internally uses an intermediate code representation called C-- (or Cmm). I do not know if the two have any relationship.

_delirium · on Oct 27, 2013

It looks like OCaml's C-- predates this C--, but has had an influence on it.

From Xavier Leroy, one of the lead Ocaml developers [1]:

    I think I'm the one who coined the name "C--" to refer to a low-level,
    weakly-typed intermediate code with operations corresponding roughly
    to machine instructions, and minimal support for exact garbage
    collection and exceptions.  See my POPL 1992 paper describing the
    experimental Gallium compiler.  Such an intermediate code is still in
    use in the ocamlopt compiler.

    I had many interesting discussions with Simon PJ and Norman Ramsey
    when they started to design their intermediate language.  Simon liked
    the name "C--" and kindly asked permission to re-use the name.

    However, C-- is more general than the intermediate code used by
    ocamlopt, since it is designed to accommodate the needs of many source
    languages, and present a clean, abstract interface to the GC and
    run-time system.  The ocamlopt intermediate code is somewhat
    specialized for the needs of Caml and for the particular GC we use.

[1] http://article.gmane.org/gmane.comp.lang.caml.inria/9436/

ohwp · on Oct 28, 2013

Wow, I was surprised by the content of the website. I think this is how a website should be.

First they are talking about the problem and then they present the solution.

I also like the words that are marked bold.

This is how interaction design should be done (imho).

sanxiyn · on Oct 27, 2013

For another take on this area, I recommend reading about Pillar from Intel. http://dl.acm.org/citation.cfm?id=1433063

mrcactu5 · on Oct 27, 2013

this seems really awesome, but I have no idea what it does?

I know that Python compiles to C and that Clojure compiles to JVM (or even to JavaScript).

My cartoon:

  scripting lang --> programming lang --> native code

Honestly, I have never experimented with Assembly language much except for COOL (http://en.wikipedia.org/wiki/Cool_(programming_language)) and TOY (http://introcs.cs.princeton.edu/java/52toy/).

elehack · on Oct 27, 2013

Python does not compile to C (well, there might be such a compiler, but it is not the standard mode of operation).

In CPython, Python compiles to bytecode, which is then interpreted by the Python interpreter (which itself is written in C).

gsnedders · on Oct 27, 2013

RPython, the Python subset that PyPy is written in, is compiled to C. (That said, it is a fairly restrictive subset, and the compiler is very much designed for interpreters and nothing else.)

JulianWasTaken · on Oct 27, 2013

Python does not compile to C. In CPython, it compiles to VM bytecode for a VM written in C (analogous to Closure compiling to JVM bytecode).

tel · on Oct 27, 2013

When compiling a language you have to compile it "to" somewhere. You could pick assembly, but assembly is so low level that it changes bit by bit depending on the processor. You want a stable target that's "low level" enough to let you optimize things for the machine you'll be working with, but not so low level as to be a moving target.

You could pick C, but C is actually still fairly complex and has lots of undefined behavior. You could pick JVM, but then you'll pick up the entirety of the JVM architecture which is quite large and may include many things you're not interested in.

C-- is another choice. It's decidedly lower-level than C (and thus far lower than JVM), higher level than assembly, and was crafted, as far as I know, under the deep influence of how to compile the pure functional language Haskell (or more specifically, it's System FC style and STG underlying languages).

In the mean time, LLVM took a similar place in this hierarchy and has probably taken off much more than C-- has. GHC, a Haskell compiler, in fact is moving its compilation pathway that way.

jdc0589 · on Oct 28, 2013

Cool! the first parser/compiler I wrote was for C-- (the version that is a small subset of C, not this one). Had not even heard an mention of the different C-- languages for a few years now.

paulhodge · on Oct 27, 2013

Would C-- be a good choice for JIT machine code generation, or is it mostly for static compilation?

EGreg · on Oct 27, 2013

How does C- compare to CIL of .NET?

Locke1689 · on Oct 27, 2013

As far as I can tell, some similar features, but overall completely different.

For one, I think C-- still looks a lot like C, i.e. still an imperative language. CIL is a stack language (also with a built-in object system).

ErsatzVerkehr · on Oct 27, 2013

Where can I find a code example?

EGreg · on Oct 27, 2013

If the language is called C-, how come the website is named Cminusminus?

davebees · on Oct 27, 2013

Is the 'minus minus' being converted into a dash throughout?

tel · on Oct 27, 2013

I saw it in a few places, but often it was typeset C - -

Dewie · on Oct 27, 2013

Say you're writing a compiler for a language in Haskell, and you want to generate machine code rather than having it be interpreted. Is C-- a natural choice on this platform? Or might LLVM be a better choice?

carterschonwald · on Oct 27, 2013

Use llvm-general. http://hackage.haskell.org/package/llvm-general

Idris uses llvm-general to have a simple llvm backend. Also llvm general is probably the nicest and most thorough llvm binding you'll find.