
Amplifying C (2010) - jhack
http://voodoo-slide.blogspot.com/2010/01/amplifying-c.html
======
pcwalton
> The system introduces an "amplification" phase where s-expressions are
> transformed to C code before a traditional build system runs.

In other words, this is a compiler.

At this point I have to say what I always say: Don't compile to C. You will
spend forever eliminating undefined behavior, the debugging experience will be
bad because you don't have direct control over the DWARF DIEs, your
compilation will be slower for no reason, you will not be able to add custom
metadata to be consumed by optimization passes (for example, aliasing passes),
you won't have access to special instructions like LLVM add nuw/nsw, etc.

Virtually all languages I know of that started out compiling to C stopped
doing it at some point in their development, for one or more of the above
reasons. Skip the major technical overhaul you're going to inevitably have to
do and just build LLVM IR from the beginning. The LLVM API is excellent, so it
ends up being less work to begin with.

~~~
generic_user
> In other words, this is a compiler.

Its not a compiler it a pre-processor.

People look at C in isolation but don't realise it was designed to be part of
the full Unix system that included Sed, Awk, M4, Lex, Yacc, Sh and all the
other tools.

Writing custom C abstractions with Awk is fairly trivial and you end up with
efficient C code that can be further processed or tuned. Thats what tools like
Awk are there for.

The compiler itself is built on the same principles doing successive
transformations on source, IR, assembly etc. There is really no rigid
boundary.

If you want C with custom abstractions a custom dialect that compiles to C
makes perfect sense. That use case was taken into account in its design.

~~~
coliveira
Can you given an example of how to write custom C abstractions with Awk?

~~~
benwaffle
[https://www.cs.rit.edu/~ats/books/ooc.pdf](https://www.cs.rit.edu/~ats/books/ooc.pdf)

------
Animats
This road is well traveled. Objective-C, C#, and Java are all on that road -
start with C, and add only the stuff you need. A few years down the road, and
the new language has feature bloat.

The three big problems in C are "how big is it", "who released it", and "who
locks it". The language provides no help with any of these issues. C++ allows
papering over the problem, but the abstractions always leak and the mold
always comes through the wallpaper. There are times when you need a raw
pointer (for system calls, for example) and the availability of raw pointers
breaks the size and ownership protection.

Rust deals effectively with all three of those issues. That was a major
breakthrough, one of the few fundamental advances in language design in years.
Rust, unfortunately, seems to have taken a turn for the worse in the last
year. Rust has been infected with the Boost disease - overly clever templates.
Rust is a procedural language, but template enthusiasts have been making it
pseudo-functional with constructs like ".and_then()" and ".or_else()". Then
there's "try!()", with its invisible return statement. The result is painful
to read and hard to maintain.

The author of the parent article has a point about "with-" type constructs.
Python has a general "with" statement, which can be used on any object that
implements "__enter__" and "__exit__". That's a very useful and safe language
feature. Both Go and Rust lack it, and their workarounds ("defer" and
destructors) are worse.

Python's "with" plays well with exceptions. Exceptions in Python work well.
Exceptions in C++, not so much, mainly because ownership and exceptions mix
badly. Go could fix that with GC, and Rust could fix that with the borrow
checker. But neither has exceptions. As a result, error handling in Go is
wordy, and in Rust, both complicated and wordy.

I had high hopes for Rust, but they may have jumped the shark by getting too
clever. We don't need a new C, but we may need a new Rust. At this point, any
new low level language that doesn't have a borrow checker is flawed from the
start.

~~~
ygra
> Objective-C, C#, and Java are all on that road - start with C, and add only
> the stuff you need.

If »starting with C« means »use a mostly similar syntax for basic things«,
then yes. Otherwise only ObjC has been on that road, given that C supports
things that C# and Java don't. And then it's a »start with C, remove stuff you
don't like, and add stuff you need«, which is probably more akin to »start
wherever and pick things you want in the language«. E.g. syntax aside, C#
probably has much more in common with Delphi in its inception and design than
with C.

------
mapcars
There is a project by Fernando Borretti for Lisp-style macros in C, with macro
expander written in Common Lisp:
[https://github.com/eudoxia0/cmacro](https://github.com/eudoxia0/cmacro), and
a collection of basic macros:
[https://github.com/eudoxia0/magma](https://github.com/eudoxia0/magma)

This way he has already added lambdas, lazy evaluation, types, anaphoric
macros, with- macros.

~~~
eudox
The difference between cmacro is it builds on C syntax (and allows for syntax
extensions, by having a very flexible parser), while this is more like the
"defmacro for C" system described here[0] and here[1], because it uses a Lisp
syntax.

I prefer the former approach because it's less likely to alienate C
programmers. At the same time, being able to define new syntax can have the
same effect, since every new syntax you add carves out a smaller and smaller
set of the community that is willing to use it. The same is true for Common
Lisp reader macros, which is why few of the many syntax extensions that are
available become commonly used.

[0]: [http://www.european-lisp-
symposium.org/editions/2014/ELS2014...](http://www.european-lisp-
symposium.org/editions/2014/ELS2014.pdf)

[1]: [http://www.european-lisp-
symposium.org/editions/2014/selgrad...](http://www.european-lisp-
symposium.org/editions/2014/selgrad.pdf)

------
eigenbom
Always fun to see projects like this, but I don't get some of the motivations.
Regarding the code examples:

    
    
      // c++
      AutoFile file("c:/temp/foo.txt", "w");
      fprintf(file, "Hello, world");
    
      // lisp
      (with-open-file (f "c:/temp/foo.txt" "w")
        (fprintf f "Hello, world"))
    

.. the author thinks the former is cluttered and latter uncluttered (simply
because the fprintf is within a sub-scope of with-open-file)? That's a fragile
rationale, but you can certainly write c++ that way if you like.

~~~
nightcracker
If you actually use RAII, and write braces in a bit of an unorthodox style,
the distinction disappears even further:

    
    
        { AutoFile file("c:/temp/foo.txt", "w");
             fprintf(file, "Hello, world"); }
    
        (with-open-file (f "c:/temp/foo.txt" "w")
            (fprintf f "Hello, world"))

~~~
noobermin
I honestly think explicit scopes, even though unorthodox, should become more
of a thing. I personally do it in my C play projects to emphasize particular
parts in a function as being independent "subordinate clauses", and with
variables that only have a limited scope, but not being independent enough to
be factored into a function--for example when the clause needs to be within
reach of a label.

In C++ with RAII, the "emphasis" makes even more sense, to the extent that it
doesn't make things ugly.

------
overgard
Wouldn't it make more sense to use a Scheme that compiles down to C (like,
Chicken scheme or something like that)?

I guess the argument might be readability of the output, but I'm guessing as
his lisp grows more complex the output is going to diverge from the input
quite a bit anyway (I'm trying to imagine what a closure would look like.. I'm
guessing it wouldn't be pretty)

~~~
junke
Amplify is mainly used to generate C code that does not need a Lisp runtime.
It is used effectively as a smarter preprocessor.

Embeddable Common Lisp (ECL) [1] is an efficient implementation of Lisp that
compiles to C. It can be easily used to interface C and Lisp.

Regarding C++, nowadays there is Clasp [2] that is under active development
and looks very promising.

[1] [https://common-lisp.net/project/ecl/](https://common-
lisp.net/project/ecl/) [2]
[https://github.com/drmeister/clasp](https://github.com/drmeister/clasp)

------
eatonphil
I don't know a lot about C or about Lisp, but I do know that this isn't the
first compile-to-C Lisp. So why not use one of the ones that already exist
instead of starting from scratch? And the blog post seemed to treat it like
this is the first attempt to compile a high level language to C (MLTon, JHC,
various Schemes, etc).

~~~
dang
It's because the author doesn't want a whole new language, but rather a better
way to build abstractions over C.

What he wants is C's conceptual model with sexpr syntax and (therefore) the
ability to build custom abstractions on top of it. That's different from
working in an entirely other programming language with its own conceptual
model that happens to compile to C.

You would see this difference vividly in the C that the two systems generate:
the first would read like a more verbose version of the same application in
which the skeleton of the program is recognizable; the second would look like
machine gobbledygook. The former would be debuggable in a way that let you
step through the logic of the surface program, while the other probably
wouldn't. And so on.

Instead of thinking of this as a different language, think of it as 'magic C
with DSLs', or 'C, with a macro system that lets you do what you want'. That
distinction isn't absolute, because if you push your abstractions far enough
you will end up with a different language, but the style of programming the
author is talking about gives you incentives not to do that.

~~~
pcwalton
Even if all you want is C semantics with a different syntax, though, I think
it's better to emit LLVM IR. LLVM IR is close to C semantics (though aliasing
information has to be supplied explicitly), but you have direct control over
debug info, which is important to avoid a huge regression in the debugging
experience. As an added bonus, you eliminate the necessity of serializing and
deserializing your IR during your compilation pipeline for no reason (which is
effectively what compiling and reparsing C is doing).

~~~
dang
You may be right, but learning curve is an issue. If you're already familiar
with C, writing a sexpr C generator is super easy. I can see why someone would
balk at learning a new conceptual model and toolset, and just take the path of
least resistance, especially if they already know the kind of C program they'd
like to write. This would have been even more true in 2010 of course.

So what's the best way to tackle the learning curve of what you're suggesting?
If I know zero about LLVM and I want to make something like the OP, what
should I do?

~~~
mpweiher
Another good option is to use Joe Armstrong's approach: look at the LLVM IR
generated by clang for example and then emit those.

My first attempt at using LLVM was using the C++ API. It was...a struggle.
Using this approach (IR snippets), I made more progress in a day than I had in
months using the API.

~~~
sklogic
Also worth mentioning an invaluable learning tool: 'cpp' backend in LLVM. It
emits an idiomatic C++ code that generates any given IR module using the LLVM
API.

~~~
mpweiher
Yup, that's what I used in my first attempt. Didn't work for me. (Actually: a
later part of my first attempt, I think I initially tried with the straight
API. Good luck with that).

~~~
sklogic
What exactly did not work? Have you filed a bug?

------
jamii
IMO Terra ([http://terralang.org/](http://terralang.org/)) is a more
interesting approach to this problem, since it not only enables macros but
also staged compilation, polymorphism-as-a-library, runtime specialization
etc.

> You can use Terra and Lua as…

> A scripting-language with high-performance extensions. While the performance
> of Lua and other dynamic languages is always getting better, a low-level of
> abstraction gives you predictable control of performance when you need it.
> Terra programs use the same LLVM backend that Apple uses for its C
> compilers. This means that Terra code performs similarly to equivalent C
> code. For instance, our translations of the nbody and fannhakunen programs
> from the programming language shootout1 perform within 5% of the speed of
> their C equivalents when compiled with Clang, LLVM’s C frontend. Terra also
> includes built-in support for SIMD operations, and other low-level features
> like non-temporal writes and prefetches. You can use Lua to organize and
> configure your application, and then call into Terra code when you need
> controllable performance.

> An embedded JIT-compiler for building languages. We use techniques from
> multi-stage programming to make it possible to meta-program Terra using Lua.
> Terra expressions, types, and functions are all first-class Lua values,
> making it possible to generate arbitrary programs at runtime. This allows
> you to compile domain-specific languages (DSLs) written in Lua into high-
> performance Terra code. Furthermore, since Terra is built on the Lua
> ecosystem, it is easy to embed Terra-Lua programs in other software as a
> library. This design allows you to add a JIT-compiler into your existing
> software. You can use it to add a JIT-compiled DSL to your application, or
> to auto-tune high-performance code dynamically.

> A stand-alone low-level language. Terra was designed so that it can run
> independently from Lua. In fact, if your final program doesn’t need Lua, you
> can save Terra code into a .o file or executable. In addition to ensuring a
> clean separation between high- and low-level code, this design lets you use
> Terra as a stand-alone low-level language. In this use-case, Lua serves as a
> powerful meta-programming language. You can think of it as a replacement for
> C++ template metaprogramming or C preprocessor X-Macros4 with better syntax
> and nicer properties such as hygiene. Since Terra exists only as code
> embedded in a Lua meta-program, features that are normally built into low-
> level languages can be implemented as Lua libraries. This design keeps the
> core of Terra simple, while enabling powerful behavior such as conditional
> compilation, namespaces, templating, and even class systems implemented as
> libraries.

~~~
brandonbloom
+1. I can't say enough good things about Terra.

I self identify as a Lisper (Clojure), but have done a fair amount of
professional work in C/C++/Go too. I was productive in Lua/Terra practically
overnight. I ported a bunch of Rust (that I had originally ported from C and
was considering porting back to C + custom string-templating) to Lua in a few
days and the code got shorter, simpler, faster, and with build times of a tiny
fraction of those in Rust. Terra just matches how I think about programming
low-level code: Types are about shifting work to compile time. Yes, some
error-detection work, sure, but that's totally secondary for me as compared to
shifting work to compile time for performance, memory representation, etc.

------
junke
Code:
[https://github.com/deplinenoise/c-amplify](https://github.com/deplinenoise/c-amplify)

~~~
noobermin
It seems like it hasn't been touched since the year of publication of this
article :)

------
ktRolster
This quote from the article really echoes my experience:

"Every C++ shop has its own 'accepted' subset."

And it's frustrating. This quote from the article too:

 _" these are clear signs that C++ isn’t providing any real cost benefit for
us, and that we should be writing code in other ways."_

~~~
pjmlp
Except this happens to all languages when they grow big.

If the language is a rich one, then every shop uses part of it.

If the language has a rich library ecosystem, then each shop has its own
little garden of libraries.

~~~
pekk
This problem is much more serious when you are talking about opaque DSLs. Even
having to retreat to dramatically different corners of a huge language is not
too hot. By comparison, a garden of libraries that are all readable by anyone
in command of a smaller core language is a smaller problem.

~~~
pjmlp
Depends how big the libraries are and if they are commercial or open.

But I agree with the DSLs, that is the curse of many Common Lisp shops, yet
Lisp is a very simple language.

------
fsaintjacques
Is debugging C++ worse than generated C code?

~~~
pcwalton
No, it's not. Debugging generated C code is always worse. One of the major
reasons why C++ compilers stopped compiling to C is that it is basically
impossible to emit proper debug information that way.

------
atilaneves
"compiles super fast": compared to C++, sure. But not really, no.

~~~
iainmerrick
Compiling stb_image, an incredibly useful image loader:

    
    
      $ wc -l stb_image.h
          6433 stb_image.h
    
      $ more stb_image.c 
      #define STBI_ONLY_JPEG
      #define STBI_ONLY_PNG
      #define STBI_FAILURE_USERMSG
      #define STB_IMAGE_IMPLEMENTATION
      #include "stb_image.h"
    
      $ time clang -c stb_image.c -o stb_image.o
    
      real	0m0.091s
      user	0m0.075s
      sys	0m0.014s
    

Seems pretty fast to me!

~~~
atilaneves
Have you ever written Pascal, Go, or D? That's what fast compilation looks
like. The D standard library compiles in 3.5s from scratch.

~~~
iainmerrick
Yes, I've used Go. The compiler is fast and that's really nice; fast
compilation was one of their explicit goals.

Pascal and C are fast because they're old and relatively simple, and their
implementations are mature.

C++ becomes super slow when you start doing anything interesting, when I tried
Rust it seemed very slow, even Java is slow although it seems like it
shouldn't have to be. Those languages are all more widely used than D, I
think.

[edit: typo]

------
signa11
> from the article: "In my prototype system, c-amplify, I’m doing exactly
> this. The system introduces an "amplification" phase where s-expressions are
> transformed to C code before a traditional build system runs."

haven't gambit/chicken already tried something similar for scheme ? don't know
what the latest state of affairs there is though.

------
rumcajz
We've experimented extensively with code generation resulting in C and there a
single most important reason why it does not work: Nobody except the authors
can read and understand, let alone modify, the source code.

------
ctz
> Answer: none, they are equivalent as far as semantics go.

The C function does not have any defined semantics for certain values of a and
b. The Common Lisp function merely adds two integers (or runs out of memory
trying).

~~~
junke
But this is not a Common Lisp function, it is the AST of the previous C
fragment. In fact the code makes no sense interpreted as Common Lisp.

------
nickpsecurity
I built something like this that compiled BASIC to C or C++ to avoid dealing
with the later. Later re-implemented it in LISP. Doing a one-to-one mapping
from LISP to 3GL code makes it easier than some here think. Using a correct-
by-construction coding method eliminates a _lot_ of the problems. Your code
generator can also avoid C-related errors and undefined behavior wherever
possible with tools available to check for most of both. Mine was a simplistic
tool with very-few targets in terms of compilers. _Great_ safety and
productivity, though.

Here's main benefits a LISP-like C can give you:

1\. Automate boilerplate 2\. Automate safety 3\. 4GL-style doing very much
with very little 4\. Context-specific optimizations 5\. My favorite:
interactive testing & incremental compilation for sub-1s code-to-execute and
maximizing mental flow. Safe, C alternatives will probably never be able to do
this. 6\. Update while the system is running if still in prototyping mode. Fun
stuff. 7\. Save exact execution state of mock C application for debugging if
you design that feature.

I don't recall most of details of implementation due to a broken memory. I
worked in industrial BASIC more than C because C has too many issues. However,
as "dang" and "jhack" are into this, I'll add that I did find it useful as a
non-compiler-expert to setup at least two versions or states of the
development environment: prototyping and production.

The prototyping system was designed for interactive development. It had all
the libraries I might use loaded up, pre-wrapped, and with interface checks.
The 3GL commands, or emulation of them, ran instantly as LISP expressions. You
develop in this mode with no connection to the 3GL except logical and [mostly]
semantic equivalence. Your code is straight up code at the moment.

When done with that, you switch to production where your code is data that's
interpreted and turned into the 3GL. The generator should take care to make
portable code or at least cover your targets. Throw in out-of-bounds checks
and such by default. Can have a declaration to decide that. Anyway, you get
your BASIC, C, C++, whatever code here. I can even imagine one for Rust that
works faster than the actual compiler.

So, two versions of the system: interactive interpretation, incremental
compilation, and live updating of LISP version for quick development w/
optional checks inside (eg typing); version that generates code for external
compilers. Both operate on same input but ignore what they don't need. Can
avoid most debugging issues with a robust, coding style and plenty interface
checks. All I did with that.

Hope lessons from that old project helps someone doing the next. I expect if I
recreate it that the Amplifying C stuff will be helpful in figuring out how to
do that. So might be tools like Racket and me learning LISP _for real_. ;)

------
fleitz
I seem to recall something about C programs becoming bug ridden
implementations of half of Common Lisp

~~~
wolfgke
This is Greenspun's tenth rule
([https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule](https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule)):

"Any sufficiently complicated C or Fortran program contains an ad hoc,
informally-specified, bug-ridden, slow implementation of half of Common Lisp."

~~~
ktRolster
The Morris corollary:

"Including Common Lisp."
[http://paulgraham.com/quotes.html](http://paulgraham.com/quotes.html)

~~~
wolfgke
Can also be read on the wikipedia link that I posted. :-)

------
sklogic
I tried "amplifying" C with Lisp-like syntax macros. Looks like there is a lot
of potential on this route:

[https://github.com/combinatorylogic/clike](https://github.com/combinatorylogic/clike)

