
Generating C code that people want to use - bibyte
http://jonathan.protzenko.fr/2019/01/04/behind-the-scenes.html
======
stcredzero
_We sure expected that generating idiomatic C code was important, and this
informed a lot of early design choices in our toolchain. We were surprised,
however, by how closely Mozilla reviewed and manually inspected the generated
C code._

Yes, yes, yes! Just because one is automatically generating/translating code,
that doesn't mean _it can 't be pretty!_ When automatically translating code,
the matching engine needs to be done with the full syntactic expressiveness of
the source language, and what is matched and translated need to be idioms in
each language! (As opposed to fine-grained syntactic elements. When the
translation is done below the level of idioms, what results is non-idiomatic.
It sounds pretty obvious when put like that.)

~~~
cryptonector
When compiling one language to another, or to assembly, or just straight to
object code, the most important things are that a) you produce _interfaces_
(APIs / ABIs) that are easy to use, b) you generate good code.

No one is going to demand that GHC generate readable assembly. Why should they
demand that GHC generate readable C if it were generating C instead of
assembly?

~~~
DougBTX
It probably depends how likely the consumer is to look at the code. In the
article, they’re expecting code review of copy-pasted C code produced by their
system, so the bar is higher than generated assembly that almost no one will
ever look at.

TypeScript is an interesting example, the generated JS is pretty close to the
original TS, largely just with types removed, so for someone with lots of JS
experience it is easy to get confidence in the compiler as the output is
pretty close to what it would have been in the first place.

~~~
mirekrusin
Not exactly the same things, ts doesn’t generate anything really, coffescript
or better reasonml compiled to js would be better comparison. Ts/flow design
goal was specifically that if you replace type annotations with white spaces
it is precisely normal js. There are no static or dynamic/runtime transforms.

------
pfdietz
There's a social/business issue here as well.

Suppose you're selling a code generator. The customer using that code
generator is going to ask "what happens if you disappear? How do we maintain
our code?" This drives them to demand that the code you generate being able to
be continue to be maintained on its own, even if the code generator rots and
dies. And that means it must be readable.

I've seen a case where a company has been stuck with generated code, and not
only did they not have the code generator, they didn't even have the
documentation for the code generator (nor anyone who ever used it). The
company that sold the code generator had died many years earlier.

This same consideration doesn't apply to compilers, because you can buy
compilers from many vendors, as well as get well-supported free ones.

~~~
blattimwind
> This same consideration doesn't apply to compilers, because you can buy
> compilers from many vendors, as well as get well-supported free ones.

This consideration absolutely applies to compilers for various applications
(embedded, exotic platforms etc.) and also if you are relying on
implementation-defined behaviour and extensions.

------
AceJohnny2
> _I wrote the pretty-printer for our compiler, KReMLin, looking at the
> reference table of operator precedence in C, resulting in a minimal amount
> of parentheses being inserted; I happily thought I did the optimal thing,
> until it turned out that no one can remember the relative precedence of +
> and <<, or | and - – I have myself since then forgotten, and I’ve heard that
> it even differs across languages..._

When I was a young(er) and naive(r) C programmer, I had printed out this
precedence table for occasional reference while coding. I was rather proud
that I knew of such subtle corner cases!

I've since learned and don't need to use that table anymore... because I just
use parentheses to avoid ambiguity. Don't rely on subtle behavior in your
code, folks. As the Python Mantra says: explicit is better than implicit.

~~~
pjmlp
Same here, I got to make my C code easier to follow and safer, by following a
couple of rules, being explicit and not doing operator tricks of how many side
effects one can cram into one line was one of them.

------
fizixer
Readable C code generation is the future. Read my previous comments and you'll
find I've been advocating this for many years. I mainly advocated Python/C
two-language programming, but using F* instead of Python (or Scheme) is a
secondary issue (though an important one in terms of static vs dynamic
typing).

~~~
pjmlp
A future whose Eiffel was one of the first languages to take.

------
bitwize
One of the reasons why Pre-Scheme was so lovely to use is because when it
transpiled to C, the C was highly readable. You could follow the output of the
Pre-Scheme compiler and know exactly what it was doing, provided you knew
idiomatic C.

From a readability standpoint, the C output of Gambit and Chicken is a hot
mess in comparison.

~~~
rurban
I came to the same conclusions with my own compiler from perl to C. Lot of
work went into generating perfectly readable and formatted C code, with all
the whistles and ifdef's for debugging, tracing, Config and architecture
optimizations. The good thing is that it's much easier in the compiler to do
than in the resulting C code. We generate a lot of C code now, automatically,
for perfect hash tables, optimized unicode tables, generating API's and
exhaustive test cases and much more. It must be pretty and usable, gambit-c is
a good example for throwaway code nobody wants to debug or read through.

~~~
bitwize
Indeed. If you want to support full Scheme semantics, with continuations and
all the crazy control structures, you really need to contort the C language
into a form that's pretty hard to grok. That's the sacrifice Gambit and
Chicken made, and it's probably a worthy one given how excellent and complete
those compilers are.

Pre-Scheme is intended for a different use case -- when you want the level of
fine-grained control that C gives you but don't want to leave Lisp behind. Its
semantics are accordingly quite different from Scheme's and it doesn't support
the full set of Scheme control structures, nor implicit garbage collection.

------
StreamBright
>>> Going to C is what allows people to use our code without having to buy
into exotic, strange languages with lambdas.

At this stage is there any software engineer who considers a language with
lambdas strange or this is a joke from the author?

~~~
grive
Given their field, it is safe to assume this is a joke.

------
pornel
It's funny/depressing that the only bits of C which were written by hand, not
converted from another language, had both memory corruption and undefined
behavior.

------
phkahler
>> Alas, this extra precision was not appreciated by reviewers, who very
explicitly requested that variables named uu__123456 be eliminated whenever
possible.

It's been a while but I used to use Simulink to generate C code and they had
the same naming problems. It was hideous. They also solved the C99 types issue
independently and generated their own target dependent header files that
defined INT16_T in a target dependent way. I asked a couple guys from that
company to please just implement a C99 target which is more portable - best if
it actually produced code using the C99 types instead of renaming them.

So many issues with auto-generated code, but sometimes it's really useful.

------
nullc
I found it a little surprising that they initially thought using recursion
would be acceptable for this sort of application. Esp since in the Mozilla
codebase many threads run with quite small stacks.

------
bleair
Another benefit of generating good output is that someone can observe the
input and the resulting constructed idioms and learn from the example

------
marktangotango
_Mozilla adopted this and runs a Docker build command in their own CI, which
errors out if someone tries to modify the generated code instead of fixing the
F_ * _source file_

Ok, so all the talk about generating idiomatic C went for nothing in the end?

~~~
minitech
It still has to be readable.

------
Yuval_Halevi
Sound like mission impossible

------
carapace
(I try to avoid low-content comments here, but I'm LMAO at the fullscreen
close-up of sir's forehead. Awesome article and project though!)

~~~
tobyhinloopen
At least it's responsive :D

Putting the important content on top: TODO.

~~~
carapace
Derp: I revisited the page today and the graphic is a tidy part of the header
now. It's still 2,838px × 2,789px though... I dunno if it was my browser or
what; I was seeing the picture scaled to the width of the window which
resulted in full-screen forehead. Scrolling up was like a Terry Gilliam
cartoon.

