
Porting the Go compiler from C to Go - sqs
http://gophercon.sourcegraph.com/post/83820197495/russ-cox-porting-the-go-compiler-from-c-to-go
======
beliu
Author of the post here. Happy to answer any questions I can, and FYI, we
(Sourcegraph) are liveblogging all of GopherCon at
[http://gophercon.sourcegraph.com](http://gophercon.sourcegraph.com). Let us
know if you have any questions or find it useful!

~~~
nfoz
Are you at all concerned about the possibility of the Trusting Trust[1]
problem manifesting in any of your Go compilers?

[1] [http://cm.bell-labs.com/who/ken/trust.html](http://cm.bell-
labs.com/who/ken/trust.html)

~~~
0xdeadbeefbabe
Since this is an exercise divorced from reality, the usual vehicle was
FORTRAN.

Dang it I cheated and looked up Rus's quine. For anyone not wanting to cheat a
good hint would be "tail recursion"

------
Shamanmuni
It's great that they are aiming for an automated conversion from C to Go. It's
clear they aspire to convert their code which was written in a certain way.
But I think it would be a huge boost in Go usage if they could eventually aim
to transpile any C code into Go code.

A little dream of mine would be if in the future when Rust is stable Mozilla
developed a transpiler from C++ to Rust. That would be brilliant.

By the way, all the other talks at GopherCon seem pretty interesting, I hope
someone uploads videos of them soon.

~~~
jerf
It is unclear how one would compile arbitrary C code into _useful_ Go. The
stereotyped conventions of well-written compiler code allows for a more
idiomatic translation than a general translator could ever aspire to.

C++ to Rust would be even crazier.

(Transpile is a silly word. It's "compile". Compiling already trans-es.)

~~~
Shamanmuni
I know it's crazy difficult to do it, that's why I was talking about dreams.

But maybe a set of guidelines about translatable C code plus a translator
which is good enough for a variety of cases would make refactoring the
resulting code such a manageable task that many projects could consider
switching to Go directly.

You are technically correct in that compile already implicates a translation,
but we usually use that term refering to a translation into a lower level
language and not another language at the same level. You can say transpiler or
source-to-source compiler for those cases and I think it's clearer and more
accurate for the reader.

~~~
yohanatan
> ... but we usually use that term refering to a translation into a lower
> level language and not another language at the same level.

Rust is not lower-level than C++ and neither is Go lower-level than C.

~~~
yohanatan
This should not have been down-voted. We are talking about translation from C
-> Go and C++ -> Rust (which are both in the _opposite_ direction of the
definition given for 'transpiling').

~~~
vanderZwan
I agree that your comment added value to the discussion, and gave a counter-
vote.

However, arguably the term could make _more_ sense there, _if_ we ignore the
earlier definition and instead assume "trans" is short for "transcendent", as
in "climbing to a _higher_ level".

If we then simplify that to "compile from one language to another of roughly
equal or higher level", it becomes a useful word to indicate a specific subset
of ways one can do compilation.

Of course, I'm completely pulling this out of my ass and I might piss off
actual CompSci people who use very specific and strict definitions in their
papers (kind of like how some get annoyed with the procedure/function mixup),
so take this with a grain of salt.

------
stcredzero
Automated rewrite FTW! This can help you avoid freezing a project while it is
being ported. Also, if you have a code base with its own idioms, then those
idioms can be matched and translated, which can produce cleaner target code.

~~~
adient
Go 1.0 spec is already frozen and will not change any time soon.

~~~
szabba
I believe he referred mostyl to freezing the efforts on impoving the
implementation.

~~~
stcredzero
I was also talking about projects and porting in general. Incidentally, I've
been paid to do exactly what we're discussing. (Porting a production program
using automated translation.) It works.

------
SixSigma
Russ' paper on the subject from Dec 2013

[https://docs.google.com/document/d/1P3BLR31VA8cvLJLfMibSuTdw...](https://docs.google.com/document/d/1P3BLR31VA8cvLJLfMibSuTdwTuF7WWLux71CYD0eeD8/preview?sle=true&pli=1#)

------
micro_cam
I'm reminded of the fortran to c compiler f2c that was used to produce pure c
implementations of lots of libraries like LAPACK.

I'm curious how they will handle things like pointer arithmetic and memory
safety in C vs Go. If they mange to do so in a performant way I could see
translating lots of numerical or computationally intensive code to Go so that
it could be run in a shared cloud environment without worries about memory
safety and without having to resort to vms for separation.

~~~
mjcohen
If you took arbitrary fortran, especially with i/o, the results were an
unreadable mess - but they compiled and ran.

For a project I had (in the early 90s iirc), I had to extensively modify the
fortran and make multiple versions before the generated C was readable. Still,
much easier than a rewrite.

------
thinkpad20
> There are currently 1032 goto statements in the Go compiler jumping to 241
> labels.

Wow, that's really striking. I know that goto statements have their uses but
for something written in the last couple of years to have over a thousand of
them is very surprising (it might not be surprising at all for those who write
C code all the time). I guess they're mostly just for error handlers?

~~~
jgrahamc
For example:
[http://golang.org/src/cmd/6g/gsubr.c](http://golang.org/src/cmd/6g/gsubr.c)

Some are for single error return, but others are just to make a nice structure
for the code.

~~~
raverbashing
Ah the old C style of function types in a different line.

This smells of very old code (as the header testifies)

~~~
rootbear
I use that style all the time. I like having the function name start in column
1, it can make searching easier.

I do wish that in C1x (for x > 1), they could find a way to let us declare
multiple formal arguments of the same type without repeating the type:

    
    
        float
        my_graphics_hack(float x, y, z, r, g, b, u, v) { ...
    

instead of

    
    
        float my_graphics_hack(float x, float y, float z, float r, float g, float b, float u, float v) { ...
    

which gets really tedious and obfuscates that fact that they are all the same
type, especially with the typename is more complex than just "float". When the
new syntax was added to ANSI C, it wasn't possible to do multiple variables,
for reasons I've forgotten but which I think had to do with forward type
references. It would be awfully nice to find a fix for this.

~~~
simias
That's a poor justification, there are plenty of tools to index and search
C(++) code like ctags or cscope, and your example could use a vector or a
struct, I have a hard time imagining when a function with such a prototype
would be useful in real life.

~~~
clarry
Compound literals are not too pretty and not too well known. Plus, if you care
about c89... So, do you wrap all your values in a struct because the prototype
gets too long otherwise? What a chore.

Preferring that function names start at the first column to make searching
easier is a perfectly good justification.

You, on the other hand, seemingly would impose your tools (which have their
shortcomings) and workflow on people with no justification at all.

Just because a tool exists, doesn't mean it's good (or better than what people
are accustomed to) or that everyone must use it.

Do you think everyone should use Vim on Linux too?

~~~
simias
No, I think everyone should use Emacs on FreeBSD, but that's not my point.

Adding syntactic sugar to a language makes the language bigger and harder to
completely understand, it makes it easier to misuse a feature (and C already
has a lot of trickery with types, like when you declare a parameter as an
array but it behaves like a pointer). I'm a strong believer that implicit is
better than explicit and that while there are many ways to do the same thing
some are better than other in practice. Of course, "in practice" can change
from one project to the other, what matters in consistency. I applaud the
choice of Go to standardize on a single coding style for instance.

For the particular example of the parent, while I write a lot of C code for
work and for fun I have yet to encounter a situation where I saw a function
taking 10 floats as arguments and thinking "yup, that's completely the right
way to do that". If you have an example of such a code I'd me more that
willing to reconsider my position, otherwise we're just talking about the best
way to tame a unicorn.

------
hrjet
This might become a nice benchmark for the Go language; the same code base
implemented in C and Go! It may not be fine-tuned for optimisation, neither in
C nor in Go, but may still give a good ball-park estimate.

~~~
lazyjones
Indeed, but it would be much more useful and impressive if they rewrote the
compiler in Go by hand. I'm disappointed that they are aiming for an automatic
translation instead - some people are going to ask themselves whether Go
actually isn't that much fun to program in. An independent reimplementation is
better for correctness too, they can compare outputs and find bugs in both
implementations instead of porting over old bugs and adding new bugs where the
translation goes wrong.

------
AYBABTME
I'm really thankful for the liveblogging, as I couldn't manage to get my body
to the conference.

I understand the desire to promote the Sourcegraph app by doing the blogging,
and I think its effective. However, the blog is real annoying to browse, as
every (prominent?) link points to Sourcegraph the app instead of the blog.

~~~
AYBABTME
Just to clarify, because I think my original comment is disbalanced.

I'm REALLY thankful for sourcegraph's liveblogging. The above comment was a
suggestion as I thought they might want to know that (at least for me), the
navigation of the blog was confusing.

~~~
sqs
Thanks! We are having a fun time liveblogging and are glad you're finding it
useful. We got complaints when the blog image did NOT link to Sourcegraph,
too. :) It's almost the end of this conference, but next time we liveblog,
we'll have 2 separate header images, or try something else to make it less
confusing.

------
kristianp
"They’re deciding to automatically convert the Go compiler written in C to Go,
because writing from scratch would be too much hassle."

When transcribing a talk, there isn't any need to write "They're". Just use
the same pronoun the presenter used, otherwise it stands out like a sore
thumb.

------
pohl
_3) Go has turned out to be a nice general purpose language and the compiler
won’t be an outsize influence on the language design._

In what sort of ways does self-hosting early influence a language design? Were
they hoping to avoid something in particular by delaying self-hosting?

~~~
gizmo686
The general way that self hosting influences language design is that the
compiler is often one of the first major projects to be built using a
language. This does not give it more influence then other major projects, but
if your goal is to have a language designed for use case X, it is generally
best to have your early projects with it be for X. Additionally, self hosting
may encourage a language design that makes bootstrapping easier (such as a
stricter divide between the state-1 language and the general language).

~~~
pohl
I sort of had the general sense of that already. I guess I was hoping for
something more specific about what language features are so useful when
writing a compiler but get in the way for general problems.

The part I quoted above almost sounds like wiping sweat from one's brow after
having dodged a bullet: "phew, the language is safe from influence by those
gull-durn compiler-writers..."

------
rdc12
"Note: There’s a book written about converting goto code to code without goto
in general, but this is a sledgehammer and not necessary here."

Anyone have any idea what the title of that book is?

------
ANTSANTS

      >A Union is like a struct, but you’re only supposed to use one value
      >(they all occupy the same space in memory). It’s up to the programmer to know which variable to use.
      >  There’s a joke in some of the original C code:
      >      #define struct union /* Great space saver */
      >  This inspired a solution:
      >      #define union struct /* keeps code correct, just wastes some space */
    

Somewhere in Scotland, a sum type sheds a single tear.

~~~
piokuc

      >      #define union struct /* keeps code correct, just wastes some space */
    

Not always, though...

~~~
rsc
Yes, always. And if you don't believe me, it's not my trick. I learned it from
Dennis Ritchie (he was thinking about a C to Limbo converter).

~~~
andybalholm
It works when unions are used correctly. If they are used to deliberately
subvert the type system (e.g., put in an int, get an array of char out) it
doesn't work. But C has so many other ways to subvert the type system that
there's no need to do that.

Probably about the only place you'll see them used like that is in the code
produced by web2c in compiling TeX. Knuth used variant records a lot to get
around Pascal's type safety, and they get translated to unions.

~~~
rsc
That code is not valid according to the C standard, so there is no guarantee
it will work anywhere. In particular, many modern compilers have optimizations
that would break that code. I would be a little surprised if modern web2c
still uses unions this way and gets away with it.

The only standard compliant way to, say, convert a float to an int is to use
memmove:

    
    
      uint32 i;
      float32 f;
      i = 0x80000000;
      memmove(&f, &i, 4);

~~~
bzbarsky
Which part of the C standard would forbid type punning a uint32 to a float32?
[http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_283.htm](http://www.open-
std.org/jtc1/sc22/wg14/www/docs/dr_283.htm) suggests that C89 explicitly
allowed this, C99 at first glance did not, and was errata'ed to make it clear
that this is still allowed. C11 seems to have the same verbiage.

~~~
Someone
Devil's advocate example:

    
    
      union hack { int x; float y};
    
      int foo( int *i, float *f)
      {
         *i = 3;
         *f = 42.0;
         return *i;
      }
    

Aliasing rules say foo need not read from that int pointer, may switch the
order of the write to f and the write to i, and can assume that foo returns 3,
so

    
    
      struct hack h;
    
      foo( &h.i, &h.f);
    

might return 3 or something else. I think that last call introduces undefined
behavior, but only becuase of the definition of foo that the writer of that
call might not even have the source for.

But of course, that is an "you shouldn't do that" edge case. One could also
claim that the corrigendum doesn't apply because foo doesn't "use a member to
access the contents of a union".

~~~
bzbarsky
Sure, but that's an aliasing issue, not a type punning issue. I agree that the
code you cite there is a violation of the aliasing rules, and will not work
"as expected" on modern compilers unless one does the equivalent of gcc's
-fno-strict-aliasing.

And I agree that the corrigendum doesn't apply in this case. Once you hand
different-typed pointers to the same memory to people, whether via union or
just casting pointers, the aliasing rules will up and bite you.

