

How Go uses Go to build itself - spahl
http://dave.cheney.net/2013/06/04/how-go-uses-go-to-build-itself

======
acqq
The last time I've looked there were a lot of C code needed to be compiled
with C to build Go, and definitely it was not for the bootstrapping purposes
as the part of standard libraries were written in C. Is it still the case?

~~~
andrewreds
Most of the standard library is written in go.

a large part of the runtime package (within the standard library) is written
in c and asm. This c code compiled with one of the 5c, 6c or 8c compilers.

gcc is only used for bootstrapping (I believe... tho I am still trying to get
my head around what happens).

------
DavidWanjiru
I'm the naive one here, but is "Go using Go to build itself" much like what
Paul Graham talks about re LISP?

~~~
pjmlp
Most languages can be used to built themselves, in a process known as compiler
bootstrap.

The main reason why some compiler developers don't do it is mostly a question
of using existing tooling instead of redoing all required stuff in the new
language.

Additionally porting to new platforms by bootstrapping usually requires cross-
compilation, which for some developers might not be worth it, depending on the
target audience of the language.

~~~
ralph
The Go developers have said they deliberately didn't try and write Go in Go
because they've done that with other languages, possibly Alef, I forget, and a
bug can arise where the fix is obvious if it wasn't that the bug exists and
would be triggered by the fix. Instead, a more awkward fix has to be figured
out that doesn't trigger the bug. Of course, once the bug is fixed the
original straightforward fix can be substituted but it's all unwanted hassle.

~~~
gizmo686
How often does this come up? Under normal circumstances, wouldn't you be able
to revert to a version of the compiler from before the bug was introduced? If
your still in initial development, then you have the old, foreign, compiler to
fall back on.

~~~
ralph
Seemingly often enough that it deterred them with Go. No, you may not be able
to revert to before the bug was introduced as it may be the bug was there ever
since that feature was added. As soon as it is self-hosted, the old compiler
becomes quickly irrelevant, i.e. the code rapidly diverts from what it can
compile.

~~~
pjmlp
That is why bootstraping done properly is always done in stages.

You have a compiler that can only compile a specific subset and use that
subset to write the real compiler. There are endless book examples how to do
it.

Given who Go designers are, I think they don't have any issue keeping the C
code around.

~~~
ralph
That's not how the books say to do it, and you're right, given who created Go
you'd think they'd know this stuff. :-)

Many generations of the compiler are created. Let's say the compiler-in-C is
worked on until it compiles subset Gosub1 which is just enough to write
compiler-in-Gosub1 that duplicates compiler-in-C's behaviour. From now,
compiler-in-C atrophies. G-2 features are implemented in G-1's compiler,
though nothing uses them yet. The compiler's source then uses these, making it
G-2 source, only compilable by a G-2-grokking compiler.

Weeks later we have a G-40 where a bug is discovered, introduced in G-20. It
wasn't in the compiler-in-C so that's not useful. Choices include fixing it at
`head', which can sometimes be awkward as described earlier, or fixing the
initial G-20 implementation and then rolling forward all changes from there
assuming the fix doesn't break code that was depending on the errant
behaviour.

~~~
pjmlp
> That's not how the books say to do it, and you're right, given who created
> Go you'd think they'd know this stuff. :-)

Given that compiler design was one of my three main focus on my CS degree, I
read a few books along the way. :)

> Many generations of the compiler are created. Let's say the compiler-in-C is
> worked on until it compiles subset Gosub1 which is just enough to write
> compiler-in-Gosub1 that duplicates compiler-in-C's behaviour. From now,
> compiler-in-C atrophies. G-2 features are implemented in G-1's compiler,
> though nothing uses them yet. The compiler's source then uses these, making
> it G-2 source, only compilable by a G-2-grokking compiler.

It is not required to do this so fine grained.

The first version of the primitive language can already be good enough to
offer the minimal set of features to compile itself.

Afterwards the full language compiler gets implemented in this minimal version
and used for everything else.

There aren't thousand versions of the compiler, you just need to be
restrictive of what is used in the base compiler.

> Weeks later we have a G-40 where a bug is discovered, introduced in G-20. It
> wasn't in the compiler-in-C so that's not useful. Choices include fixing it
> at `head', which can sometimes be awkward as described earlier, or fixing
> the initial G-20 implementation and then rolling forward all changes from
> there assuming the fix doesn't break code that was depending on the errant
> behaviour.

As I explained this is not required because you only have G-2 as starting
point, which is able to compile whatever is the current version of the
language.

Additionally you get the benefit to eat your own dog food and as compiler
designer check if you are doing the right design decisions on how the language
works.

~~~
ralph
Sorry, I don't understand. You seem to be saying there's only two versions of
the compiler, one written in a foreign language, e.g. C, the other in a
subset, e.g. Gosub, called G-2. But then there's "whatever is the current
version of the language", which suggests to me incremental improvements, e.g.
the language develops as experience is gained rather than being fully planned
on day one. So doesn't G-2 undergo changes to implement these? You may keep
calling it G-2 but there are many (I never said thousands) of versions of it.

~~~
pjmlp
Lets use Go as an example.

Now that Go 1.0 release exists and is stable. One could write a Go compiler
using Go 1.0.

Eventually the compiler will reach a state that it can fully compile Go 1.0.

Now replace the C implementation of Go 1.0 by this new compiler and use it to
write Go X.Y using only Go 1.0 features.

When the need to target a new OS or CPU arises, add a new backend that
generates code for the desired target system in the Go 1.0 compiler.

Use the cross-compiler to compile itself with the new backend. Copy the binary
to the new system, now use the Go 1.0 compiler to compile the Go X.Y version,
whatever X and Y are.

You don't need to use multiple versions of the language and by keeping the
feature set of base compiler small, it makes it easier to write cross-
compilers.

~~~
ralph
This is flawed AFAICS. It assumes that because 1.0 is fixed in specification
that there are no bugs in the implementation. To return to my original point,
these smart guys are on the record stating that's why they didn't do a self-
hosting compiler; good enough for me. :-)

~~~
pjmlp
How is this different than having bugs in the C compiler used for the language
implementation?

~~~
ralph
Yes, because of the stableness of the C compiler, bigger test audience, etc.,
compared to a new language under rapid development.

~~~
pjmlp
Except that is a false assumption.

Have you ever done multiplaform C development across using OS vendor specific
C compilers?

There are lots of nice bugs to be found, just check the available bug
databases of any C compiler.

So this does not make it any better.

~~~
ralph
Yes, C across AIX, Suns, Silicon Graphics, whatever those HP ones were, and
others. Platform differences were common, bugs rare because many had been
before me and they could always be worked around; I didn't have to fix a C
compiler. When writing a compiler the aim _is_ to fix the compiler.

This isn't getting us anywhere. We disagree. I value the opinion of that lot
given their many decades of experience. I used to have your opinion, based on
textbooks. They've made a good point, one I can see has considered thought
behind it.

~~~
pjmlp
> I used to have your opinion, based on textbooks.

I do have compiler development experience, but alas as you say this is not
getting us anywhere.

