
Compilers in OpenBSD - hebz0rl
http://marc.info/?l=openbsd-misc&m=137530560232232&w=2
======
Ologn
A decent picture of the history is given in the OP:

* "gcc 2.5 (at the time) had a few bugs, but not many"

* "schism between gcc 2.8, conservative...and the ``Pentium gcc'' group...[Pentium gcc group was] stretching the optimizer code beyond its limits"

* "These projects eventually merged as gcc 2.95...gcc [now] had bugs"

But what does this historical lesson tell us?

Stallman was conservative, slow-moving and cathedral-like with 2.8. This
approach helped keep bugs out of the code.

The "pentium gcc" (Cygnus/egcs) group was quickly responding to marketplace
needs. It was more bazaar-like. It committed code more freely than Gnu would -
and the code while allowing for new functions was not always that well
architected.

So what was the deal with this schism and then subsequent merger which
happened? What happened was egcs (the Cygnus-oriented one, the "pentium gcc"
one) began eclipsing gcc. Toward the end it _really_ began eclipsing gcc. It
was not as solid as gcc, but it had all the new functionality people wanted.
All over, people were seriously about to abandon gcc and go with egcs. At this
point Stallman threw up his hands and accepted that the egcs approach had won.
They merged, and gcc became more liberal about what code it would commit, at
the expense of being a solid code base. Just like the OP says.

So now what is different this time around? Why is a compiler which prioritizes
stability and correctness over new functionality and optimizations going to
win? The latter approach won last time around, why should the first approach
win this time? Especially since in battles between cathedral/waterfall
projects and bazaar/agile projects, the bazaar/agile approach seems to come
out on top again and again. OpenBSD can afford to go this route if it wants
because OpenBSD fills a marginal niche. It might even be interesting to watch
OpenBSD go down this road. But for more mainstream OS's like Linux, this
approach might not be possible.

And if anyone mentions Apple - Apple is not marginal, but it is a niche. GCC
and Linux are in a multitude of environments. A company like Apple with its
own ecosystem and only a handful of targets can afford to pick and choose its
compiler.

~~~
fusiongyro
Both you and Chuck see this situation as naturally arising out of some aspect
of open source software. I see this as being simply a side-effect of
particular compiler developers enjoying optimization more than portability.
Couldn't it just be priorities on these teams that led them to this situation,
without it being some kind of indirect result of some over-arching
political/economic phenomenon?

~~~
asabjorn
I agree and as an academic I can say that optimizations are sure sell for an
academic paper, and I think most new optimizations are first developed in
academic work in industry or at universities. Portability on the other end
seems like a hard sell and if it sounds too much like engineering it is
academic suicide.

------
ChuckMcM
Ok, that is a pretty compelling argument in favor of non-open source software.

I read it to say that GCC is so open source that it cannot converge on a
stable release. Further there isn't a non-commercial (aka free) incentive for
making it stable, so it doesn't converge. Rather it trundles along from new
optimization strategy to the next constantly in a state of minor bugginess.
The economics of 'sold' products uses the loss of revenue as the incentive to
maintain quality, without that incentive its hard.

Google has (had?) a pretty good sized team that did nothing but maintain GCC.
I'm sure it cost them easily $1M/year to keep that team going. There is no
incentive for them to fund a team like that in a third party such that
everyone else benefits from their work. Sure they offer the changes back into
the base product, and somewhere else there is another team working for company
Y that is taking those, porting them into their effort. In this article from
Marc he mentions himself and 5 other developers who are the "compiler people".
5 developers, $120K each, that is .6M/year before you add insurance and office
space.

And those 5 have their lives made more difficult by the dozen or so folks who
are committing in changes that destabilize parts of the code or require side
ports.

It makes me wonder how many people there are like me who would be willing to
pay $100/year for a bespoke C compiler that was supported by a single source
and stable.

~~~
foobarbazqux
> It makes me wonder how many people there are like me who would be willing to
> pay $100/year for a bespoke C compiler that was supported by a single source
> and stable.

Did you look at ICC / XLC / MSVC? They typically outperform GCC by about 20%,
although I haven't checked in a while.

~~~
neckbeard
You should probably check again, but in any case MSVC shouldn't be compared to
gcc/icc/clang - it can't even compile code written for a 14 year old language
specification (C99), so no sane person should use it for C development these
days.

~~~
TwoBit
FWIW, Microsoft hasn't made an effort to support much of C99 because almost
none of their users (Windows and XBox developers) use C. I don't know any
Windows or XBox programmers who uses C.

Your point is still valid: if you want to compile C99 code, MSVC is not even
an option.

~~~
foobarbazqux
It depends on the features you use. If all you want is variadic macros, long
long, __FUNCTION__, and stdint.h, it has those things.

If you don't care about sticking to C, you can usually get what you want with
a C++ feature anyway.

[http://stackoverflow.com/questions/3879636/what-can-be-
done-...](http://stackoverflow.com/questions/3879636/what-can-be-done-in-c-
but-not-c/3880281#3880281)

------
justincormack
For an example of a recent bug try this
[http://gcc.gnu.org/bugzilla//show_bug.cgi?id=56888](http://gcc.gnu.org/bugzilla//show_bug.cgi?id=56888)
basically the compiler tries to recognise code that manually does memcpy or
memcmp and replaces it with the built in version. Even if you are trying to
compile libc where you get an infinite loop when this happens.

~~~
jff
My colleagues and I often lament how gcc is just so much smarter than all of
us. Usually after spending a day figuring out why our code wasn't working,
only to discover that gcc was doing something "clever".

~~~
gngeal
This one sounds almost like peephole optimization of sorts. How exactly is
that "clever"?

~~~
duaneb
I assumed it was the ironic sense—it was intended to be clever but just ended
up being a bug.

------
bch
No mention of pcc[1] except note that it was orphaned in the early 90s. I see
here[2] it's been removed from OpenBSDs base system. What happened to pcc? I
had high hopes for it. Is there no chance it'd become a viable compiler?
(fwiw, it's still included in NetBSD base system).

    
    
      [1] http://pcc.ludd.ltu.se/
    
      [2] http://comments.gmane.org/gmane.os.openbsd.misc/196817

~~~
justincormack
But it can't even compile NetBSD
[http://blog.netbsd.org/tnf/entry/portable_c_compiler](http://blog.netbsd.org/tnf/entry/portable_c_compiler)
which is disappointing (its much easier than Linux).

------
oofabz
>The last de-facto LTS compiler we have had was gcc 2.7.2.1

The new de-facto LTS compiler is gcc 4.2.1, the last version released under
GPLv2. After gcc switched to GPLv3, Apple and FreeBSD stayed on 4.2.1.

~~~
zdw
I'm not sure all of OpenBSD's platforms have code generation support in 4.2.1
- for example, the m88k issue given in the article.

But that is definitely a good place to stop/start (depending on how you look
at it)

------
mjs
"...but there is something I wish would happen first.

An LTS release of an open source compiler."

Surprising that this doesn't already exist--Apple and RedHat and Ubuntu, etc.
must all maintain what is in effect a LTS version of the compilers they ship,
in the same way that OpenBSD does.

~~~
yassim
Forgive my ignorance, but what does 'LTS' stand for?

~~~
DomreiRoam
LTS: Long Term Support, it is a version that the editor/community will support
for a long time. Here you find the policy for Ubuntu:
[https://wiki.ubuntu.com/LTS](https://wiki.ubuntu.com/LTS) .

------
ternaryoperator
I'm curious why compiler testing appears to be so hard. It seems to me that:
Given this C input, this AST should be built, and on this platform, this code
should be generated. This should be testable through automated scripts and
numerous test cases could be created for new features resulting in huge
regression suites. Am I missing something, or is it the effort of putting
together such tests that's the problem?

~~~
noselasd
gcc does have a _huge_ test suite.

The problem is if you combine all the various flags that affect the compiler,
across all the architectures, across all the platforms, in all its variants
(cross compiler, native, the many handful of libc and barebones variants)
you're looking at too many tests to run no matter how huge an infrastructure
you have to run it.

Another problem is that optimization depends a lot of context, given the
amount(basically infinity) of C code that could surround any other piece of C
code and affect the result - it's quite a hard task.

One interesting approach is csmith
,[http://embed.cs.utah.edu/csmith/](http://embed.cs.utah.edu/csmith/), that
generates random C programs and look for bugs.

~~~
scott_s
Their PLDI paper, "Finding and Understanding Bugs in C Compilers", is an
amazing read:
[http://www.cs.utah.edu/~regehr/papers/pldi11-preprint.pdf](http://www.cs.utah.edu/~regehr/papers/pldi11-preprint.pdf)

------
artagnon
The only software that isn't buggy is dead software with no users. All
software has bugs, and that is a fact of life.

Clearly OpenBSD developers haven't been involved with the compiler engineering
communities, and their wishes have been neglected over many years; this is not
news. Why? Because there are no _users_. Bugs don't get fixed by bitching
about them: they get fixed when you get involved with upstream and write
patches.

GCC can be as "conservative" or "cathedral-like" as it wants: if it does not
produce sufficiently optimized code for the big users, it will be thrown out
the window. Today, GCC is in active development and has more users than anyone
else. The leadership is strong: RMS and many of the GNU people are heavily
involved. Those are the facts.

LLVM is the other elephant in the room: from my experience posting patches on
their list, they don't give a shit about getting llvm/clang to build
linux.git, or many of the projects that currently use GCC. Although it might
be technically superior (the code is much more readable and maintainable), the
community is much too narrow. Moreover, the leadership is gone: most of the
top contributors (Chris Lattner, Evan Cheng, Reid Spencer) seem to have lost
interest in the project.

The proprietary compilers like ICC are mostly useful just in research. Sure,
they produce highly optimized code, but they're black boxes that cannot be
studied or tinkered with. I tried compiling git.git with ICC a few years ago
out of curiosity [1]: pages and pages of totally pointless warnings; gcc and
clang both clean-compiled git.git at that point.

What the community needs is a compiler project with a strong leadership that
cares deeply about its users, not a dead "LTS" project that nobody else gives
a shit about: nobody wants to work on a project that's in maintenance-mode.
Hardware, programming languages, and compilers evolve constantly, and
programmers must learn to cope with these changes.

Fwiw, I'd really like to see what "bugs" this guy is talking about. If they
really don't care about hardware, programming language, and compiler
technology advancement, why don't they just maintain a port of an older
version of GCC? Why bother with new versions at all?

[1]:
[https://gist.github.com/anonymous/1367335](https://gist.github.com/anonymous/1367335)

~~~
mansr
> LLVM [...] don't give a shit about getting llvm/clang to build linux.git.

The LLVMLinux project is making good progress towards this goal.
[http://llvm.linuxfoundation.org/](http://llvm.linuxfoundation.org/)

------
sarnowski
Refreshing to know that even settled projects have those same problems at
their core. Depending on upstream developers is always a big risk you have to
calculate well. Was always a pain and only gets bigger in our agile *aaS
world.

~~~
tenfingers
Snarky comment (be prepared), but I read *aaS as Shit as a Service the first
time around.

------
Nursie
The g++ 4.7 and or libstc++ that goes with it from their pkg-add repo is
b0rked at the moment. Exceptions don't work.

------
teddyh
If people switch from GCC to Clang/LLVM in enough numbers that Apple think
they can get away with it, Apple will, in a heartbeat, close the development
of Clang/LLVM and make all new versions proprietary.

~~~
axman6
You understand Apple aren't the only developers of Clang right? It's their
project, but there's nothing stopping anyone else (say Google, which are also
huge contributors) from forking their own version of Apple decide to do
something that would surely hurt them more than help them.

------
pdw
It seems like they might like CompCert. An optimizing C compiler that comes
with a formal proof of correctness. Downsides: GPL, doesn't quite implement
the full C language (but it's getting closer), and no support for ancient crap
such as m88k.

EDIT: I misunderstood the license file, the majority of the code is non-
commercial-use only, only a small part is dual-licensed. Still a cool project,
even if it's not open source...

~~~
pascal_cuoq
The OP's problem is with the support for the exotic targets and the more
recent additions to the C standard (C99, soon C11). CompCert targets only x86
and PowerPC, and does not support all the features of C99, not to mention the
GCC extensions that everyone has come to rely on. In fact, in terms of these
criteria it is uniformly worse than GCC 2.7.2.1 (although it is better at
having verified, bug-free semantics).

~~~
pdw
That was only his second complaint.

    
    
      "First, compilers are fragile. While one would like to expect a minimum
      level of correctness and trustworthiness from a modern compiler, we
      can't, regardless of the compiler we use."
    

CompCert (had it been truly open source) would have provided that
trustworthiness. And it can be ported to new architectures with the confidence
that these ports won't silently break by random other changes.

~~~
mansr
There are two fundamental problems with CompCert: 1) there is no formal,
machine-readable specification of the C language, and 2) there is no formal,
machine-readable specification of the target architectures. CompCert may be
formally verified, but it's not necessarily a C compiler (even ignoring that
it implements only a subset of C), nor does it necessarily compile for the
advertised architecture.

~~~
qznc
Addressing number 1)
[https://code.google.com/p/c-semantics/](https://code.google.com/p/c-semantics/)

Not perfect, but pretty good.

------
brainflake
I love seeing posts like this from OpenBSD team members. I cut my chops
developing on that platform and I have to say I've never encountered a more
clean and well thought-out code base.

------
sc68cal
Hopefully the OpenBSD members will reach out to the FreeBSD members, who can
share a lot of knowledge about their transition from GCC in the base system,
over to Clang/LLVM.

~~~
zeckalpha
FreeBSD maintains fewer platforms than OpenBSD.

------
dschiptsov
Why not just to have clang installed in /usr/local?) And gcc 4.2.1 is good-
enough.

------
adultSwim
People seeing the value of correctness. Proofs >> Tests.

~~~
fusiongyro
I don't think that's the message here at all. The only way to prove code
correct is by reference to some kind of specification. If that spec is a
standard full of implementation-defined behavior (and worse, contradicted in
practice by all the other vendors) a correctness proof is not really going to
convey what it sounds like it would.

What Miod is really asking for is an open source compiler that puts stability
and portability over compiler optimization. If it weren't for optimizations
and the endless fiddling that goes with them, gcc would have remained stable.
It is already the only option that meets their portability requirements.

~~~
sigstoat
i agree that miod isn't talking about proofs. but i don't think adultSwim was
making that claim, either.

> The only way to prove code correct is by reference to some kind of
> specification.

there are other interesting properties to prove besides "correctness of the
entire compiler".

you could prove the entire compiler version n+1 emits the same code as version
n, modulo $bugfix.

you could prove that individual optimizations are, on their own, correct with
regards to the AST or IR that gcc operates on.

neither of those require a formalized spec of C. the first would probably make
miod pretty happy. sadly one isn't going to get gcc man handled into a proof
assistant, ever.

~~~
fusiongyro
Those are interesting and extremely good ideas, thanks for pointing them out.

I do wonder if OpenBSD would accept a C compiler built on a small functional
language amenable to these kinds of proofs. I suppose fulfilling the
portability requirements is priority one, then stability. I just wonder if
they would accept something not written in C at all.

