
How Much Does a Compiler Cost? - ingve
https://www.embecosm.com/2018/02/26/how-much-does-a-compiler-cost/
======
filereaper
As a previous compiler developer, its a pretty bad idea to look at the final
lines of code to determine its cost.

Typically in compiler development, there's massive R&D time, many hours spent
analyzing memory & CPU patterns to figure out which code to emit. You also
take a look at disassembly of generated code and nowdays perform data-mining
to figure out which patterns of optimize for (typically on non-kernel or
hotspot-heavy type workloads).

The end result of alot of investigation can be few lines of code which have to
be absolutely robust and production ready at the time of release under
standard configurations (i.e -O1, -O2 etc...)

Its a costly piece of engineering which is why some companies still charge
money for it.

I'd say the article's estimates are accurate for a simple compiler that emits
results at say -O0 or -O1 level of efficiency, but anything higher will start
to require more and more resources with diminishing gains over time. But this
can be said about most things in engineering.

------
catpolice
> LLVM is smaller at 1.6 million lines, but is newer, supports only C and C++
> by default

Unless I'm misunderstanding "by default", this isn't true. LLVM is language
agnostic and supports many languages - even the primary C-like frontend Clang
supports more than just C/C++.

~~~
wolfgke
> LLVM is language agnostic

There exists no such thing as "language agnostic". Every intermediate
representation contains properties (very often implicitly) that are specific
at least to a class of programming languages.

~~~
whatshisface
If the intermediate representation is no more language-specific than x86, then
it isn't really the _compiler_ that's language-specific: it's modern computer
architecture.

~~~
FeepingCreature
LLVM IR is fairly C specific. For instance, you better not be trying to write
a language where null pointer access is defined...

~~~
nicwilson
We compile D with LLVM. Null pointer access is defined to crash, unlike C/C++
where it is undefined.

~~~
pcwalton
Are you adding explicit checks (possibly using fault maps to optimize them)?
Because if not, you're depending on undefined behavior.

~~~
nicwilson
That is only necessary for single objects that are bigger than one page which,
we dont currently do (but should). Arrays are no problem since we check the
bounds, only raw pointers cause problems.

~~~
pcwalton
No, it is necessary in all cases. Look at the InstCombine code I linked in a
sibling comment…

------
gumby
In a decade Cygnus spent well over $250 million on GCC and its related tools
-- well our customers did. Not sure what that is in today's dollars.

~~~
solarkraft
Can break down where the money likely went, how it was spent?

~~~
gumby
This was between 1989 (when we started in my living room) and 1999 when I
left. By 1999 we had about 160-170 developers working on gcc, gdb, binutils,
glibc, Cygwin etc. The money was a mixture of 1> ports (typically paid for by
the manufacturer), 2> maintenance (mostly paid for by the corporate end user
-- for $100K/year we'd fix your bug whether we spent $100 to make a trivial
fix or $300K on a major problem) 3> maintenance on very old versions that
people had to freeze on for one reason or another and 4> the public tree (we
wouldn't work on non-free software, apart from a few abortive experiments
after I had left).

Making sure everything we wrote was compatible with the public tree was
expensive. In the end, one of the last things I did saved a lot of money and
improved code quality for everyone: I forked gcc from the FSF (and made the
egcs tre the _de facto_ , and eventually _de jure_ version).

Nowadays forking is considered routine, and even beneficial, and projects have
steering committees that are responsible for master releases. At that time,
forks were considered a _tragedy_ , and it took me over 6 months of discussion
with various people to build a reasonable consensus (you can see the people's
name at the bottom of the letter I wrote announcing it (
[https://gcc.gnu.org/news/announcement.html](https://gcc.gnu.org/news/announcement.html)
). It's been a good model. Note that RMS's name is _not_ on the letter -- he
was furious and was sure this would destroy the FSF. Instead it has
strengthened it.

By the way, to compare these numbers: Red Hat went public with, AFAIK less
than 10 engineers and when Cygnus and Red Hat merged, though both companies
had about 200 people, Cygnus had over 160 engineers while Red hat had a couple
of dozen. Of course we had a lot more revenue, and in the years since RHAT has
invested heavily in free software development. But their company value at that
time was something like 3X ours -- a very valuable lesson.

~~~
oblio
How did Cygwin come about?

~~~
gonzo
[http://www.toad.com/gnu/cygnus/](http://www.toad.com/gnu/cygnus/)

~~~
oblio
Maybe I'm misreading that, but it doesn't say anything about Cygwin, per se,
just about Cygnus, the company.

------
phkahler
I find it odd that they did an LLVM port and even another blog about dealing
with 16bit char types, but none of the target specific changes are going
upstream. I can understand a company not wanting to help their competitors,
but it sounds like they're still going to try to get the 16bit char changes
pushed upstream, which is likely one of the more useful pieces for a competing
DSP vendor with a different instruction set.

In 5 years we'll all be using risc-v with a nice DSP extension anyway... And
after typing that I felt the need to google: [https://riscv.org/wp-
content/uploads/2016/07/Wed1000_dsp_isa...](https://riscv.org/wp-
content/uploads/2016/07/Wed1000_dsp_isa_extensions_for_an_OpenSource_RISCV_implementation_Schiavone.pdf)

------
ggambetta
Heh. Remember when you had to actually _buy_ a compiler? :)

~~~
bradstewart
People still buy compilers. IAR is [1] still the gold standard in the embedded
space.

[1] At least it was 18 months ago when I still working in the space, can't
imagine that's changed since.

~~~
peatmoss
In the scientific / numeric space it’s not unheard of to buy Intel Fortran/C
compilers for performance reasons.

Also, it’s been a few years since I touched IBM’s POWER hardware with AIX, but
I’m pretty sure XLC and whatever the Fortran compilers were didn’t come free.

~~~
wycy
My organization uses the Intel Fortran compiler. I did a few rough benchmarks
at one point and found it produced code nearly 3x faster than the same code in
gfortran.

~~~
gnufx
I hear this a lot, but if it's with equivalent optimizations it's quite
pathological and should be reported as a bug; it's certainly not generally the
case. They are typically similar at the 10-20% level in my experience (not
always in ifort's favour). There are several significant HPC codes which
recommend GCC where it matters. You also need to consider reliability; GCC is
surprisingly better in my experience -- I've quite often solved users'
problems with "just try GCC".

There used to be a published set of results from Polyhedron -- presumably to
help to sell proprietary compilers -- which had a geometric mean favouring
ifort by ~20%, if I remember right. However, the last I saw didn't use the
latest GCC, required the same flags for each case (which didn't include
profile-directed, for instance), and the cases with the biggest differences
actually bottlenecked on libraries, not generated code. gfortran is infinitely
faster on most architectures, of course.

Most benchmark results I see aren't useful because they don't supply the
parameters needed to be reproducible and they don't have profile information
to allow you to understand, and maybe improve, them.

~~~
gnufx
I found that results from the Polyhedron benchmarks are online again
<[https://www.fortran.uk/fortran-compiler-
comparisons/polyhedr...](https://www.fortran.uk/fortran-compiler-
comparisons/polyhedron-benchmarks-linux64-on-intel/>). Note that they compare
gfortran from 2015 and ifort from 2017. I don't know how representative they
are of the part of the workload on typical HPC systems for which the compiler
might be important.

------
msla
> For a commercial system, the compiler has to be completely reliable—whatever
> the source code, it should produce correct, high performance binaries.

Well, that's certainly the ideal case. I can only wonder what historic defect
rates have been for commercial compilers.

~~~
eesmith
Not that it answers the question, but here are some VC++ bugs found in
Chromium development: [https://randomascii.wordpress.com/2016/03/24/compiler-
bugs-f...](https://randomascii.wordpress.com/2016/03/24/compiler-bugs-found-
when-porting-chromium-to-vc-2015/) . HN discussion at
[https://news.ycombinator.com/item?id=11361151](https://news.ycombinator.com/item?id=11361151)
.

------
neokantian
C/C++ compilers are different from other compilers. They are forced to support
all kinds of complicated, too-many-cooks-spoil-the-broth, design-by-committee
standards. If you just wanted to do another portable assembler, your new
compiler would take just a fraction of the effort.

------
z29LiTp5qUC30n
Posts like this make me think of this:
[https://github.com/oriansj/M2-Planet](https://github.com/oriansj/M2-Planet) A
C compiler with support for inline assembly, gotos, structs and weighs under
2000lines and is self hosting.

------
sesteel
I was hoping this was an analysis on how much time and money is spent while
waiting for n number of lines of code to compile by various compilers. I find
waiting for things to build, deploy, install dependencies, etc as the most
mind numbing aspects of the field. I can hardly stand it.

------
jokoon
I wonder, are there languages that are cheaper yet have similar features?

For example, how much is rust? python?

It's weird how a compiler can cost so much, yet it can still produce unsafe
code because the language is not safe by design. In that optic, rust might
save a lot of money.

~~~
gambiting
You cannot produce a Rust or a Python compiler without a working C/C++
compiler first anyway, so the point is slightly moot. If there exists a C/C++
compiler for the platform you want to build for, just build Python for it and
you're good to go.

~~~
adrianN
You could produce a Rust or Python compiler without a working C compiler if
you wanted to.

~~~
gambiting
Yes, but then you are running into all the same issues that writing a C
compiler would have, you're just skipping the middle step.

------
bernardino
Does anyone know of any resources to learning about compilers and building a
small, simple compiler for beginners?

~~~
ndh2
I watched a couple videos from this and was good to go.
[https://lagunita.stanford.edu/courses/Engineering/Compilers/...](https://lagunita.stanford.edu/courses/Engineering/Compilers/Fall2014/about)

My strategy was to start with a very small grammar and grow it.

