
Ask HN: Compiler Engineers, what would you advise new grads/students to learn? - generichuman
What skills, technologies etc.<p>Inspired by https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;cscareerquestions&#x2F;comments&#x2F;d1fqwd&#x2F;people_of_this_sub_who_have_jobs_as_data&#x2F;
======
jlebar
Maybe you're different than me, but I have a tough time really learning
something without a concrete task at hand. I also find that I have a tough
time learning if I don't have someone to ask questions of. And I find that
online developer communities are much more welcoming of newb questions after
you've contributed something concrete.

If it was me (and it really _was_ me just a few years ago), I'd git clone
llvm, join #llvm on freenode, and ask if there's any simple refactoring or
cleanup people are doing. Give me a chance to absorb the code and also build
some goodwill in the community, which I might then leverage into getting help
on more complex parts of the compiler that I want to hack on.

If you're looking for a project, the nvidia GPU ("NVPTX") backend in LLVM has
a bunch of horrible global variables protected by locks, and it all really
should go away. And it's not hard to find other simpler refactorings to do in
there, it's pretty yucky. No compilers experience needed, just C++ skills.

------
pizlonator
JSC compiler architect here.

First of all make sure you are very comfortable with these concepts:

\- SSA

\- graph coloring, linear scan, and priority coloring approaches to register
allocation. You really have to know all of them otherwise you’ll have
incorrect ideas about which is best and when.

\- sea of nodes. Not because you will necessarily implement it (it’s not that
great IMO) but because you will definitely use some ideas from it. It’s a very
inspiring concept.

\- abstract interpretation

\- types, points to sets, abstract heaps, and the ways that these things are
the same

\- instruction selection. This one is tricky because the literature doesn’t
say smart things about it. Gotta read code or talk to people. I learned how to
do it by word of mouth.

You need to strike a balance between these two activities to become good:

\- Write your own compiler and get it to beat other compilers on some
benchmark. It can be a simple benchmark or a simple language. You need to be
comfortable with overall compiler architecture and there is no substitute to
seeing the whole thing fall together. Then, after you do this, do it again
because if you’re like me then your first attempt will be shit.

\- Learn a major, mature compiler architecture like JSC, llvm, V8, GC, or
whatever. Write some measurable improvement to such a compiler. Every major
compiler has brilliant nuggets of awesomeness that you will only come to
understand if you jump in there and try to make it better.

Hope this helps and good luck! You picked a fun profession.

------
woodruffw
I'm not exactly a compiler engineer (I do program analysis research,
frequently with LLVM as my base), but here are some things that come to mind:

* The LLVM ecosystem is an absolute godsend for most instrumentation, analysis, and optimization tasks. The project provides an excellent tutorial on writing LLVM passes[1] that I refer to daily.

* The compiler blogosphere is full of excellent resources, including Eli Bendersky[2], Trail of Bits[3] (fd: my employer), and John Regehr[4].

[1]:
[https://llvm.org/docs/WritingAnLLVMPass.html](https://llvm.org/docs/WritingAnLLVMPass.html)

[2]: [https://eli.thegreenplace.net/tag/llvm-
clang](https://eli.thegreenplace.net/tag/llvm-clang)

[3]:
[https://blog.trailofbits.com/category/compilers/](https://blog.trailofbits.com/category/compilers/)

[4]: [https://blog.regehr.org/](https://blog.regehr.org/)

------
saurabh20n
I have a PhD in programming languages; and I hire people with these
backgrounds (compilers, PL, verification, synthesis). As others have said, if
you've looked at LLVM, I'd find that interesting, but for practical use, there
are more useful + interesting + fun things out there.

E.g. SMT solvers (CVC, Z3, or the like) -- infinitely more fun; and require
experience to truly understand what works and what doesn't. Or if you've done
something really novel with meta-programming or designed a custom DSL for a
domain.

~~~
314
Your PhD is insanely impressive :) Ras' work on Program Sketching is stellar.
Your new company looks very exciting. Hope your hiring goes well, it will be
fun to see what kinds of verification you can do on smart contracts.

~~~
saurabh20n
Thanks for the very kind words. Email me directly / watch our blog. We'll be
releasing some new results this week.

~~~
314
I've just followed @_syntheticminds on twitter. Looking forward to reading
what you've been up to.

------
enos_feedler
I would start looking at MLIR:

[https://github.com/tensorflow/mlir](https://github.com/tensorflow/mlir)

This is a new compiler framework that attempts to bake machine learning
computation into the compiler stack. It is spearheaded by Chris Lattner who
started LLVM.

I think this project is both early in it's development phase and has a good
chance at turning into important compiler infrastructure.

(I worked in compilers at NVIDIA for a few years)

~~~
enos_feedler
Timely announcement from Google today on MLIR:

[https://www.blog.google/technology/ai/mlir-accelerating-
ai-o...](https://www.blog.google/technology/ai/mlir-accelerating-ai-open-
source-infrastructure/)

------
rvz
For a new grad, I recon you checkout the LLVM introduction and its pointers by
Adrian Sampson for a primer and then look at the LLVM tutorials [0] about
building a toy language. This will give you a feel about how 'modern' compiler
development ties in with a programming language using the LLVM infrastructure.

If you are feeling for contributing to a real-world language, there are a few
such as Go (very advanced), Rust, Swift, etc. For a beginner, my
recommendation would be to checkout the Zig Programming Language [2] as a
start and then look at the others.

If you can't choose, look at Awesome Compilers: [3]

This is the sort of question that I am very pleased to see on HN and we need
more of.

[0] -
[https://www.cs.cornell.edu/~asampson/blog/llvm.html](https://www.cs.cornell.edu/~asampson/blog/llvm.html)

[1] - [https://llvm.org/docs/tutorial/](https://llvm.org/docs/tutorial/)

[2] - [https://ziglang.org](https://ziglang.org)

[3] - [https://github.com/aalhour/awesome-
compilers](https://github.com/aalhour/awesome-compilers)

------
mynegation
Something else in addition to the compiler engineering

Pure compiler jobs are far and wide between. You might get a good job but your
career movement will be limited.

I believe the future of compiler design will go hand in hand with the demands
of the industry. Traditional “by the dragon book” compiler is a solved
problem. The growth is in the application of compiler technologies to Machine
Learning, distributed systems, specialized hardware.

~~~
seanmcdirmid
Agreed. There is a bit of a market for compiler/ML cross overs ATM (at MS,
Facebook, and Google at least). But you need at least a PhD or 10+ years of
experience.

PL isn’t really a hot area ATM, including implementation. Even formal
verification jobs are a bit hard to find these days. Perhaps it will be more
popular when the next AI winter comes around :).

~~~
jlebar
> There is a bit of a market for compiler/ML cross overs ATM (at MS, Facebook,
> and Google at least). But you need at least a PhD or 10+ years of
> experience.

Until a few months ago I lead part of the XLA team at Google, working on ML
compilers for CPU/GPU.

Most of us didn't have 10+ years of experience in compilers when we started on
the team. I know one of us had a PhD in compilers. Possibly others did too,
I'm not sure; this goes to show how unimportant the PhD in compilers / PhD at
all was. I had no experience in compilers before I joined (not as the lead).

We're unusual, but I want to demonstrate that you really can get a good job
working on compilers without a PhD or 10+ years of experience.

~~~
seanmcdirmid
Note that I didn’t say a PhD or 10 years experience in compilers, it could be
ML or some other related area. There really isn’t such a thing as a PhD in
compilers, and even in the SIGPLAN area compiler specialized PhDs are rare
(formal verification is much more common).

A PhD can get your foot in the door (showing mastery via the research you’ve
done vs work experience). And then either/or is very useful in even knowing
about and being recommended for these jobs (HR is generally not very useful in
hiring).

~~~
jlebar
Iirc none of us had any ML experience. :)

~~~
seanmcdirmid
Noted. It is a bit ironic since a common complaint/frustration among PL people
(mostly with PhDs) working at google is that they have to work on ads...

------
wyldfire
Just adjacent to "compiler engineers" are lots of engineers who work on what I
would call "plumbing." e.g. the tools in your toolchain which aren't compiler
optimization passes but instead implement a particular backend or
language/library/OS feature (preprocessor, linker, assembler, C library, etc).
For those of us plumbers it's great to get experience with the specific tools
in the toolchain on various backends to see how they work. The most critical
things are (1) dumping output at various stages (2) your specific toolchain's
debug functions, and (3) running strictly-analysis tools like readelf/objdump.

Learning how linkers and loaders [1][2] work really helps put the pieces
together.

Exercises like isolating reproducible failures to a particular tool or
compiler pass, C-reduce [3], etc. -- these are valuable.

Of course, like everyone else here says: LLVM is a great place to kick the
tires on some of this stuff.

[1]
[https://en.wikipedia.org/wiki/Special:BookSources?isbn=978-1...](https://en.wikipedia.org/wiki/Special:BookSources?isbn=978-1558604964)

[2] [https://eli.thegreenplace.net/tag/linkers-and-
loaders](https://eli.thegreenplace.net/tag/linkers-and-loaders)

[3] [https://embed.cs.utah.edu/creduce/](https://embed.cs.utah.edu/creduce/)

------
vkaku
I wouldn't ask them to read any existing stuff, because I'd want them to learn
how to write some of that themselves.

Things like: Lexer, Parser, Expression Creator, Optimizer, Evaluator,
Expression -> Machine Code Template Matcher, and Machine Code Generator.

Where commercial compilers do better than most grad student projects is the
modularity, the number of optimizer passes and options, and the run-time
tooling and modularity.

Unless a student implements these themselves, they'll be spending more time
understanding the discrete implementations, with it's flaws/features more than
the concepts itself, or the big picture.

Hence, in order to create better engineers overall, I'd recommend they do it
all themselves initially.

~~~
tester346
I had to write mid size custom markdown parser?/lexer? with many various
business requirements or things like attaching additional informations that'd
allow frontend to display completion popup menu

I wrote it just as "just" step by step algorithm that transforms e.g 500LoC
into flat tree of parsed objects

I thought about learning formal grammar theory, but I couldnt see how it'd
help me because at the end everything worked fine. It just needed writing a
lot of tests.

------
x0x0
I bet there are a few good businesses around building type hinting for large
rails codebases, both for use for profiling, bug detection, and as an editor-
agnostic backend. Ditto for large React codebases. I don't think these are
large businesses, but look at eg Sidekiq. There's probably a very nice living
for a handful of people who get to do interesting work.

NB: people on HN are oddly cheap. From the perspective of someone who makes
payroll for 10 engineers, I can spend $1k on something like that basically
because a senior engineer asked nicely or thought it might be useful. $5-$10k
is definitely in scope for useful tooling. (Prices per year because I
understand that if the authors can't make a working business, they stop
offering me the X that I'm buying). Also, please make it work with vim. Pretty
pretty please.

And there are probably good industry jobs eg at Dropbox for python or rust,
etc. Basically, find a large company with a big investment in a slightly-off-
the-beaten-path programming language, and there will be very interesting work.

------
ibains
I worked on compilers at Microsoft, NVIDIA and there are few jobs for
traditional hardware compilers. Some new hardware is being designed though,
most fun was new optimizations specific to GPUs different caches such as loop
optimizer and rematerialization in register allocator.

Then I’ve worked on query optimizers for databases and they’ve got completely
different technology and papers. Here one can extend Apache Spark, they give
lots of interesting extension points.

Also, these days my startup is building parsers in Scala for DSLs and if
performance is not critical, I’m loving Packrat parsing in Scala (parser
combinators), this is way easier and fun. Interesting tooling can be built in
Scala, you can also use Scala macros, get access to Scala compiler AST. This
kind of work around data might have applications for more engineers.

------
gesman
Data Analytics.

There is so much data is being generated and worldwide enterprises are lagging
so much in discovery and leveraging insights in data - so there are lots of
fun work here.

Add ML and AI here.

Applying it to above gives you even more powers.

And then - you can apply these to any other discipline.

------
rishav_sharan
For the front end, Haskell or similar functional languages. For the backend,
cranelift. It is currently in the work so contributing to the project will be
a good learning experience

------
mamcx
I'm self taught on compilers. I focus more in "how" do stuff.

1- Focus on AST transformations. A lot about parsing but that is the "easiest"
part (using a parser generator, pratt parsing or combinators).

In the AST is where the "action" is. I even made my toy langs without parsing
at all (I build a small internal DSL).

2- Not expect much information about the _real neat_ stuff.

How make repls for compilers??? how enable debugging?? how represent AGDTs??
How test them??? How do FFI??? Which data structures to base on the rest???
How profile them??? How do type inference?? So, which GC to use?? How
implement a GC?? How implement macros and generics?? ie: without lisp. How
implement generators?? etc.

A LOT you will find in papers. But real examples??? Never.

So I think if you wanna get serious learn how read papers. I don't get the
weird math them use and my ignorant impression is that VERY few have real
information even if understood. Have the abstract math is small potatoes at
the time of implementing.

So many times I get answers like "is easy dude" and pressing how "just read
how the LLVM is made!".

3- That is why I'm very glad of

[http://journal.stuffwithstuff.com/category/language/](http://journal.stuffwithstuff.com/category/language/)
[http://craftinginterpreters.com](http://craftinginterpreters.com)

 _Real gems_ here.

4- You need to read lisp, oCalm & Haskell if wanna get some good ideas. I'm
using Rust and the little is there (ie: toy langs) is good!

5- I don't know what to do with LLVM and other larger codebases. Too much
complications and when done in Java, .NET (except F#), C or worse, C++,
codebases the noise is big. Is much clear the samples on oCalm, Haskell
sometimes, lisp sometimes.

Or in other words, small/medium compilers are better to get stuff.

6- Semantics & features. This is the meat. The toy math calculator is too
easy. In the moment you wanna do OO, Lazy, AGDTs, Streaming, Structural type
system, etc is where you will see how sparce the actual info is. So narrow the
kind of semantics/features you look for.

Just add this or that could lead to MASSIVE changes in how do the language.

For example, I'm doing a relational language
([http://tablam.org](http://tablam.org)).

Is not that conventional, and a lot of info is from the RDBMS guys, and that
mean a lot of detour about STORAGE/ACIDs and not actual languages!

6- Finally, pick your host language with care. Probably compilers with
transpiling not matter much but your host will define the boundaries of how
and what your could do "easily".

~~~
jlebar
> 5- I don't know what to do with LLVM and other larger codebases.

Learning how to deal with large, gnarly codebases is one of the most important
software engineering skills I know of. To the parent question, I would say,
learn this skill, and _so much else_ will follow.

~~~
hollerith
I agree, and another important engineering skill is _avoiding_ the need to
deal with the large, gnarly codebase when the costs outweighs the benefits.

~~~
mamcx
Good point. What I mean is that for learning, dive into a large codebase could
be counter productive. Specially when have a long history and who know what is
like it is.

(Plus, LLVM is kind of a complex beast)

------
stmw
I second the recommendation to spend time on LLVM, as there are decent books
and tutorials, and it is in active use. It can also be helpful to study JIT
compilers - see the .Net runtime code, or various JDK source code - since
dynamic code generation has some different tradeoffs.

~~~
saagarjha
Not a compiler engineer, but I'd suggest taking a look at JavaScript JITs as
well, which often have interesting solutions for making a poorly-typed
language run fast on modern hardware.

------
admils
Data Analytics. There is so much data is being generated and worldwide
enterprises are lagging so much in discovery and leveraging insights in data -
so there are lots of fun work here.

Add ML and AI here.

Applying it to above gives you even more powers.

And then - you can apply these to any other discipline.

------
lalo2302
I know nothing about this topic. But saw this book once and made me excited
about compilers. Maybe one day I'll go for it:
[https://compilerbook.com](https://compilerbook.com)

------
cheez
It's not hard, but it's tricky.

The reason it's tricky is that there is so many features in various file
formats that it's very possible to implement an entire project then come to
generating that one bit in that one field that you forgot about, then need to
go all the way back up to the parser to attach it to the right place in the
AST so you can pass it all the way down.

I'd be looking at compiling ML code but otherwise, compilers are a solved
problem.

~~~
closeparen
Programming language design doesn't feel like a solved problem, and it seems
reasonable that new ways of expressing programs will require new techniques in
compilers.

~~~
cheez
You're right, there are always new ways of expressing yourself. I don't mean
to say that you don't need to have new ways to express yourself, but that the
mechanism for converting expressions to machine code is well known. That is
what is a solved problem.

