
Anders Hejlsberg on Modern Compiler Construction [video] - ingve
https://channel9.msdn.com/Blogs/Seth-Juarez/Anders-Hejlsberg-on-Modern-Compiler-Construction
======
musesum
Anders Hejlsberg wrote the first mass-market Compiler+IDE with Turbo Pascal.
What was unique is that TP would automatically bring up the editor and
position the cursor on the offending error. Seems trivial now, but it was a
game changer for writing code on a PC. TP sold for around $49. The competitor,
Pascal MT+, sold for $400.

I doubt if Anders would remember me, but I was lucky enough to be contracted
by Borland to write their first test suite for TP 4.5. It was their first
object oriented language. The spec was one of the most beautiful and concise
pieces of technical documentation I've ever read.

~~~
agumonkey
Last month I recovered an old backup tape with TP.EXE. Couldn't resist but to
play with it in dosbox.

[http://imgur.com/oT3u4fR](http://imgur.com/oT3u4fR)

It really was a brilliant piece of software. 600KB.

ps: the text GUI shadows.

~~~
musesum
Ah yes, text mode graphics! I spent a few thousand hours writing a text mode
version of Augment, with TP. Amazing what you could do with 80x25x16 colors.

~~~
agumonkey
In terms of ergonomy, that amount of text was very nice. As a long emacs user,
I felt straight jacketed by their keyboard shortcuts.

ps: really, these pseudo alpha transparent text shadows ...

------
wwwigham
As Anders said near the end of the video, if you want to know more, look at
the source code[1]. Speaking as someone who's worked on it (so I'm biased), I
feel it's pretty easy to jump in and edit (though I'd advise new people to
avoid the typechecker unless you feel particularly adventurous, it's a multi-
thousand line long monster file). There are piles of easy issues[2] that are
looking for community members to work on them.

By the way - one of the most meaningful comments he made in this talk (to me)
was when he said that your parser had to "parse more than the allowed grammar"
so you can provide better error messages. Compilers are, ultimately, tools for
developers - so developer experience is tantamount. This, I've found, is so
very very true in any smaller projects I've worked on, and is easily one of
the first things neglected by some of my more algebraically inclined peers
(who are very satisfied with a perfect ABNF and a parser which strictly
adheres to it).

[1]
[https://github.com/Microsoft/TypeScript](https://github.com/Microsoft/TypeScript)
[2]
[https://github.com/Microsoft/TypeScript/labels/Effort%3A%20E...](https://github.com/Microsoft/TypeScript/labels/Effort%3A%20Easy)

~~~
seanmcdirmid
I've developed a couple of cool tricks to get very error tolerant parsers (I
design/build live programming languages). We can go pure shunting yard
(precedence parser), which will basically parse anything since it doesn't rely
on grammar rules. Even if going with grammar based parsing (they are
convenient), braces can be matched on tokens in a pre-pass before parsing
occurs, eliminating an entire class of difficult to deal with errors, and
allowing for brace stickiness in an incremental edit context; no need to
rebalance because someone typed an opening paren without brace completion!

Though I can't help but think that someone will eventually develop an NN-based
PL parser that will be much more error tolerant than straight grammar-based
implementations could ever be.

~~~
infinite8s
Do you have those tricks written up anywhere?

~~~
seanmcdirmid
Not really. I haven't taken the time to do any analysis given that they are
always details in the languages I'm building (which are written up). I gave a
talk about this at Parsing SLE (2013?) but I guess it didn't need a PDF.

They are really simple ideas (really, pre match your braces, I'm sure I'm not
the first one to do that!), I'm not sure they are publishable.

------
lemming
This is a fantastic overview. If you've ever wondered why JetBrains build
specific editor support for each language they support (including,
effectively, a compiler), this is why.

As the developer of an IDE for Clojure, I'm also very happy that one of the
secret sauces is persistent data structures.

------
sethjuarez
As a quick aside, Anders had committed code the same day to the TypeScript
compiler. He also is in a team room with like 20 devs (not in his own plush
window office). He told me he loves that kind of environment. The dude is a
really cool dev.

------
constexpr
This is a very good talk but I wonder if this alternate compiler design has
actually made the TypeScript compiler slower in normal compilation mode. If
you really do "helicopter in" to every point in the syntax tree and run IDE
queries to implement type checking then that could potentially be much slower
if there's any overhead at all to doing that.

I've been experimenting with programming languages and compilers myself
([https://github.com/evanw/skew](https://github.com/evanw/skew)) and my
compiler appears to be ~2x faster than tsc when run with node even though it's
also doing constant folding, tree shaking, inlining, and minification while
tsc is just removing type annotations (my compiler appears to be ~15x faster
when run natively). The slow compilation speed of tsc is my main complaint
about the TypeScript ecosystem.

~~~
nv-vn
The compiler doesn't just remove the type annotation, it has to go through and
check that the annotations are valid and that the type safety is kept
throughout the program. Type checking is often not a quick process, since it
requires every single value in the program to have its type verified. If I
declare that a variable is an integer and set it to the result of the
function, the compiler has to make sure that that function returns an integer
and not some other type, or that the type that function returns can be
implicitly converted to an integer. And for that to be safe, it has to first
prove that the function being called can be given that type, etc.

~~~
constexpr
I know what type checking is. :) Both Skew and TypeScript are type-checked
languages. Just because a compiler does type checking doesn't mean it has to
be a lot slower. I was just pointing out that it would be interesting if this
alternate compiler design was the reason why the TypeScript compiler is so
slow relative to another compiler for a similar language (object-oriented,
typed, garbage collected, etc.) that also uses the node runtime, especially
since that other compiler is doing even more work than tsc.

------
rtpg
Are there books or more developed material on these strategies? I understand
the concepts mentioned here but reading some implementation strategies would
be helpful.

~~~
mattchamb
The C# compiler he is taking about is open source, so you could have a play
around with it to see how they implemented it.
[https://github.com/dotnet/roslyn](https://github.com/dotnet/roslyn)

------
Scarbutt
Is it common to see people at Anders's age so enthusiastic about CS as he is?

~~~
stevetrewick
He's 56. What's your point? Do you imagine that people who devote themselves
to a field have some kind of expiry date built in?

~~~
jonsen
Our expiry date is not har coded. It is a nondeterministic finite state
machine.

------
visarga
It looks like they are building syntax trees in a similar way to how React is
building the DOM tree - using functional programming and caching/diffing.

------
amasad
So it's not that the classic compiler architecture that had to change -- it's
the addition of languages services that broke the existing model.

~~~
chubot
The classic compiler architecture did change, unless you want to maintain two
compilers. That's the main point of the talk.

~~~
amasad
Yes, I'm just saying that the requirement was external to the original
function of compilers.

~~~
chubot
Sure... they mentioned the fact that the TypeScript compiler barely generates
code, if that's what you mean by the original function of compilers.

The architecture is completely dominated by the front end for usability and
incrementality now. Generating JavaScript after all that is trivial.

You can imagine that during a typical program's development lifetime, 99% of
the time is spent with the program in a non-working state, and 1% is when it's
working. The compiler has to help with the 99% case now, not just the 1% case.

~~~
amasad
Typically langauge services are decoupled from compilers. The 1% case still
dominates how compilers functions. Furthermore, typically compilers are not
deamons and build systems would take care of caching etc. Do you have other
examples of consolidated tooling like that? It seems like a good idea but I
don't think it's the standard thing yet or that everyone thinks it's a good
idea.

~~~
pjmlp
Yes the model of programmer workstation as envisioned by Xerox PARC and
followed up by ETHZ.

Smalltalk, Interlisp-D, Mesa/Cedar at Xerox PARC, followed by Niklaus Wirth
work at ETHZ with Oberon, Oberon-2, Active Oberon and Component Pascal
environments.

Also the initial Ada Workstation developed as the first product from Rational
Software and Eiffel development environments.

The Energize C++ environment created by Lucid after pivoting from Lisp
Machines.

