* How is error handling done in Oberon-07?
* How does one do generic programming in Oberon-07?
* What is Oberon-07's async story?
I will be thankful for any pointers (heh).
> How is error handling done in Oberon-07?
There is an Oberon programming language with compiler written in Oberon, and there is an operating system called Oberon as well; you can have a look at the source code, but there are also a couple of free books about it, see http://www.projectoberon.com/
> How does one do generic programming in Oberon-07?
No support for generics yet; there was a proposal in the nineties, but Wirth didn't integrate it. Neither the Go language which is a brain child of Oberon-2 has generics (yet).
> What is Oberon-07's async story
Oberon is single threaded; there is a coroutine library part of the Oakwood standard. But nothing comparable e.g. to Go.
Generic programming and async are supported on the Active Oberon programming language in the Oberon family branch. Which also does support exception handling.
Most recent version from Active Oberon
The A2 OS written in Active Oberon and respective documentation
Some A2 overview
While Modula-2 had co-routines and exceptions, Wirth has chosen a minimalist path with Oberon.
Oberon-07 design goal has been to remove as many features as possible from original Oberon, while keeping a GC enabled systems programming language.
Then you have the other path, taken by others at ETHZ, which extended Oberon into a more expressive language, namely Oberon-2, Component Pascal and Active Oberon.
With Active Oberon being what is actually still used at ETHZ for teaching some OS classes and OS research subject.
When I read this book I had an hard time running and compiling the source code included.
If you want to read the book and try out the example compiler (oberon0), you can follow the instructions here: https://github.com/lboasso/oberon0. You just need a JVM >= 8 and the oberonc compiler: https://github.com/lboasso/oberonc
(This comment is posted every time but it's true.)
https://en.wikipedia.org/wiki/GNU_Bison#Use: Ruby, PHP, GCC until 2004, Go, Bash, PostgreSQL, MySQL, Perl.
Can we stop with this "duh, real programmers write their own handwritten parsers"? Parser generators are awesome! They force you to write a declarative grammar which a human can understand. If you're designing a language this is very useful. With a handwritten parser you're very likely to encode some unintended behaviour in edge cases; using a parser generator early in this process will very quickly reveal the ambiguities.
That doesn't mean that parser generators are always the best. There are many use cases where a handwritten parser is better. But the picture is way more nuanced than "no serious programming language uses bison or flex".
The benefit of having a BNF grammar for your language is rightfully enormous, and all modern languages definitely lay out their lexing grammar and BNFs explicitly in their specification. But there are downsides. Debugging LALR grammars is painful; error conditions are almost uniformly painful. Many languages make tokenization dependent on the current context of a parse for language ergonomic reasons , and you might not be able to specify the action in the right place in your automated parser generator. Hell, the language might not even conform to the LL(k) or LALR(1) your parser generator translates.
 I'd say this is context-sensitive lexing, but the resulting grammar isn't context-sensitive. Just somewhat more complicated than a naive translation of the grammar would indicate.
They force you to write a declarative grammar which a human can understand
It's a separate language that also has to be learnt, an extra step in the compilation process, and the difficulty of debugging parser-generator-generated code cannot be overstated. A recursive-descent parser written in a programming language is easily understood, debuggable, and does not incur any extra complexity.
Perhaps that is where the "no serious programming language" argument comes from --- the migration away from parser generators is a sign that the language has received enough attention such that the limitations of a generated parser become prominent.
FLEX, on the other hand, is perfect for the job. I can't imagine why anyone would hand-write a lexer.
Flex introduces a new dependency to your project, which involves generating source files at build time. This is a lot of complexity in your build system if this is the first time you need it. And it also by default uses a style that isn't really modern--it's more annoying to use if you want to use it in a multithreaded programming, or if you want to have multiple lexers in one program. (Or you're not even using C/C++ in the first place!).
Furthermore, lexers are actually really easy to write. Most tokens  are recognizable within a character or two, and usually amount to a regex on the order of [a-z][a-z0-9_]+ or "([^\"]|\.)*" -- the sorts of regexes that have very trivial implementations to handcode. The lexer I wrote for a JSON-ish file format is 128 lines of code. Sure, it'd be more like 30 lines of code if I used an automatic generator, but running into any issues integrating the automatic generator into my workflow would toss out the benefits of saving 100 lines of code.
 Excluding keywords, but keywords are usually implemented by parsing it as an identifier and looking for the identifier in a hashtable of known keywords.
Apart from Go, (and maybe PHP, which I don't know well enough to judge) all the languages you list have pretty horrible syntax that probably few (ruby?) to basically none of its users (bash, perl, C++, postgresql) really understand.
It also changes syntax in patch level versions: https://github.com/whitequark/parser.
And that is a straw man argument because, of course, no one is saying that.
The problem is many languages' grammars are not declarative. They're full of completely uncontrolled side effects. So why are we trying to use a declarative tool for them? It's the wrong tool for the job.
Yes! We should have both handwritten parsers and parser generators in our toolkit and then depending on the language we want to parse we should choose the best tool for the job.
My point about a declarative grammar was more that if the language permits it (e.g. is simple enough for Bison), it's so valuable to have a declarative grammar which serves as both specification and parser. It will be way less bug-prone than any handwritten parser. (From my experience handwritten parsers tends to have both weird crashes and unexpected handling of edge cases. But maybe I just suck at programming?)
I've seen people using Bison on one complicated languages, given up, and then they bring the negative bias to other languages they want to parse — even when the other languages are very simple.
I started this one!
Regarding error messages, if one implements the grammar as specified by the language spec, yes, confusing and useless error messages will result. That's because the language spec grammar includes restrictions based on semantics. To get good error messages, one has to remove/relax the spec grammar and then use semantic checking. Parsing is an art; not a science.
But... if I was asked to implement a language tomorrow I wouldn't even consider using Bison and Flex. Why? Because they're context-free tools, and almost every industrial language is context-sensitive. Square pegs and round holes.
These tools are a great example of over-complicating and over-theorising programming languages. We invented a theory that doesn't even apply to the problem we have, and then created crazy tools to solve the wrong problem, and gloss over the fact that they don't apply with terrible hacks and side-effects. We made a mistake with that phase in programming language history...
Are there other parts of Ruby's grammars that are also context-sensitive?
I'd say that Bison's biggest issue is that it's limited to LALR; if it actually supported full context-free grammars it would be far more useful.
That's the one that comes to my mind yes. I'd bet that there are others, but couldn't list them to you now.
> do they have to be resolved during parsing?
I guess not in practice?
But why are we doing this to ourselves, right? These parsing tools work great! (As long as you only do half do the job you were trying to do and then sort of hack it back together afterward.)
I agree with you in practice: The current parser generators are not as good as I'd like them to be. They are too restrictive and too locked into a "let's parse this subset of grammars mathematically optimally" instead of "let's make it easy for programmers to declare grammars".
(Disclaimer: I started working on a completely new parser compiler https://www.sanity.io/blog/why-we-wrote-yet-another-parser-c... which is capable of parsing full context-free grammars. I personally believe this is part of the puzzle of making them more usable.)
However, since we're talking about Ruby's syntax: I don't think Ruby's parser would be any nicer if it was handwritten. The problem here is that they discovered the power of "lexer tweaks parser state" and let it drive the design of the language. This has caused the language itself to be complicated (both for humans and for computers!) Unfortunately I don't have time for it, but I'd love to dive into the changes required in Ruby's syntax to make it context-free (e.g. pushing the local variable lookup into the bytecode generator).
It’s true almost no usable language is really CF, but a CF grammar with a few semantic patches still seems to be the better way to describe them.
It’s been a while since I needed to parse a grammar, but if I had to I’d probably use re2c+lemon, Ometa/ohm, or a hand-rolled Pratt parser. (Or an Earley parser if the grammar was ambiguous and it made sense)
I'm really not sure where you got the impression I was blaming Yacc or Lex for the complexity of the languages - I think you've imagined that.
The situation we're in in practice is that our language grammars are complex and are context-sensitive. So why are we trying to use context-free tools? It's madness - we're trying to hammer a nail using a screwdriver.
> These tools are a great example of over-complicating and over-theorising programming languages.
If programming languages are over-complicated, it's not the fault of these tools.
> So why are we trying to use context-free tools?
I no longer keep up with compilers (or languages), but when I was, there was no other tools. Either go through the tedium and effort of hand-rolling a lexer/parser or munge the language grammar to fit.
> there was no other tools
I don't think you need tools. Just write your parser as a normal program directly in a general-purpose programming language. I'm not sure that parsing is so special as to require an external DSL.
We usually semantically patch that at the right stage through the lexer or the AST so things still work in a CF setting, but almost no modern language is CF.
Type checking is where all the complexity really is. Parsing and lexing are just easy things you do before the hard stuff.
Use before declaration can only be resolved at the typechecker if you assume every name is a variable, but the introduction of typedefs makes that assumption wrong, so you can't wait for the type checker - every C parser I know patches that information back into the lexer so that it would parse correctly. Indeed, the "typename" keyword was added to C++ because you can't leave that to the typechecker if you want a unique parse before instantiation.
var x = (Foo) - foo;
I think the biggest difference is the rich intermediate representations we use these days. These classic texts don't really address intermediate representations and are usually syntax-directed instead.
These days a program will spent almost all of its time deconstructed into an IR.
The Xerox PARC workstations were microcoded, and the one needed to load the microcode on boot for the specific environment (XDE, Smalltalk, Interlisp-D,...).
Likewise the P-Code as means of portability for Pascal, M-Code on Lillith, the Ceres and latter FPGA RISC for Oberon.
I emailed an author of a book that specifically calls parsing a "distraction" (Shriram Krishnamurthi, in PLAI) and he explained that he and other textbook authors of recent texts are reacting to a sort of over-emphasis on parsing in older books and tools (for example, YACC standing for Yet Another Compiler Constructor despite being simply a parser generator).
So it's interesting what you're saying about IRs -- I've definitely noticed Shriram's book and a few other recent ones seem to focus really heavily on them. I wonder if there's also a relationship between a more IR-focused approach and the reduced emphasis on parsing?
I've been working professionally on one compiler continuously since 2013. I've spent like two or three days working on the parser. That's it. I've probably spent longer tweaking the Markdown of the README.
For me the issue is that parsing is basically a solved problem. I don't see how you can make it much better for compilers. If you give me a language I can write you a parser for it. It won't be interesting, fun, or be better or worse than any other parser. It's a purely mechanical job. (Of course it could be a nicer parser, or a faster parser, or give better error messages, or work better for IDEs or tooling, but stick with me.)
But the transformations that happen after that point? Well there a whole world opens up to you! You can do so much, in so many different ways, and achieve incredibly different results for different requirements.
They have a couple of typos, and one can feel that ML one was the first variant, but they are relatively good, covering a lot, specially if one does the exercises.
There is a 2nd edition for the Java one, where the tiger language gets replaced by a Java subset.
I've always wanted to learn about compilers, but something tells me the dragon book is no longer the go-to source.
* Foundations: the frontend, codegen [to IR] (including vtables, lambdas, and coroutines, probably), the big scalar optimizations, register allocation, maybe instruction scheduling?
* Loop optimizations: dependence analysis, all the loop opts, autovectorization, autoparallelization. Maybe cover some OpenMP or heterogeneous computing (e.g., SYCL, CUDA) here
* "Fancy" optimizations: profiling (including what & how to profile), link-time (including thin-lto techniques), SLP vectorization, possibly superoptimization and dataflow analysis
* Dynamic language VMs: JITs, garbage collection, shape optimization, and several of the various tricks (e.g., NaN-boxing, deoptimization guards)
* You might be able to make an entire book out of fuzzing and symbolic/concolic execution, but I think it'd be thinner than the other ones here.
And there are still things I don't know where to put. Alias analysis, exception handling, debugging…
A related thread from 2018: https://news.ycombinator.com/item?id=16609360