
Do you have a problem? Write a compiler - lelf
http://oleg.fi/gists/posts/2019-09-26-write-a-compiler.html
======
TeMPOraL
To people saying that writing a compiler is almost always an overkill - that's
because you and author mean different things by the word "compiler". Or, in
other words, that's because your intuition is based on the inflexible
languages you're working with.

It's not an accident that the talk/article is about Clojure[0]. A Lisp. In
Lisp, "writing a compiler" is a fancier way of saying "writing a possibly
code-walking macro", which is still just a fancy way of saying "writing a
function that transforms a tree into another tree, and having it run compile-
time on parsed program source as the input". 99% of the usual compiler work -
tokenizing, building an AST, generating machine/bytecode, optimizing - is
handled for you by the host Lisp compiler. All the mechanics are there, and
all you have to do is to add the brains you need.

\--

[0] - Clojure itself actually makes this slightly more difficult than it
looks, because of the decisions it made that makes the syntax slightly less
obviously homoiconic.

~~~
kerkeslager
Writing a compiler is _still_ almost always overkill, because implementing the
compiler is one of the smallest parts of creating an effective programming
language.

Let's say you're starting a project, and you're choosing a language to write
it in. You have two options:

Language A:

1\. Has multiple mature compiler/interpreter implementations.

2\. Has lots of tooling built around it.

3\. Has extensive documentation.

4\. Has a diverse community to hire programmers from.

5\. Fits the problem domain okay.

Language B:

1\. Has one buggy implementation that returns mangled stack traces on errors.
See Kernighan's Lever[1].

2\. Has no tooling.

3\. The documentation is the compiler. So... no documentation.

4\. The only programmer that knows it is the one who wrote the compiler.

5\. Fits the problem domain as well as the programmer understood the problem
domain when he started writing the compiler.

I know which language Dan McKinley[2] would choose. :)

You might say, "But my language uses the same homoiconic syntax of Lisp, so
the tooling of Lisp carries over, and it doesn't require more documentation
than just normal functions, and Lisp programmers can pick it up easily." To
which I would respond, "Sounds like you took the long way to implementing a
library." I'd level this criticism against a lot of Racket languages, which
basically are just a shorthand for a bunch of require statements. I'd rather
copy/paste those same require statements.

The fact that Lisps make creating a compiler so easy is actually a _downside_
, because it leads people to write compilers without thinking through the full
secondary effects of doing so. This is one of many heads of the Hydra that is
The Lisp Curse[3].

EDIT: Decided to say more.

EDIT 2: Decided to say less.

EDIT 3: Always be editing.

[1] [https://blogs.msdn.microsoft.com/alfredth/2009/01/08/are-
you...](https://blogs.msdn.microsoft.com/alfredth/2009/01/08/are-you-smart-
enough-to-debug-your-own-code/)

[2] [https://mcfunley.com/choose-boring-
technology](https://mcfunley.com/choose-boring-technology)

[3]
[http://www.winestockwebdesign.com/Essays/Lisp_Curse.html](http://www.winestockwebdesign.com/Essays/Lisp_Curse.html)

~~~
TeMPOraL
You're still thinking of languages as these huge things that can be offered as
products. It's not the way a Lisper thinks about it.

> _To which I would respond, "Sounds like you took the long way to
> implementing a library."_

And I'd respond, that library _is_ the language. There's a spectrum of
complexity of what you can call "a language". General-purpose programming
languages like you seem to be considering are one end of that spectrum. The
other end of that spectrum are the abstraction layers and APIs you're coding
to. Take e.g. OpenGL API - you can view it as a set of functions, but once you
consider the rules about calling them, you may as well call that a language.

So when you're designing a "DSL" in a Lisp, it's literally no different than
designing any other module. You have to name things, you have to figure out
how they should be used. Lisps just doesn't force you to shoehorn your API
into built-in language syntax. It doesn't force you to write "open-file" and
"close-file" because your language only has functions and no built-in "with"
construct; it lets you add the "with" constructs with four lines of code,
making the API cleaner.

Most DSLs sit somewhere between "API for a stack" and "API for OpenGL".
They're just code - code that in some cases happens to run at compile time and
operate on other code. But on the user level, on the API level, it's _no
different at all_ from using any other library. A documented DSL is no harder
to use than a documented function library; a function library lacking
documentation is no easier to use than similarly undocumented macro-based DSL.

Some people seem to have an irrational fear of "less mainstream" things, even
though they're not fundamentally more difficult than the mainstream things.
I've been working with Lisp professionally for a while now, and I've seen
plenty of really complex and hairy DSLs and had to debug some when they broke.
Fixing a broken macro or its use is not fundamentally different from debugging
Java code of similar complexity, but when the DSL works, it can vastly improve
readability of the code that uses it.

~~~
dfabulich
No, @kerkeslager has it exactly right.

1\. Fixing broken generated code is fundamentally harder than debugging non-
generated code, because the code generator doesn't appear in the call stack,
and because a line of generated code doesn't necessarily even have any
specific line number in the input program.

2\. "Tooling" includes type systems that can detect and prevent bugs at
compile time, and IDEs that can highlight the bug and often automatically fix
the bug with a shortcut action.

You can write your own type system in Lisp, but testing a type system is
uniquely harder than testing ordinary functions. Type systems prove facts
about a program, and type-system bugs have to do with cases where the language
can't prove the safety of something that's safe, or where the system wrongly
proves that code is incorrect when it's actually correct, or vice versa.

Finding important truths that a proof system can't prove (or finding
falsehoods that a proof system can generate) is much harder than ensuring that
a well-scoped function generates valid outputs for its inputs.

3\. Documenting languages (especially type systems, especially type system
_errors_ ) is also much harder than documenting function parameters.

In a non-Lisp language, developer-users don't have to understand how the
language (and especially the type system) is implemented, because the language
devs keep the language small and well tested. (Yes, the Java compiler does
have bugs sometimes, but odds are, any given bug is in your code, not the
compiler.)

Keeping the language small allows a community to form around documenting. If
you write your own DSL, there's no language community to support you.

And, sure, having a big community of people supporting each other and
answering questions always a huge help, even just for libraries, so there is
some pressure to stick with mainstream languages. But languages have come and
go from the mainstream while Lisp has remained sidelined.

Lisp makes it easy to generate code, and that's the "Lisp curse," because
writing your own mini language is usually a bad idea.

~~~
lispm
CLOS started as a portable library for Common Lisp. It was implemented as an
embedded language extension and provides on the user level a set of macros:
DEFCLASS, DEFGENERIC, DEFMETHOD, etc...

There is no reason such an embedded DSL can't have documentation, find
widespread use etc. In the Lisp world, there are a bunch of examples like
these - where DSLs are written for larger groups of uses and not just the
original implementors.

One has to understand though, that not all attempts have the same quality and
adoption. But that's fine in the Lisp world: one can do language experiments -
if tools are designed for larger groups, one can just put more effort into
them to improve the quality for non-implementors.

~~~
kerkeslager
> CLOS started as a portable library for Common Lisp. It was implemented as an
> embedded language extension and provides on the user level a set of macros:
> DEFCLASS, DEFGENERIC, DEFMETHOD, etc...

Right, that shows that if a highly skilled group of people with community
support put in a lot of effort, they can do something that is extremely
difficult to do.

The Scheme community has not had such success. All three of the Scheme object
systems I've experimented with suffer from all four of the problems I
mentioned. And at least one of them was recent enough that they had the
advantage of having CLOS to guide the way on how to do it right.

I'm not saying that DSLs are impossible to create successfully. I'm saying
that many Lispers _drastically_ underestimate the difficulty of doing so in
proportion to the value.

Keep in mind that you have people in this thread claiming that "DSLs are
trivial to implement in Lisp." And I'm saying, DSLs are trivial to implement
_counterproductively_ in Lisp, but implementing a high quality DSL is still
very hard.

~~~
lispm
> Right, that shows that if a highly skilled group of people with community
> support put in a lot of effort, they can do something that is extremely
> difficult to do.

I don't think it's 'extremely' difficult to do. Object Systems in Lisp were
for some time experimented in dozens in various qualities.

CLOS OTOH is on the 'extremely difficult' side, since it has a lot of features
and even may provide its own meta-object protocol. But even then it is
possible to leverage a lot of Lisp features (like a well-documented code
generation/transformation system) and thus reduce some implementation
complexity.

When CLOS was designed, several well-documented OOP extensions already existed
(and were used): Flavors, New Flavors, LOOPS, Common LOOPS, Object Lisp,
CommonObjects, a bunch of frame languages like FRL, KEE, ...

> DSLs are trivial to implement in Lisp

Some are, some are not. There is a wide range of approaches. This starts
relatively simple for some 'embedded DSLs' and gets more difficult for non-
embedded DSLs.

There is a decades long practice developing languages in Lisp (since the
1960s), especially embedded DSLs and thus there should be a wide range of
them, including a lot of well documented.

Many Lispers know that this CAN be a lot of work, given that many complex DSLs
have been developed, many which have seen more than one person year (or even
dozens) to develop and some which have been maintained over a decade or more.
In many projects one has also seen the limits of Lisp (speed, debugging,
delivery, etc.) for this.

------
chrisaycock
Years ago I was at a trading firm where I needed to write a test suite for our
_order entry_ system (the software that sends orders to a stock exchange).
Tests could become quite large because I had to track the sequence of
acknowledgments and executions from the test exchange.

The tests would have been a real pain to write in C++, so I created my own
language! It was inspired by both Expect and Cucumber.

[https://en.wikipedia.org/wiki/Expect](https://en.wikipedia.org/wiki/Expect)

[https://cucumber.io/docs](https://cucumber.io/docs)

That allowed me to write a simple script of expected behavior, like:

    
    
        test "Cross own order"
        -> new buy IBM 100 $141.01 1
          <- ack 1
        -> new sell IBM 200 $141.00 2
          <- ack 2
          <- trade 1 100 $141.01
          <- trade 2 100 $141.01
    

My interpreter would run the above script against the exchange's UAT
environment.

The tests could get pretty sophisticated. I could investigate swanky exchange-
specific order types by verifying queue position, liquidity flags, etc. By
writing these test scripts, I could verify (1) that my order entry code was
correct and (2) that I actually understood what the exchanges were doing with
my orders.

These test scripts had another added benefit in that traders now had
executable documentation. So whenever a hypothetical scenario came up, I could
write a script to verify that our assumptions were correct and then email the
script's code as a precise set of steps that the exchange was taking.

I ended up writing the whole language in Boost.Spirit, which admittedly was a
beast. But I still believe that creating a testing language was the right
thing to do.

~~~
random314
I am curious. Was there any particular feature that your language had, which
was lacking in C++, that helped in simplifying the test script?

~~~
ratww
Also not OP but: Having a limited language that looks more like data than code
provides great opportunities to analyze it in ways that you can't analyze
Turing-complete languages (halting problem and all).

An example: A friend works for a game translation studio and one of their
games has a tool that read the game scripts and is able to detect
missing/unused external resources, overlapping dialogue, unmatched mouth
animation/voiceovers, etc... without even running the game.

------
kccqzy
> Implementing small (domain specific) languages is fun.

This, from the conclusion, is something I totally agree. However fun isn't all
there is to production systems. Writing so many little languages just make the
program overall more complicated with increased cognitive burdens. Writing new
languages must generally be carefully weighed against the alternative of
picking one single language with good abstraction power for almost everything.

~~~
TeMPOraL
C++ or JavaScript is a "language", but so is the API to that library you're
currently using in your project. Different ends of the language spectrum. The
article is about Clojure, a Lisp, and Lisps are good at giving you access to
the _entire_ spectrum. When you hear a Lisper talking about "writing a
language", half of the time it's just couple dozen lines of code defining a
macro that makes some code much more readable and expressive. Then there are
times when "a language" is an extensible pattern-matching engine or otherwise
half of Prolog, but those cases are usually handled by someone who has a real
need for it and later publishes it as a library - and you're only happy you
can now use half of Prolog without _actually_ adding a Prolog to your
codebase, with all the devops headaches this would create.

~~~
kccqzy
That's what I meant by "language with good abstraction power." Lisps with
macros, or Haskell with its incredibly flexible do-notation are examples of
languages that allow you to reuse the host language parser and compiler. It
takes a lot of fun out of actually developing a language, but nevertheless
more practical.

------
WalterBright
I had a problem with Telecom C back in 1982, so I decided to write a C
compiler. Still working on it.

------
uka
I had a problem I wrote a compiler Now I have 2 problems at line 33 and 67.

------
mehrdadn
I'm reminded of: [https://steve-yegge.blogspot.com/2007/06/rich-programmer-
foo...](https://steve-yegge.blogspot.com/2007/06/rich-programmer-food.html)

~~~
jimbokun
Made me exhale out my nose:

"Whenever I gave even a moment's thought to whether I needed to learn
compilers, I'd think: I would need to know how compilers work in one of two
scenarios. The first scenario is that I go work at Microsoft and somehow wind
up in the Visual C++ group. Then I'd need to know how compilers work. The
second scenario is that the urge suddenly comes upon me to grow a long beard
and stop showering and make a pilgrimage to MIT where I beg Richard Stallman
to let me live in a cot in some hallway and work on GCC with him like some
sort of Jesuit vagabond."

"Both scenarios seemed pretty unlikely to me at the time, although if push
came to shove, a cot and beard didn't seem all that bad compared to working at
Microsoft."

------
endymi0n
"Imagine you are writing a cool new rogue-like game. So cool many have no idea
what's going on."

...and then you nerd snipe yourself by writing a compiler in order to ensure
the fair randomness none of your end users perceived to be off in the first
place.

The game still isn‘t fun.

~~~
klyrs
> The game still isn‘t fun.

Joke is on you; writing it was the fun part. Roguelikes are meant to be
_hard_. Fun really isn't the point

------
rob74
Want to hammer a nail into a wall? Build a robot to do it! If you have lots of
spare time, of course...

~~~
taneq
I've actually come to the conclusion over the years that automation is a form
of optimisation, and that all of the rules of optimisation apply in full
force.

Do you need to process a list of 10,000 things, once a day, forever? Then you
do need to automate.

Do you need to process a list of 100 things, once? Just roll up your sleeves
and start typing (or get someone else to do it). The chances that you could
come up with something remotely reusable in less than the two minutes it'd
take to do manually are slim to none, yet I have often seen people spend a
couple of hours coding up a script to do a 15 minute non-recurring job.

~~~
ahmedalsudani
Macros, regex, awk are the tools to quickly automate text processing. It
literally takes less than two minutes in most cases once you’re proficient.

~~~
fastball
Honestly the amount of time I've saved just by throwing stuff into a text
editor with multi-cursor support is kinda unbelievable.

~~~
sinstein
Step 1 - Let me try using command line magic to transform this data Step 2 -
?? Step 3 - Create a scratch file in IDEA and some cursor + regex magic and I
am done!

~~~
rmetzler
And those regexes are the first part of an awk script or something similar
which will save time when you want to do this repeatedly.

And when your awk script grows into something that doesn’t produce the final
output, but has to be run in another interpreter first in order to produce the
final result, then you created your custom language.

------
galfarragem
While reading this blog post I feel that my programming knowledge is an order
of magnitude below. It's a bit depressing.

------
codr7
Writing compilers for languages that look like Lisp in Lisp is so effortless
that most don't bother to go further. I feel that's often a missed
opportunity. Hook in a parser [0] and the sky is the limit.

[0] [https://github.com/codr7/lila](https://github.com/codr7/lila)

------
WesternStar
I feel like this and the motivation for golang are vaguely similar. Hear me
out. The intention of golang in its language is to make complex things hard to
do and make it easy to understand the language in any piece of code because
there isn't much syntax. But sometimes you want to code more expressively and
having some nice syntax would be nice. Its not a big deal if its local and
limited to what this exact problem is. That's tough to get from a general
purpose language.

~~~
jimbokun
I thought the intention of golang was to avoid C++'s super slow compile times.

------
asdf333
Do you have a problem? write a compiler....and now you have two problems!

------
Dude2029
Fun to implement, sucky to maintain! Like event-sourcing.

~~~
pyrale
Why would event sourcing be sucky to maintain?

My last job was to build a system to replace something that used to require
lots of maintenance. Event sourcing made most of this maintenance trivial, and
the remaining cases were made easier to explain and handle.

Event sourcing things doesn't mean you need to use a complex frp framework, a
simple pg table and a state machine in your language of choice is usually
enough.

~~~
Dude2029
In this case the maintainer is the builder, that is always comfortable.

~~~
pyrale
I have recently moved to another team, so we shall see.

However, we also administered two crud dbs, one we inherited and another
recent one. The three dbs were doing pretty much the same thing. Understanding
and maintaining what happened in the crud db we built was significantly harder
than understanding and maintaining the event-sourced one.

------
janczer
Here the link to video:
[https://youtu.be/kOXfdZRD0wM](https://youtu.be/kOXfdZRD0wM)

------
rosybox
> JavaScript numbers are a mess. Next a bit of arithmetics.

They aren't a mess. Every programming platform has limits on number sizes. In
JavaScript numbers are floating point values, but you can get a guarantee of
integer precision within the scope of Number.MAX_SAFE_INTEGER and
Number.isSafeInteger. Is that a mess? I'm not one to defend JavaScript, it has
lots of problems, but in this case the behavior works as defined in the
specification and is predictable.

~~~
axaxs
Do you not find the fact the JavaScript has no concept of integers and treats
everything as floating point, as a valid point to it being a mess with
numbers?

Sure, you can make it work. But I think it's a fair criticism when compared to
just about any other language.

~~~
jgtrosh
Conversely, do you find that this criticism applies to Lua?

~~~
axaxs
Absolutely. I don't particularly like either language much as a self admitted
language snob, but I have to realize their successes and try to remain rather
objective in my statements. Subjectively, I think that python does numbers
best for a scripting language.

------
mbrodersen
The article really should have been called: "Do you have a problem? Write a
DSL". I use DSL's (Domain Specific Languages) or "Little Langues" all the time
to be much more productive in C++/Java/C#/...

------
imihai1988
"When inlining is a valid rewrite?" "When constant folding is a valid
rewrite?"

Are these valid questions ? structurally speaking ... they seem somehow off, a
lot off.

I'm asking because I'm way more bothered by them than I should be.

------
otabdeveloper4
> Write a compiler

Then you have two problems?

~~~
gpvos
Yes, that quote is also the first thing I thought of.

------
smacktoward
Now you have two problems!

:-D

~~~
quickthrower2
Only if you use regex in the lexer.

~~~
guramarx11
I guess now you have 3 problems

------
Beltiras
Now you have 2 problems.

------
dehrmann
Alternatively, use blockchain.

~~~
nullc
> Alternatively, use blockchain.

Hows that work?

Step 1. Raise 4 billion dollars in an unlawful ICO.

Step 2. Pay 24 million (0.6%) in a settlement with the SEC that admits your
actions were unlawful but absolves you of any further issues.

Step 3. Use the remaining 3.998 billion to pay someone to write a compiler for
you instead of writing it yourself?

------
VBprogrammer
I was half expecting the punch line to be "now you have 2 problems!".

