
So You Want to Write Your Own Language (2014) - rspivak
http://www.drdobbs.com/architecture-and-design/so-you-want-to-write-your-own-language/240165488
======
tikhonj
I've always found it unfortunate how he, and others, implicitly define the
_success_ of a programming language as its _popularity_. A programming
language can be incredibly interesting and useful to you and others without
ever becoming fashionable—and that doesn't mean it has failed!

I think there should be other metrics for success: a programming language can
be useful _to a dedicated community of experts_ , it can be useful _to a
single company_ , it can be great _for a single use-case_ or even just great
_for you_ , it can push _novel ideas_ or showcase a _novel design philosophy_
, all without massive popularity or market success. (And it's not like most
languages directly make lots of money anyhow.)

Popularity is a poor proxy for quality, and quality matters all on its own.

Keeping this in mind changes a lot of his suggestions, I think.

Completely unrelated note: what the article calls "lowering" I've more
commonly seen called "desugaring" and is, indeed, a powerful technique both in
terms of _implementation_ but also in terms of _understanding_ a language—if a
piece of syntax is pure sugar, you don't have to understand anything new, just
how the syntax it desugars to works.

~~~
saurik
FWIW, I have more often seen the term "lowering" used by people who do
research in this field and "desugaring" mostly among industry developers.

~~~
tikhonj
Most of the time I see desugaring _is_ in a research context, at least within
areas of PL adjacent to functional programming.

------
qwertyuiop924
I actually think that easy parsing is something you should try for. the easier
your language is to parse, the less edge cases there are, and simpler it is
for programmers to understand. In short, I'd rather have Scheme than Ruby. But
I seem to be in the minority...

~~~
brianberns
Easy (for a computer) to parse does not necessarily mean simpler for
programmers to understand. I think Lisp's deeply nested parens are a good
example of this problem.

~~~
pvitz
J is also quite easy to parse, but could be difficult to understand. (See e.g.
[http://sblom.github.io/openj-
core/iojSent.htm#Parsing](http://sblom.github.io/openj-
core/iojSent.htm#Parsing))

~~~
stepvhen
J is one of my favorites. The syntax is cryptic at first, but the lack of
ambiguity makes programs easier to understand and easier to design in the long
run.

~~~
Avshalom
I dunno, I think proper anonymous functions would have been a lot better than
hooks and forks.

Though I'm not sure if that falls under syntax or semantics for purposes of
this discussion.

------
nickpsecurity
It's a decent list but not enough. The real trick is being general purpose
enough to handle low-level up to programming in the large while being
comprehensible and easy for compiler to work with. In my field, highly-assured
INFOSEC, you also want correctness and traceability for security purposes.

I wrote an essay on a layered, bottom-up approach to constructing such a
language here if anyone is interested:

[https://www.schneier.com/blog/archives/2014/05/friday_squid_...](https://www.schneier.com/blog/archives/2014/05/friday_squid_bl_423.html#c5828584)

I'm also interested in feedback on my approach. It tries to solve probably a
dozen problems at once with some basic principles and three, compatible
languages.

~~~
restalis
I'm not sure I understand the value of having tools to "downgrade" something
to a lower level. I compare this "downgrading" with "compiling to C" \- a
feature that enables some languages to become useful on more than one platform
just by riding on the back of the C language. Was it about this idea of having
different layers so that one may care to really port only the lowest level
when it comes to it? But then the scope you get on the lower levels is simply
inferior compared to the one you get on higher level/layer in terms of
optimization possibilities, partly because you are now forced to get from
point A (source code) to point B (binary code) not directly but through a 3rd
point - C (a lower level source code), which may hurt performance and other
things. Did I misunderstood something?

~~~
nickpsecurity
It's an exploration so I was throwing "downgrading" out there. You almost
guessed it all with portability and bringing on new programmers. Both
arguments for that technique. Another, esp if compiler is written in an ML, is
that code used to analyze, optimize, or translate lower layers of language can
be re-used in compiler for high-level.

Matter of fact, I think my default was to go to ASM with plugins for other
stuff. That should prevent missed opportunities in optimization.

------
toolslive
Minimizing key strokes is seen as 'false', but it has actually been proven
that one can write (debug, maintain,...) about 10 lines of code per hour.
Those 10 lines can be 10 lines of assembly, 10 lines of c++ or 10 lines of
haskell. Of course, you can do much more in 10 lines of haskell than you can
do in 10 lines of assembly. It basically means you want to be work on the
highest level of abstraction that you can afford. It also means that economy
of expression is important. source:
[https://vimeo.com/9270320](https://vimeo.com/9270320)

~~~
josteink
> Those 10 lines can be 10 lines of assembly, 10 lines of c++ or 10 lines of
> haskell.

I seriously doubt the validity of this. 10 lines of assembly is usually
reasonably clear-cut what is about.

10 lines of Haskell can take hours of mindfuck trying to peer through the
functor and monad operators, trying to work out which operation does what and
which operator takes precedence where. And then you need to start worrying
about which data-type are you _actually_ working on. And how does that type
implement lift and bind. Etc etc.

Basically 10 seems like a BS number taken out of thin air, because it looks
good in a base 0x0a number-system.

~~~
codygman
> 10 lines of Haskell can take hours of mindfuck trying to peer through the
> functor and monad operators,

Please stop. Can you even come up with one example that supports this?
Alright, now how about 10 lines from a "real world"* Haskell project.

*real world meaning has at least 1 user besides it's developer

~~~
saurik
I think the more important counter-argument is "that ten lines of assembly
just barely did one thing; the ten lines of Haskell was most of the program:
clearly it will be easier for you to understand a single wire or switch than
to understand an entire computer, but that isn't a valuable insight".

~~~
codygman
Hyperbole (if that is what the commenter was using and didn't seriously think
it) should be used carefully when it reinforces stereotypes that are so
damaging.

------
yongjik
For my project, I found sticking to LALR(1) quite beneficial. Not that there's
anything special about LALR(1): you could use LL(k) instead if you prefer.
However, by sticking to a small, deterministic syntax (and making the parser
generator complain loudly in case of ambiguity), I can easily find edge cases
I've never thought about before it comes around to bite me. And, in most
cases, it's trivial to slightly modify the syntax so that the undesirable
ambiguity does not arise.

It's the equivalent of static typing for syntax: it takes effort to make bison
run successfully, but when it does, I know that my grammar is free of
ambiguity.

------
musesum
> Regex is just the wrong tool for lexing and parsing

Funny; I recently wrote an Island parser that extends Regex to create a token
stream: [https://github.com/musesum/Par](https://github.com/musesum/Par)

Quoting the link to Rob Pike's post:

> Consider finding alphanumeric identifiers. It's not too hard to write the
> regexp (something like "[a-ZA-Z_][a-ZA-Z_0-9]*"), but really not much harder
> to write as a simple loop

Maybe I'm missing something, but does this suggest replacing a declaration
with a procedure? Oh the horror!

------
krig
I disagree with several of these. But I guess that's why 1) I am working on my
own language, and 2) Walter Bright is a lot more successful at that than me.
So yeah. He's probably right, and I'm probably wrong. However: I'm doing this
entirely for my own pleasure, and I don't want success as he defines it
(popularity means having to cater to the whims of others).

Disagreements:

I think minimizing keystrokes _is_ important, and I think the notion that code
is primarily read, not written, is idealistic but completely wrong. The
purpose of programming languages is to be writable by humans and readable by
machines. There is value in making them readable by humans and writable by
machines, but only secondary value.

I think there is value in easy parsing, if nothing else because every single
editor out there can highlight your language even if it isn't popular. I think
the focus on popularity as the measure of success is wrong.

Tried and true is overrated. There are plenty of examples of programming
languages that break the tried and true rules and are successful either
because of this or despite this. Either way, it doesn't seem to matter as much
as people seem to think, but staring too hard at the success of Java might
make you think it does.

Anyway, feel free to disagree with me, go ahead and design your own language.
I sincerely hope you are successful, regardless of what your definition of
success is.

~~~
kazagistar
In practice, writing code consists mostly of reading; either to (re)understand
how some other part of a large system works, or to figure out why the code is
incorrect.

~~~
PeCaN
I would argue that it's easier to understand short, even excessively terse,
code than it is to understand long code. I can look at a few lines of Haskell
and even though deciphering it may take me some time, the cognitive overhead
of anything other than working out what the code is doing is very low (the
most I'd have to do is look up some operators). I can see a significant chunk
of logic all at once and piece together the system as a whole. Meanwhile I
could look at 3 files full of Java and have no idea what the system is
actually doing.

------
StephenFalken

      LinuxWorld.com: What is your advice to designers of new programming languages?
      
      Dennis Ritchie: At least for the people who send me mail about a new language 
      that they're designing, the general advice is: do it to learn about how to 
      write a compiler. Don't have any expectations that anyone will use it, unless 
      you hook up with some sort of organization in a position to push it hard. 
      It's a lottery, and some can buy a lot of the tickets. There are plenty of 
      beautiful languages (more beautiful than C) that didn't catch on. But someone 
      does win the lottery, and doing a language at least teaches you something.
      
      Oh, by the way, if your new language does begin to grow in usage, it can 
      become really hard to fix early mistakes.[0]
    

[0] [http://www.itworld.com/article/2826125/development/the-
futur...](http://www.itworld.com/article/2826125/development/the-future-
according-to-dennis-ritchie--a-2000-interview-.html?page=2)

------
talles
> My career has been all about designing programming languages and writing
> compilers for them.

Anyone knows which languages/compilers the author has worked on?

(I'm not trying to validate its arguments, it's just out of curiousity)

~~~
anewhnaccount
Digital Mars C, C++ and D compilers

~~~
stevoo
To add more context to that
[https://en.wikipedia.org/wiki/Walter_Bright](https://en.wikipedia.org/wiki/Walter_Bright)

Although i have never though of actually write a language i can agree on
those. Especially Minimizing keystrokes. I want the language to be verbose and
easy to understand. Dealing with thousands of line trying to figure out what
is going on there it is essential to easily pick up what is going on !

~~~
dozzie
> I want the language to be verbose and easy to understand.

So, you basically want to write in assembler, which printed is very verbose,
and it's very easy to understand each separate instruction.

What I want is the language to minimize language entities used to do a task
(from parser's perspective those would be AST nodes), and I want those tasks
to be general and high-level. _This_ is what makes source code comprehensible,
not verbosity.

~~~
Jtsummers
Each instruction may be easy to understand, but the programs are not easy to
understand at scale. So that's probably not what stevoo wants.

~~~
dozzie
Of course it's not (or rather, I would be surprised if it was), but he phrased
it in very wrong way.

------
pjc50
My own opinion about PL research is that programming is _concept
serialisation_ : the programmer has a model of the system (or part thereof) in
their head, and the role of the language is to turn that into a byte sequence
that is amenable to both being turned into machine instructions and being de-
serialised into the heads of other programmers.

But different programmers have different abstractions in their heads, and find
different things intuitive. That's why things like ColorForth and APL exist;
to a tiny fraction of people and problem sets they're intuitive, to the rest
obtuse. I suspect this is why people keep thinking Lisp is going to take over
the world despite 30+ years of this not happening.

------
david927
_One thing abundantly clear is that syntax matters._

I feel this is like reading 100 years ago about how to make your own new type
of horse & buggy, with tips like "make sure the reigns are well-fitted," when
what you want to invent is a car.

The future doesn't belong to syntax because the future isn't text. The same
people that would blanch at sending a string argument of "ONE" or "TWO" are
indifferent about the fact that they send in "IF" to start a condition.

The reason you wouldn't send in "ONE" or "TWO" is because it makes it stiffer,
more brittle and harder to manage. What can we say about the software we
write? That it's stiff, brittle and hard to manage.

It's not that the future has no textual syntax at all, of course. An
expression of 3+4*5 will always be easiest to express just that way. But we
need to pull back from our gregarious, wide-mouthed babbling into something
more meta, or I should say, more data, than text.

~~~
commentzorro
This is the ??? (I don't know what to describe it as) post here. You took the
one sentence I just loved from the article and pooped on it. Then tried to tie
it in with details of a non syntax related item and process. Next, you said a
two character syntax for describing something as complex as an if statement
was somehow too much burden?!

Not sure what the future of programming languages is. But if humans are
involved I suspect there will be a visual representation. And that means
syntax. And that syntax will matter then just as it does now.

~~~
david927
I didn't mean to hurt your feelings. This is my opinion; I didn't expect it to
receive a warm reception.

If you went to a horse and buggy convention 150 years ago and told them that
it would all be gone soon, they would have virulent reaction, right? They
would say that metal contraptions have been tried before and failed, that
humans have been using horses for travel for thousands of years and if so if
humans are involved, it will mean horses. This isn't to prove my point but to
draw caution to yours.

Note that I understand your position quite well but that you clearly don't
remotely understand anything that I've written. That should raise a flag.
Instead of getting mad, you should be getting curious.

~~~
knz42
I am a PL researcher and I too believe you may not fully understand what's
going on.

To keep your analogy, the idea that a PL needs syntax to be used is comparable
to the idea that a vehicle needs acceleration to get moving. Different flavors
of syntax correspond to different means of propulsion.

Perhaps we will find good way to get rid of context free grammars (the horses
in your example) and replace them with automatic pruning of parse forests (the
current Tesla car of compilers) or even something more fancy like semantic
editors (a fusion engine in your example) but unless you change the laws of
physics, you can't get a vehicle without an accelerating engine nor a PL
without an input syntax.

------
andrewchambers
The level of dedication is required is really high. I stumbled upon
[https://github.com/oridb/mc](https://github.com/oridb/mc) and found my new
favourite language written entirely by one person over the course of many
years.

------
auggierose
He got the thing about using context-free grammars right. The final comment
about not using generators is currently right, but will hopefully be wrong by
the end of this year. :-)

------
scriptproof
Do not know why this old page is revived again, but a better title would be
"So you want to write you C or C++ like own language?".

Among other points is the trailing semicolon. If this is so good, then the
authors of Go and Swift must be wrong.

~~~
Randgalt
I don't think you can say one is objectively right or wrong. However, as
someone who's programmed for nearly 40 years, the entire issue seems like a
fetish. Would anyone choose a language solely based on whether separators were
required or not? Is your productivity dramatically increased by the choice of
required separators? Of course not.

In functional languages there are some reasonable needs that make not
requiring a separator useful. But, it does make the code harder to parse and
harder to read. Go, for instance, is not a functional language. Making
separators optional (sometimes) seems capricious. But, then, Go is rife with
capricious choices.

~~~
jerf
Go uses semi-colon insertion like Javascript does, albeit with the benefit of
Javascript's experience to fix up the edges. Calling "adopting some of the
syntax of arguably the most popular language" capricious is probably
stretching a bit.

~~~
imtringued
I don't see why anyone would even think that a band-aid solution like
"automatic semi-colon insertion" is a good idea. If you want optional semi-
colons you have to change your language's syntax.

