
Syntax is the last thing you should design - pcmonk
https://boxbase.org/entries/2017/feb/6/syntax-advice/
======
aaron-lebo
The author is right that syntax isn't the be all end all, but syntax often
drives semantics and vice versa. More than one language has crashed on the
rocks of unforgivable syntax, and even successful languages even have small
syntax annoyances that will never go away.

My growing belief is that if you can't express your language using a Pratt
parser you need to rethink what you are doing. Once you grok them and get your
first one up they're more extensible than anything else - far simpler than a
parser generator and very easy to write in any language.

[http://javascript.crockford.com/tdop/tdop.html](http://javascript.crockford.com/tdop/tdop.html)

[http://journal.stuffwithstuff.com/2011/03/19/pratt-
parsers-e...](http://journal.stuffwithstuff.com/2011/03/19/pratt-parsers-
expression-parsing-made-easy/)

[http://www.oilshell.org/blog/2016/11/01.html](http://www.oilshell.org/blog/2016/11/01.html)

[http://effbot.org/zone/tdop-index.htm](http://effbot.org/zone/tdop-index.htm)

Several of them link each other. Thorsten Bell's book where you write an
interpreter in Go uses it, too, to parse his own language Monkey.

~~~
zengid
What might be an example of a syntax that can't be expressed (parsed) with a
Pratt parser?

~~~
vidarh
The right question is probably what syntax can't be expressed _cleanly_ with a
Pratt parser. Operator precedence parsing in general can be extended to parse
anything relatively easily by introducing different kinds of operators with
special treatment, but if your grammar can't be cleanly expressed in a way
that naturally results in a sensible parse-tree by applying operator
precedence (e.g. asterisk should bind tighter than "+" so that "5 * 2 + 1" is
parsed (+ (* 5 2) 1) rather than (* 5 (+ 2 1))) you can end up with ugly
special casing.

An example that makes it more complicated (not impossible):

In Ruby "x" depends on context. It can be a method call or a local variable
reference. If "x" has been assigned to prior to the reference, it's a local
variable. Except if it has a parameter list. "x 1" is a method call. Except if
it's within a literal hash. "{ foo: x 1 }" is a syntax error - the parser
wants a "," after "x".

Even with Ruby I handle a substantial part of the syntax in my compiler with
an operator precedence parser where the operator precedence parser class
itself is about ~170 lines of Ruby (excluding the table of operator priorities
which adds another 145 lines). But on top of that I so far have ended up with
about the same again to "massage" the resulting parse tree into something
that's nicer to work with.

The challenge with these types of parsers, though, is generally more often
that they are harder for people to reason about. If I express the rules I
started with (for "+" vs asterisk) in a BNF-type syntax, I could do it like
this:

    
    
        expr     = plusexpr
        plusexpr = mulexpr ("+" plusexpr)*
        mulexpr  = simple ("*" mulexpr)*
        simple   = number|identifier
    

It's quite clear if you've seen some variant of BNF before that "1 + 2 + 3"
will go expr -> plusexpr -> mulexpr -> simple, return 1, then fail to find
asterisk , find "+", parse a second "plusexpr" find 2 + 3, then exit with (+ 1
(+ 2 3)). And that "1 * 2 + 3" meanwhile will go expr -> plusxpr -> mulexpr ->
simple, find "1", then find asterisk parse a second "mulexpr" which will find
"2", return up to plusexpr, find "+" and eventually "3", and result in (+ (* 1
2) 3).

But the grammar above, in terms of an operator precedence parser might be
expressed by a table like this:

    
    
        +, INFIX, 10
        *, INFIX, 20
    

The rest is "obscured" in the parser function. As long as it's this simple you
can probably guess what's going on (the last number is the priority, and the
values are arbitrary - only their relative size and whether high or low values
are treated as binding tigher or loser by the parser matters), but when you
e.g. add things like different types of brackets and parentheses, function
calls, operators with different priorities if used as a prefix vs. infix etc.
it can easily become hard to understand intuitively from the table of
operators how the priorities interact.

EDIT: Modified due to HN handling of asterisk...

~~~
zengid
I agree with you that operator precedence parsers are somewhat opaque in terms
of how they function (at least I can say Pratt parsers are, of which I am most
familiar). One has to juggle a lot of mechanics in their head. Also, they have
to be able to imagine the recursive nature of the expression parsing without a
nice BNF style definition that resides in one location as you mentioned.

------
odiroot
Funnily enough syntax is one of the most important factors for me. Call me
superficial but that was my main reason for choosing Python.

~~~
paulddraper
And unfortunately, it's also the biggest reason I hear for _not_ liking
Python.

I agree with the article. I work with ideas, incidentally written left-to-
right top-to-bottom as text.

------
tomatsu
That's only true if you don't simply go with C-like syntax.

For example, Dart prioritized familiarity. They wanted to make it easier for
people who already know JavaScript, C#, Java, or ActionScript.

JavaScript was also made to look somewhat like Java and they even put "Java"
right in the name to make it look more appealing to the masses.

Creating completely new syntax is a very risky move which will always hinder
adoption.

~~~
darfs
Sorry, i thought Java is in the category of "C-Like Languages"?

Maybe my teachers where wrong here. Possible.

~~~
tomatsu
Java also uses C-like syntax.

------
danielhooper
Going to join the "I strongly disagree" camp with Swift as my example. The
expressive syntax in combination with the strict type system quite often
challenges me to design succinct app architecture.

I personally prefer to omit type information from naming, so for example I
would declare a username text field of a view controller as so: `let username
= UITextField`. Other devs might declare the text field as a
`usernameTextField`, and somewhere else declare a variable called `username`
to represent the string from the text field, but now you have a view
controller concerned with both the textfield and the data from the textfield.
By naming the textfield simply as 'username', I force myself to not have a
'username' value anywhere else in this particular view controller, which
results in forcing myself to entirely separate these concerns. I can elaborate
on this if someone is interested in trying to get this working in practice.

------
yxhuvud
Thanks for the link to cheery/chartparser. It is the first complete and
readable implementation of the parser type with right recursion optimization
and parse forest generation that I've been able to find, and I have been
looking as I've been trying to implement this (and failed!).

------
kazinator
Irony: g++ has an (evidently) hand-written parser that's well over a megabyte
of code in one file.

~~~
Ace17
Parser generators really shine when you're experimenting with a new language,
whose syntax isn't stabilized. At this stage, you want to be able to change
your grammar nearly for free. You don't want to be trading enhancements of
your final grammar for parser developement time ; and you don't care a lot if
your parser is slow or your syntax error messages are imprecise.

I agree that the syntax of C++ is still evolving, but nobody would qualify g++
as test bed for experiments on C++ syntax. The benefits of using a parser
generator here are lower - but I certainly agree that having one megabyte of
code inside one file is undesirable!

See: [http://www.drdobbs.com/architecture-and-design/so-you-
want-t...](http://www.drdobbs.com/architecture-and-design/so-you-want-to-
write-your-own-language/240165488)

~~~
lifthrasiir
In addition to the inherent complexity, parsers do not split well in my
experience. Probably you can somehow split expressions and statements provided
that your language distinguishes them and they do not interact to each other
(but it is very frequent that they do interact, especially when you want
robust error recovery), but I cannot think of other easy split points.

------
vidarh
I disagree with this intensely.

For starters, syntax drives how I interact with a language as much as - maybe
more than - semantics. How expressions are laid out is intensely important to
me, as it affects how I remember and visualise the code. I can visualise the
layout of code I have not worked with in years when the syntax is clear, and
the code is well formatted.

I can work around painful semantics and find ways to pretend they don't exist
by avoiding features or picking patterns that work better; but painful syntax
usually stares me in the face ever moment I work with a language.

I have more than once rejected or picked languages based on syntax. E.g. I
can't look at a Python program without getting annoyed with the syntax, and I
avoid using the language whenever possible over it, and I work with Ruby
whenever I can for the same reason (though the language geek in me wants to
cry whenever I think about the Ruby grammar)

I also reject the idea of avoiding hand written parsers to start with. I
sympathise a bit with the idea. I can see quickly testing changes with a
parser generator. And certainly, if you hand write a parser, you need to avoid
the temptation of adding all kinds of awful exceptions.

E.g. I love Ruby as a user of the language, but the MRI parser is beyond
awful, and I think the syntax could have had most of the nice aspects and
avoided most of the awful syntactical warts with a bit more discipline
("favorite" wart at the moment: '% x ' parses to the literal string "x" \- "%"
when not preceeded by an operand that makes it the infix operator "%" starts a
quote-sequence where the following character indicates what the quote
character should be - with the exception of a few special character, most
characters will set the quote character to its identity. So in '% x ', the
quote character is space).

Though MRI uses a Bison parser, but contains thousands of lines of handwritten
exceptions, demonstrating both the bad parts of hand writing irregular
exceptions into parsers, as well as how easily you can mess things up even if
using a parser generator if you have one that isn't strict enough.

But to me, if your hand written parser becomes big and/or problematic to
maintain, you're designing a language that will be problematic to parse
cleanly, and it's probably worth revising your grammar (I wish this rule had
been adhered to for Ruby).

Nice, regular, clean grammars tend to lend themselves very well to small,
compact hand-written parsers. In practice I've never run into a situation
where a grammar change required major rewrites of a parser in any project I've
worked on for this reason, unless the rule deviated majorly from what I'd
consider good practice in language design in ways that would cause problems
for most parser generators too.

Modularising a hand written parser along the lines of the grammar rules is
easy, and few changes cut so deeply across grammar rules to make this
difficult.

But what a hand written parser tends to get you over a parser generator, is
better ability to do clean error reporting, and better ease of introspecting
how parser changes actually changes the processing in ways that are meaningful
to mortals. To me at least, this is a lot more difficult to do with ever
parser generator I've tried (and I keep hoping to be proven wrong; I've tried
writing my own too, to try to prove myself wrong, and so far I've failed to
come up with something I consider a usable replacement to handwritten parsers
- you certainly _can_ come up with something expressive enough, but it tends
to end up being verbose enough to lose most of the benefit over clean code in
the target language that saves you from having to deal with idiosyncracies of
the generator).

To me the "solutions" offered demonstrate exactly _why_ syntax matters to me:

I deeply admire Forth and Lisp and descendants on a technical level, but the
syntax has always been a massive barrier to me for both language classes. I
chose a s-expression inspired syntax to kick off my own compiler project by
basically treating it as a serialization format for the parse tree, and first
adding a parser on top later, but I did that first to be able to toy with
semantics of something I didn't intend to make into its own language, and then
to act as the "guts" of my in-progress Ruby compiler, not because I'd be
willing to work with it more than that.

If anything, I've found it incredibly painful to work with, and I'd never have
"held out" for very long without bolting a more human-friendly parser on top
very early on. The experience has made me more insistent - not less - on if
not starting with the syntax, then at the very least co-evolving semantics and
syntax from the outset.

~~~
inimino
I am beginning to think there must be something about Ruby that poisons the
mind in this way.

There is nothing (major) wrong with Forth, Lisp, or Python syntax, and if you
think there is, it's just a prejudice that is slowing you down. Stop being so
delicate and get over it.

~~~
abecedarius
I think you're being kind of mean. I love Forth, Lisp, and Python, and I'd
encourage people prejudiced against them to give them more of a chance, but I
can also understand how a language can just rub you the wrong way. (Personal
example: Go. It has a lot to like pragmatically, but it wants you to code in a
particular style. I'd rather code in mine.)

~~~
vidarh
The thing is I can see tremendous value in both Forth and Lisp in terms of the
semantics, but in some cases we just have to accept that people favour
different styles and cross-fertilise the ideas rather than try to make people
use the same languages.

One of my old pet peeves that I wish I had time to pursue was to experiment
with a language that tried to separate presentation from the semantic model -
I know there has been other attempts. I wish that got more attention. It's a
tremendously hard thing to get right without ending up hampering communication
more than you enable it, but the need for high fidelity interchange of
programs is one of the biggest barriers to more rapidly iterating on
improvements to the presentation of code.

~~~
inimino
You can't design a language with Haskell's semantics and Ruby's syntax. So if
you want to use Haskell, you just have to get over your dislike of the syntax.
It has a beauty of its own, but you'll never see it until you get familiar
with it.

~~~
vidarh
> You can't design a language with Haskell's semantics and Ruby's syntax.

Not identical of course, no, but same style would not be a problem. There's
nothing in Haskells grammar that can't easily be adapted to a less terse
style. It wouldn't solve all issus I have with Haskell by far, but it would go
a long way to make me consider it readable.

> but you'll never see it until you get familiar with it.

I am familiar with it. It is how I came to detest the syntax.

------
logicallee
It is possible to completely avoid syntax, period, by putting boxes of various
colors on a whiteboard and labeling or connecting them with pictographic
symbols representing what you want to do.

As soon as you do so you will realize that all modern programming languages
are as dumb as a sack of rocks, and the functionality you are coding is
trivial.

You can then design the syntax, which is what separates your language from
every other dumb as a sack of rocks language.

Quick, name a language where I can do something simple like mention "Whenever
this function is called, make sure there is enough free memory (at least ----
MB) before actually calling it; if there isn't, first swap out any objects
from memory to disk, starting with the ones that had been used the longest
time ago, and after the function has finished running, swap those back into
memory."

That's pretty straight-forward and well-specified. Name a language I can do
that in?

Or how about this: "analzye this library and include a logically simplified
version that only needs to address these cases:" (a list of conditions.). What
language will even try to simplify included source code?

Or take debugging. Name a language I can add this line to: "if variables A and
B ever both change as a result of the same function call, print the following
debugging message:" trivial. Name a language that can do it.

Languages are dumb. They do almost nothing. I can't _wait_ for the future, it
can't get here fast enough.

~~~
coldtea
> _Quick, name a language where I can do something simple like mention
> "Whenever this function is called, make sure there is enough free memory (at
> least ---- MB) before actually calling it; if there isn't, first swap out
> any objects from memory to disk, starting with the ones that had been used
> the longest time ago, and after the function has finished running, swap
> those back into memory."_

The first fallacy is thinking that this is "simple" merely because it is "well
specified".

The second fallacy is thinking that this should be the task of a language.

~~~
logicallee
Hey you're right, a language shouldn't let me just describe what my executable
should do - and then give me that executable. My mistake!

A language should just be a super tiny syntactic framework that elegantly lets
library writers rewrite the same thing they already wrote in C++, Python,
Perl, Go, Java etc libraries for - but this time in my language!

It shouldn't really "do" anything. Who needs a programming language with any
abilities! That's not what languages are for!

/s

~~~
Samis2001
Here's the problem with that idea: How do you convert your abstract high-level
description into a set of specific operations? First you need to parse your
specification, and parsing natural language is already hard. Secondly, you
have to infer everything else that was requested and will be needed but was
not originally specified.

Alright, now you have your list of requirements. Firstly, you must break each
requirement down into a long series of small steps that perform exactly one
operation - such as add 1 and 1 together. You must also find the correct order
for all of the steps, as well as the operations to store and manage all
required data in memory as meeded.

Finally, the list must either be translated to the native language of your
computer's CPU or ran by a program running on the CPU.

As you can see, this is actually a large and difficult task, even if you
manage to not encounter any unsolvable problems along the way. I would think
the following comes naturally: The difference between your language and the
'dumb-as-rocks' ones is that the latter are implementable and usable.

~~~
logicallee
>Here's the problem with that idea: How do you convert your abstract high-
level description into a set of specific operations? _First you need to parse
your specification_ , and parsing natural language is already hard. [my
emphasis]

If that were true it would be impossible to make a flow-chart based interface
to a "compiler" that produces an executable based on the flowchart (without
any keywords, variable names, etc). But plainly it is possible to make such a
flowchart->executable compiler+visual IDE, so you are incorrect regarding the
claim that the specification needs to be parsed lexically. Functionality does
not depend on lexical parsing. It's that simple.

