
Building an extensible syntax - gozzoo
http://breuleux.net/blog/oexprs.html
======
breuleux
Didn't expect to see this here (I am the author). For the record, I
implemented this syntax as a Racket #lang called LISO[1], and I also used a
variant of that syntax for a language that compiles to JavaScript, Earl
Grey[2]. It's fun to play with.

[1] [http://breuleux.net/blog/liso.html](http://breuleux.net/blog/liso.html)

[2] [http://breuleux.github.io/earl-
grey/repl/](http://breuleux.github.io/earl-grey/repl/)

Edit: I forgot to mention I also based a markup language on operator
precedence parsing: Quaint
([http://breuleux.net/blog/quaint.html](http://breuleux.net/blog/quaint.html)).
It benefits from similar ease of extension.

------
TheLoneWolfling
Here's an interesting way to do operator priority:

Specify priority relative to other operators, including associativity, and let
the language figure it out. If something is ambiguous or inconsistant, emit an
error.

The nice thing about this is that it removes the entire concept of "first-
class" operators. If you want to inter-operate with something, specify the
relative priorities as needed and you're done.

The down side is that the worst case time to detect cycles is O(n^2) with the
number of operators, at least if you include the relative priority of '!='.
Although, that being said, I'm not sure if it's a good idea to include '!=' as
a relative priority.

It also means that your parsing becomes... less than amusing. In particular,
it suddenly becomes rather context-dependent. This breaks one of the "laws"
described here. Still something interesting to muse about, however.

~~~
breuleux
That's more or less what I meant by "priority graph" in the article: priority
is defined using a directed acyclic graph. In principle, yes, you could have a
way to add new nodes in the graph; you don't really expect n to be large in
this case.

Parsing wouldn't necessarily be very complicated. An operator precedence
grammar[1] can be fully implemented in about 50 lines of code. If you run a
simple sequential algorithm, completed expressions could be executed
immediately and could modify the precedence matrix dynamically. You do lose,
however, the option of parallel parsing.

There's an idea I am toying with, though, which is to split source files in
"zones". A separator, say, "=====", would act as a hard delimiter for parts of
the file. The first part could define the syntax of the second part, etc.

[1] [http://en.wikipedia.org/wiki/Operator-
precedence_grammar](http://en.wikipedia.org/wiki/Operator-precedence_grammar)

~~~
TheLoneWolfling
Again, you introduce context-dependance - which is something that you
explicitly say you shouldn't have. What AST a <foo> b <bar> c is turned into
depends on what priorities <foo> and <bar> are defined as.

This starts to get nightmareish very quickly, at least if you're trying to
leave open the possibility of compilation as opposed to interpretation.

Zones are an interesting possibility to solve this, although it doesn't have
generality - there's no way to express "zone C depends on zone B and zone A,
but zone B doesn't depend on zone A, nor vice versa", for instance.

~~~
breuleux
Well, the idea of zones is that the second zone depends on the first zone, the
third depends on the second, and so on. It is meant to allow definition and
use of a language within the same file, so there is little point in having a
different dependency structure, I think.

------
stormbrew
I think things get a little lost here when you start treating infix operators
as special. When talking about creating uniform syntaxes, I think people don't
look hard enough at the basic structure of SmallTalk, where operators work
mostly like you'd expect, are infix, and are uniform with all other kinds of
calls because the convention is always (subject verb object) instead of the
lispy (verb object subject). You don't get operator precedence, but I've
largely come around to the idea that maybe we should be questioning if it's
something we actually need.

Of course that leaves you with the problem of when a verb lacks a subject.

~~~
wtetzner
Honestly, most of the time I think we would be better off without operator
precedence. It just adds another thing you need to keep in your head when
reading code.

Unfortunately, nearly everyone -- even non-programmers -- learned the operator
precedence for arithmetic operators, which means they might be confused if
those operators in a language didn't use it.

~~~
vorg
A limited hierarchy like Go's could be OK:

    
    
        parens (...)
        paths  . [] and args (x,y)
        unary  ! -
        binary * / &
        binary + - |
        binary == != <= >= < >
        binary &&
        binary ||
        comma  ,
        assign = += -= *= /= &= |= :=

------
agumonkey
I think it's even wrong to think about syntax at all when thinking about Lisp.
It's a symbolic computational system. It's about anything. Other languages are
DSLs with aesthetics. Whereas lisp is about semantics. Don't even think about
syntax, think about meaning:

(<idea> <data>...)

That's it. Now you can focus on ideas instead of looks. (but yeah the added
bonus of self-similar representation is a pretty large one too, both in
expressiveness and in concepts reduction)

------
azeirah
There's a library for lisp called readable lisp. It has three features
([http://readable.sourceforge.net/](http://readable.sourceforge.net/))

1: Indentation instead of parentheses

    
    
      (defn [a b]
        (+ a b)
      )
    

Becomes

    
    
      defn add [a b]
        + a b
    

2: Function names before the parens instead of after

    
    
      (add 3 5)
    

Becomes

    
    
      add(3 5)
    

3: Infix operations

    
    
      (+ 3 5 6)
    

Becomes

    
    
      {3 + 5 + 6}
    

(Note the curly braces instead of parens)

These combined make Lisp code a whole lot more readable:

    
    
      (define (factorial n)
        (if (<= n 1)
          1
          (* n (factorial (- n 1)))))
    

Becomes

    
    
      define factorial(n)
        if {n <= 1}
          1
          {n * factorial{n - 1}}
    

To me, this made common-lisp python-like readable. Now to get it working in
Clojure

~~~
akkartik
Except it doesn't stop at 3. Then you have (copying from
[http://sourceforge.net/p/readable/wiki/Solution](http://sourceforge.net/p/readable/wiki/Solution)):

 _4\. A \\\ (aka SPLIT) starts a new line at the current indentation. If it 's
immediately after indentation (aka GROUP in that case), it represents no
symbol at all (at that indentation) - this is useful for lists of lists.

5\. A $ (aka SUBLIST) in the middle of list restarts list processing; the
right-hand-side (including its sub-blocks) is the last parameter of the left-
hand side (of just that line). If there's no left-hand-side, the right-hand-
side is put in a list. This useful abbreviation from Haskell simplifies many
common cases.

6\. A leading traditional abbreviation (quote, comma, backquote, or comma-at)
or datum comment "#;", at the beginning of a sweet-expression line, and
followed by space or tab or end-of-line, is that operator applied to the
following sweet-expression. Otherwise, it applies to the next neoteric-
expression.

7\. The markers “<∗” and “∗>” surround a collecting list, and MUST accept a
list of 0 or more un-indented sweet-expressions. The “<∗” and “∗>” represent
opening and closing parentheses, but restart indentation processing at the
beginning of the line. A collecting list is only terminated by its matching
“∗>” and not by a blank line. These are valuable for defining libraries and
for short let-style constructs.

8\. The marker “$$$” is reserved for future use._

Got all that? I sure didn't. I spent some time on the mailing list a couple of
years ago and grew frustrated with this gradual creep of operators, which led
to absurd code like this (from
[http://srfi.schemers.org/srfi-110/srfi-110.html](http://srfi.schemers.org/srfi-110/srfi-110.html)):

    
    
      let
        \\
          c $ cos a
          s $ sin a
        body...
    

I mean, would it hurt to add a couple of parens here and there?! Even _Java_
has a few of them.

(My competing proposal, which wasn't hampered by all the bending-over-
backwards to preserve backwards compatibility and get merged upstream that did
them in: [http://akkartik.name/post/wart](http://akkartik.name/post/wart). The
final straw that pushed me to find my own syntax was this example from SICP:
[http://arclanguage.org/item?id=16699](http://arclanguage.org/item?id=16699))

------
jessaustin
I'm going to have to reread this to really understand it, but it prompted me
to imagine the possibility of a simpler antidote to parentheses than described
here. If the tokenizer generated INDENT and DEDENT tokens as in python, then
the combination of those tokens and the spaces and parens that lisp already
has could be translated to other paren patterns and then interpreted as in
normal lisp. (Of course strictly speaking even parens are already translated
into hierarchical cons-pairs, but that just proves that this sort of
translation is not foreign to lisp.) Hmmm, something to think about.

~~~
akkartik
I did this last year:
[http://akkartik.name/post/wart](http://akkartik.name/post/wart)

------
btown
Perhaps I'm missing something: In Iteration 1, it says juxtaposition has
higher priority than any operator. In that case, how is the lambda declaration
(x y -> (...)) _not_ interpreted as a function call to the function x with the
argument y?

Or is the author simply saying that this will unambiguously create an AST of
((x y) -> (...)) regardless of what -> does?

------
kazagistar
Post it to Lambda the Ultimate if you want the serious language designers to
take a look at the ideas and make suggestions.

~~~
JadeNB
That's such a good idea that he did it 9 months ago: [http://lambda-the-
ultimate.org/node/4879](http://lambda-the-ultimate.org/node/4879).

------
sklogic
Why parser extensions are opposed to AST transforms? They work together well,
see an example implementation here:
[https://github.com/combinatorylogic/mbase](https://github.com/combinatorylogic/mbase)

------
jhallenworld
I've had very similar ideas in my language Ivy:
[https://github.com/jhallen/joes-
sandbox/tree/master/lang/ivy](https://github.com/jhallen/joes-
sandbox/tree/master/lang/ivy)

