
Grammars for Programming Languages (2018) - bo0tzz
https://medium.com/@mikhail.barash.mikbar/grammars-for-programming-languages-fae3a72a22c6
======
Joker_vD
I manually wrote more recursive-descent parsers that I am comfortable to admit
(hey, they're fun to write), and most of the "beyond the context-free!"
approaches described in this article seem to me to fall into two categories:

1\. There is this one common pattern in recursive-descent parsers that allow
you to write a more concise recognizing code — let's make it directly
expressible! Ordering rules, "not an ending quote/not an end of a comment",
"ident that's not a keyword" fall into this category.

2\. Semantic actions work really well in practice, but it's gauche. Let's
instead encode some restricted, non-Turing-complete, but still useful PL
directly in grammar, that'll improve things somehow! Detecting "undeclared
identifier used" and "identifier redeclaration" falls into this category: this
"quite knotty" Boolean grammar has a dict-like structure threaded through it,
isn't it amazing? And this grammar-with-context has built-in support for
scoped environments, you don't need your symbol tables anymore (unless you
have user-defined types, presumably)!

Of course you can parse the input according using any of these more expressive
grammars in the same time and space as you can with a boring old CFG grammar
with rule priorities and semantic actions: because that's essentially what
they're, just expressed differently. And is there much gained by expressing it
differently? It's the same story as with DSLs, really, you keep making them
more and more complex and feature-full up until the point you arrive at a
full-fledged programming language with awkward semantics and realize that you
probably should have stayed with limited DSL that could invoke arbitrary
actions written in a proper general-purpose programming languages.

So, yes, "Devising an adequate and convenient formalism on top of grammars
with contexts is a challenging task", and it's the task that must be solved
before any of those approaches would be more useful than what we have today.

------
carapace
See also Definite Clause Grammars (DCG):

[https://en.wikipedia.org/wiki/Definite_clause_grammar](https://en.wikipedia.org/wiki/Definite_clause_grammar)

[https://www.metalevel.at/prolog/dcg](https://www.metalevel.at/prolog/dcg)

------
drallison
There is a tension between the language (grammar) used to describe a language
and the algorithm that is used to parse the language. In practice, a Pratt
Parser
([https://en.wikipedia.org/wiki/Pratt_parser](https://en.wikipedia.org/wiki/Pratt_parser))
is often a good choice since it has the simplicity of a recursive descent
parser but avoids the recursive plunge for objects like expressions.

------
snidane
Controversial opinion but I believe that eventually these grammars and complex
parsers will be found to have been a huge mistake for computer science. A
mistake made for the sole reason of making programming languages resemble
natural languages, even though they are not meant to be read fluently.

Everyone who experienced the magic of lisp understands how beneficial it is to
have the textual representation as close to the parsed abstract syntax tree.
One can create new languages fit for the purpose of a given class of problems
and thus reducing size of codebase 10 to 100 fold (or even million times in
case of modern multi million LOC software projects).

The biggest mistake lisp ever made was the unintuitive parenthesized prefix
notation. Which can be thrown away by having all operations strictly infix.

Ie. instead of doing combinations of the following:

    
    
      lisp: (plus 1 2)
      methods: 1.plus(2)
      algol,C: plus(1, 2)
      haskell: plus 1 2
    

One can use the simplest form ever and achieve even simpler lisp:

    
    
      1 + 2
    

but use it consistently for every function call or operator within the
language. (even for functions of multiple arguments).

APL came close, but for some reason decided to ignore the important Phoenician
discoveries of 1. reading from left to right 2. use of phonetic alphabet
instead of hyeroglyphs.

~~~
AnimalMuppet
> One can create new languages fit for the purpose of a given class of
> problems and thus reducing size of codebase 10 to 100 fold (or even million
> times in case of modern multi million LOC software projects).

I find that a bit hard to believe. Here's a 100 million line codebase; I
seriously doubt that you can express it in 100 lines in _any_ language. Lisp
may be good, but it isn't _that_ good.

~~~
chubot
Yeah to be honest one of the things that made me skeptical of the code
compression / productivity claim is looking at the implementations of Chez
Scheme and Racket (after also looking at 20+ compilers / interpreters, and
working on a language for a few years).

I'm pointing to them as very long-lived and valuable codebases written in Lisp
dialects. Chez Scheme is a 35 year old codebase and Racket is also decades
old.

So I'm not saying there's anything wrong with them, but I am saying that it
doesn't appear that they're 10x easier to understand or modify than LLVM or
CPython (Chez being a compiler and Racket being an interpreter as far as I
remember). Or that you can get a 10x better result.

Basically for the claim to be true, why can't you write something like Racket
in Racket 10x faster? Like 3 years instead of 30 years. And why doesn't it do
way better things than CPython or Ruby? Those claims might be "slightly" true
depending on who you are, but they're not devastatingly true. There's been
more than enough time to evaluate the claims empirically.

In other words they would have already proven themselves in the market if that
were the case. You would have extraordinarily productive teams using these
languages -- along the lines of what PG hypothesized 15+ years ago.

[http://www.paulgraham.com/avg.html](http://www.paulgraham.com/avg.html)

In fact the thing I found was interesting is that at the core of Racket is a
big pile of C code, just like CPython. A year or 2 ago I watched a talk about
them self-hosting more, and moving to Chez scheme's backend, but I don't
recall the details now.

[https://github.com/cisco/ChezScheme](https://github.com/cisco/ChezScheme)

[https://en.wikipedia.org/wiki/Chez_Scheme](https://en.wikipedia.org/wiki/Chez_Scheme)

[https://github.com/racket/racket/tree/master/racket/src/rack...](https://github.com/racket/racket/tree/master/racket/src/racket)

(FWIW I also looked at and hacked on femtolisp around the same time, since I
was impressed by how Julia uses it.)

correction: it looks like Racket has a JIT too, written in C. Still same point
applies: it's not magic and looks a lot like similar codebases in C. Chez is
more self hosted AFAIR but it's also hundreds of thousands of lines of code.

~~~
carapace
> In other words they would have already proven themselves in the market if
> that were the case. You would have extraordinarily productive teams using
> these languages -- along the lines of what PG hypothesized 15+ years ago.

There's K and Kdb, et. al. (I think they're on to "Q" now.)

~~~
vidarh
I'd argue very little of what makes K and Kdb special has to do with the
language per se, but with the focus on array operations, and with the
particular runtime environment of Kdb (heavily focused on distributed services
that logs (and can replay) operations against a columnar database with most
operations in-memory).

Q to me is a tacit admission that the obtuse syntax of K is not necessary to
get the benefits of K, and that K presents too steep a learning curve for
most.

They're fascinating, and I think vastly underrated, but they're hard to grasp
even without the syntax hurdle...

~~~
anonu
Kdb benefits from a combo of things:

1\. The terse language is actually quite expressive and very well suited to
vector programming. So less code is more. Also gets you thinking in a
particular way...

2\. The database, a mix of transactional, in memory, and fast analytics
(aggregations, joins..) concepts

3\. The runtime framework: Kdb is an ideal building block for a network
distributed system. Opening sockets and connecting to other Kdb instances is
simple.

~~~
vidarh
2/3 I buy, but for 1 I think you need to decouple the syntax from the
underlying array-programming model. K-level terseness and the semantic
structure that makes K well suited to vector programming are pretty much
entirely orthogonal.

You can implement the core of K's array manipulation in other languages and
ends up with similarly terse code if that's what you want. But you can also
produce something that will be far more readable to most people without losing
the expressiveness by e.g. opting for naming some of the operations etc.

~~~
carapace
> I'd argue very little of what makes K and Kdb special has to do with the
> language per se, but with the focus on array operations, and with the
> particular runtime environment of Kdb

I'm also not convinced that K as a language is essential to kdb as an
application but it's undeniable that they have "proven themselves in the
market", eh? It's a niche, but a well-paid one, and they don't have any
competitors (with e.g. easier syntax.)

Are you familiar with arcfide's Co-dfns compiler? [https://github.com/Co-
dfns/Co-dfns](https://github.com/Co-dfns/Co-dfns) I ask because it might serve
as a counter-example of a "killer app" that relies on the concision of the
syntax.

> I think you need to decouple the syntax from the underlying array-
> programming model. K-level terseness and the semantic structure that makes K
> well suited to vector programming are pretty much entirely orthogonal.

FWIW, I think so too. You could say that things like Numpy and Pandas are
moving towards integrated array processing, eh?

~~~
vidarh
They have proven themselves in the market, but it's impossible to decouple the
language agnostic benefits of KDB from K's syntax in that respect. That said,
they have plenty of competitors - just most of them are just run of the mill
languages.

