
META II: A Syntax-Oriented Compiler Writing Language (1964) [pdf] - espeed
http://www.ibm-1401.info/Meta-II-schorre.pdf
======
espeed
This is one of the papers referenced by Alan Kay in the talk posted to HN one
day ago:

Joe Armstrong Interviews Alan Kay [video]
([https://www.youtube.com/watch?v=fhOHn9TClXY](https://www.youtube.com/watch?v=fhOHn9TClXY))

Discussion:
[https://news.ycombinator.com/item?id=13033299](https://news.ycombinator.com/item?id=13033299)

------
adamnemecek
This seems very similar to OMeta
[https://en.wikipedia.org/wiki/OMeta](https://en.wikipedia.org/wiki/OMeta)
that has been ported to a lot of languages
[http://www.tinlizzie.org/ometa/](http://www.tinlizzie.org/ometa/)

OMeta really changed my view on how to prototype languages. TL;DR, OMeta let's
you map patterns in your language to patterns in another language (say python)
and thereby, you can implement a whole language roughly as easily as defining
a BNF.

~~~
chubot
I looked into OMeta awhile back, which sounded cool although somewhat
overblown, and wrote my own toy backtracking PEG implementation (somewhat
based on LPeg), which isn't too difficult. I read most of the VPRI papers
because I'm interested in compact software and DSLs.

But then I had occasion to implement a "real" language, and I just went with a
hand-coded parser instead, despite a desire to avoid that.

I think these kinds of tools are good for prototyping toy languages, as you
say, but unfortunately they're still TWO steps away from production quality
implementations of real language. They're not good for even prototyping "real"
languages (think shell, C++, Perl, Make), or production-quality
implementations of "toy" languages.

Possible reasons for this:

1) Developer experience... you end up having to reason about control flow and
performance for "real" languages, breaking out of the declarative abstraction.

2) Most parsing tools are frameworks rather than libraries -- they kind of
force you into a single lexer -> parser -> tree architecture (ANTLR being a
prime example, re2c being a great counterexample for lexing). But this
architecture doesn't work for many languages.

3) They make other unwarranted assumptions, e.g. about whitespace, which
Bourne shell violates.

There are a bunch of other reasons I can list if anyone is interested. A a
good research topic is to bridge this gap.

I wrote a very complete parser for shell in Python and sort of "discovered"
what the language is... I would love to now port it to a meta-language and
generate C/C++ code, rather than port it to C++ by hand.

[http://www.oilshell.org/blog/2016/11/17.html](http://www.oilshell.org/blog/2016/11/17.html)

But I think it's impossible. I was thinking of putting out a "meta-language
bounty" to help me solve this problem.

I think what's more likely is that I will end up writing a custom code
generator to port my subset of Python to C++. But I would love to be proven
wrong. The Python parser is 3500 lines and the AST is 2000 lines. I would like
to describe this with 200-500 lines of code in a meta-language (factor of 10
reduction). Although I think some things like good error messages are sort of
"irreducible" so maybe that's impossible.

Basically the goal is: describe bash with a meta-language. I have described it
in ~5K lines of Python, which is a pretty good compression over the actual
bash source code, where the parser is between 10K and 20K lines, spread out
over many files and many stages.

I wonder what Alan Kay would say about bash. I think he would probably say
"don't use such a complex tool". But I think that's just an indication of the
gap between research and practice. More people have used bash than any of his
languages, and it consumes more CPU cycles worldwide as well. Although his
ideas were influential, they had to be modified to actually work in practice.

~~~
dangom
Very interesting blog. Trying not to go too much off-topic here, I read in one
of your posts that "Awk and shell are similar in an interesting way: they lack
garbage collection. This leads to arrays that can neither be nested nor
returned from functions." Would you be willing to explain why that is the
case? I'm a CS enthusiast without any formal CS education.

~~~
chubot
Another way to say it is the bash and awk have no references, and hence no
aliasing (and no possiblity of cycles). Consider:

    
    
        d = [1,2,3]
        e = {key1: d}
    

Now you have an two names for the array [1,2,3] -- d and e['key1']. They are
aliases.

Now consider something like this:

    
    
        f() {
          d = [1,2,3]
          g(d)       # The interpreter has no idea what this does, without runtime inspection
          e = {key1: d}
          return e
        }
    

How do you figure out how to free memory? There's no way to do it without
garbage collection, which is an algorithm that runs concurrently with your
program. For contrast, Rust eliminates the runtime bookkeeping with compile
time annotations like borrowing.

So basically if you have references and aliasing, and you can move them across
function boundaries, then you must have garbage collection if you want to free
memory automatically. (Function boundaries are important, because they use the
idea of a stack. A stack is a (limited) way to automatically manage memory.)

But bash and awk both get around this by not having references. This
eliminates the possiblity of nested compound data structures, or returning
values by reference. But in return you don't have to implement a garbage
collector. Compound data structures are flat and confined within a stack
frame.

(You could still copy the whole array to another function's stack frame if you
wanted, but they don't do that.)

So you can't express that program in either bash or awk. They have "flat" hash
tables and arrays, where you can only COPY primitive values like ints and
strings into them. You can't reference other arrays like {key1: d}.

Hopefully that helps. I hacked on awk a bit here [1] and that is when I
noticed the limitations in the language. I thought it was sort of like
JavaScript -- regexes and hash tables. But it's actually much more limited,
due to its lack of garbage collection.

[1] [https://github.com/andychu/bwk](https://github.com/andychu/bwk)

------
pierre_d528
Work is still being done in that direction see:

[https://github.com/harc/ohm](https://github.com/harc/ohm)

~~~
andreaorru
That's very cool. Are you aware of similar tools written in languages other
than JavaScript?

~~~
abecedarius
Hi Andrea --
[https://github.com/darius/parson](https://github.com/darius/parson) is in
Python. It's probably not very similar because I haven't got around to
studying Ohm, but it shares the basic goal of not writing semantic actions
inline in some particular language. (I wrote it in my first Recurse Center
stint back in 2012.)

~~~
andreaorru
Hi Darius (it's a small world).

That looks really good, I'll have a look at it!

------
walterbell
Previous threads on meta-compilers, including a Javascript-based workshop/demo
of techniques:

[https://news.ycombinator.com/item?id=8297996](https://news.ycombinator.com/item?id=8297996)

[https://news.ycombinator.com/item?id=8441041](https://news.ycombinator.com/item?id=8441041)

