Swift's Abstract Syntax Tree

yjgyhj · on Feb 29, 2016

After learning lisp, I just feel like every other language is a weird set of macros. Looking at this AST confirms that.

Wish I could live the rest of my life in Sexp-land.

chriswarbo · on March 1, 2016

Seeing this "dump AST" option for Swift is truly refreshing.

I've come to the opinion that when sending data in or out of software, it should always preserve as much structure as possible in an easily parseable form accessible to off-the-shelf parsers; e.g. s-expressions, JSON or, if it can't be avoided, XML. Even thrown-together debugging messages. Basically I should be able to reconstruct all of the structure in basically any programming language with something like one module import and a function call (or equivalent); I shouldn't have to define my own parsing rules, no matter how simple their author thinks they are.

Not only this, but any format which, for some reason, cannot be represented in this way, e.g. programs in languages which consciously avoid s-expressoins, should pass through a layer which does convert to a structured form, and this form should be obtainable, e.g. by setting an environment variable or a commandline flag. That's why I commend this "dump AST" flag.

It's basically the opposite of the many "Lisp-on-top-of-X" approaches. Instead of being able to write s-exprs which translate to some underlying language, like Swift, I should be able to translate any Swift into s-exprs to be manipulated. Why? Because there will always be strictly less code written in a Lispy language than there will be for the underlying platform as a whole. For example, there will always be less code written in Clojure than code written for the JVM.

The benefits of s-expressions (and accessible AST structures in general) is that they can be inspected, transformed, manipulated, pulled apart, recombined, etc. This is of some use when programming, e.g. having macros manipulate our Clojure ASTs, but not a huge amount; after all, we could write our code in a different way which didn't need the macros (we could even, in principle, write it in Java!).

In contrast, being able to manipulate code which isn't ours is much more useful; since we don't have the option of writing it a different way (since we didn't write it at all!).

westoncb · on March 1, 2016

Interesting. I've been thinking about an alternate way of structuring programming tools that's along these lines. I've written about it here: http://westoncb.blogspot.com/2015/06/how-to-make-view-indepe...

chriswarbo · on March 1, 2016

Very interesting, especially tracing a path through the grammar; I saw "path" and was expecting something more unwieldy like an infinite trie of all valid programs.

Whilst I agree that the visual and on-disk representations of programs are too tightly-coupled, there are many structure-based IDE experiments out there, but few see much adoption due to the immense momentum of existing infrastructure, and the subtlety of integrating nicely with things like version control.

It's a less drastic change to make existing language compilers/interpreters more modular, so that we can use them for things like parsing, type-checking, optimisation, etc. in a "standalone" way, without having to reinvent everything from scratch each time we want to do something the implementors didn't consider.

westoncb · on March 1, 2016

Hey Chris, thanks for checking it out.

The way I see it, the fact that our programming tools operate on huge character sequences is historical accident. Rather than coming up with a particular editor, I'm trying to think of what a more 'correct' structure for representing programs in general would be, as a replacement for character sequences.

Of course this is essentially suggesting a paradigm shift, and as you say (approximately), the current paradigm has lots of momentum.

It feels like this might be closer to describing planetary orbits with the more natural ellipses rather than circles—but maybe I'm just suggesting triangles or something :)

westoncb · on March 1, 2016

Also, probably the most subtle thing I do in there is change what it is that the grammar is actually describing, so that the 'grammar' I talk about paths through is actually talking about higher level things than grammars ordinarily are.

dang · on Feb 29, 2016

Boy do I know how you feel. So many things that used to seem interesting and deep to me (often involving syntax or parsing or data formats) now feel like busywork and noise. Or like I'm visiting a land where everyone wears extra layers of heavy, stiff clothing and speaks in long circumlocutions. I feel claustrophobic and can't wait to get back to Sexp-land where I can breathe.

Some of it's just habit, of course, but the lightness and simplicity also seem objective to me. It becomes hard to understand why everyone else keeps jumping through all those pointless hoops, you get ever more impatient and irritable with it all, and one day you fall prey to full smugweeniness.

stormbrew · on Feb 29, 2016

Of course, the sentiment in this reply could probably have been expressed with far less text and/or syntax (probably in less than a tweet even).

IMO people prefer to communicate with some nuance and some degree of redundant expression. Terseness is itself even a kind of nuance when you're allowed to not use it.

Drives for widespread purity in language never seem to pan out, in either human<->human communication or human<->computer communication. I suspect there are reasons for this that might be somewhat under-researched.

dang · on Feb 29, 2016

Well, let me see. Conversation isn't code, natural language isn't programming language, I don't think I care about "purity in language", and I'm pretty sure I can make a case for everything in my comment needing to be there.

I do agree that people often desire to make things more complicated than they need to.

stormbrew · on Feb 29, 2016

Explain in more detail why code isn't conversation? You are communicating a set of expectations about what a computer should do to both the computer and other humans who either read it or interact with it.

dang · on Feb 29, 2016

What I said is that conversation isn't code. Not a commutative statement!

That is, you seemed to be arguing that there was redundancy in my comment, similar to what I called busywork and noise in code. An interesting point, but my comment was conversation—not code—so it struck me as a non sequitur.

Your comment hints at an insight into why we make code overcomplicated. But the superfluous complexity I'm talking about rarely strikes me as nuance, i.e. valuable enhancement for humans. It's mostly boilerplate to satisfy machines. It's no accident that "Programs must be written for people to read, and only incidentally for machines to execute" is a line that came out of the Lisp mentality.

yjgyhj · on Feb 29, 2016

Look then at human languages - two of the giants are English and Spanish. Both pretty simple. Definitely they both are the most spoken as second language. They would not have that position if it was not for their simplicity, compared to other languages.

Mandarin is an outlier in that comparison - but also not popular as a second language.

mkehrt · on Feb 29, 2016

English and Spanish hold their positions as popular languages based on the fact they were both spoken in world spanning empires in the past, not because they are particularly simple.

English, in particular, is known for being difficult to learn--it uses phones that very rare ("th" and the English "r"), it has a bizarrely large vocabulary, and, somewhat separately, its spelling is nonsensical.

_mhyx · on March 1, 2016

Your are correct about the cause of dominance, but Spanish is absolutely the most simple language in large use.

yjgyhj · on Feb 29, 2016

Your opinion is just as valid. Can't say syntax is _simpler_ than s-expressions though. But so is ASM.

nickbauman · on Feb 29, 2016

Yeah, wow, such complicated! Brings to mind the Rich Hickey quote "There's no syntax you will ever devise that will beat pure data".

dandrews · on Feb 29, 2016

Contrast with Bill Gosper's quote from Levy's Hackers: "Data is just a dumb kind of programmihg."

vmorgulis · on Feb 29, 2016

There is other interesting homoiconic languages like Rebol (brackets instead of parens and more whitespaces).

I could add a stack-based language like Factor too.

vmorgulis · on Feb 29, 2016

It would be useful to add a command line tool with a linq/jquery-like syntax over that.

I've discovered few weeks ago the same feature (a bit hidden) in GCC:

    gcc -fdump-translation-unit=ast.txt file.c

It gives the AST in a flat map (parent/child with ids).

yjgyhj · on Feb 29, 2016

Well since it is s-expresssions (those paren things, looks like lisp), it is also very similar to JSON objects or XML elements.

If you want to convert it to HTML you could do pretty easily. Use the car & cdr functions in lisp, and transform every car into a string like this `"<" + symbol + ">"`. Convert the closing paren to `"</" + symbol + ">"`. In the middle you put the result of cdr. https://en.wikipedia.org/wiki/CAR_and_CDR

Writing LISP is basically writing the AST without hiding it behind a textual syntax. The only textual syntax is the paren symbols that delimit the nodes in the syntax tree. It's a simpler way to live.

Kinnard · on Feb 29, 2016

That looks like lisp?

chumich1 · on Feb 29, 2016

You are right! Lisp is homoiconic which basically means that the language mirrors the AST https://en.wikipedia.org/wiki/Homoiconicity

Kinnard · on Feb 29, 2016

Hmmm, I am learning a lisp now I've been reluctant to dive in but this makes me all the more willing

bsaul · on Feb 29, 2016

Your remark made me curious, because i initialy felt the same. Then, i thought, well i still find the original swift much more readable, so isn't it like prefering to code in ASM because you're seen compiled code result ?

yjgyhj · on Feb 29, 2016

It's simpler than it seems, and how strange people make it out to be held me off for a long time.

There are cons cells. A cons cell is basically an untyped tuple. The syntax for a cons cell containing a number look like this: (1 nil) (where nil is the zero byte (I think, please correct me if I'm wrong)).

You can nest them, like this: (1 (2 (3 nil))).

Because nobody can be bothered to type those parens, the part of the compiler/interpreter called the Reader compiles syntax like this

(1 2 3)

to this

(1 (2 (3 nil))).

After being parsed by "the reader" (you could also call it "parser"), those nested tuples/cons cells are sent to eval function. The eval function takes the first element of the list (aka `car` in lisp-speak. My mnemonic is that a is to the left of the keyboard, therefore means first element). That first element is being regarded as the function, and the second element in the cons cell is the argument to the function.

The function to get the second item in the cons cell is `cdr`. I remember that by having `d` being to the right on my keyboard. (car 1 2) => 1. (cdr 1 2) => 2. (cdr 1 (2 (3 nil))) => (2 (3 nil)). That is how things are nested.

I digress. The eval function evaluates every symbol (aka atom) in these nested cons cells, and looks up the corresponding meaning of them in a big table of defined symbols. So the + symbol might match an addition function, the string-join symbol might match to a function for joining strings.

What is returned from eval you could say is the true AST, which still looks pretty darn similar to the lisp syntax. The function is then sent to the apply function, which applies the argument to the function. Remember that the argument can be a nested set of cons cell, so it can nest infinitely.

That's it! I'm no lisp expert, so if someone is, please correct me. But that's the gist of it. Syntax is read by the read function, the symbols are interpreted (lookup in a big key-value object) by the eval function, these function gets apply-ed with their argument. Very few primitives are needed by build a system from that. Here is a list of the primitives needed - these can be implemented in binary, C, assembly, Java, Javascript or whatever: http://stackoverflow.com/a/3482883.

BONUS SYNTAX:

If you want a eval to skip over a cons cell (and of course it's children), you can prepend it with a `'` symbol. So (cdr 1 '(this is never going to be evaled, but is just a list)) => (this is never going to be evaled, but is just a list)

This is called a 'quote'. Any Javascript array is basically working like a quoted list. If the first item in a javascript array was a function, and you applied the cdr of that array as argument to that funciton, it would be like using lisp with an not-quoted list.

BONUS VIDEO:

This is just lovely. I'm not American, and do not have a CS degree, so I feel like attending a place like this is a far dream. But happy to be able to enjoy it over the interwebs. Incredibly thankful to the giants of computing on whose shoulders we can stand https://www.youtube.com/watch?v=2Op3QLzMgSY&list=PLF4E3E1B72...

yjgyhj · on Feb 29, 2016

BONUS FUN:

If you want to play around with lisp interactively, I reccomend checking out the program Emacs. It's a little lisp interpreter written in C, that comes with a text editor and such. Download it and run it. When you write some lisp code into the editor, place your cursor after the expression and hit Ctrl-x Ctrl-e. That is read, eval and apply what the expression, and display the result in the bottom of them window.

Here is a screenshot of me doing that with this code (apply (quote +) (quote (1 2))) ;; http://i.imgur.com/SrhMxS8.png

Play around with that - it's fun enough for an evening.

Some other fun code to run is this

    (reduce '+ '(1 2 3 4))

    (defun say-hello (&optional name) ;; a wild lambda appeared!
      (if (stringp name)
          (concat "hello " name)
        "I'm a lonely program"))
    (say-hello "Martin")
    (say-hello)

dragonwriter · on Feb 29, 2016

A note on notation:

The standard notation for a cons cell with car "x" and cdr "y" is (x . y), not (x y). The latter is a list, equivalent of (x . (y . nil)).

So (1 2 3) is equivalent to (1 . (2 . (3 . nil)), not (1 (2 (3 nil))). The latter is a nested list, which is equivalent to (1 . ((2 . ((3 . (nil . nil)) . nil)) . nil))

yjgyhj · on Feb 29, 2016

Good catch, thank you.

To anyone who wants to play around with lisp, I recommend downloading the emacs text editor. Just open it, navigate around with the arrow keys. Type an s-expression, and place the point after the ")". Then type Ctrl-x Ctrl-e, and the expression will be sent thru read -> eval -> apply. Fun to just play around with.

(+ 1 2)_ ;; _ means the cursor, ";" means a comment

Mikeb85 · on Feb 29, 2016

It's because Lisp is basically an abstract syntax tree.

adamnemecek · on Feb 29, 2016

Programming in lisp is basically manipulating the ast.

PeCaN · on March 1, 2016

Not just "basically", that actually is the so-called ‘Zen of Lisp’. With something like paredit, your editing isn't text editing so much as manipulating the AST in a very literal sense. Then you can write macros to do that for you, since programmers are lazy and Lisp programmers are particularly so.

_19qg · on Feb 29, 2016

No, it looks like some kind of s-expression version of the Swift AST. It could have been in XML format or something else. Lists seem to be popular for it - somehow.

Not everything which uses a notation similar to s-expressions has anything to do Lisp.

outworlder · on March 1, 2016

True, but it could be turned into executable Lisp code without much effort. As could a XML format version of it.

Besides, square brackets is the most common representation of lists. It's difficult to find round brackets denoting lists outside of Lisp and dialets. I doubt that's a coincidence.

Given that Apple already uses Scheme in its sandbox implementation [1], it wouldn't be too outlandish to assume those are related somehow. The notation is a bit off, but it's nothing a few macros wouldn't solve.

[1] http://lemonodor.com/archives/2007/10/sexprs_in_leopard.html

_19qg · on March 14, 2016

> True, but it could be turned into executable Lisp code without much effort

Unlikely.

> Given that Apple already uses Scheme in its sandbox implementation [1], it wouldn't be too outlandish to assume those are related somehow. The notation is a bit off, but it's nothing a few macros wouldn't solve.

They both use parentheses. They are related. Somehow.