
Swift's Abstract Syntax Tree - aciid
http://ankit.im/swift/2016/02/29/swift-abstract-syntax-tree/
======
yjgyhj
After learning lisp, I just feel like every other language is a weird set of
macros. Looking at this AST confirms that.

Wish I could live the rest of my life in Sexp-land.

~~~
chriswarbo
Seeing this "dump AST" option for Swift is truly refreshing.

I've come to the opinion that when sending data in or out of software, it
should always preserve as much structure as possible in an easily parseable
form accessible to off-the-shelf parsers; e.g. s-expressions, JSON or, if it
can't be avoided, XML. Even thrown-together debugging messages. Basically I
should be able to reconstruct all of the structure in basically any
programming language with something like one module import and a function call
(or equivalent); I shouldn't have to define my own parsing rules, no matter
how simple their author thinks they are.

Not only this, but any format which, for some reason, cannot be represented in
this way, e.g. programs in languages which consciously avoid s-expressoins,
should pass through a layer which _does_ convert to a structured form, and
this form should be obtainable, e.g. by setting an environment variable or a
commandline flag. That's why I commend this "dump AST" flag.

It's basically the opposite of the many "Lisp-on-top-of-X" approaches. Instead
of being able to write s-exprs which translate to some underlying language,
like Swift, I should be able to translate any Swift into s-exprs to be
manipulated. Why? Because there will always be strictly less code written in a
Lispy language than there will be for the underlying platform as a whole. For
example, there will always be less code written in Clojure than code written
for the JVM.

The benefits of s-expressions (and accessible AST structures in general) is
that they can be inspected, transformed, manipulated, pulled apart,
recombined, etc. This is of some use when programming, e.g. having macros
manipulate our Clojure ASTs, but not a huge amount; after all, we could write
our code in a different way which didn't need the macros (we could even, in
principle, write it in Java!).

In contrast, being able to manipulate code which _isn 't_ ours is much more
useful; since we _don 't_ have the option of writing it a different way (since
we didn't write it at all!).

~~~
westoncb
Interesting. I've been thinking about an alternate way of structuring
programming tools that's along these lines. I've written about it here:
[http://westoncb.blogspot.com/2015/06/how-to-make-view-
indepe...](http://westoncb.blogspot.com/2015/06/how-to-make-view-independent-
program.html)

~~~
chriswarbo
Very interesting, especially tracing a path through the grammar; I saw "path"
and was expecting something more unwieldy like an infinite trie of all valid
programs.

Whilst I agree that the visual and on-disk representations of programs are too
tightly-coupled, there are many structure-based IDE experiments out there, but
few see much adoption due to the immense momentum of existing infrastructure,
and the subtlety of integrating nicely with things like version control.

It's a less drastic change to make existing language compilers/interpreters
more modular, so that we can use them for things like parsing, type-checking,
optimisation, etc. in a "standalone" way, without having to reinvent
everything from scratch each time we want to do something the implementors
didn't consider.

~~~
westoncb
Hey Chris, thanks for checking it out.

The way I see it, the fact that our programming tools operate on huge
character sequences is historical accident. Rather than coming up with a
particular editor, I'm trying to think of what a more 'correct' structure for
representing programs in general would be, as a replacement for character
sequences.

Of course this is essentially suggesting a paradigm shift, and as you say
(approximately), the current paradigm has lots of momentum.

It feels like this might be closer to describing planetary orbits with the
more natural ellipses rather than circles—but maybe I'm just suggesting
triangles or something :)

------
vmorgulis
It would be useful to add a command line tool with a linq/jquery-like syntax
over that.

I've discovered few weeks ago the same feature (a bit hidden) in GCC:

    
    
        gcc -fdump-translation-unit=ast.txt file.c
    

It gives the AST in a flat map (parent/child with ids).

~~~
yjgyhj
Well since it is s-expresssions (those paren things, looks like lisp), it is
also very similar to JSON objects or XML elements.

If you want to convert it to HTML you could do pretty easily. Use the car &
cdr functions in lisp, and transform every car into a string like this `"<" \+
symbol + ">"`. Convert the closing paren to `"</" \+ symbol + ">"`. In the
middle you put the result of cdr.
[https://en.wikipedia.org/wiki/CAR_and_CDR](https://en.wikipedia.org/wiki/CAR_and_CDR)

Writing LISP is basically writing the AST without hiding it behind a textual
syntax. The only textual syntax is the paren symbols that delimit the nodes in
the syntax tree. It's a simpler way to live.

------
Kinnard
That looks like lisp?

~~~
chumich1
You are right! Lisp is homoiconic which basically means that the language
mirrors the AST
[https://en.wikipedia.org/wiki/Homoiconicity](https://en.wikipedia.org/wiki/Homoiconicity)

~~~
Kinnard
Hmmm, I am learning a lisp now I've been reluctant to dive in but this makes
me all the more willing

~~~
yjgyhj
It's simpler than it seems, and how strange people make it out to be held me
off for a long time.

There are cons cells. A cons cell is basically an untyped tuple. The syntax
for a cons cell containing a number look like this: (1 nil) (where nil is the
zero byte (I think, please correct me if I'm wrong)).

You can nest them, like this: (1 (2 (3 nil))).

Because nobody can be bothered to type those parens, the part of the
compiler/interpreter called the Reader compiles syntax like this

(1 2 3)

to this

(1 (2 (3 nil))).

After being parsed by "the reader" (you could also call it "parser"), those
nested tuples/cons cells are sent to eval function. The eval function takes
the first element of the list (aka `car` in lisp-speak. My mnemonic is that a
is to the left of the keyboard, therefore means first element). That first
element is being regarded as the function, and the second element in the cons
cell is the argument to the function.

The function to get the second item in the cons cell is `cdr`. I remember that
by having `d` being to the right on my keyboard. (car 1 2) => 1\. (cdr 1 2) =>
2\. (cdr 1 (2 (3 nil))) => (2 (3 nil)). That is how things are nested.

I digress. The eval function evaluates every symbol (aka atom) in these nested
cons cells, and looks up the corresponding meaning of them in a big table of
defined symbols. So the + symbol might match an addition function, the string-
join symbol might match to a function for joining strings.

What is returned from eval you could say is the true AST, which still looks
pretty darn similar to the lisp syntax. The function is then sent to the apply
function, which applies the argument to the function. Remember that the
argument can be a nested set of cons cell, so it can nest infinitely.

That's it! I'm no lisp expert, so if someone is, please correct me. But that's
the gist of it. Syntax is read by the read function, the symbols are
interpreted (lookup in a big key-value object) by the eval function, these
function gets apply-ed with their argument. Very few primitives are needed by
build a system from that. Here is a list of the primitives needed - these can
be implemented in binary, C, assembly, Java, Javascript or whatever:
[http://stackoverflow.com/a/3482883](http://stackoverflow.com/a/3482883).

BONUS SYNTAX:

If you want a eval to skip over a cons cell (and of course it's children), you
can prepend it with a `'` symbol. So (cdr 1 '(this is never going to be
evaled, but is just a list)) => (this is never going to be evaled, but is just
a list)

This is called a 'quote'. Any Javascript array is basically working like a
quoted list. If the first item in a javascript array was a function, and you
applied the cdr of that array as argument to that funciton, it would be like
using lisp with an not-quoted list.

BONUS VIDEO:

This is just lovely. I'm not American, and do not have a CS degree, so I feel
like attending a place like this is a far dream. But happy to be able to enjoy
it over the interwebs. Incredibly thankful to the giants of computing on whose
shoulders we can stand
[https://www.youtube.com/watch?v=2Op3QLzMgSY&list=PLF4E3E1B72...](https://www.youtube.com/watch?v=2Op3QLzMgSY&list=PLF4E3E1B72A58B492)

~~~
dragonwriter
A note on notation:

The standard notation for a cons cell with car "x" and cdr "y" is (x . y), not
(x y). The latter is a list, equivalent of (x . (y . nil)).

So (1 2 3) is equivalent to (1 . (2 . (3 . nil)), _not_ (1 (2 (3 nil))). The
latter is a nested _list_ , which is equivalent to (1 . ((2 . ((3 . (nil .
nil)) . nil)) . nil))

~~~
yjgyhj
Good catch, thank you.

To anyone who wants to play around with lisp, I recommend downloading the
emacs text editor. Just open it, navigate around with the arrow keys. Type an
s-expression, and place the point after the ")". Then type Ctrl-x Ctrl-e, and
the expression will be sent thru read -> eval -> apply. Fun to just play
around with.

(+ 1 2)_ ;; _ means the cursor, ";" means a comment

