Hacker News new | past | comments | ask | show | jobs | submit login

One does not write 'directly in AST' in Lisp.

Lisp is written in nested lists of prefix expressions.

Instead of

  foo(bar(1,2));
one writes

  (foo (bar 1 2))
Instead of

  function foo(bar) {
    bar.baz = "hello";
  }
one writes

  (defun foo (bar)
    (setf (slot-value bar 'baz) "hello"))
Instead of

  function(a) {
    a + 1;
  }
one writes

  (lambda (a)
    (+ a 1))
Basically it's writing hierarchical lists of tokens in a externalized data structure: symbolic expressions.

An AST would represent more information. For example a s-expression representation of an AST might look like:

  (anonymous-function-definition
    :argument-list ((variable :name a))
    :body          (function-application (function :name +)
                               :arguments ((variable :name a)
                                           (constant :value 1))))
It would actually represent from parsing what the meaning of the tokens is (function, variable, constant, ...) and what the syntactic constructs are: function definition, function application, argument list, ... -> an AST is the result of a syntax analysis.

see: https://en.wikipedia.org/wiki/Abstract_syntax_tree

The Lisp source representation lacks this information and needs to be parsed/interpreted to construct this information. The Lisp reader does not do a syntactic analysis of a program - it does not even know the difference between program and data. On the Lisp source level a program is just data and (+ a 1), (a + 1) and (a 1 +) are all valid expressions - but only the first is a valid Lisp expression.




> The Lisp source representation lacks this information and needs to be parsed/interpreted to construct this information.

That information doesn't have to go anywhere into the tree though; a compiler can generate code directly from a macro-expanded top-level expression without constructing any class-based concoctions involving inheritance.

The parsed Lisp data structure meets the textbook definition of AST in every regard, particularly after macro-expansion. All the nodes that are compound forms are clearly labeled and classified by symbols which denote the special operators. Recursive walks of the tree can readily branch to cases according to these symbols.

It's just a smarter version of the diagram you see in that Wikipedia page's top diagram.


> That information doesn't have to go anywhere into the tree though; a compiler can generate code directly from a macro-expanded top-level expression without constructing any class-based concoctions involving inheritance.

Sure, but then it does not use an Abstract Syntax Tree.

> Recursive walks of the tree

That's the parsing part. You can only macroexpand and walk the tree with knowledge of the Lisp language: special forms, calling forms, arg lists, environments, ... etc.

> All the nodes that are compound forms are clearly labeled and classified by symbols which denote the special operators.

  (let ((a a)) (flet ((a (a) a))
    (a a))
Nothing is labelled: without walking the tree based on the Lisp syntax you won't know what the 'a' symbols are - there is zero syntactic information represented. In the s-expression, they are symbols and not elements in a syntax tree.


Note that "nothing is labeled" in the node of an AST that is written in C or Pascal; fields of a record or structure are accessed by position. The fields have compile-time names only. I don't see anything in the Wikipedia article about things having to be labeled.

If we obtain a pointer into the middle of an AST node without any context, we also don't know anything about it; we have to work with a correctly aimed pointer to the base of the object.

There is no reason anything in the compiler or interpreter would obtain a pointer to just the (a a), and treat it as a form, other than as some sort of severe bug. The (let ...) thing is an object, and has to be respected as such.

Basically your argument boils down to "lists are not proper polymorphic tree node data structures, even with discriminating type symbol in the car; and an AST must be made out of structured data with named fields."


> Basically your argument boils down to

No, my argument boils down to that an Abstract Syntax Tree must represent Abstract Syntax. A stream/list/tree of tokens isn't a tree of syntax elements.

In Lisp (+ a b) and (a b +) are valid token lists and we have no idea what these tokens are other than symbols. In an abstract syntax tree we would represent syntactic categories like operation, variable, constant, function call, etc. and it would be the result of a parse of a particular syntax.

The Lisp reader is not a Lisp parser. It doesn't know if (+ a b), ((a) (+) (b)), (a (b) +) or (a b +) are valid Lisp constructs.

The Lisp reader is an S-Expression parser. Its input is a s-expression and the syntax of s-expressions. Thus the result of its parse can't be an abstract syntax tree for a Lisp program - it simply can't assign any syntactic information: neither implicit, nor explicit.

Definitions (stolen):

Abstract Syntax is a model consisting of the language constructs from which a language is composed.

Abstract Syntax Tree is a directed acyclic graph (DAG) consisting of instances of Abstract Syntax language constructs.


Because Lisp uses a flexible representation, the representation has to be validated to see that it conforms to a given syntax (such as that of Lisp source). This can happen naturally in a code-expanding tree walk. The subsequent structure can then be processed without additional validation; every form has the right kinds of pieces in the right positions. If (1 a) or ((a) b) are not valid forms, their presence can be diagnosed at that stage. Subsequent processing stages, can trust that the pieces are where they are supposed to be. The invalid forms like (1 a) and ((a) b) are assumed not to occur, just like any bad inputs to a module that have been checked away.

You're arguing for a definition of AST that basically coincides with "decorated AST": AST plus some semantic attributes, and that a generalized tree form that encodes supersets of a syntax cannot be an AST.


> This can happen naturally in a code-expanding tree walk

just because something can be interpreted as a tree, it's not suddenly an Abstract Syntax Tree. It's mainly nothing more than a hierarchical tokenizer format.

A flexibility of the Lisp source representation comes because it is NOT an AST and thus not bound to a particular correctly represented syntax. A macro for example can turn any wild combination of tokens into more wild combination of tokens - regardless of what the symbols/numbers/lists/strings, ... actually might 'mean'.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: