

In Lisp, should lists be replaced with trees? - bawatski
http://www.hxa.name/notes/note-hxa7241-20110515T0907Z.html

======
ohyes
A list structure is already a 'dictionary tree', except it has lousy
performance in access. (only really visible on large data sets).

* (second (assoc 5 '((1 2) (3 4) (5 6))))

> 6

The lousy performance doesn't matter, most of the time, because you have other
data-structures for data, and your compiler generally is only interested in
sequential access (constructing a parse tree out of it). The lisp code already
being a tree makes parsing fairly simple...

So, I guess my point is that I don't see the point of this?

The only reason to really upgrade to a dictionary would be if you wanted to
have non-sequential access to the contents of of your large-scale code-data-
structure... generally the lists aren't large enough to merit it. You can do
all of the dictionary operations on a list, and they are 'fast enough'. \----

I feel like the rationale here is created by conflating the idea of a cons and
a list. I think generally, a list already accomplishes what the author sets
out to accomplish.

They are a single step higher level than conses, and give you a hierarchical
structure with the possibility for named associations. I don't really see how
using something more complicated and memory intensive to build this basic data
structure qualifies as an improvement...

~~~
fexl
_> The lousy performance doesn't matter, most of the time, because you have
other data-structures for data, and your compiler generally is only interested
in sequential access ..._

Exactly. Much of the time you don't care. The rest of the time, you can
_still_ use assoc-lists, except you create a branching structure on the keys
such that no key in the list is a prefix of any other. The first entry of a
node contains the value "at" that node, and the tail of the node is a list of
branches, where each branch is a pair consisting of a string key and a child
node.

The "get" and "put" functions in this structure are very simple recursive
functions, and you can maintain _very_ large dictionaries ("maps") this way.
The get and put operations are essentially O(1), constant time, if you ignore
the sheer _length_ of the keys (which are usually reasonably short anyway).
I've done tests where I insert hundreds of thousands of pseudo-random keys and
values, and read them back, all very quickly and reliably. The code splits the
keys into branching structures automatically. It's also a purely functional
structure, so you can keep old versions of the maps.

The value at each node can be anything you like, not just a string, though to
detect the absence of a value you need to use a "Maybe" type. In Fexl you can
define constructors:

    
    
      \absent  = (       \absent\present absent)
      \present = (\value \absent\present present value)
    

So then the default value of a newly created branch is always absent, but a
"put" operation replaces it with (present value). When you delete a key, it
automatically detects if the node is empty (i.e. its value is absent and it
has no branches) and then eliminates the node from its parent.

You also detect singleton nodes, which have an absent value and only one
branch. In that case you can concatenate the key of that branch with the
parent key upstairs, eliminating the extra branch.

------
riffraff
not directly lisp related, but the team working on Fortress has some
interesting work on using "conc lists" (pairs, and trees as a combination of
pairs) as a basic unit of work for parallel processing[1], the analysis quite
naturally considers its relation with lisp ad cons lists in other languages.

[1]
[http://labs.oracle.com/projects/plrg/Publications/ICFPAugust...](http://labs.oracle.com/projects/plrg/Publications/ICFPAugust2009Steele.pdf)

~~~
p4bl0
Also, Guy Steele is one of the two inventor of Scheme. So it may not be
directly lisp related, but it definitely is indirectly lisp related ;-).

------
cswetenham
A better reason to replace lists with trees might be for parallel execution. I
enjoyed this talk by Guy Steele on replacing lists with trees in Fortress:
<http://vimeo.com/6624203>

------
Turing_Machine
"Branchs/trees are a superset of pairs/lists"

No, they aren't. A tree is acyclic by definition. Cons cells allow circular
structures.

~~~
Turing_Machine
To take a familiar example, the Web is not a tree. While you might argue that
each page has unique children (outlinks), there is no unique parent -- a page
can have an arbitrarily large number of incoming links, and the page can link
back to those pages in turn, producing cyclic structures of immense
complexity.

For some interesting discussion on the limits of tree structures with respect
to urban planning see A City is Not a Tree by Christopher Alexander (who is
also the inspiration for the Design Patterns movement in software
engineering). <http://www.rudi.net/node/317>

~~~
Deestan
All good, just a nitpick: You don't need a unique parent to have a tree. You
don't even need directed edges. :)

The purest definition of a tree graph is "a graph with no cycles".

~~~
sesqu
We use different definitions. My trees are acyclic connected graphs with one
root.

------
jefffoster
Will Thimbleby (<http://will.thimbleby.net/misc-an-experimental-language/>)
also explored similar ideas.

------
fexl
It's easy to implement both lists and name-value associations in terms of
pairs, so I'm not sure why they need to be built in.

In Fexl (<http://fexl.com/code>), the only built in concept is the _function_
, and all data is structured out of functions. I won't go into all the details
here, but concepts like pair, list, key-value assoc, branch, and everything
else are readily expressed as pure functions. So-called "circular" structures
are created with the Y combinator, so they really are not circular in memory,
only in concept.

~~~
jefffoster
There's an interesting write up about the idea of "everything is a function"
in Haskell at [http://conal.net/blog/posts/everything-is-a-function-in-
hask...](http://conal.net/blog/posts/everything-is-a-function-in-haskell/).

What's your rationale vbehind everything being a function?

~~~
randallsquared
_Although I keep hearing “everything is a function” (and 3 is a nullary or
constant function), I don’t hear people say “everything is a list”, and 3 is
really the singleton list [3]._

Well, in musings around Arc, pg did mention defining everything as a list,
including 3 as [[] [] []], I believe. So, it's not that no one says it. :)

~~~
fexl
Although Fexl does have built-in functions for long int and double float
values, it is possible to do without them.

There are a couple of ways to define integers as pure functions. One is unary
notation, which is equivalent to the "3" example you cite above. So you'd have
these two constructors:

    
    
      \zero = (   \zero\succ zero)
      \succ = (\N \zero\succ succ N)
    

And then the "add" would be:

    
    
      \add == (\x\y x y \n succ (add n y))
    

Or, using the ';' (right-pivot) to avoid nesting on the right (though
sometimes that can be too clever by half):

    
    
      \add == (\x\y x y \n succ; add n y)
    

That works fine for some applications. But you can use binary notation as
well, defining an integer as a list of bits, with the lowest-order bit first.
Then the "add" function becomes:

    
    
      \add == (\x\y x y \bx\nx y x \by\ny
         \sum=(add nx ny)
         bx
            (by (cons F (inc sum)) (cons T sum))
            (cons by sum))
    

Where inc is:

    
    
      \inc == (\x x (cons T null) \b\n
         b (cons F (inc n)) (cons T n))
    

I think I transcribed that right, but I haven't tested it in a while. It's all
surprisingly fast -- I was actually doing long division of 100 digit numbers
years ago in reasonable time in this notation, including the conversion to
decimal notation by repeated division by 10.

One of these days I need to link a big number library into Fexl, but it hasn't
been a priority for me. (Actually I need to think seriously about dynamically
linked libraries, with linkage control done in Fexl itself. This can be
sandboxed for secure applications, of course.)

Oh and here's my old division function, which again I haven't tested in a
while, so no warranties expressed or implied:

    
    
      #---------------------------------------------------------------------------
      # (nat:div x y) divides x by y.  It yields a pair <q,r>, where q is the
      # quotient and r is the remainder.
      #
      # The result satisfies the equation x = q*y + r,  0 <= r < y.
      #
      # NOTE:  If you divide by zero, the function yields the pair <0,0>.
      #---------------------------------------------------------------------------
      \nat:div==(\x\y\:
          x (: null null) \bx\nx
          y (: null null) \by\ny
          by
          (
              # divide by odd - recur on x only
              nat:div nx y \q\r
                  \r=(bx nat:2x1 nat:2x r)
                  \d=(nat:sub r y)
                  int:ge0 d
                      (: (nat:2x1 q) (int:abs d))
                      (: (nat:2x q) r)
          )
          (
              # divide by even - recur on x and y
              nat:div nx ny \q\r
                  : q (bx nat:2x1 nat:2x r)
          )
      )

------
pmb
Lisp is already made of trees. A Lisp program is a pre-order traversal of the
abstract syntax tree of a yacc'ed program in another programming language.
This fact is WHY Lisp is powerful. By eliminating yaccing you can create new
syntax on the fly without having to recompile your compiler.

------
dkarl
By "replacing" lists with trees, am I supposed to understand that working with
trees will be natural and easy, and lists will feel like second-class
citizens? I thought we were beyond thinking like that. Building languages
around a favorite data structure was never the right thing to do; it was
always tunnel vision and bad design. Different data structures have different
features and different performance characteristics. There's no reason for a
language designer to pick a favorite data structure and force anyone who has
to use a different one to deal with warts and second-class library and
language support.

------
mahmud
He seems to have heard of Lua but that didn't stop him from raising the
question.

Next article: " _Should airplanes be fitted with big rotor on top to allow for
vertical take off and hovering? .. like helicopters?_ "

~~~
daleharvey
not that jets need a big rotor on top to do vertical takeoff and hovering

<http://en.wikipedia.org/wiki/Harrier_Jump_Jet> :)

------
ScottBurson
I wouldn't say lists should be _replaced_ with trees, but adding a rich set of
tree-based data structures to Lisp improves it substantially, in my view. To
see what I mean, have a look at my FSet functional collections package, which
does exactly that:

    
    
      http://common-lisp.net/project/fset/
    

Unlike what the OP is suggesting, FSet doesn't expose the internal structure
of its trees. Instead it wraps them in a rich set-theoretic interface.

------
agentultra
You can just build your tree data structure and develop operators to work with
them in Lisp. There's nothing that restricts you to using lists.

So what's the point of "replacing" them (even if such a thing were possible
without "replacing" Lisp)?

~~~
jasonwatkinspdx
Cons cells give you binary trees. It is possible to encode any arbitrary tree
as a binary tree. However, this does not mean that the expression of
algorithms over such left child right sibling encoded trees is identical to
one over trees with some higher branching factor. So the cons/binary situation
will at the least require some additional layers of abstraction to simulate
various traversal patterns of the n way tree on the encoding binary tree.

There also physical/efficiency concerns. There's a reason databases use highly
branching trees.

~~~
agentultra
And you can create the abstractions necessary to work with such a data
structure and add them straight to the regular grammar of the language.

Lisp also has all of the other low-level primitives we're used to when working
closer to the machine. Structs, indexed arrays, hashes, etc. If efficiency is
a concern, just pick an implementation that compiles to machine code. SBCL is
pretty good for this depending on your target platform.

Replacing conses seems like a waste of time.

------
hxa7241
A few people have asked, what is the point? since trees and associative access
can just be implemented with lists.

It is the same basic point as having 'let' or 'case' etc. (which can all just
be implemented with 'lambda' and 'if' etc.) -- it is just a bit higher-level
to program with.

With something that makes programming a little easier (maybe), probably the
question ought really be, why _shouldn't_ we have it?

~~~
fexl
I think the answer is we _should_ have it, but it's just a library.

~~~
hxa7241
OK. Then I suppose the question is: why should it be library not built-in?

The main thing _not_ addressed in the article is: what are the language
implementation implications? What are the pros and cons there?

~~~
fexl
In my way of thinking, the distinction between "library" and "built-in" is
very fluid. That is, if you run an executable which automatically loads a
library for you, then voila it is now built-in. If nothing else, you can set
up your own local environment that you favor. I realize this has portability
implications, but keep in mind that not everyone will want "bignums" or
"socket operations" or "openssl" available all the time in every program.

For me, the language implications are nil. I would not change a single thing
in Fexl to support name-value "trees" versus lists. It's all programmable
using the system as it is. The S, C, Y, I, L, R combinators are what they are,
forever and ever, world without end. :) And I'm quite sure the same thing is
true of Lisp (i.e. just use defun or maybe defmacro). You'll notice that even
in the original article the author didn't propose any new syntax. It's all
just more functions.

I mean, even in _Perl_ for goodness sake all you have to do is this:

    
    
        # A map is a list of branches.  Each branch is pair(key,data).  The data is
        # pair(val,map).  No two keys in a map have any prefix in common.  A key cannot
        # be null.  If you put a null val at a key, it deletes that key from the map.
    
        sub map_put
            {
            my $map = shift;
            my $key = shift;
            my $val = shift;
    
            die if !defined $key || ref($key) ne "" || $key eq "";
            die if !defined $val;
    
            if (is_atom($map))
                {
                return $map if is_null($val);
                return pair(pair(atom($key),pair($val,null())), $map);
                }
    
            my $branch = left($map);
            my $old_key = name(left($branch));
    
            my $cmp = $key cmp $old_key;
    
            if ($cmp == 0)
                {
                my $sub_map = right(right($branch));
    
                if (is_null($val))
                    {
                    return right($map) if is_atom($sub_map);
    
                    if (is_atom(right($sub_map)))
                        {
                        # Combine singleton map upstairs.
    
                        my $top_key = name(left($branch));
                        my $next_key = name(left(left($sub_map)));
    
                        my $new_branch = pair(atom($top_key.$next_key),
                            right(left($sub_map)));
                        return pair($new_branch,right($map));
                        }
                    }
    
                my $new_branch = pair(left($branch), pair($val,$sub_map));
                return pair($new_branch,right($map));
                }
    
            if (substr($key,0,1) eq substr($old_key,0,1))
                {
                my $len_common = len_common_prefix($key,$old_key);
    
                my $key_suffix = substr($key,$len_common);
                my $old_key_suffix = substr($old_key,$len_common);
                my $common_prefix = substr($key,0,$len_common);
    
                if ($key_suffix eq "")
                    {
                    return $map if is_null($val);
    
                    my $sub_map =
                        pair(pair(atom($old_key_suffix),right($branch)),null());
                    my $new_branch = pair(atom($common_prefix),pair($val,$sub_map));
                    return pair($new_branch,right($map));
                    }
    
                if ($old_key_suffix eq "")
                    {
                    my $old_sub_map = right(right($branch));
                    my $new_sub_map = map_put($old_sub_map,$key_suffix,$val);
                    my $old_val_atom = left(right($branch));
    
                    if (name($old_val_atom) eq "")
                        {
                        return right($map) if is_atom($new_sub_map);
    
                        if (is_atom(right($new_sub_map)))
                            {
                            # Combine singleton map upstairs.
                            # TODO hey wait!  This is the same code as above.  Unify this.
    
                            my $top_key = name(left($branch));
                            my $next_key = name(left(left($new_sub_map)));
    
                            my $new_branch = pair(atom($top_key.$next_key),
                                right(left($new_sub_map)));
                            return pair($new_branch,right($map));
                            }
                        }
    
                    my $new_branch = pair(left($branch),
                        pair($old_val_atom,$new_sub_map));
                    return pair($new_branch,right($map));
                    }
    
                return $map if is_null($val);
    
                my $old_branch = pair(atom($old_key_suffix),right($branch));
                my $sub_map = map_put(pair($old_branch,null()),$key_suffix,$val);
    
                my $new_branch = pair(atom($common_prefix),pair(null(),$sub_map));
                return pair($new_branch,right($map));
                }
    
            if ($cmp < 0)
                {
                return $map if is_null($val);
                return pair(pair(atom($key),pair($val,null())), $map);
                }
    
            return pair($branch,map_put(right($map),$key,$val));
            }
    
        sub map_get
            {
            my $map = shift;
            my $key = shift;
    
            die if !defined $key || ref($key) ne "" || $key eq "";
    
            return null() if is_atom($map);
    
            my $branch = left($map);
            my $old_key = name(left($branch));
    
            my $cmp = $key cmp $old_key;
    
            return left(right($branch)) if $cmp == 0;
    
            if (substr($key,0,1) eq substr($old_key,0,1))
                {
                my $len_common = len_common_prefix($key,$old_key);
                my $key_suffix = substr($key,$len_common);
                my $old_key_suffix = substr($old_key,$len_common);
    
                return map_get(right(right($branch)),$key_suffix)
                    if $old_key_suffix eq "";
                return null();
                }
    
            return null() if $cmp < 0;
            return map_get(right($map),$key);
            }
    

That would be a lot cleaner in Fexl or Lisp, but it's totally do-able in any
language.

------
sambeau
This is either a reinvention of ASTs or it's Graph Reduction.

<http://en.wikipedia.org/wiki/Abstract_syntax_tree>
<http://en.wikipedia.org/wiki/Graph_reduction>

:)

~~~
jasonwatkinspdx
You provided two things that the original post is a reinvention of. Which is
it? Why not read the article enough to decide which it is a reinvention of,
then present your thinking in your comment?

------
skewsymmetry
Lists are trees.

~~~
ericmoritz
Wait, I thought trees are lists.

~~~
sesqu
No, trees are multidimensional lists. Lists are one-dimensional lists, which
are degenerate trees and a simple primitive for the more general (always a
loaded term) structure.

------
frobozz
Wouldn't that make it a new language called Trep?

