

Show HN: New Lisp data structure - daniel-cussen
https://github.com/daniel-cussen/Trellis

======
rntz
The two most common operations on lists in typical functional programs are
consing and deconsing, which are O(1). The most analogous operations on
trellises are appending and removing the tail element, which are O(log n). So
it seems inappropriate to propose trellises as general replacements for lists.

Indeed, I'm not sure in what situations this tradeoff of cons/decons- for
lookup-speed _is_ helpful. If I just need fast lookup, I'll use a mapping
structure (hashtable, balanced binary tree, patricia trie, etc). If I need
reasonable lookup speed and also standard sequence manipulations, finger trees
or ropes seem like a better choice. They're more complex, but if performance
is an issue, then it's probably worth it.

Also, a bug/design flaw: You can't store NILs in the terminal position:

    
    
       > (trellis 1 2)
       (1 (2))
       > (trellis 1 2 nil)
       (1 (2))

~~~
daniel-cussen
About the bug, I talk about this at the end of trellis2.lisp. I didn't want to
weird people out by adding a second sentinel to the list.

I realize these aren't the best at any one thing, but in situations where
you'd normally use lists (exploratory programming, say) you can swap to this
for a performance boost.

~~~
jules
That's just broken, and will lead to horrible bugs. There is a reason why
normal Lisp lists end with NIL instead of storing the last element in the last
cdr.

~~~
daniel-cussen
> That's just broken, and will lead to horrible bugs.

You mean you think people will use this? Thanks :)

Here's how I was thinking of fixing it:

(setf nol 'nol) ; make nol evaluate to itself

(defun my-null (x) ;redefine null (if (or (null x) (eql nol x)) t null))

Then, change add so that if it's adding a nil, it changes the sentinel at the
end of the linked list part of the list to nol.

So (trellis 1 2) returns (1 . ((2 . NIL) . NIL))

And (add nil (trellis 1 2)) returns (1 . ((2 . NIL) . NOL))

As soon as someone adds something other than a nil to a cdr of a cons in a
tree, add changes nol back to nil.

~~~
alec
Use gensym to make a unique sentinel.

~~~
Shamiq
To clarify, you mean: (setf nol (gensym)) instead of (setf nol 'nol)?

------
somnium
How do these differ and what advantage do they offer over Chris Okasaki's
Random Access Lists? <http://www.eecs.usma.edu/webs/people/okasaki/fpca95.ps>

Scheme SRFI with implementation
<http://srfi.schemers.org/srfi-101/srfi-101.html>

~~~
jbapple
> How do these differ and what advantage do they offer over Chris Okasaki's
> Random Access Lists?

These are inferior, as Okasaki's structure offers O(1) worst-case cons, even
when used in a functional setting. As far as I can tell from the Wikipedia
page on Exponential-Golumb coding, trellises require Omega(lg n) for cons.

~~~
daniel-cussen
I get the feeling these can do things Random Access Lists can't (I haven't
experimented yet, but I think trellises can support both data and code).

~~~
somnium
Perhaps you could elaborate on that feeling? (and also clarify what you mean
by supporting data and code)

The linked SRFI advocates replacing Scheme's traditional pairs/lists with
RALists.

~~~
daniel-cussen
I'm looking at that and McCarthy's eval now.

------
jules
This "list of trees of increasing size" data structure is not new. You can
find plenty of these structures in Okasaki's Purely Functional Data Structures
(look in the chapter about numerical representations). His random access lists
have O(1) adding an element compared to this O(log n).

------
gwern
I've never been very good at data structures, but the increasing size of trees
makes it sound like finger trees: <http://en.wikipedia.org/wiki/Finger_tree>

~~~
pjscott
Finger trees are even more impressive than this (very clever) data structure.
A sequence based on finger trees supports:

* O(lg n) lookup, split, concatenate, random insert/delete

* amortized O(1) push and pop at either end of the sequence

* Caching the value of a monoid reduction of each subsequence. Surprisingly useful.

Trellises are easier to understand, though.

------
daniel-cussen
Update: I feel I'm not nailing the questions about why this is better than
other data types or why this is not identical to other data types. But I found
this: <http://en.wikipedia.org/wiki/Composite_data_type>

The trellis is a composite data type in lisp, and of those composite data
types (which also includes assoc-lists and...?), it is the only one with
O(logn) for any of its functions (particularly search).

I hope that conveys what I felt was exciting about discovering (or
rediscovering) this data type.

~~~
jbapple
> of those composite data types (which also includes assoc-lists and...?), it
> is the only one with O(logn) for any of its functions (particularly search).

I think you meant "all", not "any", since assoc list offers O(lg n)
performance for one of its functions. In any case, this is incorrect. You can
build any tree-like data structure you like with cons and nil, and, as pointed
out elsewhere, Okasaki (as well as others before him), have demonstrated
several tree-like data structures with O(lg n) cons, car, cdr, and nth
equivalents.

~~~
daniel-cussen
Which function on assoc lists is O(log n)?

~~~
jbapple
> Which function on assoc lists is O(log n)?

acons is O(1), and the O() notation is for upper bounds, so anything O(1) is
also O(lg n).

~~~
varjag
Strictly speaking, no. Precisely because O-notation defines the worst case.
O(1) is not O(log n), since the list insert will never exhibit log behavior.

~~~
aperiodic
Strictly speaking, yes. O-notation describes the growth of a function as the
argument tends to infinity. Formally, the statement "f(x) is O(g(x))" means
that there's some point x_0 such that f(x) is less than a constant factor
times g(x) for any x > x_0.

This is clearly the case for f(n) = 1, g(n) = lg(n), hence the constant
function is O(lg(n)).

An algorithm can be O of different g(x)s, depending on the properties of the
input. For example, the runtime of a naive quicksort implementation (which
always chooses the leftmost value as a pivot) is O(n^2) if the input list is
sorted, while it has runtime O(n*lg(n)) on average and in the best case (where
the pivot is always the median of the section of the list being partitioned).

~~~
varjag
> This is clearly the case for f(n) = 1, g(n) = lg(n), hence the constant
> function is O(lg(n)).

It is not, because constant function has no logarithmic behavior. To
distinguish that is the whole point of big-O notation. When you tell someone
"this is a O(logn) operation" they do expect log performance.

Yes O(c) is strictly under O(n) too, but so what? We can just as happily
declare most functions double-exponential in complexity, but what use is that?
It would be one of those formally correct but practically useless definitions.

~~~
aperiodic
> It would be one of those formally correct but practically useless
> definitions.

"Hi, my name is Aperiodic, and I'm... a mathematician."

"Hi Aperiodic."

"It all started out so easily; you know, a few lemmas with the boys in the
evenings. But before I knew it, I was picking up Bourbaki as soon as I got
home from work. I would wake up in the mornings, surrounded loose sheets of
paper covered in commutative diagrams, without a clear idea of what I did last
night..."

------
daniel-cussen
I know this is a big claim, but there are mathematical reasons to support why
it works. And it works, at least on my computer.

~~~
jbapple
> I know this is a big claim

Perhaps, but it needn't be to get the same asymptotic performance.

Okasaki's simplest random access list has performance proofs that depend only
on the simplest properties of binary numbers, rather than those of the
Exponential-Golumb coding, and the code is very easy to write:

    
    
        data BRAL a = Empty
                    | Full (Maybe a) (BRAL (a,a))
                      deriving (Show)
    
        cons :: a -> BRAL a -> BRAL a
        cons x Empty = Full (Just x) Empty
        cons x (Full Nothing ys) = Full (Just x) ys
        cons x (Full (Just y) ys) = Full Nothing (cons (x,y) ys)
    
        nth :: Integer -> BRAL a -> a
        nth 0 (Full (Just x) _) = x
        nth 0 (Full Nothing xs) = fst (nth 0 xs)
        nth n (Full (Just _) xs) = nth (n-1) (Full Nothing xs)
        nth n (Full Nothing xs) =
            let pair = nth (n`div`2) xs
            in if n`mod`2 == 0
               then fst pair
               else snd pair

------
aidenn0
What is the insertion time on this? It seems like it would be O(n) to insert
at the head, at which point why don't you just use a vector?

[edit]It is always log(n) to insert at end, which is better than vector in the
worst-case

~~~
daniel-cussen
Insertion into the middle is O(n) because you have to rearrange everything
after that. Adding something to the end (#'add) is O(logn). Inserting at the
head is indeed slow (unless you are willing to break the structure) but there
are many reasons not to use a vector; insertion time isn't everything. With
vectors, you have to make sure your vector has empty space left or is a
dynamic array. Here, you don't have to worry about anything (short of blowing
the stack).

------
jemfinch
How are these better than balanced binary trees that keep subtree size in the
node?

~~~
daniel-cussen
Predictable structure and smaller nodes.

And most of the other advantages linked lists have over binary trees, with
some of those advantages being slower.

~~~
jemfinch
If I'm using both as a abstract data types, does predictable structure really
help me at all?

~~~
daniel-cussen
If you're using it for code (the way Lisp uses linked lists) you know when one
function ends and a nested one begins.

I'd like to point out, however, that this is _in theory,_ as I haven't seen
this data structure doing anything other than moving data around.

------
j_baker
Why would one want to use this over the VList? It provides O(1) access on
average and O(log n) access in the worst case. Plus, it also does O(log n)
length finding.

<http://en.wikipedia.org/wiki/VList>

~~~
munificent
From skimming wikipedia and the code, it looks like a trellis is more or less
a VList where each array in the VList is represented by a complete binary tree
of an appropriate depth. That gives you the VList's O(1) average performance
to find the right tree, but then you have to do an O(log n) walk down the tree
to the desired element.

Am I interpreting that right?

~~~
daniel-cussen
Pretty much. Finding the right tree takes O(logn) as long as walking down that
tree.

------
puredanger
Kind of reminds me intuitively of skip lists where the additional partial
lists provide a "fast track" for searching, inserting, and removing in the
list in logn rather than n time.

------
nickik
Very intressting. Could the be used in clojure (I mean do the provid the same
imutabillity concells do)

What are the benefits of these over clojure vectors?

~~~
daniel-cussen
You know, I tried clojure out, but the cons cells are implemented in a weird
way; they can't combine two objects, only an object and a sequence. So I could
probably do something _like_ trellises, but they wouldn't be as elegant.

~~~
cgrand-net
I think one can use [a b] and destructuring instead of cons/car/cdr and be as
elegant.

