
Data Structures in Clojure: Singly-Linked Lists - llambda
http://macromancy.com/2014/01/16/data-structures-clojure-singly-linked-list.html
======
aredington
One of Clojure's fundamental opinions, asserted by the language eagerly, is
that vector quantities should be treated as values the way scalar quantities
are in most languages. Specifically, this means that when you operate with a
list and append to it, instead of modifying the value of the next (cdr)
pointer in the last node, you create a new list and return the new list as the
result of the append operation. These immutable collections afford certain
classes of algorithm that destructively modified ones do not. Rich reviews
this concept pretty well here: [http://www.infoq.com/presentations/Value-
Values](http://www.infoq.com/presentations/Value-Values)

This story doesn't spend much time discussing this particular opinion of
Clojure's or why you would choose to circumvent it by making a mutable linked
list. Additionally by virtue of skipping the immutability discussion, it fails
to discuss how append to the head of a list is crucially different from append
to the tail when working with data structurally sharing underlying data.

Additionally it prefers to define an interface over a protocol (using
definterface instead of defprotocol), and it uses camelCase naming instead of
Clojure's culturally embraced hyphen-cased naming.

In short this describes a way to build a data structure in Clojure, but is not
a particularly good as an example to be followed of how you should build data
structures in Clojure. johnwalker cited a couple more idiomatic examples, e.g.
data.finger-trees: [https://github.com/clojure/data.finger-
tree/blob/master/src/...](https://github.com/clojure/data.finger-
tree/blob/master/src/main/clojure/clojure/data/finger_tree.clj)

------
stiff
Those days I guess the phrase "linked list" shouldn't really appear without
the phrase "locality of reference" somewhere near. From what I understand, the
only practical context in which classic linked lists could be better than
dynamic arrays on current hardware, maybe except some really atypical
applications (huge objects in the nodes?), is when there is a lot of appending
and removal done on the ends of the list, or if you need a lot of insertions
somewhere in the middle.

The wikipedia article on linked lists linked in the article explores a lot of
angles and is really excellent. There are some newer, cache-friendlier
versions of linked lists, I hope they will start being covered more commonly:

[http://en.wikipedia.org/wiki/Unrolled_linked_list](http://en.wikipedia.org/wiki/Unrolled_linked_list)

[http://en.wikipedia.org/wiki/CDR_coding](http://en.wikipedia.org/wiki/CDR_coding)

Given linked lists are the workhorse of every Lisp, I wonder if there has been
work on using those kinds of data structures in Lisp implementations, or is
for example Clojure still using classic linked lists internally.

~~~
johnwalker
Some time ago, hypirion wrote about how persistent vectors are implemented in
Clojure.

[http://hypirion.com/musings/understanding-persistent-
vector-...](http://hypirion.com/musings/understanding-persistent-vector-pt-1)

There are actually several different implementations of the core data
structures:

[https://github.com/clojure/data.finger-
tree](https://github.com/clojure/data.finger-tree)

[https://github.com/clojure/core.rrb-
vector](https://github.com/clojure/core.rrb-vector)

Of course, the short answer is that Clojure isn't using classic linked lists,
and that regular Java collections can be used and created pretty easily. [0]
[1]

[0] [https://stackoverflow.com/questions/4313505/converting-
cloju...](https://stackoverflow.com/questions/4313505/converting-clojure-data-
structures-to-java-collections)

[1] [http://blog.raek.se/2011/01/24/executors-in-
clojure/](http://blog.raek.se/2011/01/24/executors-in-clojure/)

~~~
stiff
Thanks for the great resources. I still wonder to what extent is it at all
possible to do cache-friendly data structures on the JVM. Besides the somewhat
abstract data structure considerations of the kind posted, there is still the
issue of whether in the end references are stored in the collections memory or
the values themselves. On the surface level it looks like Java collections
generally use references for everything, except perhaps for arrays and (maybe)
ArrayLists of primitive types. See for example:

[http://java.dzone.com/articles/slab-guaranteed-heap-
alignmen...](http://java.dzone.com/articles/slab-guaranteed-heap-alignment)

~~~
pron
Java _currently_ only stores primitives inline (in arrays or in objects).
However, the memory allocation scheme (at least in HotSpot) places objects
that are created one after another by the same thread consecutively in memory.

Nevertheless, cache-friendly data structures are still very much possible on
the JVM. Consider something like a B-Tree node, with an array of references to
the children and an array of keys. The node object and the arrays will be
stored together in memory, so the only cache-missing pointer following you'd
do is when you jump from one node to another, just as would happen in C/C++.
So the memory placement is certainly not as flexible as in C/C++, but you can
certainly be pretty cache friendly. In fact, this is a very active topic, as a
lot of people are starting to use Java in low latency applications. For
example, look at the discussions here:
[https://groups.google.com/forum/#!forum/mechanical-
sympathy](https://groups.google.com/forum/#!forum/mechanical-sympathy)

There is work being done to support value types in Java, that are immutable,
cannot be referenced (hence only compared by value), and stored inline in
arrays (well, the latter is an implementation issue and up to the particular
JVM to decide). This may be included in Java 9.

------
drcode
Certainly a nice intro to Clojure types, though it might be a bit confusing to
a beginner, given that linked lists are already built in to Clojure and
there's nothing I can see in the tutorial saying "You don't actually have to
do any of this, this is just an exercise it for pedagogical purposes."

~~~
aaronem
I was wondering that myself. "Isn't Clojure a Lisp? Doesn't it thus give you
linked lists for free? What's the point of reimplementing them at the JVM
level?"

------
dschiptsov
Why not just stop at the point that a Pair is just two pointers and a List is
a bunch of nested Pairs, without any classes, deftypes or monads?)

Btw, the classic trick from SICP lectures, where a Pair was implemented as a
closure (a procedure) that accepts messages (to illustrate code-is-data
principle) is much more interesting abstraction than this overengineered OO
stuff.)

~~~
dschiptsov
Now I have a full keyboard instead of that clumsy phone.)

The very idea "to implement a List" in what is supposed to be called a Lisp
is.. well, I do respect the amount of work that has been done, but.

Such kind of confusion, it seems, comes from a wrong perspective of what a
Lisp is. Let's look first at what it is not.

It is _not_ some "modern OO" language with parenthesized prefix notation, it
could be so boring. Second, a List is not some "data structure in Lisp", it is
the most important, non-separable fundamental feature of the language. Without
"native conses" a Lisp is nonsense.

The explanations is going to grow lengthy, but it worth a try.

A Lisp is a set of layered DSLs and ADTs. Its "miracle" in what it is _small
and good-enough_ , in accordance with "Less is more" maxima.

There is no fucking "Class Node" in what a List is. It is just (cons 1 (cons 2
(cons 3 '()))) or 1:2:3:[] if you wish.

The big ideas is to separate clearly the use from the implementation, which is
called an Abstraction Barrier.

A complete definition of the basic building block - a cons cell, or the Pair
ADT is _cons, car, cdr, pair?_ and _nil_ while a List is just one more
"constructor" \- _list_. This is minimal and it is good-enough.

It is so "good" that it "created" another small miracle - _cadr /cddr_
notation, which is just an accidental piece of beauty created by a human mind
(like a Lisp itself). Only complete tasteless person could abandon this in
order to use dull and mediocre _first /rest_ or any other names for selectors
that doesn't "chain" and reflect the notion that almost everything could be
made out of conses.

The problem is that we have way too many of those who cannot grasp the beauty
of what a _cadr_ or _cddr_ is.

Implementation in a language's runtime is quite another topic. _There_ we
probably should have a notion of a pool-based allocation (in order to not call
malloc for each cons) and locality (to properly implement persistence) etc.
but these implementation specific things _must never leak int "high-level"
use_. Again, there is no fucking Class Node.

Speaking of Lisp apart form Lists is like speaking of water apart from the
waves. Water has waves, Lisp has Lists in its own nature.

Giving this nature of Lists we could have a general mapper procedure, or
_map_. The idea of a Functor from Math (or these control abstractions from
Haskell) has nothing to do with it. It is just the recursive nature of List's
structure allows us to define recursive processes.

Even trees not necessarily must have any Class Node. Here is a classic
example:

    
    
       (define make-tree cons)
       (define datum car)
       (define children cdr)
       (define (leaf? n)
          (null? (children n)))
    
       (define (tree-map fn tree)
          (make-tree (fn (datum tree))
    	     (map (lambda (ts) (tree-map fn ts))
    		  (children tree))))
    

There is no "Single-Linked List Data Structure in Lisp" because there is no
Lisp without conses.

Last but not least. This very site is build in a dialect of Lisp which never
tries to redefine what a List is.

More about Lisps - [http://karma-
engineering.com/lab/wiki/Bootstraping](http://karma-
engineering.com/lab/wiki/Bootstraping)

More about piling up useless abstractions - [http://karma-
engineering.com/lab/wiki/Monads](http://karma-engineering.com/lab/wiki/Monads)

------
djb_hackernews
Pretty indepth stuff, it'd be useful if you point out why you even need the
INode interface, unless I missed that.

~~~
Anderkent
`deftype` only lets you implement functions that match an interface, protocol,
or override Object. It's meant to narrow down the meaning of a type, and
usually any non-interface methods would just be clojure functions that take an
object.

It does play weirdly with the fact that mutable fields are private, I agree.
But that is a very rare case, and in general you'd usually create a type to
implement an interface or protocol.

Quoting Rich Hickey:
([https://groups.google.com/forum/#!msg/clojure/pZFl8gj1lMs/qV...](https://groups.google.com/forum/#!msg/clojure/pZFl8gj1lMs/qVfIjQ4jDDMJ))

"AOT deftype vs gen-class touches on the same Clojure semantics vs Java/ host
semantics, with the objectives from before - support implementing interfaces
but not concrete derivation. So, no concrete base classes, no super calls,
self-ctor calls, statics, methods not implementing interface methods etc."

------
blt
Get rid of that s-t ligature. It's distracting.

~~~
aaronem
What ligature would that be? I'm afraid I don't follow.

~~~
mej10
On words that have "st" there is a ligature between the s and t. It looks like
this:
[http://typophile.com/files/Letter_st_6439.jpg](http://typophile.com/files/Letter_st_6439.jpg)

There is also one on "ct".

~~~
simcop2387
I'm not seeing that either, can you take a screenshot?

~~~
eurleif
I'm seeing it too: [http://imgur.com/puK8OiW](http://imgur.com/puK8OiW)

~~~
simcop2387
Hmm, i was using firefox. I wonder if it's a webkit only thing?

