
Tree traversal without recursion: the tree as a state machine (2007) - userbinator
http://plasmasturm.org/log/453/
======
malisper
I've always found these kinds of tricks fascinating. There's all sorts of
crazy things you can do to save some bytes here and there. One of my personal
favorites is the XOR linked list[0]. It's a doubly linked list that uses as
much memory as a singly linked list. It works by XORing the previous pointer
with the next pointer. As you're doing a traversal you can compute the next
pointer by making use of _next = prev XOR (prev XOR next)_ giving you the
address of the next node in the list.

Anyways, I did some digging into TAOCP. In section 2.3.1 Knuth mentions
threaded binary trees[1]. They are trees that you can traverse without a stack
_and do not use more memory than a regular binary tree_. This is compared to
the method used in the post which requires you to store the parent pointers.
The trick is that in a binary tree, lots of pointers are going to be null.
Instead of storing null there, you can store a special value that provides
information on how to traverse the tree. Indicating the difference between a
regular pointer and one of these special values requires only a single bit.
You can often store that bit within the pointer itself, requiring no
additional space over a regular binary tree.

For a threaded binary tree, if the left child is null, you instead store the
predecessor node. If the right child is null, you store the successor node.
Finding the successor now becomes:

    
    
        1) Check if the right child is null. If so the right child pointer points to the successor. We are done.
        2) Follow the right child pointer. Then keep following the left child pointer until the left child is null. This gives us the smallest value greater than the current value.
    

One small advantage of threaded binary trees is you traverse the tree slightly
less. If the right child is null, instead of traversing back up the tree as in
the post, you can jump straight to the successor node.

[0]
[https://en.wikipedia.org/wiki/XOR_linked_list](https://en.wikipedia.org/wiki/XOR_linked_list)

[1]
[https://en.wikipedia.org/wiki/Threaded_binary_tree](https://en.wikipedia.org/wiki/Threaded_binary_tree)

~~~
xxs
>You can often store that bit within the pointer itself

Alternatively you can follow what java did with compressed ops and use 4bytes
per pointer on 64bit archs (instead of 8) and shift the pointers prior to
dereference - effectively enabling 32GB heaps with 4bytes pointers.

~~~
svat
It's a good idea IMO, but the two aren't alternatives, i.e. not mutually
exclusive — you can do both. See Knuth's “A Flame About 64-bit Pointers” from
2008
([https://cs.stanford.edu/~knuth/news08.html](https://cs.stanford.edu/~knuth/news08.html),
starting with “It is absolutely idiotic to have 64-bit pointers when I compile
a program that uses less than 4 gigabytes of RAM”). Work on making this
possible in Linux started as the x32 ABI in 2011 (see Wikipedia
[https://en.wikipedia.org/w/index.php?title=X32_ABI&oldid=887...](https://en.wikipedia.org/w/index.php?title=X32_ABI&oldid=887611857)
or this LWN article:
[https://lwn.net/Articles/456731/);](https://lwn.net/Articles/456731/\);)
unfortunately it looks like there's discussion about removing it (Dec 2018
thread starting here
[https://lkml.org/lkml/fancy/2018/12/10/1145](https://lkml.org/lkml/fancy/2018/12/10/1145)
though apparently I can't figure out how to navigate the LKML tree using
constant space).

~~~
dragontamer
I've found myself using 32-bit "array indexes" to half my "pointer-sizes" for
some code I'm writing (for fun). When doing linked data-structures (linked
lists, trees, etc. etc.), a huge amount of the data ends up being a pointer.

Consider a typical binary-tree with a 64-bit integer as its value. You use
24-bytes per node (value, left child, right child) if you use 64-bit pointers,
but only use 16-bytes per node if you use 32-bit pointers (or array indexes).

Now, a 32-bit pointer can have at most 4billion nodes across the whole tree.
But 4billion is more than usable for many programs.

EDIT: Consider that 4billion nodes of size 16 bytes (where each node is the
value + left child + right child struct discussed earlier) will take up 64GB
of RAM.

EDIT2: And half the time, my brain gets short-circuited and I end up
recreating some terrible form of segmented memory before having to slap myself
for thinking up such a horrible idea. In any case, a surprising amount of
memory is used up as pointers in almost all the code I write. If you care
about fitting as much data into L1 cache (64kB on modern machines), you will
absolutely want to minimize your data usage.

------
vnorilo
If you require the highest performance and your trees are write once, read
often (and immutable), it's worth looking into if you should just sort the
nodes in traversal order in one contiguous block of memory. Storing the edges
as indices into the block is handy in this case.

The article used parent pointers in node to avoid using a stack (native or
custom made). One downside of a parent pointer is that your identical subtrees
can no longer share structure.

~~~
Axsuul
Could you expand more on what you mean by "Storing the edges as indices into
the block is handy in this case."?

This is super relevant to me right now, thanks!

~~~
nosianu
You add integer slots to each node and store the index of the child nodes in
them. So you have, for a binary tree,

    
    
      [actualTreeNodeData1,child1Idx,child2Idx][actualTreeNodeData2,child1Idx,child2Idx][...]
    

instead of just

    
    
      [actualTreeNodeData1][actualTreeNodeData2][...]
    

Vary this to include whatever you actually need, could also be links to
parents instead of or in addition to links to child nodes.

~~~
rocqua
That feels like it'll start hurting when you don't know the (maximum) degree
of your nodes. As you won't have fixed-size object anymore.

~~~
silasdavis
Hurting because of memory alignment or something else?

Given we have indices along the serialisation for child nodes I would have
thought not having nodes start at the consistent intervals doesn't hurt us for
traversal at least.

I'd quite like to use this idea for a file backed immutable tree. The array of
children would have variable size with bit field to index then nth child in
the sparse array.

~~~
rocqua
Constant time array indexing depends on a fixed array member size. So, the
actual memory layout of storing the nodes with non-fixed size could not use an
array of nodes.

I suppose you could (should) write your own allocator to ensure nodes are
allocated in a contiguous piece of memory, and have a separate array of
indexes / pointers that tells you where the n-th node starts. That is another
array that'll have to be kept in cache though.

------
derriz
> You can get rid of any stacks whatsoever by keeping a parent pointer in the
> tree node data structure.

You can actually in-order traverse a binary tree without any memory overhead
at all (no parent pointer nor an implicit or explicit stack) using Joe Morris'
traversal.

I found that this can be done to be surprising but effectively it uses the
NULL slots as temporary storage.

Here's one explanation:
[https://yuyuan.org/MorrisAlgorithm/](https://yuyuan.org/MorrisAlgorithm/) \-
although there seem to be a few youtube videos on it also.

------
ttctciyf
Reminds me of Sean Parent's recounting[1] of how his son's beads and string
toy could represent a tree data structure with stateless traversal and some
other interesting properties, like:

> ...to erase elements from some point in my model to some other point in my
> model, what I do is grab those two locations on the string, and I pick the
> whole thing up, and the beads that fall off are the ones that get erased ...
> (48:41)

1: [https://youtu.be/sWgDk-o-6ZE?t=2721](https://youtu.be/sWgDk-o-6ZE?t=2721)

------
alkonaut
It seems like by adding parent pointer one is shifting complexity from the
traversal into the data structure, in order to avoid a stack. I know there are
places where avoiding an explicit stack can be nice (perhaps a BVH tree in a
shader or other places where constant space is required?) but are there more
good reasons for it? A DAG (such as a singly linked tree) has so many benefits
for memory safety, GC performance, ability to extend it from a binary tree to
N children etc., that it seems like an odd tradeoff in most scenarios.

~~~
HelloNurse
Obviously, adding read-only data in the tree and to save mutable data in the
traversal state can be amortized if we have many ongoing traversals rather
than one. For example, in a massively parallel GPU computation running an
arbitrary number of tree searches simultaneously in a fixed memory amount.

------
segmondy
Learn some Prolog, one of the best langauge for studying your algorithms and
data structures.
[https://rosettacode.org/wiki/Tree_traversal#Prolog](https://rosettacode.org/wiki/Tree_traversal#Prolog)

preorder(nil). preorder([Node, FG, FD]) :- format('~w ', [Node]),
preorder(FG), preorder(FD).

inorder(nil). inorder([Node, FG, FD]) :-inorder(FG), format('~w ', [Node]),
inorder(FD).

postorder(nil). postorder([Node, FG, FD]) :-postorder(FG),
postorder(FD),format('~w ', [Node]).

------
kazinator
> _You can get rid of any stacks whatsoever by keeping a parent pointer in the
> tree node data structure. Effectively, this turns the tree into a (sort of)
> state machine._

How can it be a state machine if you aren't mutating it?

In fact, you can traverse a tree without recursion (or any equivalent stack-
like structure) _and_ without parent pointers. And then you _actually_ use the
tree as a state machine: you stash the reverse path pointer in the tree itself
by overwriting the downward links temporarily and then putting them back.

This is useful for garbage collection; you can have a garbage collector that
is guaranteed never to blow the stack in the marking phase.

~~~
normalhuman
> How can it be a state machine if you aren't mutating it?

You are confusing a state machine (which is an abstract model of computation)
with the behavior of a program implementing it. The set of possible states and
the conditions of transition from one state to the next never change.

These conditions can depend on mutable state, as is the case described by the
author (in this case, mutable state is kept in the current and previous node
pointers).

------
dragontamer
> You can get rid of any stacks whatsoever by keeping a parent pointer in the
> tree node data structure. Effectively, this turns the tree into a (sort of)
> state machine. While traversing, you need no memory other than the current
> and previous node/state. The traversal algorithm is very simple:

While this sounds innocent, truly consider the ramifications of this decision.

Your node has now grown to be +8 bytes (the size of a 64-bit pointer). If you
have 1-million nodes in your binary tree, you are using +8MB of memory.

\-------

Lets consider the alternative: lets say you have 1-million nodes, and you have
a red-black tree (imperfect balance: instead of a perfectly balanced 20-depth
tree you have some branches which are 40-depth).

Traversing to 40-depth requires 40x pointers on the stack, or 320 bytes.

\----------

In effect, you're spending 8-bytes __per node __(which could be millions, or
even billions of nodes), to save log2(depth) space off of your stack frame.

While the state-machine is very "clean", it seems like a very bad tradeoff
from an algorithmic / memory space point of view. I'd rather spend log2(depth)
space on the stack, rather than O(+8 bytes * number nodes) in the heap.

I think there's something to be said about the clarity and cleanliness of the
state-machine design. Its quite possible that some algorithms are easier to
write with this "parent pointer in node" methodology. However, anyone seeking
the highest performance per unit-memory will have to see that the +8 bytes per
node is a terrible, terrible tradeoff.

\-------------

> Update: Todd Lehman pointed out that given node-level locks, this algorithm
> allows concurrent traversal and update of the tree. Any atomic operation
> other than detaching a non-leaf node is safe.

Hmm... the parallel-angle I haven't thought about. But it does seem to be a
potential building block as maybe even a lock-free data-structure.

Or just use locks, since locks are easier to think about.

Insertion and deletion into the tree seems possible, but "rebalance" is very
difficult for me to think about (since it possibly requires modifying many,
many nodes). Locking all nodes involved would be a heavy approach and probably
work.

Red-black colors are kept track of for self-balancing purposes. Maybe more
colors are needed to make the methodology "clean" for multi-threaded / lock
free purposes.

------
roddux
A complete aside: the site design on linked page is beautiful. Very minimalist
and totally functional.

~~~
ajuc
I don't know, maybe I get a different color scheme than everybody else? It's
light gray on white for me and the text is almost invisible.

[https://i.imgur.com/YM3dIcG.png](https://i.imgur.com/YM3dIcG.png)

Surely that's not considered a great design?

EDIT: ok, was fault of a chrome extension "NightModePro" :)

------
FeepingCreature
Note that if your language has non-precise garbage collection, having parent
pointers guarantees that any spurious pointer into any node on the tree will
keep the entire tree alive.

------
willvarfar
Excellent!

Generally, trees are poor for performance as you zig-zag around memory and
stall on cache misses. But in situations where you have to do tree traversal
and you have to be as fast as possible, trading the memory for the parent
pointer (which probably fits within the same cache line as the rest of the
node anyhow, so in practice basically free) vs recursion or an explicit stack
which stress memory in the tight loop, it is probably a winner! Definitely
worth profiling in those situations.

~~~
lmilcin
Man, hold your horses.

Assume you have 64GB of memory devoted to binary tree (of just integers),
assuming the tree is balanced.

Your node is two pointers + integer (24 bytes). That gives 2666666667 nodes.
Balanced tree will have 32 levels (log2(2666666667) ~= 31). When doing
recursion you just need to keep pointer to the level, so this is 32 times
pointer or just about 256 bytes of your cache.

Also, even if we were storing data in memory (and not in cache) this still is
orders of magnitude less effort than accessing the memory for the large data
structure.

~~~
wahern
> When doing recursion you just need to keep pointer to the level, so this is
> 32 times pointer or just about 256 bytes of your cache.

What language do you use where a recursive function invocation only requires a
call frame large enough to hold a single parameter? By what magic does the
function know where to return? And what are you doing in that function that
doesn't require any temporaries whatsoever?

At best you're looking at a couple of kilobytes. For all but the most
constrained environments (toaster, a 4Kb kernel thread stack, etc), this isn't
really worth worrying about. And compared to the pointer chasing, all those
pushes and pops aren't much of a problem, either.

However, in a language like C without function closures, non-recursive
traversal can keep code much more concise and clear. This is the real win,
IMO.

~~~
rrobukef
You could simulate the recursion with an iteration, a depth byte and a stack-
array of 32 pointers (64 for overkill). This removes the frame pointer and
makes the temporaries clear too.

~~~
wahern
But that's not recursion, which pretty much by definition implies the use of
repeated invocations of the same function(s). Explicitly building a stack is
exactly what the article was discussing, though it's far more clever than the
obvious approach. I've seen AVL implementations that use an explicit array for
maintaining a stack in lieu of recursion or parent pointers.

In any event, you always need a stack of some sort. Recursion is one way to
accomplish that, but saying that you're simulating recursion by explicitly
building a stack reverses the categories.

------
Hitton
Honestly I'm quite surprised that although this uses both Perl and goto it
gets upvoted. Sometimes I'm pleasantly surprised by HN community.

~~~
lioeters
The goto's did get my nerves bristling a bit on first sight - but it's the
right tool for the job (or at least gets the job done), so I was OK with it. I
guess other languages may have better/cleaner ways to achieve it.

I love the discussion that it sparked here too. It's wonderful and educational
to see so many smart people breaking down the problem space, discussing
de/merits of this particular technique, presenting alternative approaches with
analysis on memory use, performance..

------
huhtenberg
Asking to write a tree iterator was our "FizzBuzz" question when interviewing
people for the kernel dev position. Worked exceptionally well.

~~~
andreaorru
Are you still hiring?

~~~
huhtenberg
No. It was over 10 years ago, sorry.

------
inimino
I missed the introductory part where you mentioned that these are _binary
trees only_ and not just trees. I think you should mention it more
prominently.

~~~
kragen
There's an isomorphism between binary trees and general ordered trees, man.
That's how we can have Lisp.

~~~
inimino
Yes, sure, there's an isomorphism. Using that fact, is it then so trivial to
show how this FSM approach works out if you don't have a binary tree to start
with? I think it would make an interesting follow-up blog post. But failing
that, it would have been nice to call out at the top that these are binary
trees.

~~~
kragen
It definitely doesn't work if you don't change your in-memory data
representation, and it requires you to change it in a way that forecloses the
possibility of FP-persistent tree structures or efficient DAGs. Compared to
that, representing your ordered-tree nodes as cons chains is a barely-
significant change.

It _does_ generalize, though, to ordered-tree nodes that aren't represented as
cons chains. You just have to walk through the _n_ child pointers when you're
moving back up the tree to search for the one you're returning from, adding an
extra runtime cost factor proportional to _n_ but no extra memory. Moreover,
if you devote three or four registers to an amnesic stack, you can avoid that
cost for the bottom two or three levels of the tree, which will necessarily
contain the vast majority of tree nodes.

------
kofejnik
right, traverse a maze by always turning left

------
DoctorOetker
Brilliant, I need something like this for one of my projects (attempt to
decrypt Beale cipher)!

------
enz
> then, when discussing how to turn recursive functions into iterators using
> an explicit stack (which permits breadth-first searching);

How about a depth-first search?

------
nudpiedo
I thought everyone learns that in the university.

~~~
inimino
Not everyone went to university.

~~~
nudpiedo
Just an observation, for some reason people take as elitism the observation of
having a complete well rounded curriculum has certain advantages. And this
article has a good point to illustrate that there is value in taking an
engineer rather than a boot camp code fighter.

Data structures are a very specific subject of IT Engineering, every person
studying subject should take the whole curriculum, including recursivity
scanning data structures and how to transform these recursive procedures into
iterative code (or FSM as the article says).

~~~
inimino
I'd take an English major who is self-taught in computer science over a CS
grad any day. And I'd take a bootcamp kid who went on to study the theory
themselves over a Stanford kid who only knows this shit because he was tested
on it. I'm honestly, at this point, not sure why we even teach software
engineering in schools, as we so manifestly fail at imparting anything to the
kids, and the ones that come out as decent engineers would have been so
anyway. Oh right, it's signalling.

~~~
nudpiedo
Whatever comparison you do, you should compare a representative subset of both
populations in similar situations in life, not just cherry pick the self
selected ones, the successful and survivors, ones from one population against
the potatos from the other.

There is no point of comparison in five dedicated years of highly academic
study versus 6 months of touching the techs du jour. Will the boot campers
have also graph theory background? And compilers and assembly or just webpack
preprocessors? Database design or just picking stacks? and compilers and
network protocols?

Are you sure you are not comparing a 35 year old responsible person who made a
boot camp to sustain his family or an entitled post-grad IT engineer who is
still a bit high from the graduation party? It would be hypocrite not to admit
that a hungry serious battle tested individual will have the right attitude to
build a role rather than the not serious entitled individual who just wants to
see what he can get from there and does not make any effort in continuously
improve.

Once in a plane I met a CEO who told me he would take any day an engineer for
the decision making role or business related roles. Was he crazy or it was
just his previous experience conditioning him? you can take whoever you want
for your company, even someone who took gender studies and ended up taking a
code camp hoping to pay its former debt and ends up making high impact blog
posts about thyr experience in the digital IT world as person of XYZ gender.

~~~
inimino
> Whatever comparison you do, you should compare a representative subset of
> both populations in similar situations in life

Well, yes and no.

If you're an 18 year old who wants to program computers and you read Turing
and Dijkstra for fun, then go study math or physics, which you won't have much
time to do later. Or pick a subject in the humanities if you like to read and
write. Is it a great way to become a programmer? Maybe not, it would be better
to just go and get an internship or a job, but (1) smart companies are hard to
find and (2) you'd be missing out on a chance to get a university education at
the age when most people do so.

On the other hand, if you're picking coworkers, go with the self-motivated one
who had to learn everything they learned because they were interested in it,
over the one who had to learn it to get a degree in a field that everyone
knows is a ticket to a high-paying job.

> Once in a plane I met a CEO who told me he would take any day an engineer
> for the decision making role or business related roles. Was he crazy

Sure, this is the same point: if the only thing you're good at is your
specialization, then you're probably not going to be very good at it. Unless
it's something like chess that really has minimal relation with anything else
in society.

------
jnordwick
Congratulations you've just recreated CS Intro Data Structures 61 tree
iterators? Constant space is the normal way with a parent pointer.

