
Enumerating Trees (2013) - espeed
https://www.cs.virginia.edu/~lat7h/blog/posts/434.html
======
lgessler
As a person, Luther Tychonievich is one of the most interesting professors I
ever had.

This was one of my first impressions: he explained to us on the first day of
class that (despite (because of?) his having a PhD in computer science) he did
not have an internet connection at home and would not be able to respond to
our emails outside of working hours.

He has also blogged publicly and extensively[1]. He has written about math and
computer science--his explanation of Cantor diagonalization[2] is the first
one that clicked with me. On the other hand, he has blogged about his inner
life[3]. Though I am personally irreligious, it makes me happy to see him
blogging about something so personal and, frankly, unfashionable. I'd hazard a
guess that being publicly Christian would probably not be in your interest if
you're trying to make a career for yourself in academia, but he has clearly
not tried to hide it. I admire the commitment he has to the dictates of his
soul.

[1]:
[https://www.cs.virginia.edu/~lat7h/blog/index.html](https://www.cs.virginia.edu/~lat7h/blog/index.html)

[2]:
[https://www.cs.virginia.edu/~lat7h/blog/posts/124.html](https://www.cs.virginia.edu/~lat7h/blog/posts/124.html)

[3]:
[https://www.cs.virginia.edu/~lat7h/blog/posts/563.html](https://www.cs.virginia.edu/~lat7h/blog/posts/563.html)

~~~
User23
Thank you so much for this comment. While the original submission is
interesting, these links increase my appreciation of the author greatly.

As an aside, Don Knuth is another example of a world class computing scientist
who is also an avowed Christian.

------
lalaland1125
EDIT: Actually, in retrospect, this is not quite right. It maps every tree to
a natural number, but not every natural number to a tree. Regardless, it can
be handy in certain circumstances.

Another simpler way of doing this is to encode a tree with n levels as 2^n - 1
bits. The first bit is 1 if and only if this tree has a root. The next 2^(n-1)
- 1 bits represent the left tree. The following 2^(n-1) - 1 bits represent the
right tree. Recurse as needed until you get to a tree with only one level
(which is either a single node when the bit is 1, or is nothing when the bit
is 0).

I'm fairly sure this is not optimal, but it is much simpler.

~~~
User23
This is quite like the trick for flattening binary trees into an array.

------
exDM69
Enumeration techniques can be really useful in surprising situations.

I am working on a planetary terrain rendering system for a game about
spaceflight. The technique is based on conforming quadtrees but at no point
there is an actual tree structure, it is encoded in a single integer. Starting
from an origin point, the tree is traversed by moving to the adjacent subtree
or parent node using simple bit twiddling.

The goal is to write a vertex shader than can efficiently generate triangle
number n in quadtree m. This will be then fed to tesselation shaders for LOD
subdivision.

~~~
espeed
Have you seen the VDB tree...?

VDB: High-Resolution Sparse Volumes with Dynamic Topology (Academy Award
Winning data structure)

[http://www.museth.org/Ken/Publications_files/Museth_TOG13.pd...](http://www.museth.org/Ken/Publications_files/Museth_TOG13.pdf)

OpenVDB is an open-source C++ implementation with Python wrappers:
[https://ken.museth.org/OpenVDB.html](https://ken.museth.org/OpenVDB.html)

[https://github.com/dreamworksanimation/openvdb](https://github.com/dreamworksanimation/openvdb)

Ken works at SpaceX now, but a few years ago he gave a lecture on VDB at the
Berkeley Institute for Data Science (BIDS)...

OpenVDB: An Open Source Data Structure and Toolkit for High-Resolution Volumes
[video]
[https://www.youtube.com/watch?v=7hUH92xwODg](https://www.youtube.com/watch?v=7hUH92xwODg)

------
drdeca
For unlables rooted trees, there are also the matula numbers which naturally
map between these trees and the natural numbers.

Also, if you want binary trees, another way to do that could be to take any
bijection between positive natural numbers and pairs of natural numbers
(including 0), and, well, it is fairly straightforward.

Let 0 correspond to the tree of one vertex (no children), and then for any
binary tree, associate it with the natural number corresponding to the pair of
natural numbers which are associated with the two child subtrees.

The specific bijection used to pair them is up to you. This page seems to
describe one example of this.

~~~
sagebird
I have explored Matula numbers/trees before knowing what they were called. The
following document has example trees and an explanation :
[http://williamsharkey.com/integer-tree-
isomorphism.pdf](http://williamsharkey.com/integer-tree-isomorphism.pdf)

------
utopcell
Knuth teaches how to generate all trees in TAOCP [1].

[1]
[http://www.cs.utsa.edu/~wagner/knuth/fasc4a.pdf](http://www.cs.utsa.edu/~wagner/knuth/fasc4a.pdf)

~~~
shoo
Knuth also has some interesting material on BDDs (binary decision diagrams),
which can be used to encode, count, enumerate and uniformly sample a variety
of combinatorial structures.

[http://www.cs.utsa.edu/~wagner/knuth/](http://www.cs.utsa.edu/~wagner/knuth/)

------
iams-
I didn't see that mentioned the number of binary trees with n nodes -- T(n) --
is precisely the Catalan numbers

These can also be interpreted as the number of strings of parentheses with n
matching pairs

So P(2) would be (()) and ()()

Surely there is a bijective map between P and T somewhere in there

------
jesuslop
The concept of entropy pushes one to think of binary trees. Huffman coding
used in real compressors use Huffman trees [1] that express the prefix code
used to define entropy as in Colah's blog
[http://colah.github.io/posts/2015-09-Visual-
Information/](http://colah.github.io/posts/2015-09-Visual-Information/)
Entropy is the expected lenght, so it's the weighted sum of the lenght of the
paths from root to leaves, weighted by their frequency.

[1]
[https://www.siggraph.org/education/materials/HyperGraph/vide...](https://www.siggraph.org/education/materials/HyperGraph/video/mpeg/mpegfaq/huffman_tutorial.html)

------
psykotic
There's a simple recipe for arithmetically encoding recursive algebraic data
types (in the functional programming sense) which is related to this.

What you might have seen is Goedel numbering where a finite sequence of
natural numbers a_0, a_1, ..., a_n (where n isn't fixed but can vary per
sequence) is mapped bijectively onto p_0^a_0 a_1^p_1 ... a_n^p_n where p_0,
p_1, ... is an enumeration of the primes.

However, if you want to represent trees instead of sequences, you have a
better, simpler option. The key is the existence of a bijective pairing
function between N^2 and N, which you can write as <m, n> for m, n in N.

You have a lot of choices for how to construct the pairing function. But a
curious fact is that there is essentially one polynomial pairing function and
it's the one you saw in class when you learned that the rationals are
countable:
[https://en.wikipedia.org/wiki/Fueter%E2%80%93P%C3%B3lya_theo...](https://en.wikipedia.org/wiki/Fueter%E2%80%93P%C3%B3lya_theorem).

From this perspective it should be clear that bit interleaving provides one
such pairing function.

Here's a larger example of how you can use this for encoding algebraic data
types. The data type in this article is an unlabeled binary tree:

    
    
        data Tree = Leaf | Node Tree Tree
    

I will use square brackets to denote the encoding. Then

    
    
        [Leaf]     = 0
        [Node a b] = 1 + 2*<[a], [b]>
    

So the tag for the sum type is the residue modulo 2, and if the tag is 1 you
can divide by 2 (shift right by one bit) to extract the encoded pair of
subtrees. If the sum type had three terms you could use 3 instead of 2:

    
    
        data Tree = WhiteLeaf | BlackLeaf | Node Tree Tree
    
        [WhiteLeaf] = 0
        [BlackLeaf] = 1
        [Node a b]  = 2 + 3*<[a], [b]>
    

This also works with labeled trees:

    
    
        data Tree = Leaf N | Node Tree Tree
    
        [Leaf n]   = 0 + 2*n
        [Node a b] = 1 + 2*<[a], [b]>
    

If the label type wasn't a natural number, you'd just recursively encode it.

In the above I treated the <m, n> pairing as a black box that can be
implemented in different ways (e.g. Cantor pairing, bit interleaving). You can
abstract out the encoding of sums as well. You're looking for a bijection
between N and Inl N | Inr N and I made the particular choice

    
    
        [Inl n] = 0 + 2*n
        [Inr n] = 1 + 2*n
    

But you have other options as well, though this is probably the simplest. And
once you have an encoding for sums and products you can apply them recursively
to encode any recursive algebraic data type.

Note that the sum and pair encoding functions are bijective on the natural
numbers, but the corresponding encodings for algebraic data types aren't
necessarily bijective, only injective. For the unlabeled binary tree example,
the number 2 has a tag of 0 (leaf) but isn't the encoding of any tree.
However, the encoding for labeled binary trees is bijective if the label type
is bijectively encoded. The problem with this kind of recursively constructed
encoding that terminates in finite types is that you obviously cannot have a
bijection between a finite set and an infinite set like N.

To tie it back into the article, he uses this encoding:

    
    
        data Tree = Leaf | Node Tree Tree
    
        [Leaf]     = 0
        [Node a b] = 1 + <[a], [b]>
    

You can extend this to my example with two kinds of leaves:

    
    
        data Tree = WhiteLeaf | BlackLeaf | Node Tree Tree
    
        [WhiteLeaf] = 0
        [BlackLeaf] = 1
        [Node a b]  = 2 + <[a], [b]>
    

Note that the tag ordering is all-important here. This only works when the tag
ordering is "tail recursive". If you used 0 as the tag for Node, then there'd
be an ambiguity between leaves and proper subtrees; you wouldn't know if 1
corresponded to a leaf or 0 + encoded_subtree_pair.

So, this kind of prefix-sum encoding is bijective but can only accommodate one
unbounded term, which must be assigned the final tag. No tag ordering can work
for this:

    
    
        data Tree = Leaf | WhiteNode Tree Tree | BlackNode Tree Tree
    
        [Leaf] = 0
        [WhiteNode a b] = 1 + <[a], [b]>
        [BlackNode a b] = 2 + <[a], [b]> // Wrong!
    

I haven't thought about ordinals for a long time, but this feels related to
the fact that 1 + ω = ω is not equal to ω + 1.

~~~
espeed
Re: pairing functions, two of my favorite are:

1\. Moser–de Bruijn sequence
[https://en.wikipedia.org/wiki/Moser%E2%80%93de_Bruijn_sequen...](https://en.wikipedia.org/wiki/Moser%E2%80%93de_Bruijn_sequence)

    
    
      n = x + 2y
      x = n & 0x55555555
      y = (n - x) / 2
    

2\. A 'Binary' System for Complex Numbers, by Walter Penny

[https://www.nsa.gov/news-features/declassified-
documents/tec...](https://www.nsa.gov/news-features/declassified-
documents/tech-journals/assets/files/a-binary-system.pdf)

Previous:
[https://news.ycombinator.com/item?id=10648847](https://news.ycombinator.com/item?id=10648847)

~~~
psykotic
Yeah, if you use the Moser-de Bruijn sequence for constructing a pairing
function you just get bit interleaving. You define the map expand(x_0, x_1,
..., x_n) = (x_0, 0, x_1, 0, ..., 0, x_n), apply it to x and y, multiply
expand(y) by 2 to shift it over, and add them to merge the now disjoint
even/odd positions. Incidentally, this decomposition of the problem is exactly
how you efficiently implement bit interleaving in software except you don't
even need addition:

    
    
        z = interleave(x, y) = expand(x) | (expand(y) << 1)
    

For the inverse you define the map compact(x_0, x_1, ..., x_2n) = (x_0, x_2,
..., x_2n) and then

    
    
        uninterleave(z) = (compact(z), compact(x >> 1)) = (x, y)
    

The expand map can be implemented efficiently as a logarithmic number of
shift-and-merge stages, and compact can be constructed by inverting each stage
separately and composing them in reverse order. A stage has this form:

    
    
        x' = x | (x << n)
    

You sometimes also see it written with xor. That's not necessary. What you
care about is that the operation is bitwise (e.g. no carries from addition)
and it must have 0 as an identity.

If x has 2n bits then this is reversible: the lower n bits in x and x' are the
same, and the upper n bits in x and (x' >> n) are the same, so you can recover
all 2n bits of x from x' with low(x') | high(x' >> n). You can cascade stages
like this if you alternate them with appropriate masking to allow data-
parallel operation. The net effect of the first stage, after masking, is to
shift the upper n bits of x up by n bits while leaving the lower n bits in
place. The next stage then does the same thing on the two halves in parallel
with a shift of n/2, and so on until you're done. The masking prevents cross-
talk between subvectors so you can operate on a single vector as if it were a
set of independent subvectors. Or put another way, it enforces the
precondition for the reversibility of each stage.

Anyway, just some random connections to low-level hacking.

------
the_grue
I wonder if this can be used to improve an algorithm [1] I worked on a year
ago for building nearly-comlete binary trees out of pre-sorted array of items.

[1] [https://residualchaos.blogspot.com/](https://residualchaos.blogspot.com/)

