
Representing Trees in PostgreSQL - mathie
http://woss.name/articles/representing-trees-in-postgresql/
======
Todd
A good book on the subject is Joe Celko's Trees and Hierarchies in SQL for
Smarties.

[http://www.amazon.com/Hierarchies-Smarties-Edition-
Kaufmann-...](http://www.amazon.com/Hierarchies-Smarties-Edition-Kaufmann-
Management/dp/0123877334)

He spends a chapter on each of the models outlined in this post: adjacency,
path, and nested set models.

~~~
platz
I've done nested set before; it's interesting - very fast for queries, but
requires a lengthy insert/update cost. Also, team members were absolutely
clueless as to what was really going on. I'm not sure nested sets are really
much faster than what modern rdbms's can provide today.

~~~
tempVariable
In addition to expensive insert/update, it is necessary to keep the left and
right boundaries across _ALL_ of your entries in perfect order. If the
boundaries get out of whack, fixing the tree is a nightmare scenario.

I worked on a multi-tenant application with distinct trees present in one
table and with one tree per table and so on. Fun fun fun!

~~~
alistairbayley
I think the nested intervals model, a refinement of nested sets, solves this
slow update problem: [http://www.rampant-
books.com/art_vadim_nested_sets_sql_trees...](http://www.rampant-
books.com/art_vadim_nested_sets_sql_trees.htm)

But I've never had to use it, so I am just guessing.

Same article, different site:
[https://communities.bmc.com/docs/DOC-9902](https://communities.bmc.com/docs/DOC-9902)

And a paper: [http://www.sigmod.org/publications/sigmod-
record/0506/p47-ar...](http://www.sigmod.org/publications/sigmod-
record/0506/p47-article-tropashko.pdf)

Here's a comparison of the different approaches in a matrix:
[http://vadimtropashko.wordpress.com/2008/08/09/one-more-
nest...](http://vadimtropashko.wordpress.com/2008/08/09/one-more-nested-
intervals-vs-adjacency-list-comparison/)

------
kpmah
There is also ltree
[http://www.postgresql.org/docs/9.3/static/ltree.html](http://www.postgresql.org/docs/9.3/static/ltree.html)

Adjacency lists also don't perform that badly with recursive queries in my
experience.

~~~
rickmode
This.

The project I'm on used materialized paths, which lead to great pain. I
investigated nested intervals ... and they could't achieve the tree depths we
needed (we were modeling a file system tree).

We are back to adjacency lists (using a parent ID) but redesigned to avoid the
need for recursive ancestor and descendant queries.

_But_ the RDBMS doesn't support recursive queries and I've been curious about
PostgreSQL's recursive query support. I played with it, but not on a fully
loaded database with deep trees of data.

Does PostgreSQL recursive query support work well with deep trees (> 100
levels) on tables with tens of millions or more rows?

~~~
andrewflnr
Can you elaborate on how materialized path caused pain? Was it more
performance or maintenance? I would have expected MP to be a good fit for
modeling a file system.

~~~
rickmode
First, renames and moves require updating all descendants.

Second, the materialized path's length exceeded the database's indexable
length limit. (MySQL, the DB in question, has a default index limit of 767
bytes, so only first 767 bytes are indexed.)

There are ways around these issues (like using "UPDATE ... WHERE" rather than
using an ORM to walk the tree and update... _sigh_ ). We also had other app-
specific / design-specific issues too that swamped these issues performance
wise.

I'm hopefully we won't need ancestor and descendant queries again in our re-
design. But I'm keeping PostgreSQL with its Common Table Expression stuff--the
thing that facilities recursive queries--in my back pocket. It's that or use a
stored procedure to build the capability by hand.

------
coleifer
Another option for storing tree structures is the closure table [1]. It acts
as a sort of "many-to-many" junction table between the tree nodes, storing the
relationships between each.

[1] [https://coderwall.com/p/lixing/closure-tables-for-
browsing-t...](https://coderwall.com/p/lixing/closure-tables-for-browsing-
trees-in-sql)

~~~
chdir
+1, Related reading (all from Bill Karwin of SQL Antipatterns fame):

[http://stackoverflow.com/questions/192220/what-is-the-
most-e...](http://stackoverflow.com/questions/192220/what-is-the-most-
efficient-elegant-way-to-parse-a-flat-table-into-a-tree/192462#192462)

[http://www.slideshare.net/billkarwin/models-for-
hierarchical...](http://www.slideshare.net/billkarwin/models-for-hierarchical-
data)

[http://karwin.blogspot.in/2010/03/rendering-trees-with-
closu...](http://karwin.blogspot.in/2010/03/rendering-trees-with-closure-
tables.html)

------
rosser
Other comments have mentioned this as well, but recursive CTEs are a very
effective technique for representing trees, and they work very well in
PostgreSQL.

~~~
gtaylor
We have used these for our discussion features on Pathwright. With ltree, you
don't have as much flexibility when sorting by multiple values. With the
Recursive CTE, you can go nuts and still end up very efficient.

I believe Disqus uses this, as well: [http://cramer.io/2010/05/30/scaling-
threaded-comments-on-dja...](http://cramer.io/2010/05/30/scaling-threaded-
comments-on-django-at-disqus/)

------
techtalsky
There's a great Ruby on Rails gem called acts_as_sane_tree (after the non-
recursive acts_as_tree) that uses postgresql's recursive queries. I'm using it
on a project and have found it useful with good performance:
[https://github.com/chrisroberts/acts_as_sane_tree](https://github.com/chrisroberts/acts_as_sane_tree)

~~~
btown
Looks like the main fork hasn't been updated since 2012. Is there an updated
version you recommend for Rails 4 compatibility?

~~~
techtalsky
Hm, I didn't realize that. I'm using it on a Rails 4 project and haven't had
any problems.

------
savagej
I've used django-mptt to represent a tree of human phenotypes before. It was
fast and quite an easy api to use. I was using PostgreSQL as the backend.

[https://www.djangopackages.com/packages/p/django-
mptt/](https://www.djangopackages.com/packages/p/django-mptt/)

~~~
robertfw
I've also used django-mptt and just wanted to throw another vote behind it in
case anyone is considering using it. Our data structure was a good fit for
MPTT (many reads, few writes) so I can't comment on how it would behave in the
opposite scenario. Nice API, very easy to use from a developer standpoint. We
were using MySQL.

~~~
jhgg
Likewise. I've done some heavy modification ontop of django-mptt to support
limiting tree depth when querying, calculating total children, querying
siblings, etc...

[https://gist.github.com/jhgg/32a379e34c8a56303295](https://gist.github.com/jhgg/32a379e34c8a56303295)

------
batbomb
First off: The first example is exactly what a relational database is for.
It's a "tree" structure only because it's several 1:many joins. It is true
that ORMs aren't that amenable to composition and lots of dynamic joins, but
that's not the fault of the database, it's the fault of the ORMs.

That being said, I've tried all these methods before across a few different
DBMS.

IMO a good way to go about this all is to actually just reimplement a file
system; You have a caching virtual file system in your application, and store
data as key:parentKey:name (equivalent-ish dentry:parentDentry:fileName) in
every table which contains child nodes. It's fast, often more predictable, and
definitely more portable (as you aren't relying on DMBS-specific constructs).
It's also amenable to partitioning/sharding by parent key. You can drastically
reduce the amount of queries that are being sent. Also, if you use a b+tree as
an index for your paths, you can invalidate cache/subtrees pretty fast in your
application.

Of course, you end up duplicating functionality of the database, and if there
is a lot of latency between you and the DB, this might not be the best method
(or it might, depending).

It would be nice if MySQL finally supported CTEs.

------
dharbin
There's also the ltree module which is built-in to postgres:
[http://www.postgresql.org/docs/9.1/static/ltree.html](http://www.postgresql.org/docs/9.1/static/ltree.html)

------
barosl
Does anyone remember how Drupal handles the hierarchical comments? It
maintains a "sort key" field for each comment, which consists of multiple
index numbers for all levels, much like the section numbers in Wikipedia.

1

1.1

1.2

2

2.1

2.1.1

2.1.2

2.1.2.1

In this way, displaying comments in a tree form is trivial. Just ORDER BY the
sort key. I find it brilliant for applying to such an application.

~~~
twerquie
This pattern is commonly called "materialized path", if anyone is trying to
search for it.

~~~
benjohnson
A similar system is also heavily used in the construction world in their "Work
Breakdown Structure" \- a borrowing on how they document airplane parts that
are part of an assembly, that are part of a module, that part of a structure.

[http://en.wikipedia.org/wiki/Work_breakdown_structure](http://en.wikipedia.org/wiki/Work_breakdown_structure)

------
m0ppers
with recursive is actually VERY quick in postgresql (adjacency model).
currently using that and thoroughly tested it. simple, very quick and no
hassle to update it (as the other options).

Compared to a mysql with nested set postgres using with recursive is a life
changer :D

------
netghost
There are some more tricks you can do using postgres' arrays to efficiently
query the data:
[http://monkeyandcrow.com/blog/hierarchies_with_rails/](http://monkeyandcrow.com/blog/hierarchies_with_rails/)

Namely using the && operator let's you make use of indices on the materialized
paths. We've been using it in production for years to great effect.

------
joesmo
> We could have retrieved all the appropriate data with a single query, but
> that means reconstructing the tree from a flat set of rows we got back from
> PostgreSQL. Doing that sort of thing in the view would be hienous.

Implement a view helper to take the flat data and return a hierarchical
structure that you can render.

~~~
smoyer
And store the data in PostgreSQL as a modified-preordered list so that the
data is already in the order you need to efficiently recreate the object graph
(note that this implies many reads for each write, since tree modifications
become the costly operation).

------
jimbokun
If it's a small tree, you could encode it as a single JSON object and query on
it using PostgreSQL's JSON support.

[http://schinckel.net/2014/05/25/querying-json-in-
postgres/](http://schinckel.net/2014/05/25/querying-json-in-postgres/)

------
mappu
Random data point: I have a hierarchal authentication system under MySQL (no
recursive query support). My schema does the most obvious thing of having
user.parent as a user_id (adjacency model).

My most common query is to find whether user X is a child of user Y.

My solution was to use application-level triggers to maintain a separate
lookup table, so i can simply do `user_id IN (SELECT child_id FROM
lookup_table WHERE parent_id = Y)` as an additional search clause. With
appropriate indexes it's very fast to query, and i can do partial updates to
maintain the lookup table.

If my most common tree query was something else (e.g. enumerate children in
sorted order) then i'd need some other data structure.

------
Walkman
I don't know if ActiveRecord is capable of doing this, but here is how to do
it with Django ORM:

in models.py:

    
    
        class Node(models.Model):
            parent = models.ForeignKey('self', related_name='children')
    
    

in the view:

    
    
        nodes = Node.objects.prefetch_related('children')
    
    

in template:

    
    
        {% for node in nodes %}
            {% for subnode in node.children.all %}
            ...
            {% endfor %}
        {% endfor %}
    
    

This will make exactly two queries; one for all the parent nodes, and one for
the children nodes. Django will make the pairing automagically, so you don't
have to do it in view code.

~~~
caust1c
Unfortunately, this still only gets the children of the root. If you want to
be able to recursively get all the children and grandchildren of a root,
you're going to be doing (N - leaves) queries where N is the number of Nodes
in the tree.

However, there is of course a django package to help with this and gathers all
the nodes you care about into one query:

[https://github.com/django-mptt/django-mptt](https://github.com/django-
mptt/django-mptt)

------
mattdeboard
What are the upsides to this vice using a graph database like Neo4j?

~~~
mikesname
You get all the benefits of a mature and very full-featured RMDBS.

Neo4j is a great database, and I use it myself for a project that involves
lots of deep hierarchies. But it has a fairly sparse feature-set where schema
enforcement and data integrity checking are concerned; you basically have to
add all that stuff yourself at the application level which can amount to a lot
of work.

~~~
btown
To second this, if you're using multiple languages to write to your database,
you almost certainly want that schema enforcement at the DB level. We decided
not to go with Neo as our canonical database for exactly that reason. Its main
strength is probably as a follower or slave to a RDBMS or log file, which is
how it seems to be used in enterprise - when you want to do graph-based
analytics, you bulk-load a snapshot of your non-graph-stored DB into Neo, then
run read-only workloads.

------
ashmud
I used to use code based on Kendall Willets' code. His original site is
offline. Archive here:
[https://web.archive.org/web/20110928135313/http://willets.or...](https://web.archive.org/web/20110928135313/http://willets.org/sqlgraphs.html)

Line from the page: "I just picked up a copy and it looks great! You are right
about the whole approach and my stuff stinks." \- Joe Celko, author of SQL for
Smarties.

------
jeremyevans
Not sure about ActiveRecord, but Sequel has supported using recursive common
table expressions for loading all descendants in a given branch (or branches)
of a tree for over 4 years using the rcte_tree plugin:
[http://sequel.jeremyevans.net/rdoc-
plugins/classes/Sequel/Pl...](http://sequel.jeremyevans.net/rdoc-
plugins/classes/Sequel/Plugins/RcteTree.html)

------
maaku
> There’s also a twist on the Nested Sets model, called the nested interval
> model, where the nodes are given two numbers that represent the numerator
> and denominator of a fraction, but it doesn’t seem so popular, and was too
> complex to wrap my head around!

...and the author just wrote-off the best performing generic tree solution,
because "it was too complex to wrap my head around!"

:facepalm:

------
chx
Bojan Živanović and I have a writeup on this whole tree storage business
[https://bojanz.wordpress.com/2014/04/25/storing-
hierarchical...](https://bojanz.wordpress.com/2014/04/25/storing-hierarchical-
data-materialized-path/) here.

------
quipo
Covered this and several other techniques here:
[http://www.slideshare.net/quipo/trees-in-the-database-
advanc...](http://www.slideshare.net/quipo/trees-in-the-database-advanced-
data-structures)

------
jaunkst
A graph database comes to mind. It's a different design problem when dealing
with deep relationships vs top level relationships. Sometimes you need more
deep inherent relational mappings for alogithms to be effecient.

------
glibgil
Should be titled representing paths. Trees are easy: id, parent_id. Paths are
tricky.

