Hacker News new | past | comments | ask | show | jobs | submit login

A good book on the subject is Joe Celko's Trees and Hierarchies in SQL for Smarties.


He spends a chapter on each of the models outlined in this post: adjacency, path, and nested set models.

If you're storing a tree in an RDBMS, please look into the closure table algorithm rather than adjacency, nested set, or materialized paths.

If you're using Rails (3.2 through 4.1), try this: http://mceachen.github.io/closure_tree/

Like it says in the README:

* Fetch your whole ancestor lineage in 1 SELECT. * Grab all your descendants in 1 SELECT. * Get all your siblings in 1 SELECT. * Fetch all descendants as a nested hash in 1 SELECT. * Find a node by ancestry path in 1 SELECT. * 2 SQL INSERTs on node creation * 3 SQL INSERT/UPDATEs on node reparenting

None of the other approaches above have even remotely similar performance characteristics. If your tree is small (tens of nodes), you won't care. If it's bigger, you will.

I wasn't aware of closure trees before, thanks. The presentation that you link to by Bill Karwin, along with a few other resources are in the comments by coleifer and chdir.

I've done nested set before; it's interesting - very fast for queries, but requires a lengthy insert/update cost. Also, team members were absolutely clueless as to what was really going on. I'm not sure nested sets are really much faster than what modern rdbms's can provide today.

In addition to expensive insert/update, it is necessary to keep the left and right boundaries across ALL of your entries in perfect order. If the boundaries get out of whack, fixing the tree is a nightmare scenario.

I worked on a multi-tenant application with distinct trees present in one table and with one tree per table and so on. Fun fun fun!

I think the nested intervals model, a refinement of nested sets, solves this slow update problem: http://www.rampant-books.com/art_vadim_nested_sets_sql_trees...

But I've never had to use it, so I am just guessing.

Same article, different site: https://communities.bmc.com/docs/DOC-9902

And a paper: http://www.sigmod.org/publications/sigmod-record/0506/p47-ar...

Here's a comparison of the different approaches in a matrix: http://vadimtropashko.wordpress.com/2008/08/09/one-more-nest...

One just needs to be aware that nested sets eventually hit a limit.

I experimented with a variation by Vadim Tropashko using something called "Farey fractions" [1]. These represent the intervals as a 2x2 matrix of four integers rather than two floating point values.

The numbers are effectively limited to 32-bit values in the matrix since a multiplication is required (resulting in 64-bit intermediate results).

It was very interesting, but couldn't model a file system hierarchy well. It could roughly 10^32 items in the best case, but a very small number (hundreds) in edge cases.

For example, it maxes out at a depth of 17 with 100 items at each level, or a depth of 34 with 10 items each. This might be fine for modeling some hierarchies, but definitely not a file system. The edge case is if the fraction extends along one edge. So if we have 10 items per level, and add a child at the leftmost edge each time, we create the most costly fractional subdivision. Do this 35 times and you hit an math overflow.

[1] Check out the Chapter 5 and Errata links: http://vadimtropashko.wordpress.com/“sql-design-patterns”-bo...

Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact