Hacker News new | past | comments | ask | show | jobs | submit login
Left-leaning Red-Black Trees (2008) [pdf] (princeton.edu)
33 points by dolftax on Apr 6, 2015 | hide | past | favorite | 25 comments

and here is the critique of it : http://www.read.seas.harvard.edu/~kohler/notes/llrb.html (left-leaning-rb-trees-considered-harmful)

I'd go even further: all forms of red-black trees considered harmful. Red-black trees are isomorphic to B-trees with a maximum branching factor of 4, but where each node is exploded into several binary nodes. General B-trees are easier to implement and more efficient than red-black trees because on modern computers a high branching factor results in better cache behaviour.

seconded & thirded

OMG. I have yet to see a working implementation of 'delete' from Sedgewick. Further, a repaired delete (from the description on the slides) will rotate back an forth horribly. Awful stuff, awful reasoning. Symmetric RB trees are similarly compact when implemented using a direction flag to abstract away the symmetry.

A good (and working!) implementation for a top-down no-parent pointer RB tree is by Julienne Walker: http://www.eternallyconfuzzled.com/tuts/datastructures/jsw_t.... No back-and-forth rotations here. Very good explanations and very good code.

Or use AVL trees instead, they work naturally (i.e., without clever restructuring) more cache-locally by storing the balance info one node up the tree.

For most applications where you want these, I've found that skip lists give you most of the performance advantages you want at a far lower complexity to implement and debug. Yeah, it's a probabilistic algorithm, but if you know the distribution well enough to tune correctly (and I usually do for things like time series and high scores) the performance is much better than LLRBT while also being rather cache-friendly, which is overlooked in a lot of implementations of RBTs.

I was in my first data structures class when this paper was published. We had just covered red black trees, so this made for an interesting deviation from the standard curriculum. Considering the maturity of the data structures covered in the standard undergrad CS curriculum, I wonder how often this type of situation comes up. That is, how often are papers on new data structures published that are this close to something a typical undergrad student studies?

Sedgewick's course on Algorithms, through Coursera, is pretty good, for anyone who wants to learn more about basic algorithms, or (like me) needs a refresher.


Here is a Python port I wrote based on the linked paper: https://github.com/peterhil/leftrb (I haven't used it much, so there might be bugs...)

This tool can generate LLRB trees in Go (can also use as a lib):

Otherwise there's also github.com/petar/GoLLRB/llrb.

Nice. My project already have btree, but it is complex concurrent version. I am thinking to add something simpler and reuse some code.

I'm wondering, are there any efficient immutable (i.e., purely functional) implementations out there?

Here is a nice starting point for a generic rb tree structure in SML.

        (* generic red-black-tree in Standard Jersey ML *)

        type key = string

        datatype color = R | B

        datatype tree = E | T of (color * tree * key * tree)

        fun rbmem (x, E) = false
          | rbmem (x, T (_,a,y,b)) =
            if x < y then rbmem (x,a)
            if x > y then rbmem (x,b)

        fun balance ( (B,T (R,T (R,a,x,b),y,c),z,d)
                    | (B,T (R,a,x,T (R,b,y,c)),z,d)
                    | (B,a,x,T (R,T (R,b,y,c),z,d))
                    | (B,a,x,T (R,b,y,T (R,c,z,d)))) = T (R,T (B,a,x,b),y,T (B,c,z,d))
                    | balance body = T body

        fun insert (x,s) =
            let fun ins E = T (R,E,x,E)
                  | ins (s as T (color,a,y,b)) =
                    if x < y then balance (color,(ins a),y,b)
                    else if x > y then balance (color,a,y,(ins b))
                    else s
                val T (_,a,y,b) = ins s (* guaranteed to be non-empty *)
            in T (B,a,y,b)

If you mean functional red-black trees in general, then there's http://matt.might.net/articles/red-black-delete/

One should encode the color as last bit of one of the left or right pointers

Their implementation is in Java. I am not too familiar with Java, but I don't think that is possible; it's a tradeoff of memory efficiency for type safety.

Why wouldn't it be possible? Here's a different (I believe more efficient) implementation of the Node container class by Sedgwick himself, which does exactly that:

  private Node insert(Node h, Key key, Value val) {
 if (h == null)
 return new Node(key, val, RED);
 if (isRed(h.left) && isRed(h.right))
 int cmp = key.compareTo(h.key);
 if (cmp == 0) h.val = val;
 else if (cmp < 0)
 h.left = insert(h.left, key, val);
 h.right = insert(h.right, key, val);
 if (isRed(h.right))
 h = rotateLeft(h);

  if (isRed(h.left) && isRed(h.left.left))
 h = rotateRight(h);
 return h;

I've looked at this code from several angles, and I still can't see how type safety might be compromised by adding an extra bit for color. The Java compiler requires you to specify the static object type that your LLRBT returns, specifically so the JVM can make as few assumptions about the underlying data types as necessary at runtime. And the color bit, that's only passed as an argument to the object constructor to expedite the elementary rotations and color flips that happen frequently in a LLRBT, to enable the creation of black ones and same-colored ones. I don't think type safety is affected by this or even related.

The comment was about adding an extra bit for color in the pointer itself—not in the object.

It's not a question about type safety, Java simply has no mechanism to make this possible (a decision which was influenced by type safety concerns).

That's interesting, I hadn't considered the possibility of attaching bytes to pointers before (although the flavor of the idea reminds me a little bit of certain SICP exercises I struggled with, but in a slightly different terminology). It sounds like a powerful idea. For one, it sounds like weighted graphs would lie in your hand, so to speak: you could traverse and compose them (anonymously, if you like) without necessarily creating any new objects. Is this right, or am I just daydreaming?

I think you're reading too much into the idea. If all memory is word-aligned, you only get 2 bits on a 32-bit machine and 3 bits on a 64-bit machine. If your property can't be encoded in 2 bits, it's probably not worth the trouble.

OCaml uses the low bits to encode some basic type information for the GC. Pointers have both low bits 0, integers have the lowest bit set, and sum types have the lowest bit unset and the second bit set. This means integers are 31/63 bits long and using native-sized integers carries a significant performance penalty.

Again, I have only used Java a little bit, but I don't see how a language that allows you do pointer arithmetic (flipping one of the bits in the pointer), and then de-reference the modified pointer, could be type safe. I understood that Java was pretty much type safe, so I assumed it doesn't allow this in general.

You don't attempt to dereference the tagged pointer, you attempt to dereference the result of masking out the tag. If all pointers are aligned on an 8byte (they are, at minimum), then the lowest three bits are always zero. So if you twiddle them, you can recover the actual pointer with (ptr & ~((size_t)0x7)).

Type safety would be easy; there are several routes one could go: A) Suppose we have tagged pointer types with a really terrible syntax like {MyClass + 3bits}. Then we can say {MyClass + 3bits} is not a subtype of Object, but is rather a subtype of {Object + 3bits}. MyClass is a subtype of {MyClass + 3bits}, if you like, since it doesn't hurt to mask out zero bits. This is all at compile time, and I suppose all erased types must be {Object + 3bits} now for safety. But note that the value referenced by an {Object + 3bits} can still be assigned to an Object, since the bits live in the pointer you just lose the 3 bits.

B) Simply change all existing pointers to have these extra bits, with some sort of syntax for that. Note that these bits live in the pointer, so even null would have the bits. Then there's no change whatsoever to the type system.

It's a moot point in practice since the runtime already uses those tag bits itself.

How anything is made safe is that the feature is given well-defined semantics: all correct uses are spelled out in the specification, and diagnosis is required for any other use. The implementation then does whatever is necessary.

A language with references/pointers could have a provision for storing a user-defined bit in a pointer. All uses of pointers (at the language implementation level) would then be aware of this requirement. All compiler-generated code which dereferences pointers would clear the bit, and so on.

The specification would also define details like whether or not two pointers to the same object, but with a different setting of the user bit, compare equal. (Or do they compare equal and unequal under different kinds of equality tests.)

This kind of thing is, of course, not safe if the language simply leaves an "escape hatch" for flipping a bit in a pointer, without adding any other requirements.

(Java has no such escape hatch; I'm assuming the topic in this sub-thread is how such a thing can be safe in a language, including possibly a dialect of Java.)

Exactly. This is in fact done in some languages to mark generations for garbage collection.

Definitely not practical in Java.

It's possible (and practical!) if you allocate nodes in a List<T> and use indices instead of pointers. Then you can use bit 0 to encode the colour. Using bit 0 instead of sign bit gives the advantage of branch-free code to retrieve the child index (always shift right by one).

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact