This feels written by ChatGPT, certainly the first and last paragraph. And it omits a few important characteristics. Balanced trees were known, but balancing was expensive. AVL trees have an O(log n) cost for search, insertion and deletion, and achieves this not by balancing but by almost balancing.
It was mostly written by me, although I do admit to using ChatGPT to help out in generating some of my material. Most of the recent blog posts are Feynman style lecture notes I wrote for myself to learn new concepts and this piece I actually wrote well over 10 years ago and I figured to share it in case anyone else wanted a quick and visual overview. Apologies for missing out any important concepts. At the time I wrote down material that I found important so there are definitely things I may have not explained correctly.
You would only rotate a node when it becomes un-balanced (i.e. so the only case you would deal with is the instance where a new child-node is added to another child within the tree). There are no cases where you would need to re-balance when you add a 2nd child - the balance factor would be larger than 1 in the instance of adding the 1st child element to the node and thus you could never find cases like this (unless I'm misunderstanding what you're asking).
In the case of 3 with children 1 and 5 and 5 with children 4 and 6. 3 and 5 have two nodes. If you add 7 as a child of 6, the tree is unbalanced. This is not one of the cases shown.
There can be nodes with a single child. Otherwise an AVL tree would always have 2^n - 1 or 2^n - 2^(n-2) - 1 nodes. Imagine adding a node to a balanced tree with 3 nodes.
They won't outperform b-trees in memory either. Cache lanes and cache hierarchies of modern computers (as in built after the 90s) mean that neither AVL trees or red-black trees have any place anymore.
Yup, b-trees should be the default these days. Even in memory latencies are so high the high fanout and shallow tree dominates other concerns. Red Black trees can be straight up terrible on modern hardware.
There's radix tree variants that still make sense vs b-trees though. Typically they take up more space than b-trees but have operations bounded by the key length not dataset size. They also form the same structure regardless of insertion order making them useful in cryptographic contexts.
As with linked lists, pretty much the only reasons for which you might want them are:
a) If your keys are really big, copying them around may be slower than using pointer manipulations; you'd have to benchmark this to be sure for particular cases.
b) You can use use them as intrusive data structures, with one structure in multiple trees or lists at a time, and you can insert one without dynamic allocation.
C++ (mentioned elsewhere in the thread). It's the main language that people bother to implement the full gamut of containers and their optimizations in; Rust is only barely starting to intrude.
Most other languages don't support things like value types, stuffing bits inside aligned pointers, etc. So even if someone implements a particular container it's sure to be quite suboptimal.
Ah. C++ was mentioned in a different sub-thread. I may be wrong but I don't think the C++ standard says anything about red-black trees. It just provides big-O bounds on operations that mean that you need to pick some sort of balanced binary tree to implement std::set and std::map.
C++ containers are usually suboptimal compared to writing the code you actually need for the situation you're really in. In most programs you have a lot of situations where you can just use a pretty basic data structure like a linked list or a dynamic array and the performance characteristics don't matter much. Then you have one big data structure or a couple of big data structures that need to use something a bit esoteric. Those are where you should probably just write your own.
For example, a text editor is going to have a lot of collections of things. But most of them are very small, and AVL tree vs RB tree vs B-tree simply does not matter. But the buffer representation itself will be some sort of custom data structure: maybe a piece table, a piece tree, a gap buffer, or a rope, or maybe something new. Either way, it's not going to be a reusable data structure.
What I'm saying is that in practice, there's very little actual value in having generic data structures beyond the very most basics. See Go, see C. So I don't think the facility for implementing 'optimal' generic data structures as libraries is particularly useful, and in practice they're suboptimal anyway.
AVL vs RB (and a few others) remains an implementation choice, but B-trees (and a few others, e.g. splay trees) are forbidden unless you add gratuitous overhead.
> C++ standard libraries commonly implement the map and set data structures using Red-Black trees, which store one element per node, requiring three pointers plus one bit of information per element to keep the tree balanced. By comparison, B-tree containers store a number of elements per node, thereby reducing pointer overhead and saving a significant amount of memory. For small data types, B-tree containers typically reduce memory use by 50 to 80% compared with Red-Black tree containers.
> However, there are many reasons why B-trees are often a good choice for in-memory collections.The first reason is cache locality. When searching for a key in a binary tree, the algorithm would visit up to logN elements that are very likely dispersed in memory. On a B-tree, this search will consist of two phases — intra-node search and descending the tree — executed one after another. And while descending the tree doesn’t differ much from the binary tree in the aforementioned sense, intra-node search will access keys that are located next to each other, thus making much better use of CPU caches.
> In this paper, we conduct a comprehensive performance
study on five state-of-the-art indices, Masstree, BwTree,
FAST, ART and PSL, with respect to throughput, scalability,
latency, memory consumption and cache miss rate. According to our results, PSL achieves a similar query performance
in terms of get throughput to a read-only index, FAST, and
meanwhile performs up to 5x, 0.3x and 2.5x faster than
Masstree, ART and BwTree respectively. Another interesting
point observed in the results is that the performance of
BwTree drops significantly when the workload is write-only,
probably due to the contenction caused by the frequent fails
on the hot blocks. FAST and ART win the competition on the
cost of memory space, leading other indices with a saving
factor between 20%-100%. Our experiments also reveal that
for the dense dataset, ART and BwTree, which belong to
indices built based on tries, may be the suitable candidates
for a better overall performance.
It could be good for in memory stores / log-structured merged trees / other data store applications although it isn't used much now-days. I find them simpler to implement and understand than red-black trees -- although that's a matter of taste I suppose. They beat red-black trees in read-heavy loads (i.e. writes / updates are more costly for AVL trees than for Red-Black trees although they beat R-B trees for read-heavy loads). You can find another implementation in Illumos (an open source Unix operating system) available here: https://github.com/illumos/illumos-gate/blob/master/usr/src/...