
Demystifying Radix Trees: How Radix trees made blocking IPs 5000 times faster - TaikiSan
https://blog.sqreen.io/demystifying-radix-trees/
======
armon
I gave a talk at GoSF about Radix trees, and how they are used heavily in
HashiCorp products (Terraform, Consul, Vault, Nomad, etc). The slides are
available here for those interested: [https://speakerdeck.com/armon/radix-
trees-transactions-and-m...](https://speakerdeck.com/armon/radix-trees-
transactions-and-memdb)

Radix trees are one of my favorite data structures, and widely under used. For
many "dictionary" type lookups, they can be faster and more efficient than
hash tables. While hash tables are commonly described as being O(1) for
lookup, this ignores the need to first hash the input, which is typically an
O(K) operation, where K is the length of the input string. Radix trees do
lookups in O(K), without needing to first hash and have much better cache
locality. They also preserve ordering allowing you to do ordered scans, get
min/max values, scan by shared prefix, and more.

If you combine them with an immutable approach (such as
[https://github.com/hashicorp/go-immutable-
radix](https://github.com/hashicorp/go-immutable-radix)), you can support more
advanced concurrency, such as lock free reads. This is important for highly
concurrent systems to allow scalable reads against shared memory.

~~~
loeg
I'll plug a shout out to your C
[https://github.com/armon/libart](https://github.com/armon/libart) as well.
:-)

------
antirez
Radix trees are extensively used inside Redis. The new Stream data type is
completely backed by radix trees, also Redis Cluster tracks keys using a radix
tree and so forth. We have an implementation that is not Redis-specific and
can be used for many other applications here:
[https://github.com/antirez/rax](https://github.com/antirez/rax).

~~~
jackfraser
Many thanks for that and for all your other contributions to the world of open
source!

------
tptacek
Cool radix trie trick:

Set a fixed-size binary radix trie of 32-bit IP addresses, say 1000 entries.
Track the nodes of the trie in a list, LRU order; insert an IP, its node goes
to the top of the list.

When you exhaust the available nodes, reclaim from the bottom of the LRU list
--- but first, find either a sibling for the node already in the trie, or a
parent, or a sibling of what that parent would be, and "merge" the IP address
you're losing.

(So in reclaiming 10.0.0.1/32, merge with 10.0.0.0/32 to make 10.0.0.0/31,
etc).

Over time, "important" /32s --- really, important prefixes, period, not just
/32s --- will "defend" their position towards the top of the LRU, while the
noise will get aggregated up, into /16s, /15s, /4s, whatever.

What you're doing here is inferring prefix lengths (netmasks), which is kind
of magical.

You can do the same thing with memory addresses in a debugger to infer
(course-grained, but without much effort) data structures and allocation
patterns. There are probably other integers you can do this with that nobody's
thought of.

(The data structure is called Aguri).

------
summerlight
By reading the OP's comment here, I think it'd be better to update the article
with some description on the requirement and why you choose a linear scan
initially. Without this understanding, it's really easy to say "why not hash
table?".

~~~
erentz
Not certain but there are two advantages of using a trie in this case, 1) you
don’t have to hash the IP address (that takes time too, and is for every
packet remember), and 2) you can easily represent whole IP subnets in a trie
as one node (though it doesn’t say in the article if they do this, it’s common
to block a subnet in a blacklist).

~~~
viraptor
> you don’t have to hash the IP address

That's not necessarily slow. Or at least not as slow as multiple potential
cache misses while traversing a tree. Once you know how many addresses you're
expecting to have, you can make a hash indexing into a preallocated array
which will never get smaller - if it fits into a few cache lines, it's great.

------
ComputerGuru
I mean, compared to _sequentially comparing thousands of IP addresses_
(really???), _anything_ would be fast.

~~~
groestl
People here dismiss the linear search, but for a small number of items (and
yes, even thousands can be a small number), I wouldn't be so fast without
testing it. Cache locality also has a say here, and depending on the exact
circumstances, a linear search might make sense (even more so as a first
version).

~~~
gweinberg
I think the only time it makes sense to use a linear search is if you think
the number of items on your list is low enough that it doesn't even matter
what data structure you are using. But I would probably use a set rather than
a list anyway if only because that more accurately reflects the nature of the
data.

~~~
lloeki
IIRC MySQL just skips using indices and does a linear search when a table size
is small enough.

~~~
machinecoffee
Probably a lot of DB Servers do that - which is why the plan for a query can
change as the amount of data per table increases.

------
lucb1e
> Whenever a threat is detected and blocked, our [software] insert the blocked
> IP address in a list. This list was then sequentially checked every time a
> request was evaluated by an agent, thus increasing the latency of our
> customer’s application. This was a O(n) operation

Right, any database where you just go ALTER TABLE blocklist ADD INDEX ('ip');
is faster than that homebrew O(n) solution. No need for another homebrew
solution.

Ended up reading the whole post anyway because now I'm curious if this radix
tree might be better than a tree a database would use, but they never compared
the performance between any old index and their radix tree.

~~~
jerf
The CPU time and memory bandwidth needed to search a 5000-entry radix tree
does not leave you with a lot of budget to beat it by making any sort of
network query to a database, not even one on the same machine. Even in the
worst case the radix tree is likely to be done with its lookup before you even
get memory allocated to start building the query, you might just barely win if
the radix tree has to go all the way down and do all of its memory accesses,
but you'll still lose overall. Advantageous early bailouts also mean that the
radix tree is often done after just three or four pointer hops, with all of
those three or four lookups having a decent chance of being in L1 and almost
certain to be in L2, if this is the primary thing the CPU is doing on that
machine.

~~~
trhway
and that is basically what happens in database software when the query
executed 2nd and all the other following times. The mildly interesting
difference is just radix tree vs. various [usually highly optimized] types of
indexes and index/search optimized storage types available in a DB engine.

While correctly implemented embedded radix tree would probably be faster on
smallish/medium datasets, the gain may be not worth the costs associated with
basically implementing your own DB engine when it comes to datasets which
wouldn't in its entirety normally sit in memory all the time and/or when
complexity of the queries to run against a dataset makes you to implement your
own full blown query language/API and/or when updates of the dataset by
downloading/reloading the whole dataset become infeasible and you need to
implement your own partial update machinery.

~~~
jerf
Yes, even if I assume the happy path here, once the prepared query is looked
up on the client side, the parameters filled out, the packet constructed to
send to the database, the OS kernel switched in to send the packet to the DB
process, the DB switched in to select on it, the DB reads the packet, the DB
parses the packet to determine what prepared query it's running, the DB finds
the correct bytecode (or whatever) to run and passes it to an execution engine
to run, yes, at that point the DB will make a highly optimized radix tree or
similar lookup itself.

Additional non-"happy path" things the DB may encounter includes using a
library or DB that doesn't support prepared queries so you're starting from
scratch to construct a query, the DB being across any sort of actual network,
and the DB library being thread-safe and having to deal with thread-safe
primitives, any one of which is expensive enough to cover a radix tree's
entire lookup operation, honestly, because even 50ns is likely to be longer
than the entire radix operation.

My point is that the in-process radix tree is done before "the packet is
constructed" and long before it can be sent. Depending on the details the
radix tree might very well outrace "the prepared query is looked up on the
client side" if that involves retrieving it from any sort of structure because
you don't just have it in a variable, though under the circumstances assuming
that it's just on hand is reasonable.

This is one of those things where you have to be thinking about those "order
of magnitude" maps that are often posted about memory access operations,
because computers span so many orders of magnitude that our intuition breaks
down. Assuming L1 or L2 cache, the radix tree is made of almost entirely of
single-digit-order nanosecondd instructions, and not even all that many of
them, whereas reaching a database is going to involve microsecond-scale
interactions. (To the extent that the radix tree may have to hit real RAM,
well, so might we have to hit it for the DB interaction, and the DB is
certainly bringing in a _lot_ more code for the code cache, so even then, it's
not a win for the DB-based code.)

~~~
trhway
i'm not arguing about the performance cost of network. Comparing apples to
apples would be more about embedded db engines or a db engine on the same
machine using a "local mode", not a network mode, driver.

------
abc_lisper
I would atleast use a hashmap rather than a linear scan for lookup. My guess
is you would be utmost 10x faster in that case

~~~
lsllc
If the list doesn't change that often, then sort the list and do a binary
search for a worst case of O(log n) time.

~~~
loeg
Binary search is cache and branch-predictor inefficient, although certainly
not as bad as linear scan (over large datasets).

------
faragon
Try adding following optimizations to your binary radix tree code:

1) Path compression: avoid having intermediate empty nodes, when possible.
E.g. replace root-left(void)-left(void)-left(void)-left(item) with
root-[000]left(item). Because of the node reduction, you'll get better data
cache usage.

2) Use a memory pool for the nodes: you'll save because of avoiding malloc()
overhead, plus the possibility of using e.g. 32-bit indexes instead of
pointers. Like in (1), this would help you reduce the memory usage, being able
to put more nodes in the data cache.

3) Nth level lookup table, so you can jump e.g. instead of starting from the
root node, you could go directly to the Nth level. For a minor cost in the
insertion, for updating the LUT, you could get an important speed-up (2x-3x,
more, or less, depending on your data and how you tune the LUT -fixed level
LUT, adaptative-level LUT, etc.-).

~~~
loeg
Or start here:
[https://github.com/armon/libart](https://github.com/armon/libart)

------
dougb
Radix Trees are used extensively at Akamai. They are very useful in dealing
with blocks of network addresses. When I was there, they had 2 libraries which
shared the on disk format. One was mutable and had all sorts of operations you
could do on them. The other was immutable and optimized for very fast lookups.

------
marknadal
Radix trees are great, they are used extensively in GUN and give us incredible
storage and routing performance even in decentralized networks:
[https://github.com/amark/gun/blob/master/lib/radisk.js#L87](https://github.com/amark/gun/blob/master/lib/radisk.js#L87)

------
loeg
Can someone summarize the tradeoffs between radix trees and (various kinds of)
B-tree? (I.e., LSM-trees, B-epsilon trees, classic B(+)-trees.)

For something like this application, which seems heavily read-biased, it seems
like a classic B-tree would be great. Or even a bloom filter, although the
tradeoff there for false positives is maybe not suitable.

~~~
rrobukef
Most search trees only need a total ordering on the elements. A radix tree has
stricter requirements: the data can be split in parts (e.g. x in x[0]..x[n]
and y in y[0]..y[n]) such that each part has a total order and the original
order is the prefix order of the parts.

Now you can already save a small constant by putting at each radix level a
B-tree again. Then you won't compare x[0] to y[0] again at level (1...n) when
you know they are equal (for example a database of all sentences starting with
'The')

True radix trees go further: when every part is finite with a small continuous
domain (say 256 elements) you can build a jump table (1 index instead of 8
comparisons and 1 index). Additionally: no balancing needed, less memory
needed than a B-tree.

Problems with radix trees: normally you don't store the elements, only rebuild
them so no referencing. Extra constraints on your datatype result in higher
coupling.

------
aluminussoma
Wish everyone knew this. I still work with vendors who do this sequentially.
My company has a consecutive 64 block of IPs and many vendors insist it is too
many to whitelist.

------
exabrial
Fascinating use of a data structure... but why are they blocking by ip address
is the better question.

------
derpherpsson
I am appalled that the understanding of basic complexity theory is missing in
a company like this, and that people apparently think this is worth reading.

This SHOULD NOT be news.

Maybe you guys should revisit your school books.

No, this comment is not "too harsh" or elitistic.

~~~
groestl
The first part of your comment might be true, but regarding the newsworthiness
of the post: there is always someone reading this from today's lucky 10000
[0]. It's generally great if complexity theory gets more coverage, here and on
company's blog posts. Even if you know this inside out, your colleague might
just hear it for the first time, or they might just want to read it up again
because they didn't pay attention in class. Which is good for you too. Also,
tomorrow you might be one of those lucky 10k.

[0] [https://xkcd.com/1053/](https://xkcd.com/1053/)

~~~
appletrotter
I think it's great to put out an article like this, for the reader. But it's
terribly embarrassing for the company.

------
the_jeremy
My education is equivalent to sophomore in CS with a couple years professional
experience tacked on.

I'm pretty disappointed in your explanation of a radix tree, since I'd only
had a very cursory exposure to one before. I don't think it's a good or even
decent explanation. The point you try to make about how the tree only stores
the end node is pretty incomprehensible when you are coming from a single
digit BST to this. You should have sorted something that at least made sense
to sort with a radix tree in your example.

------
AnaniasAnanas
Please do not engage in IP blocking, this is one of the most evil things that
you can do on a site, along with displaying a white page when the user has JS
disabled and using Google reCAPTCHA.

~~~
GordonS
As long as the blocks are _temporary_ , and of course there is a good reason
for the block, I actually think it makes a lot of sense.

~~~
AnaniasAnanas
I agree with you, sadly it seems that most that do ban IPs tend to do it
permanently and without a good reason (I do not think that banning all the IPs
of all dedicated servers and tor nodes in the world has a good reason)

