Radix trees are one of my favorite data structures, and widely under used. For many "dictionary" type lookups, they can be faster and more efficient than hash tables. While hash tables are commonly described as being O(1) for lookup, this ignores the need to first hash the input, which is typically an O(K) operation, where K is the length of the input string. Radix trees do lookups in O(K), without needing to first hash and have much better cache locality. They also preserve ordering allowing you to do ordered scans, get min/max values, scan by shared prefix, and more.
If you combine them with an immutable approach (such as https://github.com/hashicorp/go-immutable-radix), you can support more advanced concurrency, such as lock free reads. This is important for highly concurrent systems to allow scalable reads against shared memory.
Set a fixed-size binary radix trie of 32-bit IP addresses, say 1000 entries. Track the nodes of the trie in a list, LRU order; insert an IP, its node goes to the top of the list.
When you exhaust the available nodes, reclaim from the bottom of the LRU list --- but first, find either a sibling for the node already in the trie, or a parent, or a sibling of what that parent would be, and "merge" the IP address you're losing.
(So in reclaiming 10.0.0.1/32, merge with 10.0.0.0/32 to make 10.0.0.0/31, etc).
Over time, "important" /32s --- really, important prefixes, period, not just /32s --- will "defend" their position towards the top of the LRU, while the noise will get aggregated up, into /16s, /15s, /4s, whatever.
What you're doing here is inferring prefix lengths (netmasks), which is kind of magical.
You can do the same thing with memory addresses in a debugger to infer (course-grained, but without much effort) data structures and allocation patterns. There are probably other integers you can do this with that nobody's thought of.
(The data structure is called Aguri).
That's not necessarily slow. Or at least not as slow as multiple potential cache misses while traversing a tree. Once you know how many addresses you're expecting to have, you can make a hash indexing into a preallocated array which will never get smaller - if it fits into a few cache lines, it's great.
It'd still not be the greatest idea... but don't underestimate linear scan on a modern CPU.
Right, any database where you just go ALTER TABLE blocklist ADD INDEX ('ip'); is faster than that homebrew O(n) solution. No need for another homebrew solution.
Ended up reading the whole post anyway because now I'm curious if this radix tree might be better than a tree a database would use, but they never compared the performance between any old index and their radix tree.
While correctly implemented embedded radix tree would probably be faster on smallish/medium datasets, the gain may be not worth the costs associated with basically implementing your own DB engine when it comes to datasets which wouldn't in its entirety normally sit in memory all the time and/or when complexity of the queries to run against a dataset makes you to implement your own full blown query language/API and/or when updates of the dataset by downloading/reloading the whole dataset become infeasible and you need to implement your own partial update machinery.
Additional non-"happy path" things the DB may encounter includes using a library or DB that doesn't support prepared queries so you're starting from scratch to construct a query, the DB being across any sort of actual network, and the DB library being thread-safe and having to deal with thread-safe primitives, any one of which is expensive enough to cover a radix tree's entire lookup operation, honestly, because even 50ns is likely to be longer than the entire radix operation.
My point is that the in-process radix tree is done before "the packet is constructed" and long before it can be sent. Depending on the details the radix tree might very well outrace "the prepared query is looked up on the client side" if that involves retrieving it from any sort of structure because you don't just have it in a variable, though under the circumstances assuming that it's just on hand is reasonable.
This is one of those things where you have to be thinking about those "order of magnitude" maps that are often posted about memory access operations, because computers span so many orders of magnitude that our intuition breaks down. Assuming L1 or L2 cache, the radix tree is made of almost entirely of single-digit-order nanosecondd instructions, and not even all that many of them, whereas reaching a database is going to involve microsecond-scale interactions. (To the extent that the radix tree may have to hit real RAM, well, so might we have to hit it for the DB interaction, and the DB is certainly bringing in a lot more code for the code cache, so even then, it's not a win for the DB-based code.)
Because of that, we believed using a database (even in-memory) would have resulted in an unacceptable overhead.
For example, using the Sqlite library counts as a database for most, and it runs in your process, in your memory. It is essentially a collection of functions for fast data structure manipulation.
There is a data structure called Judy arrays, invented by Doug Baskins. Judy array was used at some point in time in Erlang to back up its ets tables.
The paper (2003) comparing various data structures for that task is here
For unsorted tables with populations of 70,000 keys or more, performance improvement by using the judyeh table type instead of set is easily measurable and signiﬁcant.
This improvement can be seen with keys in sequential order and random order during operations involving table insertion, lookup, and counter updates.
The deletion operation is not as fast as set, but deletion’s extra cost is smaller than the beneﬁt of insertion, lookup, and update operations combined.
Furthermore, the additional RAM required by judyeh tables is quite modest.
The judyeh table type requires only about 6% more than set tables, which is smaller than the additional 15% required for the same data in an ordered set table.
The only operation this research examines which is significantly worse for judyeh tables than set tables is table traversal using the ets:next/2 function.
Would be interesting how Judy arrays are compared to radix
(same as the postgres keyword lookup posted last week to HN using perfect hashing)
I think it would have been a much more interesting and useful read if you had included the reasons that a radix tree is a better solution for your use case than hashing and how you implement range blocking.
I also wasn't able to understand the advantage of path compression or why being fixed-length is important.
Going with an hash map was an option, but we feared the ballooning memory footprint when a large number of IPs is blocked, performance outliers (if you're unlucky enough for a lot of blocked IPs/ranges to end up in the same bucket) and we wanted the solution to work as well with IPv6 (where checking each bit takes a bit longer).
As for using existing optimal constant time membership tests, we couldn't go with too complex algorithms because we would have to reimplement them and maintain them in each of our agent's language. Radix trees were attractive in this regard because this is a widely available data structure and when it's not, it's fairly easy to implement from scratch.
As for range blocking, it's actually very simple: we block the network IP and tag the node as "range". This way, when in the process of looking up an IP we hit a range node, we know we don't have to look any farther. This has the downside of preventing us from processing differently specific parts of the range but it's not a problem in our architecture.
Path compression lets you save on nodes, which is good for memory and make looking up an IP faster (because you have to go through fewer nodes before hitting a leaf or a divergence). Fixed length makes using bit mask to compare a value to the node simpler and take care of alignment issues, but we could do without it if we had to.
If query time became a bottleneck, would you consider putting a Bloom Filter in front of the Radix tree? This would have low memory usage and a low false positive rate. Any positive hits would then do an exact check on the radix tree. So, for say 99% of non-blacklisted IPs, you would get constant time blacklist checks.
And regarding fixed-length: "This lets us optimize the lookup code." More detail here would be cool but it wasn't really the point of the article.
1) Path compression: avoid having intermediate empty nodes, when possible. E.g. replace root-left(void)-left(void)-left(void)-left(item) with root-left(item). Because of the node reduction, you'll get better data cache usage.
2) Use a memory pool for the nodes: you'll save because of avoiding malloc() overhead, plus the possibility of using e.g. 32-bit indexes instead of pointers. Like in (1), this would help you reduce the memory usage, being able to put more nodes in the data cache.
3) Nth level lookup table, so you can jump e.g. instead of starting from the root node, you could go directly to the Nth level. For a minor cost in the insertion, for updating the LUT, you could get an important speed-up (2x-3x, more, or less, depending on your data and how you tune the LUT -fixed level LUT, adaptative-level LUT, etc.-).
For something like this application, which seems heavily read-biased, it seems like a classic B-tree would be great. Or even a bloom filter, although the tradeoff there for false positives is maybe not suitable.
Now you can already save a small constant by putting at each radix level a B-tree again. Then you won't compare x to y again at level (1...n) when you know they are equal (for example a database of all sentences starting with 'The')
True radix trees go further: when every part is finite with a small continuous domain (say 256 elements) you can build a jump table (1 index instead of 8 comparisons and 1 index). Additionally: no balancing needed, less memory needed than a B-tree.
Problems with radix trees: normally you don't store the elements, only rebuild them so no referencing. Extra constraints on your datatype result in higher coupling.
I'm pretty disappointed in your explanation of a radix tree, since I'd only had a very cursory exposure to one before. I don't think it's a good or even decent explanation. The point you try to make about how the tree only stores the end node is pretty incomprehensible when you are coming from a single digit BST to this. You should have sorted something that at least made sense to sort with a radix tree in your example.
This SHOULD NOT be news.
Maybe you guys should revisit your school books.
No, this comment is not "too harsh" or elitistic.
Would you be ok if the people who worked on your house didn't know about construction basics -- and worse, created stuff that's widely known to be ineffective?
> Customers PAY that company for their services.
If the impact of poor programming is material, customers will send their money elsewhere. Customers evaluate products in a marketplace; unless the sale explicitly mentioned the developers' qualifications, it's not like they are defrauding customers.
Once again, it's OK that you (person, programmer) don't know that. But as a company, there should be a strong guarantee that you (company) have the knowledge to build a good product. It's fine for a company to have some people who don't know this stuff, but not to not have anyone capable (and in a position that allows them to) notice these kind of mistakes.
> Maybe programmers should educate themselves in the basics before working on products people depend on?
That remark clearly puts the onus on individual workers and there is no qualifying statement that anyone who doesn't know this particular thing should be allowed to work, under supervision.
Ever heard of the principle of charity? Programmers should educate themselves, and that includes senior supervision. And they should make sure they used what's best for the job when the ship while still learning -- and if they're unsure, they could check with someone...
Nope, didn't wrote off any of those. In fact I encouraged them, saying programmers should educate themselves (and all of those are ways to do that).
What I did write off was letting selling ineffective staff by interns, apprentices, on-the-job trainees ship to customers...
Do customers pay for a quality product or to have whatever done by cheap or unpaid internets shipped to them?
any serious hoster filter out ddos with a router. routers are basically build to test IP against subnets, so they can basically run each IP address against a million of hardware subnet testers all running in parallel and decide what to do depending on the result.
And if you don't have a router, any OS worth its salt already have a radix tree implementation for its routing table. The only thing you need to turn a Linux routing table into a ddos filter is to enable reverse path filtering and then add blackhole routes with iproute2.
Guess what, sometimes it's needed. Even knowing that a given data structure exists is sometimes enough.
It's 5000 times faster than an sequential list lookup.
It is a good discussion of radix trees, though.
But it's not OK for a company to offer a service built in such a naive way (naive from a comp-sci point of view). To me this kind of mistakes are a symptom of having a poor CTO, poor hiring/dev practices, etc.
However, in the case of mitigation of a Denial of Service attack, using IP blocking is an effective temporary measure.
From my experience in dealing with HTTP DDoS-style attacks there's usually a pretty consistent block of IP addresses where the attack originates from, i.e. an AWS or Hetzner IP range. When the choices are to block IPs temporarily or have the entire service go down due to resource exhaustion, the choice is pretty clear cut.
We'd implement the block to keep our service up then report attacker IPs to AWS or whoever maintained the range in question. As the attack waned, we'd re-enable the range.
Of course, IP blocks usually did not work as well with udp-based reflection style DDoS mitigation, but we'd still see certain IP ranges responsible for significant portion of the attack.
We ran our own hardware though, so resources like CPU and inbound bandwidth weren't elastic enough to outpace an escalating DDoS past 120-140 Gbps. We also eventually moved to adding DDoS mitigation through third parties like CloudFlare.