
Judy, an efficient sparse dynamic array implementation - tillulen
http://judy.sourceforge.net/
======
tptacek
I've always thought Sean Barrett's take on Judy Arrays was a really good
exemplar of casual computer science writing:

<http://nothings.org/computer/judy/>

 _There are a few potential disadvantages to keep in mind about the Judy
approach. Its author argues that approaching the problem as engineering means
accepting that complexity is how you get good performance; simplicity is for
theoreticians. But this complexity leads to things like a data structure with
20,000 lines of code designed on the assumption that a cache-line is 64 bytes
being used primarily on machines with 32-byte cache lines; a redesign
optimized for 32-byte cache lines would come close to starting from scratch._

...

 _If your data is strictly sequential; you should use a regular array. If your
data is often sequential, or approximately sequential (e.g. an arithmetic
sequence stepping by 64), Judy might be the best data structure to use. If you
need to keep space to a minimum--you have a huge number of associative arrays,
or you're only storing very small values, Judy is probably a good idea. If you
need an sorted iterator, go with Judy. Otherwise, a hash table may be just as
effective, possibly faster, and much simpler._

 _Furthermore, there aren't many ways to improve Judy, but there's a lot of
room to improve hash tables_ ... read the article for the rest of this (very
good) graf.

Oh, and if it helps his credibility any...

<http://nothings.org/computer/knuth.jpg>

:)

~~~
enqk
I ported his test suite to the gnu toolchain at:
<http://www.uucidl.com/git/?p=hashperf.git;a=summary>

------
vicaya
According to Nikolas Askitis' PhD thesis, Judy Array actually underperforms
Burst trie (also a sorted collection) by significant margins in both space and
time. Burst trie is much easier to implement (a few hundred lines of C++ or
Java code) and tune but he has so far refused to release his code for people
to independently verify his claims.

cf. <http://www.naskitis.com/>

------
moonpolysoft
Judy arrays are really great. I have a local cache implementation in erlang
which uses them: <http://github.com/cliffmoon/cherly>

Even with a small number of items they are relatively fast, however they do
not really start outperforming ordinary hash tables until you get above around
100,000 or so items.

