Hacker News new | comments | show | ask | jobs | submit login
Judy, an efficient sparse dynamic array implementation (sourceforge.net)
25 points by alexkon on Oct 3, 2009 | hide | past | web | favorite | 4 comments



I've always thought Sean Barrett's take on Judy Arrays was a really good exemplar of casual computer science writing:

http://nothings.org/computer/judy/

There are a few potential disadvantages to keep in mind about the Judy approach. Its author argues that approaching the problem as engineering means accepting that complexity is how you get good performance; simplicity is for theoreticians. But this complexity leads to things like a data structure with 20,000 lines of code designed on the assumption that a cache-line is 64 bytes being used primarily on machines with 32-byte cache lines; a redesign optimized for 32-byte cache lines would come close to starting from scratch.

...

If your data is strictly sequential; you should use a regular array. If your data is often sequential, or approximately sequential (e.g. an arithmetic sequence stepping by 64), Judy might be the best data structure to use. If you need to keep space to a minimum--you have a huge number of associative arrays, or you're only storing very small values, Judy is probably a good idea. If you need an sorted iterator, go with Judy. Otherwise, a hash table may be just as effective, possibly faster, and much simpler.

Furthermore, there aren't many ways to improve Judy, but there's a lot of room to improve hash tables ... read the article for the rest of this (very good) graf.

Oh, and if it helps his credibility any...

http://nothings.org/computer/knuth.jpg

:)


I ported his test suite to the gnu toolchain at: http://www.uucidl.com/git/?p=hashperf.git;a=summary


According to Nikolas Askitis' PhD thesis, Judy Array actually underperforms Burst trie (also a sorted collection) by significant margins in both space and time. Burst trie is much easier to implement (a few hundred lines of C++ or Java code) and tune but he has so far refused to release his code for people to independently verify his claims.

cf. http://www.naskitis.com/


Judy arrays are really great. I have a local cache implementation in erlang which uses them: http://github.com/cliffmoon/cherly

Even with a small number of items they are relatively fast, however they do not really start outperforming ordinary hash tables until you get above around 100,000 or so items.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: