
How Judy arrays work and why they are so fast (2002) - dmazin
http://judy.sourceforge.net/doc/10minutes.htm
======
bangonkeyboard
Independent performance comparisons:
[https://nothings.org/computer/judy/](https://nothings.org/computer/judy/)

~~~
nudpiedo
This illustrates perfectly the premature optimization paradox, unless having
huge datasets and the need to scale towards huge numbers it won't be much
worth and still hashing maps are going to outperform Judy in certain use
cases. Having clarity of code and being able to patch a critical bug is also a
priceless competitive advantage, and to be honest, diving into a 20k line
codebase related to algorithmic intensive code just is nothing to be
underestimated... I wonder if someone uses Judy in production and if they made
also some measurements.

~~~
megous
I used Judy for spkg (spkg.megous.com), to efficiently store refcounted list
of all files on the system in ram. It performed excellently even on 486DX2 CPU
with 16MB of RAM for this purpose. Any space saving is great in that
situation.

I also use it as part of a custom indexing/inverted index search engine daemon
for PostgreSQL, in one of the bookstore e-shops I made more than a decade ago.

~~~
nudpiedo
Glad to read a real world experience from someone. Memory saving is a great
point. Did you have the chance to compare it against other structures before
you settled with Judy?

~~~
megous
I compared it with plain newline separated list of absolute paths, and savings
were such that I could not imagine beating it with anything else I knew at the
time, which would have similar characteristics.

The dataset has a lot of shared prefixes, and Judy excels at storing such
data.

It was 15 years ago so don't ask me for details.

~~~
nudpiedo
There is no point to compare it with something from 15 years ago, but in that
case, the natural competing data structure would have been a Trie, which would
have made searches at logarithmic speed (based on path) and saved as much
memory as it is possible.

------
dang
Some previous threads:

2013
[https://news.ycombinator.com/item?id=5639013](https://news.ycombinator.com/item?id=5639013)

2013
[https://news.ycombinator.com/item?id=5043667](https://news.ycombinator.com/item?id=5043667)

2012
[https://news.ycombinator.com/item?id=3675759](https://news.ycombinator.com/item?id=3675759)

2010
[https://news.ycombinator.com/item?id=1419526](https://news.ycombinator.com/item?id=1419526)

2009
[https://news.ycombinator.com/item?id=859336](https://news.ycombinator.com/item?id=859336)

------
viluon
See also
[https://db.in.tum.de/~leis/papers/ART.pdf](https://db.in.tum.de/~leis/papers/ART.pdf)

------
cmrdporcupine
I read up on these many years ago (around the time of the original essay)..
seemed like impressive work.

But are the assumptions made still applicable to newer hardware?

------
ChrisSD
In a way Judy arrays illustrate the difference between C's memory model (flat
memory) and real hardware (with cache lines).

~~~
kazinator
C doesn't have a flat memory model, other than within objects. ISO C doesn't
even define the behavior of comparing two pointers to distinct objects for
inequality, as in p0 < p1. Only exact equality: p0 == p1.

