
Minimal Perfect Hash-Tables in Common Lisp - vseloved
http://lisp-univ-etc.blogspot.com/2018/01/minimal-perfect-hash-tables-in-common.html
======
zokier
The algorithm used (EPH) seems bit curious. The paper says "The EPH algorithm
was implemented in the C language and is available at
[http://cmph.sf.net"](http://cmph.sf.net"), but that page has no mention of
EPH and I even checked archive.org. I wonder why they ended up never actually
releasing a version of cmph with that algorithm. Two years later they seem to
have come up with another algorithm, CHD, which was actually released in cmph.
Interestingly enough the CHD paper has no comparisons to EPH either.

~~~
rurban
Yes, interesting. Never heard of EPH before. cmph contains only: BDZ, BDZ_PH,
BMZ, BMZ8, BRZ, CHD, CHD_PH, CHM and FCH.

EPH seems to be better than BDZ, CHM and FCH, but only for really huge tables,
like >100.000 entries. For smaller hash tables a simple perfect hash or even
an optimized memcmp switch table is still faster.
[https://github.com/rurban/Perfect-
Hash#benchmarks](https://github.com/rurban/Perfect-Hash#benchmarks) cmph is
usually 2-3x slower than a trivial PH, and much slower for small tables.

------
taeric
I love the reference to static structures. Indeed, I have a pet theory that
most love of immutable lists is actually a love of static ones. Though, I
typically expand that to measure statically visible in the code.

------
resource0x
perfect map in dart: [https://github.com/tatumizer/pigeon_map#how-it-
works](https://github.com/tatumizer/pigeon_map#how-it-works)

------
kruhft
Relevant, alternate implementation for C and C++:

[https://www.gnu.org/software/gperf/](https://www.gnu.org/software/gperf/)

~~~
rurban
Nope, not at all. gperf creates totally different and unoptimized perfect
hashes, not minimal at all.

~~~
kruhft
Good to know, thanks.

------
justinhj
“(although, for many methods, it can still be bounded by amortized O(1) time)”

I don’t follow this. Does it mean that you can build the perfect hash table in
constant time? Surely you can’t beat linear.

~~~
aidenn0
Inserting into a non-static hash table is measured per-element so the blog is
also saying O(1) per element, which is linear time.

~~~
martincmartin
tl;dr: Each insertion is O(n), but n insertions are also O(n). Saying
"amortized time is O(1)" just means the time for n operations is O(n).

To be pedantic: some individual insertions can take more the O(1), namely when
you have to rehash it can take O(n) time. So the tight upper bound on each
insertion is O(n). So doing n of them seems like it might take O(n^2).

Except you can't rehash on every insertion. So even though the time to insert
one is O(n), the time to insert n is also O(n). If you amortize that time over
all n insertions, you get O(n) / n == O(1).

