
Dynamic Hash Tables (1988) [pdf] - tosh
http://www.csd.uoc.gr/~hy460/pdf/Dynamic%20Hash%20Tables.pdf
======
antirez
Hash tables for me were not just a data structure, I had some form of love for
this way of organizing data. Can't forget the articles in the Dr. Dobb's
issues about the new hashing schema, or how the new processors caches + access
locality could change the game and so forth. But at this point, in 2018, I
believe that hash tables should be _rarely used_.

Take Redis for instance. I used an hash table to start. Very early in the
project I discovered that people were going to use it to store large amounts
of data and rehashing times were a too big price to pay, so let's switch to an
incremental rehashing schema, with all the complexity. Then there was to
implement some form of iteration (see the Redis *SCAN family of commands). So
let's fight against the complexity of an incrementally rehashing hash table in
the context of a stateless cursor iterator. and so forth.

Basically a decent implementation of hash tables end being slower and more
complex to handle than a good radix tree implementation (not to implement,
radix trees when full featured are a nightmare to get right, if you have a
stateful iterator). So once you implement a good radix tree (check for
instance the "rax.c" file inside Redis source code, or simply
[https://github.com/antirez/rax](https://github.com/antirez/rax)), you start
to see that, hey! you have all the advantages of an hash table, but it's
faster, the key space is sorted, you can iterate it in many ways, and the
memory usage is much smaller.

So I'm not sure at this point, other than "the simplest dictionary data
structure I can implement just recalling from memory", if there are great
applications for in-memory hash tables as data structures. They are perhaps
starting to be more interesting for disk applications.

For all my new projects I imagine myself just copying rax.[ch] and use it,
unless there is some very odd reason why an hash table will be preferable.

~~~
Veedrac
The issues you give make sense when your datastructure grows very large or
contains hash-unfriendly data, but it seems a plain hash table would still be
appropriate in day-to-day scenarios. A simple int→int mapping is very fast in
a hash table, and even string keyed hash tables are near unbeatable if you can
reliably cache the hashes. The cases Radix trees win seem fairly small, as do
the times you need incremental resizing.

