
Ask HN: Which hash function for join algorithms - ritchie46
I am implementing join algorithms and used FNV hash. Before doing any benchmarks I wonder is there a defacto standard hash function to use for join algs? Does anybody know what Postgres uses for instance?
======
brudgers
The fastest algorithm will depend on the nature of the data. So Postgres
maintains statistics, allows indexes, and uses a query optimizer to find an
"unterrible" approach. The measure of terribleness is IO.

Or to put it another way, there isn't a silver bullet method based on theory.
In practice it is always one of a kind for the job in hand. That's why
database tuning is a thing.

Good luck.

~~~
ritchie46
Right.. So testing, testing, testing it is.

~~~
brudgers
It’s not so much testing as reasoning about the data and access patterns. If
you’re joining two tables, putting the smaller in memory and streaming the
larger past it will often result in fewer IO’s than vice versa because the
larger table will need to be read fewer times. How many fewer is a question of
the size of each table and the size of memory. This can be reasoned about with
a pen and paper.

