
Hash indexes are faster than btree indexes? - dhd415
http://amitkapila16.blogspot.com/2017/03/hash-indexes-are-faster-than-btree.html
======
craigkerstiens
This is buried in a few other comments, but it bears repeating as it's perhaps
far more important than any performance gains you'd receive from hash indexes
over btree... Hash indexes are not crash safe and not WAL logged. This means
if you're running with high availability or even using something for disaster
recovery like WAL-E[1] and need to recover your database your index would not
be in a working state after recovery. Instead you'd need to reindex that data.

For any real production database hash indexes should be used with extreme
caution if used at all.

[1]. [https://github.com/wal-e/wal-e](https://github.com/wal-e/wal-e)

~~~
Veratyr
This seems to be mentioned in the very first sentence of the article:

> PostgreSQL supports Hash Index from a long time, but they are not much used
> in production mainly because they are not durable.

And there's a good reason for bringing them up anyway in the second sentence:

> Now, with the next version of PostgreSQL, they will be durable.

~~~
rosser
Saying "they are not durable" downplays the extent of how non-durable they
are. That kind of "non-durability" needs flashing neon signs prominently
featuring very large arrows, pointing at the specifics, not _four vague
words_.

~~~
arez
why are they not durable? why shouldn't you use them for big tables? For
example for PRIMARY KEY lookups, I never ask for a range of PRIMARY KEYs, so
why not use a hash?

I think I'm missing the important point.

~~~
ksherlock
Durability (the D in ACID) means that if the power goes out or somebody kill
-9s it, the database will be in a consistent state and anything that was
committed was in fact committed. In current and previous versions of PostGres,
pulling the plug at the wrong time could leave hash indexes in an inconsistent
state. That's fixable by deleting and recreating the index but the time
required to do so is proportional to the table size. In the next version of
PostGres, it won't be a problem and hash indexes can be used.

In the case of a primary key, a row is inserted or deleted but the power goes
out before the hash index is updated. When the power comes back, the hash
index still points to rows were deleted and is missing rows that were
inserted.

------
ioltas
Something that the article does not mention: equality operations are cheaper
with hash indexes than btree because less data pages need to be fetched. This
is good particularly on columns with a high cardinality.

------
makmanalp
Hash indexes have other benefits than just algorithmic complexity too:
traversing and updating tree structures requires a lot of pointer chasing, so
in cases where your btree index might not fit well in a reasonable level of
cache (L3 etc), then this means paging stuff in and out of memory a lot, which
can get very expensive very quick.

And also with updates to hash tables (e.g. transactional workloads), they're
very quick, versus btree updates require traversing the tree structure.

~~~
martincmartin
Hash table updates are quick until you have to rehash.

Rehashing can be done in the background, but then you had added code
complexity, you'll be doing significantly more I/O during the rehashing (did
you think about rehashing when sizing your machines?) and in pathological
cases, you can run out of hash table space before the resize is complete. Now
what?

~~~
neilc
Postgres hash indexes use linear hashing, which effectively means that
rehashing/expansion is done incrementally:
[https://en.wikipedia.org/wiki/Linear_hashing](https://en.wikipedia.org/wiki/Linear_hashing)

------
koolba
Question for the Postgres developers who I know are lurking around here ...

What's the disk usage of hash indexes v.s. regular btrees?

Does it depend on the underlying type or is it a fixed size regardless of the
key fields?

How's this compare to the poor man's hash index, i.e. BTREE(HASH(keys...))?

 _EDIT: I got curious so I ran a test of building btrees vs hash indexes. This
is done on 9.6 (not HEAD with the new WAL logging supported hash indexes).
Looks like the sweet spot is when the key size is around 20 bytes. After that
the fixed size of the hash index gives it a win (on space usage) v.s. the
btree which would have to repeatedly store the key values._

    
    
        CREATE TABLE IF NOT EXISTS hash_test_results (
          key_size    int,
          btree_size  bigint,
          hash_size   bigint
        );
        
        CREATE OR REPLACE FUNCTION pg_temp.test_hash_index(p_key_size int)
        RETURNS void
        AS
        $BODY$
          DROP TABLE IF EXISTS index_test;
          CREATE TABLE index_test as select lpad('' || x, p_key_size, '0') AS a FROM generate_series(1,100000) x;
          CREATE INDEX index_test_ix_btree ON index_test USING btree (a);
          CREATE INDEX index_test_ix_hash ON index_test USING hash (a);
        
          INSERT INTO hash_test_results
            (key_size, btree_size, hash_size)
          SELECT
            p_key_size AS key_size,
            (SELECT pg_relation_size(c.oid) FROM pg_class c WHERE c.relname = 'index_test_ix_btree') AS btree_size,
            (SELECT pg_relation_size(c.oid) FROM pg_class c WHERE c.relname = 'index_test_ix_hash') AS hash_size;
        $BODY$
        LANGUAGE SQL;
        
        SELECT pg_temp.test_hash_index(x * 8) FROM generate_series(1, 16) x;
        
        => SELECT
          key_size,
          pg_size_pretty(btree_size) AS btree_size,
          pg_size_pretty(hash_size) AS hash_size
        FROM hash_test_results;
         key_size | btree_size | hash_size 
        ----------+------------+-----------
                8 | 3104 kB    | 4112 kB
               16 | 3992 kB    | 4112 kB
               24 | 4880 kB    | 4112 kB
               32 | 5792 kB    | 4112 kB
               40 | 6648 kB    | 4112 kB
               48 | 7592 kB    | 4112 kB
               56 | 8464 kB    | 4112 kB
               64 | 9352 kB    | 4112 kB
               72 | 10 MB      | 4112 kB
               80 | 11 MB      | 4112 kB
               88 | 12 MB      | 4112 kB
               96 | 13 MB      | 4112 kB
              104 | 14 MB      | 4112 kB
              112 | 15 MB      | 4112 kB
              120 | 15 MB      | 4112 kB
              128 | 16 MB      | 4112 kB

~~~
threepipeproblm
I'm not a Postgres specialist but, if memory serves, when joining 2 tables the
entire hash must be loaded on 1 side (typically the smaller side, of course).

~~~
koolba
You're mixing up hash joins with hash indexes.

~~~
threepipeproblm
I know they're different things, but isn't the main reason to do hash indexes
in a relational db to then to a hash join?

More to the point: even if you're just talking indexes you can't load/use half
a hash table, so wouldn't the point also apply to isolated use of a hash
table.

I know it's not what was asked for, just a related point.

~~~
koolba
> I know they're different things, but isn't the main reason to do hash
> indexes in a relational db to then to a hash join?

I think the primary use case is O(1) lookup by key, i.e. similar to how
key/value stores work.

I'm not aware of the planner being able to reuse an existing hash index to
perform a hash join. AFAIK the hash buckets are created on the fly. That does
sound like a cool idea though.

> More the point, even if you're just talking indexes you can't load/use half
> a hash table, so wouldn't the point also apply to isolated use of a hash
> table.

What do you mean by half a hash table?

~~~
threepipeproblm
Thanks for this clarification

>> What do you mean by half a hash table?

What I'm trying to get across is my impression that you don't have to load an
entire B-tree index into memory to use it... you can _traverse_ it. Whereas a
hash index can't be "streamed" or traversed like that, you have to use the
whole thing.

I read some papers on hash indexes and hash joins several months ago, and know
little about the Postgres specifics... just wanting to wrap my head around it,
so please help me understand if I'm off base with this.

~~~
threepipeproblm
>> That means you don't load it into memory at all, you just jump to
Bucket[Hash(KEY)] and then check there for KEY (it may or may not be there due
to false positives and the bucket being full).

Right, of course. You're making me realize this really _is_ my confusion
between hash indexes and hash joins. The issue I'm thinking of is all about
dynamically constructed hash joins. Sorry, you called it right from the start
but it took me a minute to see... thanks for helping talk through it.

------
almost
What does durable mean in the context of a Postgres index? I can sort of guess
but from the way it's used here it seems like it has a well know definition. A
quick google search (on my phone) returned this article a few times but not
much to else that explains it.

~~~
ivoras
Durable in the ACID sense, or specifically that write operations go through
the WAL (write-ahead log, i.e. a journal), with all the consequences this
brings (both durability and replication).

Before this work was done, hash indexes could literally be corrupted by power
outages, etc.

~~~
almost
Oh right, that's scary! In the case of corruption would the index just be
rebuilt (I realise that's a big problem with big indexes) or would it just
silently corrupt query output?

~~~
fabian2k
Caution

Hash index operations are not presently WAL-logged, so hash indexes might need
to be rebuilt with REINDEX after a database crash if there were unwritten
changes. Also, changes to hash indexes are not replicated over streaming or
file-based replication after the initial base backup, so they give wrong
answers to queries that subsequently use them. For these reasons, hash index
use is presently discouraged.

That is the warning in the current version of Postgres about hash indexes.

------
aikorevs
How do they work for range retrievals ?

EDIT: Isn't that the point of btree ?

~~~
Tostino
No range lookups. No order by speedups. One thing that I can see Hash Indexes
being really useful for is people who use UUID primary keys. Pretty much the
only thing you do with a UUID is equality anyways, so if Hash Index is faster
than btree, it makes sense to use that instead.

~~~
bpicolo
Until you hit the problem of needing to iterate over your table (batches).

As far as I can tell, the only real way to do efficient iteration over
postgres tables is to continually bound ID at either end and then take an
offset. (SELECT * from <foo table> where id > <largest id from last batch>
limit 10). With btrees on UUID you get lexicographic ordering here I imagine,
so it'll work. With hash indexes no dice.

Cursors don't end up working out because if you kill long-running queries or
idle connections (which most websites should, in production) you'll kill
applications using cursors. With pgbouncer, at least, reading from cursors
doesn't seem to bump idle timeouts.

~~~
throwaway91111
Surely hash indexes have some stable ordering you can order by, even if its
meaning is obtuse.

~~~
deathanatos
Presumably, hash indexes are based on hash tables underneath, which don't
generally have a stable ordering, even an obtuse one. (Any write could cause
the underlying table to get resized, which will cause the contents to get re-
ordered.) _Perhaps_ if you can guarantee that writes aren't happening, they
might, but I don't think this is generally applicable to most people's use
case.

(but range queries / being able to iterate are generally why I prefer btrees,
unless I have a good reason. Keep in mind that the O(log n) on B-trees is log
with a _huge_ base — it's not power of two. It takes very few disk reads for a
b-tree to find its query; it's just not the O(1) of a hash table.)

~~~
throwaway91111
Ah, yes I see now.

------
qjighap
These articles drive me crazy when they don't mention the testing process.
Yes, he is using a "server class" machine, but did he disable the features on
a hardware level that would skew the test. Answer: probably. I'll assume he
did some sort of clear to makes sure that the data wasn't cached on the hard
drive or assume that all the reads were in memory. I can't even tell if he ran
the test over several iterations.

It isn't that I am doubting the results, but without some trail on how to get
the results I can't make this useful information. For example if I were
architecting a system and was using this as information I have no idea if I
run this in a VM that I need enough memory to get these results.

~~~
mohaine
Did you read the article?

Quote:

To start with let us see the impact of work done to improve the performance of
hash indexes. Below is the performance data of the pgbench read-only workload
to compare the performance difference of Hash indexes between 9.6 and HEAD on
IBM POWER-8 having 24 cores, 192 hardware threads, 492GB RAM.

The workload is such that all the data fits in shared buffers (scale factor is
300 (~4.5GB) and shared_buffers is 8GB).

And chart itself says it is a median of 3 5 minute runs.

~~~
qjighap
I believe I did.

Why would I assume that Concurrent runs == Seperate Runs or that other caching
mechanisms aren't in place or really anything. Computers do really odd things
trying to optimize and making assumptions that your system is the same over a
period of 30 minutes when you don't even know it is the same 30 minutes is
concurrent. There is all sorts of stuff that gets in the way of performance
tests and I would like to know how it is mitigated. Again, I am sure there is
proper process, but why wouldn't know want to know what that is?

------
ThePhysicist
Great news if they will really make hash indexes durable! That will allow some
interesting new developments, such as O(1) efficient lookups regardless of
table size. This in turn will make it easier and more appealing to build graph
databases using Postgres.

~~~
jeltz
Yes, they are durable now in the master branch.

------
btreecat
I'm still dubious of any micro benchmark that says "x is better than y" since
these are often not real world work loads.

Tuning/optimization starts with using the right schema.

~~~
marcosdumay
Hash indexes are "the right way" to speed-up index lookups.

They aren't used in Postgres because their implementation there has many
problems. One of them is that they are slower than the theoretically
suboptimal B-trees. In the process of fixing those problems, people will
compare both indexes many times.

------
Twirrim
Okay, maybe so, but this benchmarking was done against POWER8 architecture,
and not many people are running that. How does it stack up on x86_64?

------
godmodus
The problem with hashes is that they offer a likelihood of collision which in
practice and for very large sets, does happen.

Trees don't and are easier to reason about and prove. Esp. Important for
critical systems or projects on massive scales,ud want a tree ADT on that Mars
rover.

That's the gist of my conversation with my algorithms tutor about why my use
of hashing to solve problems, while correct, is flawed.

I did get the points for those exercises and my lower bounds were alot better
than what was demanded, but the problem lies in the fact that it's not
guaranteed.

------
neeleshs
Would be interesting if postgres started supporting sorted keys, like HBase.

------
faragon
How they do show sorted data when not having a tree, but hashes?

------
threepipeproblm
Unless you need to fetch items in order, for some reason.

~~~
Tostino
Which is a nice side effect if you can trust the order of your primary key.

