
PostgreSQL's Hash Indexes Are Now Cool - rachbelaid
http://rhaas.blogspot.com/2017/09/postgresqls-hash-indexes-are-now-cool.html?m=1
======
tofflos
Are there any plans for allowing hash indexes in uniqueness constraints such
as the ones created for primary keys? It seems like a good fit for an index
that is specialized for equality checks.

~~~
snaky
Maybe it would be easier just to remove "A foreign key must reference columns
that either are a primary key or form a unique constraint" restriction,
stating that any unique index is enough.

~~~
anarazel
> Maybe it would be easier just to remove "A foreign key must reference
> columns that either are a primary key or form a unique constraint"
> restriction, stating that any unique index is enough.

The problem is that only btree indexes support uniqueness atm. That's the
relevant unsupported features, not the ability to have constraints (which
essentially just requires uniqueness support of the underlying index):

    
    
        postgres[27716][1]# SELECT amname, pg_indexam_has_property(oid, 'can_unique') FROM pg_am;
        ┌────────┬─────────────────────────┐
        │ amname │ pg_indexam_has_property │
        ├────────┼─────────────────────────┤
        │ btree  │ t                       │
        │ hash   │ f                       │
        │ gist   │ f                       │
        │ gin    │ f                       │
        │ spgist │ f                       │
        │ brin   │ f                       │
        └────────┴─────────────────────────┘
        (6 rows)
    
    

Edit: different uses of word constrain (to constrain, and a constraint) seemed
too confusing.

~~~
jontro
Is the hash guaranteed to be unique though? There is always a possibility of a
hash collision

~~~
anarazel
> Is the hash guaranteed to be unique though? There is always a possibility of
> a hash collision

I mean my point is that hashindexes do _not_ support uniqueness right now. But
hash collisions wouldn't be a problem there. Does mainly require some tricky
concurrency aware code (consider cases lik ewhere one transaction just deleted
a conflicting row but is still in progress, and a new value like that is
inserted, etc).

------
misterbowfinger
Wasn't sure what a hash index was vs. btree

Short version - hash indexes are faster in PG11, but they only apply to "where
= foobar" queries, giving a 0(1) time. Btree indexes have O(logn)

But hash indexes can't be applied to range clauses, like "where < 50". You can
still use a btree index however.

SO post:

[https://stackoverflow.com/a/398921](https://stackoverflow.com/a/398921)

~~~
colanderman
Keep in mind that the log n factor of a B+tree is generally very low; B+tree
branching factors are typically in the 100s. Also, the first few levels are
generally kept in cache, so you'll only have to hit disk for inner nodes past
100 million entries or so.

Finally, hash indexes always require that the found row be confirmed in the
data table, even for simple existence queries, since the keys themselves
aren't stored in the hash table. (This is why hash indexes can't be UNIQUE.)
B+trees can often answer such queries without the extra lookup (an "index-only
scan"). If your B+tree is so large that its inner nodes spill onto disk
(necessitating a 2nd disk seek), chances are the equivalent hash index will as
well, which, combined with the consult of the data table, kind of negates the
benefit.

~~~
ankrgyl
You're right, but I don't think the use case you mentioned (as a competitor to
B+ trees for direct lookups) is the target use case for a hash index.

A hash index can be a big win for nested loop joins, especially at high
concurrency. It is quite common to build a hash table over a subset of the
inner table of an equijoin. This is (a) slow to construct and (b) memory-
intensive (especially if many of these queries are run concurrently).

With a hash index, a lot of cases that required building a hash table to speed
up a query can just use the hash index directly. Furthermore, every concurrent
instance of the query can use the same hash index. This is a big win for both
performance of a single query (latency) and query scalability.

~~~
koolba
That's a hash join which has existed in Postgres for many years. Hash indexes
are unrelated.

The primary use case of hash indexes is situations where the indexes fields
are very large. The hash index uses a fixed size regardless of the width in
bytes of the key so it "wins" storage wise when the key is wide.

~~~
anarazel
> That's a hash join which has existed in Postgres for many years. Hash
> indexes are unrelated.

I think the OP ankrgyl is aware of that. To quote:

>> It is quite common to build a hash table over a subset of the inner table
of an equijoin. This is (a) slow to construct and (b) memory-intensive
(especially if many of these queries are run concurrently).

~~~
koolba
Ah I think I misread that section with the "nested loop join" stuck in my
mind.

------
zitterbewegung
I hope Postgres keeps on getting Cooler. Reading all the Change Logs makes me
feel warm and fuzzy inside .

------
JohnCohorn
I remember someone commenting a while back that with hash indexes allowing you
to navigate relationships in O(1} time, relational DBs can approximate the
perf characteristics of a graph DB. When I did a quick and dirty test a couple
years ago(before they would have been usable anyway due to durability and
replication) I found that hash indexes performed noticeably worse than btree
for navigating a few test tables with 10-100m rows each. Curious whether this
is a major enough improvement for hash indexes that btree will not still be
faster for many common equality lookups.

------
qaq
Any good articles that go into detail on when to use hash indexes in PG 10?

------
alexnewman
We have been talking about this for a while. Consider the coming of more
distributed and nvme based storage and a lot of these things eventually make
sense.

------
Annatar
Is it just me, or has PostgreSQL kept on getting better and better, to the
point of being #1 DB in the open source market?

------
frandroid
> A report from a tester who goes by "AP" in July tipped us off to the need
> for a few further tweaks. AP found that trying to insert 2 billion rows into
> a newly-created hash index was causing an error.

This is what I call trying things "at Indian scale" :D

