
Hypothetical Indexes in PostgreSQL - davidquilty
https://www.percona.com/blog/2019/06/21/hypothetical-indexes-in-postgresql/
======
fake-name
I wrote a custom SP-GiST index for fuzzy image searching. Along the way, I
also wound up with a decent example in how to do a simple binary-tree SP-GiST
index.

SP-GiST indexes are specific to optimizing data that can fit into a metric-
space, but if your application can use that, it's on github here:
[https://github.com/fake-name/pg-spgist_hamming](https://github.com/fake-
name/pg-spgist_hamming)

~~~
tsomctl
Hey, fake-name, I've been following your work. Have you looked into faiss from
Facebook or annoy from Spotify? Their biggest drawback is that you can't
modify the index, you have to create a new one if you want to add additional
entries.

~~~
fake-name
> Their biggest drawback is that you can't modify the index

I hadn't looked into them, but being read only would make them useless for my
use case.

Basically, my application is online zips-of-images set deduplication, with a
continuous input stream of files. I have to be able to add new values
continuously.

------
d33
Apologies for being negative, but... Is it just me or does it look like a bad
abstraction trying to solve a problem that was created by SQL not being direct
enough in the first place?

I had been working with a couple billion-row-scale PostgreSQL setups and
trying to please query planner was sometimes a project in itself. Many times I
just wished I could specify an algorithm on my own instead of having SQL
interpreted as one plan for one variable, but as another if I replace it. Did
anyone else have a similar impression?

~~~
SigmundA
Yes I run into this a lot in SQL server but SQL server does let you do hints
which in my understanding PG does not. No hints would be excruciating, I don't
care how good the optimizer is, sometimes I know better and sometime the
optimizer is brain dead.

~~~
snuxoll
> No hints would be excruciating, I don't care how good the optimizer is,
> sometimes I know better and sometime the optimizer is brain dead.

PostgreSQL treats poor planning from the optimizer as a bug, not something
that humans should be forced to work around. File bug reports on the mailing
list, don't hack up your SQL.

~~~
btilly
Yes, yes. PostgreSQL developers don't care about messy real world use cases.
News at 11.

I reiterate that no optimizer is perfect. Which means that if PostgreSQL redid
some obscure statistic and a common query just switched to a far worse query
plan, you've got a disaster and no easy way to fix it. Even if you know the
right query plan, there is no easy way to fix it.

Similarly "this query worked fine in development and staging, but mysteriously
sucks in production" is not what you want to learn while deploying to
production.

Yes, I've seen both failure modes. No it wasn't pretty. And no, "File a bug
report, wait for the next version of PostgreSQL to have a fix" is not an
acceptable action plan.

In general if you've got a working site and working queries, there are two
possible outcomes to changing the query plan. Either nobody notices because
the plan was already good enough. Or everybody is upset because the new plan
is worse. I don't care if 99% of the time it turns out well and nobody
notices, that 1% keeps people up at night.

The top missing feature in PostgreSQL is this. If the query plan is good,
allow the query plan to get locked and not change. Predictability beats
perfection in production environments.

~~~
derefr
> In general if you've got a working site and working queries, there are two
> possible outcomes to changing the query plan. Either nobody notices because
> the plan was already good enough.

This assumes a use-case where there is a "site" (i.e. a business layer in
front of PG that can abstract away deficiencies in the query plan by adding
e.g. caching.)

If you're directly working with PG, interactively writing OLAP reporting
queries, you can have "working queries" that take ~300s (still practical
enough to build your report) switch to taking milliseconds after an autovacuum
(if, for example, a table being joined against was recently populated via ETL
and hadn't been vacuumed yet.) People _do_ notice that.

------
hinkley
My C is too awful at this point to get over my impostor syndrome and
contribute to Postgres, but I've always been curious about the prospects of
scaling Postgres higher by having different indexes on different replicas.

Lacking anything like this, we have to make the OLAP vs OLTP workflow splits
sooner than we likely would have. Or at least, for companies that are growing
organically. For VC backed companies this may be a difference of only 3-6
months.

~~~
chillydawg
Using logical replication you can do this. Effectively, it propragtes only the
data changes and you can control the ddl changes. I use this to great effect
to have a prod db logically replicating to a data warehouse with different
indexes and views AND to a hotstandby via the usual binary wal shipping
method.

------
airstrike
Why does this website want to send me notifications?...

~~~
tomnipotent
Marketing teams that are clueless about their customers and read "Top 10"
tricks found after a Google search to create BS metrics to send to managers &
execs at the end of the week because "progress".

~~~
hinkley
I've noticed that the preponderance of articles posted on Hacker News now,
while usually quite good in and of themselves, are often absolutely rotten
with clickbait links.

The last one I read had an outbound link to something like "Top 20 harshest
breakup messages of all time." And that was amongst an array of 8 equally
awful options.

I'm not sure what's been going on lately but it's not good.

~~~
2T1Qka0rEiPr
My approach when I am confronted with 1) cookie-bar, 2) GDPR-bar, 3) fixed
nav, and 4) notification popup, is just to hit F9 (Win, annoyingly worse in
Ubuntu) within Firefox to just go into reader mode. It makes the internet more
bearable.

~~~
hinkley
Sounds like a potential add-on to me.

Like the "remember password" popup but for reader mode.

(but please only prompt me if I exhibit a pattern of using it, not every
single time I do)

