
PostgreSQL Indexes: First principles - craigkerstiens
http://eftimov.net/postgresql-indexes-first-principles
======
ketralnis
I've come to believe that designing systems like SQL to have queries that work
transparently whether or not an index exists is wrong. There may be people
that don't care how their query is actually executed, but I doubt it's the
common case.

So many times I've fought or watched other developers fight with getting a
DBMS to use an index. Or been surprised to find out that production doesn't
have that index but staging does. Or found out that someone added a lower() to
a query that made the index stop working. Or found out that index turns out to
be more expensive than scanning the table directly in a circumstance that is
different between instances of the same software.

Contrast that to an API like the Cassandra Thrift interface. You know if
you're doing something expensive because the API only exposes the cheap
things. It doesn't have an implicit sequential scan. You know you're doing one
because you had to type out `while(next=...)` right in your code.

Something as simple as

    
    
        SELECT mycolumn FROM mytable WHERE INDEXED b=5
    

vs

    
    
        SELECT mycolumn FROM mytable WHERE SCAN b=5
    

and refusing to run if the assertion isn't met would go a long way to making
the API predictable

~~~
rpedela
> There may be people that don't care how their query is actually executed,
> but I doubt it's the common case.

I bet it is pretty common. Personally, as long as the query is fast enough for
my needs I don't care how it executes. I only start caring when performance
sucks. I wouldn't be surprised if many other developers have a similar
opinion.

~~~
ak4g
Confirming. Vast majority of SQL isn't written by developers or engineers to
begin with, it's not even close. I administer a few DBs almost entirely
queried by other departments, they'll ask for performance help (which, to
date, has always consisted of eyeballing EXPLAIN output and adding an index)
when something gets too slow. This happens maybe once every quarter.

~~~
protomyth
I think that depends. I was on a project where we absolutely as developers had
to care about DB performance. We even wrote a tool for Sybase that monitored
the query plans of long running queries. We discovered that the optimizer was
really poor on Sybase 11.X and we needed to force indexes a lot.

Once you learn that DB performance is not a gradual thing, you get into the
grove of it. It goes from good to horrible once you hit that magic memory /
table size.

Truthfully, if your developers are the ones writing tools to figure out real-
time performance, you probably want to start thinking about what's going on.
(obviously this does not apply to the parent poster)

------
wfn
> _Sequential isn’t Always the Worst_

Indeed! And sometimes it gets even more nuanced - when say the way the
index(es) would be used would imply a lot of disk seeking (or page faults if
not the whole of index(es) is/are in memory (you can "pre-warm" them into
memory but if you have to do that you may have to go back to your schema and
index design and re-assess it anyway)). Sometimes a bulldozer-like (mostly)
(hopefully!) sequential disk read works better.

The latter becomes tricky with say SSDs - you then may need to inform Postgres
of differing costs for (e.g.) disk seeks (or memory reads vs. disk reads) so
it can better decide which way to go.

~~~
tominous
It's also tricky with a COW filesystem like ZFS in which the order of data on-
disk may not match the logical order of the file. So your database may decide
to dispatch a bunch of sequential 1MB reads to the filesystem, only for the
filesystem to translate those large sequential reads into many small random
8kB reads. It depends on how fragmented the file is, so you can start off with
great performance which degrades significantly over time.

Luckily SSDs have a much lower penalty for random reads compared to spinning
disks, so in future this will be less of an issue.

In the meantime all you can do is use a different filesystem, hide the problem
with cache, or manually defragment your data files from time to time.

~~~
pkaye
Actually SSD themselves implement COW filesystem since NAND cannot be written
without erase step.

~~~
pmalynin
What about TRIM?

~~~
pkaye
TRIM is like the free() equivalent to malloc(). It releases a NAND page back
to free state for other uses. It dones't require any NAND access or writes
per-se but if persistent TRIM is needed (vs best effort,) it gets much more
complicated.

------
MichaelBurge
Index scans imply extra disk seeks to fetch the index. A seq scan is like
Pacman chewing through a row of food; an index scan is like a pied piper
dancing while he plays his flute, each time he hits the ground with his foot
triggering a new disk seek. The seq scan is way faster for pure bandwidth.

If you're finding the blog post with id 10, then an index is preferable
because your selectivity is low. The per-row cost is going to be higher than a
seq scan, but since you can exclude most of the rows it can still be better.

If you're doing analytics, you often actually want to do as many seq scans as
possible. The per-row cost of sequential access on disk is so much better: if
you know you're going to want to read most rows, it's a mistake to force an
index scan. It depends on the selectivity of the query.

However, index scans aren't the only way you could do low-selectivity queries.
Amazon Redshift doesn't support index scans at all, but it keeps the data
sorted on disk which often allows it to skip blocks as needed. The whole
database is just seq scans and you have to spend a lot of effort designing
your table structure to support it, but it's still blindingly fast when
everything aligns. People don't use Redshift to grab 1 blog post at a time
from the database, though.

~~~
vog
_> Index scans imply extra disk seeks to fetch the index._

This is no longer true in this generality, because since PostgreSQL 9.2 there
are also index-only scans. Those scans take the data directly from the index,
without going through the original rows at all. Of course, this works only
under certain conditions. See also:

[http://michael.otacoo.com/postgresql-2/postgresql-9-2-highli...](http://michael.otacoo.com/postgresql-2/postgresql-9-2-highlight-
index-only-scans/)

[http://www.postgresql.org/docs/9.2/static/index-
scanning.htm...](http://www.postgresql.org/docs/9.2/static/index-
scanning.html)

"If the index stores the original indexed data values (and not some lossy
representation of them), it is useful to support index-only scans, in which
the index returns the actual data not just the TID of the heap tuple. This
will only work if the visibility map shows that the TID is on an all-visible
page; else the heap tuple must be visited anyway to check MVCC visibility. But
that is no concern of the access method's."

------
ars
Despite the title saying PostgreSQL this doesn't go into any PostgreSQL
specific topics like function based indexes, or partial indexes.

The stuff here is universal and applies to any database (perhaps with some
terminology change).

~~~
officialchicken
A simple demonstration of spatial indexes used with WKT/WKB or various shape
and geometry column types in GIS databases would be helpful in understanding
how an index is used in a typical spatial query (e.g. distance,
contains/bounds, or intersects).

There are also video and image (e.g. histogram) indexes in postgres as well,
but they're much more rare.

------
brobinson
The article doesn't mention it when talking about clustered indexes, but you
can force a table to be reordered on disk to match an index using CLUSTER.

[http://www.postgresql.org/docs/current/static/sql-
cluster.ht...](http://www.postgresql.org/docs/current/static/sql-cluster.html)

~~~
SigmundA
Unfortunately this is not the same as a clustered index as seen in other db's
like MS SQL server.

With a true clustered index, the table IS the index, this saves space and time
by not having to maintain a secondary index and then looking up the result in
the table. This can be very important especially with large datasets.

Really wish true clustered indexes where implemented in PG. Kind of odd the
article doesn't mention this while starting out explaining clustered vs non-
clustered.

~~~
dsp1234
Indeed the issue that trips up many of the people I've worked with is that
CLUSTER will make a table seem like the clustered index of other systems (in
as much as it physically orders the table), but without an appropriately large
fillfactor (and sometimes not even then), new rows and updates are not stored
in indexes physical ordering.

So at first glance it's super-speedy for clustered index-like queries, but
then becomes slower and slower over time as the data changes and consecutive
index reads are no longer sequential on disk.

------
DBCerigo
Not Postgres, but I also found this significantly (if not more) enlightening
while I was building my first RDB using SQLite [http://use-the-index-
luke.com/sql/anatomy](http://use-the-index-luke.com/sql/anatomy)

------
hemant19cse
Hey thanks for sharing. For more principles on Indexing specially MySQL refer
this slideshare presentation.
[http://www.slideshare.net/hemant19cse/improving-query-
perfor...](http://www.slideshare.net/hemant19cse/improving-query-performance-
using-index-covering-index)

------
schmooser
There is a specific website explaining indexing basics and different usage
techniques - [http://use-the-index-luke.com](http://use-the-index-luke.com)

------
jldugger
I really wish there was a pgsql oriented translation of Use the Index, Luke.
It seems fairly Oracle heavy, and I seem to recall the mysql stuff being
explained in terms of obscure oracle features.

------
zurn
The world needs an integration of EXPLAIN + block io profiling.

