
A Collection of PostgreSQL Indexing Tips - rdegges
http://www.craigkerstiens.com/2013/05/30/a-collection-of-indexing-tips/
======
lobster_johnson
Indexing _all_ columns and then reading the statistics about which indexes are
being used is an interesting approach, but it is only able to give you simple
single-column indexes. Postgres has multi-column (composite) indexes and can
be very clever about using them; additionally, there are the functional and
partial indexes.

A better approach, I think, would be a tool that takes an SQL log and figures
out the optimal indexes based on those SQL statements. It could run "explain"
statements on the recorded queries to determine which columns need indexes,
including ones that are queried together, which would benefit from multi-
column indexes. The "explain" output would also give some information about
common filter values which could be used to generate partial indexes.

Another way would be to use genetic algorithms to create random indexes, run
the queries on a dummy database, and then read the index usage stats to
determine the most efficient combinations.

What would be lovely is a tool that you could run autonomously every week or
month, and which would simply add/remove indexes as it sees fit.

The ultimate solution would be for Postgres itself to figure out and report
potential plan improvements. After all, the planner knows best.

~~~
craigkerstiens
Original author here, the adding an index on every column is very much a joke.
A better solution would be taking advantage of pg_stat_statements in Postgres
9.2. It normalizes all queries which are run and tracks total time spent in
your DB on those queries, this would be a clear way to determine where indexes
can likely improve your performance. There's actually an earlier post I did
which talks a bit more about this -
[http://www.craigkerstiens.com/2013/01/10/more-on-postgres-
pe...](http://www.craigkerstiens.com/2013/01/10/more-on-postgres-performance/)

~~~
ajtaylor
Features like pg_stat_statements are a major reason why I love Postgres. The
dev team makes sure the foundations are rock solid, and then start pouring on
the features. If only $work used Postgres...

------
AlisdairO
There's a few other features of Postgres indexing that are interesting:

\- Indexes over expressions (e.g. upper(value)) are very helpful.

\- You can add a WHERE clause to an index to index only part of the table,
which is fantastically useful.

\- on a non-C locale, LIKE won't (ever) use indexes. varchar_pattern_ops can
help you with this to an extent, but it's not perfect.

~~~
einhverfr
I think LIKE can use indexes where the left is anchored with enough text for
an index to be useful. LIKE 'this is fun%' will be more likely than LIKE 'a%'
LIKE '%fun vacation' won't but if you have an index on reverse(mycolumn) and
you have where reverse(mycolumn) like reverse('%fun vacation'), it might.

You can also add pg_trgm as an extension and then create a GIN index. Not
perfect at all but better than nothing.

Also indexes over expressions can be used to precalculate interesting sets if
you have expensive routines in the database (the typical hypothetical example
is image recognition).

~~~
AlisdairO
> I think LIKE can use indexes where the left is anchored with enough text for
> an index to be useful. LIKE 'this is fun%' will be more likely than LIKE
> 'a%' LIKE '%fun vacation' won't but if you have an index on
> reverse(mycolumn) and you have where reverse(mycolumn) like reverse('%fun
> vacation'), it might.

That's the case for the C locale, but (as I understand it) not for other
locales.

~~~
einhverfr
Reading a bit more it looks like you have to specify the text_pattern_ops on
the index. This makes some sense actually, since it means that the index is
ordered by the localized string order (so the index can be used).

------
gbog
To the article's author: I think you could go much deeper, and get into the
discussion of what can be indexed and what can't, eg timestamp with or without
timezone, and how indexability of a column or a function affect the design of
the schema itself.

One solution I found after many struggles was to require all event tables to
have a timestamp with timezone column, index its date() and require all
queries attacking these tables to be selective on the date.

~~~
craigkerstiens
There's definitely a lot deeper I can go on this, this is largely a test of a
series which could go quite deep. My initial post yesterday simply highlighted
one method of indexing and out of that came many of the questions in here. In
general Postgres is pretty flexible with what you can index with not many
limitations, its impact on the rest of the schema aren't always so simple so
would love to hear what you have in mind as impact. Please feel free to drop
me a line and would love to turn more of that into a deeper post.

------
forgotAgain
When you get into page splits and such I would think the I/O required to
support indexes in transactions would be the most important cost. Has Postgres
found a way around that?

Edit: I ask because it's not included in the article's "Costs of Indexing".

