

Understanding Postgres Performance - neilmiddleton
http://craigkerstiens.com/2012/10/01/understanding-postgres-performance/

======
btilly
I would be more cautious about indexes than this article is.

The trade-off with indexes is that they make reads faster and writes slower.
But if you're running a high performance application, it is not that hard to
have read-only slaves to offload read queries to. (I won't say that it is
trivial, because there are issues, but it is doable. I've done it.) However
fixing poor write performance is much more complicated.

Therefore create all of the indexes that you need. But no more. And any
indexes that are not carrying their weight should be dropped. Furthermore if
you have intensive queries that you have to do (eg for reports), consider
offloading those to another database.

Of course if you don't have significant data/volume/etc, then don't worry
about this. Any semi-sane approach will work for you. (That applies to most of
you.)

~~~
andrewcooke
am i wrong in thinking that a missing index can change an O(logn) query to an
O(n), while an additional index is going to add a constant cost to writes?

because if that's correct then it makes sense to err on the side of too many
indices.

of course, it is better to be perfect. but if you have an error, which side
would you prefer to be on?

and you don't need "significant" volumes to hurt with a full table scan (ie it
_will_ hurt most of you).

[what surprised me was the 99% cache. that is very dependent on your
application, but i guess is common for web apps?]

~~~
Someone
I do not think an extra index adds a constant cost to writes. If it is an
index that allows for range queries (I haven't seen 'equality-only' indices,
but that probably is because I haven't looked hard enough), writing an index
entry is (or should be) like an insert into a sorted dictionary. That will be
O(logn).

~~~
btilly
You are right that indexes are usually (though not always!) implemented as a
tree. But in practice the number of levels of that tree is usually fixed, so
in practice it is constant. (And the number of writes to disk per write to the
index averages under 1 - the top levels generally get overwritten multiple
times before they have to hit disk on a checkpoint, and sometimes the leaf
gets overwritten again.)

~~~
Someone
But that would mean you would have to make it wider and wider. That cannot be
done at O(1) time, can it? Or are they just starting a second, third, etc.
tree whenever one fills up? If so, what is the benefit over adding another
level? More localized disk I/O during writing? Do you have a reference?

~~~
btilly
[http://www.stanford.edu/class/cs276a/projects/docs/berkeleyd...](http://www.stanford.edu/class/cs276a/projects/docs/berkeleydb/ref/am_conf/cachesize.html)
describes the performance of BTree data structures in real world databases.
(It is discussing a standalone key/value store, but the internals of a
database index work the same way.)

As noted, 3 levels suffices for a moderate sized table, and it would be rare
to need more than 5 levels. The "levels" in question are pages of memory that
have to be looked at. The reason that this is less than you expect is that
inside of each page you essentially have a tree that has several levels to
traverse. However the time to traverse that tree is less than the time to
fetch the next page, so the operations of concern are fetches of pages.

The most important number noted in that link is that under normal operation
once caches are warmed up, it should a maximum of one disk seek to traverse
the index. That single disk seek, if it happens, takes longer than all other
operations.

The long and short of it is, "in theory log(n), in practice a reasonably small
constant."

~~~
Someone
Thanks. But a disk seek is not a small constant. I would almost describe that
as "in theory log(n), in practice a disk seek, so half the speed" (assuming
that the record can be written with one disk seek)

~~~
jeltz
A hash index will also require one disk seek which makes b-tree indexes and
hash indexes have more similar performance than the theoretical O(log n) vs
O(1) would indicate.

------
krat0sprakhar
Slightly OT: I've been using MySQL for quite some time now and whenever I
start a new project thinking that I'll use Postgres this time I find myself
lazily opting for something that I know and going back to MySQL. Is there a
Postgres guide that is concise, easy to follow and illustrates its advantages
over other relational databases. The Postgres guide[1] is too sparse currently
and the beginner Apress book[2] mentioned is 644 pages long.

1- <http://www.postgresguide.com/>

2 - <http://www.amazon.com/gp/product/1590594789>

~~~
craigkerstiens
What content would you like to see within Postgresguide?

~~~
jcampbell1
Probably a good idea to add a bit about serial, and sequences. This tends to
confuse mysql people like myself.

I for instance don't know what happens when I copy or dump and restore a
postgres table that relies on a sequence. Mysql auto-magically takes care of
it by saving the next auto increment id in table dump or copy.

~~~
joevandyk
If you use pg_dump to dump/restore the table, it will reset the sequence to
the correct value. Look at the dump file, there will be a setval of the
sequence.

------
CCs
I really liked the NewRelic graph and I wanted to know if I can get it running
on my setup: Scala, sbt, liftweb, Jetty.

It worked! Just downloaded the ZIP, unpacked and started: sbt ~container:start
-J"-javaagent:/path/myapp/newrelic/newrelic.jar"

Now I have a nice graph which matches up with the numbers I see in my logs. As
I suspected there's some work that needs to be done on the DB access side.

------
Axsuul
A better query to see how indexes are used to avoid division by zero errors.

    
    
      SELECT 
        relname, 
        CASE 
          WHEN seq_scan = 0 THEN 0 
          ELSE 100 * idx_scan / (seq_scan + idx_scan) 
        END AS percent_of_times_index_used, 
        n_live_tup rows_in_table
      FROM 
        pg_stat_user_tables 
      ORDER BY 
        n_live_tup DESC;

------
madelfio
I think the ratio expressions are slightly off. Using the queries from the
article can result in negative ratios, which shouldn't be possible if we're
measuring the percent of accesses that hit the buffer. Each access is either a
cache hit or a read, not both. So instead of subtracting in the numerator,
they should be added in the denominator:

    
    
      SELECT 
        sum(heap_blks_read) as heap_read,
        sum(heap_blks_hit)  as heap_hit,
        sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as ratio
      FROM 
        pg_statio_user_tables;

------
rpq1480
Off topic: grok is now my new favorite word.

~~~
Someone
New? It's over 50 years old: <http://en.wikipedia.org/wiki/Grok>

~~~
adamtj
"new favorite word" not "favorite new word". Also, see <http://xkcd.com/1053/>

