

PostgreSQL partitioning explained - keithgabryelski
https://github.com/fiksu/partitioned/blob/master/PARTITIONING_EXPLAINED.txt

======
FiloSottile
This is quite a good informal high-level explanation.

A thing I can't understand is why this isn't made automatic.

If there are 100 companies, you don't want to create 100 tables, then add 100
FOREIGN KEY constrains, and then again each time a new company appears, do
you? Wouldn't a "syntactic sugar" like "PARTITION BY company_id" that does all
this automatically be possible?

Just asking if there is a reason something like this is not implemented.

~~~
windsurfer
Why can't they just improve the indexing to do this automatically behind the
scenes? Isn't the problem of finding things quickly in exchange for more work
during a write exactly what indexing is supposed to solve?

~~~
calpaterson
Well, be specific about what indexing is: it layers a B-tree on top of what
programmers would think of as an array structure. Indexes are not just
something which is always faster than a sequential scan in exchange for
slightly slower writes - there are important cases where the index is slower
than just doing a sequential scan. Obviously you hope the query planner will
avoid using an index in these cases, but the important thing to bear in mind
is that indexes are not magic go-faster stripes :)

The problem happens when this B-tree is too large to fit into your working
memory: swapping parts of the B-tree in and out of memory repeatedly to answer
a query is very slow. The solution that partitioning offers is to allow you to
split your indexes (and your data) so that, depending on your access-patterns
you could see less swapping of that index.

~~~
windsurfer
Why can't the index store the branches of the b-tree as separate objects on
disk instead of loading all branches into memory piece by piece?

------
rattray
Reading this reminded me a little of old works of philosophy. I found it
tremendously educational, not just about postgres partitioning, but about
databases in general. That degree of patience and thoroughness on the part of
the teacher isn't something you see every day.

I wish more such conversations A) took place, and B) were transcribed and
shared in a similar fashion.

~~~
keithgabryelski
Well, that is about the best compliment I've every heard. thank you!

------
perlgeek
Somehow [http://www.postgresql.org/docs/current/static/ddl-
inherit.ht...](http://www.postgresql.org/docs/current/static/ddl-
inherit.html#DDL-INHERIT-CAVEATS) has always stopped me from using table
inheritance; you don't get unique constraints over the table and its child
tables, so for example you can't have an autoincrement primary key that works
safely across them all. (Disclaimer: I haven't run into large enough tables
yet that I'd need partitioning).

If this constraint was lifted, table inheritance would allow some really cool
things, but currently I'm too conservative to use it in production.

~~~
pilif
Sequences are guaranteed to produce unique values. Just manually configure
each child table to use the same sequence for the primary key column. Yes,
uniqueness won't be enforced on the parent table, but the values will still be
unique (unless you manually update them in your queries)

~~~
trippy_biscuits
Until the sequence rolls over. Why design a schema with built in failure (an
integer PK that overflows)?

~~~
pilif
Sequences won't roll over by default. nextval() fails if you're out of values.
You can create a sequence which is allowed to roll over (by specifying cycle),
but it's not the default.

Yes. It's not optimal and you can shoot yourself in the foot, but it works in
its default configuration. I just gave a workable workaround for a problem, I
wasn't saying it's the perfect solution (a working unique constraint on the
parent table would be)

------
goldmar
Great introduction to partitions. I did not know how exactly they are used.
And now I think that I do. Thank you!

What I noticed (some improvement suggestions):

1) "the planner could tell (using knowledge from the check constraint) that
employees_1 was the only table it needed to look at."

Well, I think it still checks the parent table, right?

2) You could mention that it is possible to create a trigger function which
inserts new data automatically into the correct child table (as described in
your final link, here: [http://www.postgresql.org/docs/9.3/static/ddl-
partitioning.h...](http://www.postgresql.org/docs/9.3/static/ddl-
partitioning.html)).

I think it would be cool if PostgreSQL could create these trigger functions
automatically based on the check constraints...

I'm also wondering why foreign key constraints are not inherited?

~~~
keithgabryelski
yes, the parent is also a target table -- but for partitioning you never put
anything in the parent. I could probably talk about the troubles of putting
data in the parent table.

and yes, a trigger function is possible -- and I could have mentioned it
(although this was written as an explanation of partitioning using the gem I
created). I should mention the alternatives and why they weren't employed.

------
doronlevari
Great intro to Pg partitioning. I understand partitions can improve
performance of a single query, scanning less rows, but any of you guys have
idea about throughput implications? Is partitioning better or worse when it
comes to 1000 small queries/updates per second? Thanks!

------
Oculus
So in large web applications would the partitioned tables be distributed among
multiple DB servers?

~~~
keithgabryelski
you could -- that isn't "partitioning" in the way I discuss it in the linked
article

when it is split across machines it is generally referred to as sharding:
[http://en.wikipedia.org/wiki/Shard_(database_architecture)](http://en.wikipedia.org/wiki/Shard_\(database_architecture\))

~~~
Oculus
I see, thanks for clearing that up!

------
RobotCaleb
Can this be used to enforce a temporal limit on data? That is, if the check is
for time to be within 24 hours of now will it purge older than 24 hours data
as time progresses? My gut tells me no, but my gut is often wrong when it
comes to databases.

~~~
keithgabryelski
if I understand you correctly, this type of thing is implemented by
partitioning by hour in a day and then having a janitor process drop tables
that are older than 24 hours.

~~~
morgo
ALTER TABLE .. TRUNCATE PARTITION should be much faster than deleting as well.
At least it is in MySQL.

------
mrtimuk
Could you use a check constraint of FALSE on the parent table to prevent rows
from being erroneously inserted there instead of the actual partition tables?

