
Pgslice: Postgres partitioning as easy as pie - craigkerstiens
https://github.com/ankane/pgslice
======
ak4g
> Battle-tested at instacart

If you don't mind me asking, for how long?

I've found using EXECUTE in insert triggers to be very problematic - it
consumes a txid, AIUI, and every 200m inserts, a bunch of anti-wraparound
autovacuums have to to scan allofthethings. My insert triggers have big blocks
of IF/ELSIF branches (although it's straightforward enough to construct
_those_ in pl/pgsql, just requires an extra level of indirection when
preparing the partitioning).

~~~
akane
We've been using the tool for around 5 months, but the same approach/triggers
for over a year. I wasn't aware the IF/ELSIF trigger approach had potential
advantages over EXECUTE, but will definitely look into it.

------
keithgabryelski
I wrote this a while back:
[https://github.com/fiksu/partitioned](https://github.com/fiksu/partitioned)
supports partitioning in rails.

It's best to start here:
[https://github.com/fiksu/partitioned/blob/master/PARTITIONIN...](https://github.com/fiksu/partitioned/blob/master/PARTITIONING_EXPLAINED.txt)

for an overview.

the examples should prove its use.

------
takeda
Why an external program and not something in PG/SQL or at least a PG
extension.

Also, how does it compare with pg_partman[1] which also has similar goals?

[1]
[https://github.com/keithf4/pg_partman](https://github.com/keithf4/pg_partman)

~~~
aisofteng
>Why an external program and not something in PG/SQL or at least a PG
extension.

I have no association with the linked project, but here are some reasons why I
might take the same approach:

1\. Source control. Code at the level of PG/SQL into source control isn't as
easily deployed and managed as, say, Java code.

2\. Not being strongly tied to a single software. It's true that this is
currently directly intended to be used for PG, but keeping it separate would
make it easier to re-adapt for another database software in the future.

3\. Lack of expertise with PG internals. This one is just a reason, not a
justification.

Note that I'm not defending any software design decisions here, just
speculating in a way I find reasonable. I also have the same questions and
would be curious to hear the author(s)'s responses.

(Edited for formatting issues with the numbered list - and then again for
spelling errors)

~~~
gaius
_Code at the level of PG /SQL into source control isn't as easily deployed and
managed as, say, Java code._

Why not?

~~~
danneu
You're updating code that needs to be executed against an external layer
instead of just the application layer.

~~~
gaius
But you could say the same for any code that say, touches the filesystem.

~~~
danneu
I meant that the code has to be deployed to a separate system.

Application code that touches the file system is still deployed along with the
application code.

You might commit schema.sql into your application repository, but that's not
the state of your database.

~~~
takeda
I think you are confused. A PG extension is code like any other. For the use
cause such as partitioning you should not need to create any schema, what you
are doing would be to write a code that acts on schema.

Schema usually is needed for application that will be using a database. I also
don't understand the scare about using schema. Schema is FAR FAR FAR simpler
to deploy than a Java application.

------
michaelbryzek
Great to see more options for partition management in postgresql.

We took a similar approach using a ruby library to manage partitions at gilt
groupe starting many years ago (2009?). It worked quite well and is still in
use today.

Since then have also used the referenced
[https://github.com/keithf4/pg_partman](https://github.com/keithf4/pg_partman)
on RDS - and it works great! Very happy to have all the partition management
in the core database vs. in other languages, and many of the features Keith
provides in pg_partman just work. He's done a great job with the project and
would encourage folks to take a serious look at his work.

If you are interested in details of how we applied pg_partment to RDS w/out
requiring extensions, take a look at:

[https://github.com/flowcommerce/lib-
postgresql/blob/master/R...](https://github.com/flowcommerce/lib-
postgresql/blob/master/README.md)

We created a simple process to load the scripts directly into the database
similar to any one of our other migrations. We then create higher level
wrappers for our teams to use to apply consistent use.

------
qaq
Looks useful, are you using this in production?

~~~
akane
Yes, we do at Instacart.

~~~
aisofteng
Could you speak a bit of the use cases you've found this to be a better tool
than competing alternatives, whatever they may be, if you're able to?

~~~
akane
We didn't really find anything that fit our needs. We wanted a tool that was
simple to use with the ability to do time-based partitioning of existing
production tables without downtime.

------
postwait
Ummm
[https://github.com/keithf4/pg_partman](https://github.com/keithf4/pg_partman)
has been around for a long time and is used all around the world... AND it
doesn't require cron jobs. Much tighter solution

------
pacuna
I also wrote a Rails gems that makes the create and read processes completely
transparent with Active Record when using date partitions. In case someone is
insterested github.com/pacuna/partitionable

------
sargun
Can I ask why you decided to partition Postgres like this? It looks like you
have time-series data based on the examples, and you wanted quick dropping, am
I right?

~~~
akane
That's correct. It allows us to manage the database size over time for tables
where we can archive data after a specific period.

------
tomc1985
This is cool. Now if only it somehow magically allowed a unique index over the
whole setup, it'd make my day!

~~~
elmigranto
What do you mean? For uniqueness, simple scheme like "Partition ID + SERIAL"
will be unique enough (or just guid or similar). For performance, table index
is fine, since you already know which partition request is for.

~~~
tomc1985
The problem is the unique IDs are GUIDs generated independently (and offline)
by the client and then passed to the database; we need to make sure that no
event is processed twice.

Currently there's one big log table and a uniqueness constraint on the GUID
field. I'd love to partition the table out but until I can identify a clean
route to satisfying that requirement I'm keeping one large log.

Though, now that I think about it, maybe partitioning on the first character
of the GUID string is feasible, as we can ensure uniqueness per-table then

