
PostgreSQL 10: Partitions of partitions - joaodlf
https://joaodlf.com/postgresql-10-partitions-of-partitions.html?utm_source=hackernews
======
da_chicken
No primary/unique keys on the partition table and only on the partitions still
seems like a common dealbreaker.

~~~
marmaduke
Naive question but this means no joins or foreign keys right?

~~~
da_chicken
No, it means you might have duplicates of fields you don't want duplicates of.
In general, JOINs work much more efficiently when there's an index on the
columns used to join the tables, but they're not _required_.

Let's say you want a table for your account transactions:

    
    
      create table transaction (
        tran_id int primary key not null,
        tran_date timestamp(0) not null,
        account int not null,
        amount decimal(30,4) not null
      );
    

Now if there's already a transaction with an id of 8043, you can't insert
another transaction with that same id. There's only one transaction with 8043
allowed in the whole table. However, if we partition the table:

    
    
      create table transaction (
        tran_id int not null,
        tran_date timestamp(0) not null,
        account int not null,
        amount decimal(30,4) not null
      ) partition by range (tran_date);
    
      create table transaction_y2018m01 partition of transaction
      for values from ('2018-01-01 00:00:00') to ('2018-01-31 23:59:59');
    
      create table transaction_y2018m02 partition of transaction
      for values from ('2018-02-01 00:00:00') to ('2018-02-28 23:59:59');
    
      alter table transaction_y2018m01 add constraint ux_transaction_y2018m01_tran_id unique (tran_id);
      alter table transaction_y2018m02 add constraint ux_transaction_y2018m02_tran_id unique (tran_id);
    

See, the only uniqueness restrictions are on the partitions, not the overall
table. Now I could potentially have a transaction with an id of 8043 in both
January and February of 2018, as well as any number of transactions with an id
of 8043 not in either of those two months. If my application assumes that
transaction ids are always unique, that's got the potential to cause a
problem. If multiple applications or multiple users use the same database,
it's possible that an error or a race condition might cause a duplicate id.

~~~
sk5t
I suppose a mitigation might be a table of just (unique_id, partition_field)
as arbiter of validity. Assuming such a table to be compatible with system
design... one would need a whole lot of rows and/or heavy fragmentation to
rule it out, no?

------
kdv
Is anyone that was using pg_partman before migrated to native partitioning
yet? No support for ON CONFLICT and PKs are serious limitations that are
available with pg_partman.

~~~
craigkerstiens
Most I know that are using partitioning even with Postgres 10 are still taking
advantage of pg_partman. Pg_partman overall makes things much more usable in
general.

~~~
pgaddict
I agree pg_partman is awesome, and it's a great testament to the extensibility
baked into PostgreSQL.

------
simonw
Can anyone offer a really basic summary of what problems can be solved with
PostgreSQL partitions?

~~~
yashap
A big one that nobody has mentioned yet - reduce memory requirements for your
db. In general, for a database to be performant, it needs to be able to keep
the hot part of the dataset and REGULARLY USED INDEXES in memory, you don't
want to be paging indexes in and out of memory every time you have to use
them.

Say you've got a giant table for of emails, and there's a bunch of indexed
fields (people maybe want to filter their emails by tags, read/unread, etc.).
If you keep them in one giant table, you need to keep those giant indexes
fully in memory to get good performance. BUT, you know 99% of requests are for
recent emails, people only occasionally look at old ones. So you partition
your table by timestamp, and instead of running queries like `select * from
email where status = 'unread' order by timestamp desc limit 50`, you run
queries like `select * from email where status = 'unread' AND TIMESTAMP >=
<pretty-recent> order by timestamp desc limit 50`. This only hits the first
partition, and almost always returns the 50 emails you need - if it doesn't,
you run subsequent queries to hit older partitions.

But now, say you partition by month, and keep 2 years of data, almost all
queries are being serviced by a partition containing 1/24th of the total data,
so keeping indexes for that 1 partition in memory is WAY easier than keeping
indexes in memory for the entire giant table. When you have to query old
partitions, it's slower and there's still some swapping happening, but that's
rare anyways. So you can get away with having way less memory for your db,
without performance degradation on most queries.

~~~
LogicX
With your example, I believe you’ve just described TimescaleDB.
[https://www.timescale.com/](https://www.timescale.com/)

We use it to store 150M records a day, and be performant enough to have a
customer facing analytics dashboard querying against it for real-time results.
Pretty amazing approach.

~~~
mfreed
TimescaleDB person here. Thanks for the call-out.

For those who don't know, TimescaleDB is a extension to Postgres that makes it
easy to use and scale for time-series data. It works with both PG9.6 and PG10.

Some folks might be interested in two recent blog posts we wrote about PG10
partitioning:

1\. Technical write-up about PG10 partitioning:
[https://blog.timescale.com/scaling-partitioning-data-
postgre...](https://blog.timescale.com/scaling-partitioning-data-
postgresql-10-explained-cd48a712a9a1)

2\. Why TimescaleDB's partitioning is easier-to-use and more performant than
PG10's native partitioning _for time-series data_ :
[https://blog.timescale.com/time-series-data-
postgresql-10-vs...](https://blog.timescale.com/time-series-data-
postgresql-10-vs-timescaledb-816ee808bac5)

------
gregn610
How is this better than inherited tables with indexes and constraints ala
[https://stackoverflow.com/a/3075248](https://stackoverflow.com/a/3075248)

