
Plans for partitioning in PostgreSQL v11 - anarazel
http://rhaas.blogspot.com/2017/08/plans-for-partitioning-in-v11.html
======
gfodor
One of the things that is continually great about PostgreSQL is how they
consistently deliver a solid v1 of new features that are reliable, well
engineered, and semantically sane. I'm sure partitioning will be another
example of this kind of thing where in a few years you will have both all the
bells and whistles and also firm footing in solid RDBMS engineering.

~~~
X86BSD
PostgreSQL and BSD are standouts in that regard.

~~~
ksec
And I wish both are used more often.

------
rorygibson
As a long time user of MySQL, I've switched to PostgreSQL for my most recent
startup precisely because the PostgreSQL team continue to innovate in a
sensible way.

One of the core requirements is a stable, fast and reliable initial import of
existing data from 3rd party services, and because of throttling concerns that
led us to a batched, retryable solution - and a requirement for simple, fast
job control.

We've built this around a parallelised solution using the FOR UPDATE/SKIP
LOCKED feature from PostgreSQL 9.5 onwards.

It's not bleeding edge, but for us it was a great example of how the pg team
pick up the useful features from e.g. Oracle, and incorporate them into the
product in a solid way, without all the enterprisey marketing crap that Oracle
make you swallow before you can understand the actual feature-set.

[the startup is a SaaS search tool that helps you index across Trello, email,
GitHub, Slack, Drive - [https://getctx.io](https://getctx.io) if you're
interested.]

~~~
grogers
It might be a good choice to switch to postgres, but it isn't like MySQL isn't
innovating too. 8.0 will have a ton of features. This includes a bunch of
"catch up" things like CTEs and SKIP LOCKED, but also stuff like synchronous
replication built in.

~~~
jimktrains2
I know the defaults have been changed to be safe, but I will forever feel like
my trust was broken with MySQL the day I learned how bad the defaults are. Not
just inefficent/generic, but actually bad.

Additionally, I feel that postgresql is better engineered, more featureful,
and has support for a much larger variety of use-cases (e.g. hstore, json,
postgis, pg-routing, uuids, trigram indecies, &c). So between engineering
perks, and the loss of trust, I only use MySQL in legacy applications that I
cannot port.

------
joaodlf
The direction being taken with data partitioning is great. I truly believe
this will solve the issues of many companies when it comes to "big data"...
This, to me, is the middle ground we have been missing between rdbms and
nosql.

Congrats to the postgres team, what an interesting product.

~~~
cwyers
Every other major RDBMS has had this for years. The only people who have been
missing this are Postgres users.

~~~
dfox
On the other hand, PostgreSQL had some notion of partitioning essentially
since it's inception. Ten years ago PostgreSQL's rules and table inheritance
based partitioning was probably only DBMS-integrated solution that was
actually usable.

------
andy_ppp
This looks very good but after using Cassandra I'd be scared of bumping into
features that are not supported or suddenly very slow if used. For example I'm
sure sorted aggregations and group by queries across partitions will take a
long time.

I think there should be a list of Postgres partitioning gotchas somewhere to
accompany this, e.g. partitioning goes hand in hand with denormalisation (due
to the effects of the chosen partition key being so important). With Cassandra
I can simply add more machines as I increase the data duplication, I'm not
sure how easy it is to rebalance data when adding new Postgres instances?

~~~
pgaddict
I'm not sure what you mean by "adding new Postgres instances." Postgres is not
a distributed database, at least not on it's own. There are ways to scale
using e.g. hot standbys (streaming replicas for read-only queries) and
multiple projects that implement distributed databases on top of Postgres
(Citus, Postgres-XL, ...). But then it depends mostly on that project, not on
Postgres.

Sure, Partitioning does negatively affect some queries. But the thing is -
it's a significant improvement compared to the previous partitioning
implementation, which had almost no insight into the partitioning rules. And
so optimizer could not really do advanced tricks (e.g. partition-wise joins)
etc.

~~~
andy_ppp
I'm not sure what the point of partitioning would be if the data isn't
distributed between nodes? I'm probably not understanding.

All I'm saying is I can go and read about the pain points in say Cassandra for
this type of stuff because of where it comes from and it tries to protect you
from doing things that are slow. With Postgres I think it's going to be harder
to know when I can't use a feature that's suddenly going to kill my
performance.

It would be great if there was a resource for how to avoid such issues in
partitioned Postgres but it'd take a lot of work to write such a guide - I'm
guessing things like JSONB queries, Geo, Transactions, Aggregations across
partitions and many extensions are all going to cause pain points?

~~~
rorygibson
EDIT: oops, link updated. Thanks justinclift!

Performance, mainly.

Partitioning mean your data can be spread across more physical areas (or FS
partitions, or whatever) at lower level, giving the database (and you) more
control of performance by optimising what gets stored where at a lower level.

Official pg doc here: [https://www.postgresql.org/docs/10/static/ddl-
partitioning.h...](https://www.postgresql.org/docs/10/static/ddl-
partitioning.html) \- the pertinent bits are as follows:

\---

Partitioning can provide several benefits:

\- Query performance can be improved dramatically in certain situations,
particularly when most of the heavily accessed rows of the table are in a
single partition or a small number of partitions. The partitioning substitutes
for leading columns of indexes, reducing index size and making it more likely
that the heavily-used parts of the indexes fit in memory.

\- When queries or updates access a large percentage of a single partition,
performance can be improved by taking advantage of sequential scan of that
partition instead of using an index and random access reads scattered across
the whole table.

\- Bulk loads and deletes can be accomplished by adding or removing
partitions, if that requirement is planned into the partitioning design. ALTER
TABLE NO INHERIT and DROP TABLE are both far faster than a bulk operation.
These commands also entirely avoid the VACUUM overhead caused by a bulk
DELETE.

\- Seldom-used data can be migrated to cheaper and slower storage media.

The benefits will normally be worthwhile only when a table would otherwise be
very large. The exact point at which a table will benefit from partitioning
depends on the application, although a rule of thumb is that the size of the
table should exceed the physical memory of the database server.

~~~
andy_ppp
Okay - so the partitioning gives better per machine performance by allowing
you to parallelise certain things and spread out on disk. I thought this was
similar to Cassandra's ring based partitioning that allows data to be
distributed around many machines. Thanks for clarifying.

~~~
Tostino
That will eventually be coming in some form. In the OP link, there was a
mention of foreign tables participating in partitioning: > While foreign
tables can be partitions in v10, they are currently subject to an unfortunate
limitation: tuple inserts that route to a foreign table will fail, since this
feature has not been implemented yet. Etsuro Fujita (NTT) is working to
address this limitation by revising code originally written by Amit Langote.

So there is work on getting this to scale on multiple machines, rather than
just single machine. We're just not there yet.

------
btilly
Hopefully the work on better indexing of partitioned tables will also apply to
inherited tables. I regularly work with tables with over 1000 child tables,
using single value lookups which would be _trivial_ in an index. But having to
query 1000 indexes makes the inheritance useless and we have to jump hoops to
query the right child table instead.

That this is a serious limitation has been recognized for many releases. But
the limitation is still there.

------
qaq
The rate of evolution in PG is truly inspiring at this rate commercial RDBMS
will not have much market left for them to play in.

~~~
lobster_johnson
While Postgres is indeed fantastic, Oracle and Microsoft SQL Server are both
ahead of Postgres in many areas. For example, the feature that this article is
talking about — table partitioning — has been in Oracle for at least 20 years.

Oracle's replication support is also extensive, and supports many scenarios
such as multimaster, not to mention sophisticated solutions such as RAC and
Dataguard. Then there's materialized views, flashback queries, clustered
tables (this is manual in Postgres), parallel queries, etc.

So Postgres is still mostly catching up to other databases. It still has
features others don't, of course, such as transactional DDL statements. And
it's open source.

~~~
qaq
It's catching up fast which was the point :) if at present it's suitable for
85% (or whatever the proper estimate is) of use cases then at this rate it
will reach 99% in not so distant future.

~~~
lobster_johnson
I would say it's catching up extremely slowly, but that it's also way past the
"sweet spot" where it's as good as the commercial offerings for the stuff
people use databases for.

