
PostgreSQL Parallel Aggregate - petergeoghegan
http://blog.2ndquadrant.com/parallel-aggregate/
======
atemerev
At last! This was the only feature I was missing in Postgres for years.

Postgres is so amazing I still find it hard to believe it's free. I abused it
many times (as a graph database; as a real-time financial data analytics
engine with thousands of new ticks coming in each second; as a document
storage with several TBs of data per node), and amazingly, it just worked.
Magic.

If in doubt, choose Postgres.

~~~
lamby
> I still find it hard to believe it's free

PostgreSQL is only free if your time has no value..

(Paraphrased. In jest..)

~~~
atemerev
Do you know any commercial database with comparable level of documentation and
community support?

I worked with a certain other database from a company starting with "O" and
ending with "racle", and support/documentation quality was abysmal. Our
support contract was quite basic, but still, it expected to be better than
free alternatives.

It was very NOT.

~~~
thornygreb
I'm sorry, but documentation is one thing Oracle is NOT lacking.

[http://docs.oracle.com/cd/E11882_01/nav/portal_booklist.htm](http://docs.oracle.com/cd/E11882_01/nav/portal_booklist.htm)

~~~
esaym
He probably meant "readable" documentation....

~~~
hackbinary
I took it to mean the Oracle's support was lessor than that of Postgres',
despite paying a fee for it up front.

------
aorth
It seems the parallel aggregation increases efficiency of queries linearly
with regard to CPUs, close to "perfect parallelisation." I have some 8-core
boxes running PostgreSQL and this is still good to know.

~~~
kitd
It looks good up to 30 parallel worker processes. It does start to diverge a
bit after that, but still very impressive.

------
hvo
I just love PostgreSQL.It is an awesome gift in open source community.I just
cant believe it is free.Seriously,if you have not tried PostgreSQL
before,please find a time to check it out.

------
JimmyAustin
I'm looking at evaluating Postgres for the data work I do at work, which would
replace the (expensive) MS SQL server we are currently using. Aside from
performance boosts from being able to throw more hardware at the problem due
to lower costs, is Postgres as performant as MS SQL?

~~~
travjones
I think that is a tough question to answer because it really depends on what
you're doing with the data/database. Some queries may run faster on one RDBMS
and some may run faster on another RDBMS. I would like an answer to this
question as well. I did a quick search and I haven't found any modern
comparisons of performance between databases, but it would be great if there
was a more modern and expanded version of the SQLite comparison[0].

I can say that I use Postgres and have not been disappointed with performance
and it costs me $0. I encourage you to try it out. If you're a heavy user of
SSMS, then you will have to change your flow a bit to accommodate psql.
However, once you get a handle of psql, it's actually a pleasure to use.

[0]: [https://www.sqlite.org/speed.html](https://www.sqlite.org/speed.html)

------
lafay
Parallel aggregate over cores in a node is fast, but parallel aggregate over
many cores across many nodes is even faster:

[https://www.kentik.com/metrics-for-
microservices/](https://www.kentik.com/metrics-for-microservices/)

[https://www.kentik.com/postgresql-foreign-data-
wrappers/](https://www.kentik.com/postgresql-foreign-data-wrappers/)

------
heyplanet
I would assume that when you query billions of rows, the disk is the
bottleneck, not the CPU. What am I missing?

~~~
lazyjones
Several things:

\- disks (SSDs) are very fast now, so cores are saturated more easily (when
queries actually process the data instead of just reading it)

\- multiple parallel (random) reads will likely be faster on HDD and SDD to
some extent (esp. on larger RAID setups)

\- the best optimization is still lots of RAM and people have that these days,
so 100% CPU utilization during queries happens more often than not (the
benchmarking setup seems suitable for more than 1 billion rows...).

------
ogrisel
...on a 32 physical cores machine.

~~~
rodgerd
And? 32 cores are very affordable.

~~~
taspeotis
They're not always sitting idle waiting for one big OLAP query (well suited to
parallel aggregation) to come in to exhibit a 25x speedup.

Let's say you've got a web application with some cookie-cutter OLTP queries
(which are well suited to serial plans ... no parallel aggregation benefit
here). Your CPUs are running at 50% then some big OLAP query is going to come
in and get maybe ... 13x faster.

Which is a nice performance improvement, but not 25x.

These numbers are a good validation of the work done in PostgreSQL, but you
can't look at it and look at your X core machine, and say "my workload is
going to be 25x faster!"

~~~
pilif
> Let's say you've got a web application with some cookie-cutter OLTP queries
> (which are well suited to serial plans ... no parallel aggregation benefit
> here).

in my case, the OLTP queries run against a dedicated slave that's mostly busy
doing nothing, waiting for the next OLTP query to happen. So for me this will
be very, very nice :-)

