
Postgresql 9.6: Support parallel aggregation - eMerzh
http://www.postgresql.org/message-id/flat/E1ahzxY-0004qA-GJ@gemulon.postgresql.org#E1ahzxY-0004qA-GJ@gemulon.postgresql.org
======
pilif
I am totally impressed by the work 2ndQuadrant is doing: many of the recent
innovations to Postgres have been done by them and all of that without any
obligation for them to be doing so. The BSD license would allow them to add
all of these things to a proprietary fork that they could be selling.

Or they could just release their own fork under an open license and focus on
just adding features.

But that's not how they work. All of their contributions are pushed upstream
which is a very considerable effort with how conservative Postgres is at
accepting new functionality.

Aside of that: there are 2ndQuadrant employees in the #postgres IRC chat room,
helping people with daily support issues. This is their core business and yet
they still help people for free (within reason). This is bloody impressive.

If I'm ever at a point when I need help with a Postgres issue, then they will
be very first of the list of companies I would consider.

Thank you very much for all that you are doing.

~~~
sandGorgon
I wish they would build a RDS alternative. I would use them in a heartbeat.
There are all these companies/startups who are not in the target market for
consulting... But who would gladly pay for hosted postgres.

~~~
pilif
Just a little heads-up: Postgres is very easy to run on your own and even in
the default configuration runs well even for a considerable amount of users.

Over here, we only started to seriously thinking about what we're doing once
we were handling in the order of 10K transactions per second.

Once you are at that level, you're probably going to need optimizations
specific to your application and a generic database hoster might not be able
to help you anyways.

I get that as a startup you don't have people for everything, but can you
really afford to outsource the knowledge about the central piece of your
application where all the value is stored at?

~~~
sandGorgon
I have a fairly sophisticated docker VM with wal-e, replication and everything
running. Yes - I am aware of what you say.

Hosting the dB has never been the knowledge about central piece of
application. you may disagree (and I respect that), but for me it has been
similar to building a RAID-10 dedicated server vs using AWS.

You can argue that maintaining data resiliently is a critical part of the
organization - but at what stage? In the first 3 years of a startup, you are
iterating the product. You are pretty much agonizing over drop rates,
conversion rates for every single minute of your life. The fact that data will
crash NEEDS to come a distant second.

Dropbox has only just moved out of AWS. Storage was probably the most critical
part of Dropbox, but it chose (rightly) to focus on customers first.

What you are talking about will come - but it will come after some time. And
till then I would love to pay some postgres devs to run a hosted dB for
me...like RDS.

------
pgaddict
FWIW, a simple benchmark by David Rowley (one of the authors of the patch) are
here:

[http://blog.2ndquadrant.com/parallel-
aggregate/](http://blog.2ndquadrant.com/parallel-aggregate/)

~~~
andruby
The article claims a near linear speedup for a very large query. On a 64 core
machine, the query takes:

1375s with 0 workers

131s with 10 workers

56s with 30 workers

------
harel
It looks like in 9.4 PG went after the document databases and now they are
after Oracle. I don't know why I get excited with point releases of postgres,
but I do.

~~~
stuartaxelowen
Because they are building an open source industry leading database. What's not
exciting about that!

~~~
pbreit
Is anyone still starting with MySQL anymore?

~~~
harel
"mySql" is dead to me. Literally. Anything Oracle touches becomes a potential
legal liability. Its Maria DB now and although I do maintain a legacy 100GB+
MARIA db, I will never use it again for new projects. The thing about postgres
is that once you start using it you understand there is really very little
competition in both the open or commercial worlds. My previous big project
relied on mongoDB for statistical data. At the time there was no other
compelling document database for our use case. Today, this decision is no
longer relevant because if I can get the same functionality and performance
out of Postgres I will choose postgres. The ability to combine the SQL with
"NoSQL" is just too compelling.

------
chris_wot
There is going to come a point where Postgres passes Oracle in terms of
features and performance.

There is also a point that Postgres will perform with enough features needed
by most businesses that they'll choose it even though it doesn't match Oracle
on a feature by feature basis.

We have hit peak Oracle. From this point forward it's going to be hard for
Oracle to regain momentum. Expect a lot of FUD against Postgres - the more you
see, the more worried you know Oracke execs are becoming.

~~~
reactor
First hand experience, I work for a bank which is a Fortune 100, and our
department was heavily relying on Oracle Exadata, paying between 1-2 millions
(don't know exact figure), we moved away to PostgreSQL and Casandra. Cost
factor was not the only motive (1-2 millions was actually penny for this bank)
but our Director was convinced that exadata was unnecessary.

~~~
chris_wot
Out of interest, what are the benefits of Exadata, or is it really a bit of
emperorer's new clothes? Can you not get similar performance with similar good
hardware?

~~~
jerven
Exadata:Hardware used to be competitively priced (at least in the x4 age), its
the licensing for the software on top that makes it really expensive.

The other thing is that some oracle features only work on exadata or other
engineered systems. The hybrid columnar compression for one and query offload
as well as some forms of the in memory stuff. The problem is that these are
very high end features and most of us live happily without these. Postgres is
going after the bread in the oracle db market while exadata is the exclusive
cheese shop. Much smaller market even if the margins are better.

------
filereaper
Short of doing a deep dive into Postgresql, do any universities use Postgresql
as a basis for their DB implementation courses? I wanted to dive into Postgres
and was hoping for some training wheels. Thanks.

~~~
__jal
PG is, IMHO, probably one of the best code bases for this.

I'm far and away not a database hacker, but have read parts of the code at
various times to better understand what I was seeing. The codebase is
incredibly well written and organized, and the documentation (both developer
and user) is top-notch.

The only problem I can think of with using it in a course would be choosing
what to focus on with only a semester. Add in the long revision history and I
think there are also multiple theses in there on the sociology of open source.

------
kazagistar
Blog post with some details and benchmarks by one of the authors of the patch:

[http://rhaas.blogspot.com/2015/11/parallel-sequential-
scan-i...](http://rhaas.blogspot.com/2015/11/parallel-sequential-scan-is-
committed.html)

~~~
pgaddict
That's not about parallel aggregate but about parallel scan.

Also, Robert is not the author of the patch, he did a review and committed it.
The actual author are listed in the commit message: David Rowley and Haribabu
Kommi.

------
kbenson
I would love to see some sample benchmarks for the type of gains you might see
from this. I always see these interesting new PostgreSQL features being
posted, and they sound cool, but it's hard to know how much they help in
practice. I understand it's often highly workload and data dependent, but
something would be better than nothing.

~~~
olavgg
It would help a lot for typical data warehouse queries that involves
aggregating millions to billions of rows. SELECT EXPLAIN ANALYZE will tell if
its relevant for you ;-)

~~~
kbenson
> SELECT EXPLAIN ANALYZE will tell if its relevant for you

That assumes I run PostgreSQL already, which I don't. I _am_ interested in
possibly switching at some point if it's worthwhile, but it's hard to muster
the effort to do concerted testing of a representative sample of my data,
including possibly changing how queries are done to take advantage of specific
features, when I have little information to go on.

Not that I expect PostreSQL to do in-depth analysis of everything, but it
would be great from both a promotional and technical standpoint if there was
something like "we've seen something like X% speedup of queries utilizing Y,
and up to Z% speedup in extreme cases." I mean, I assume they at least have
rudimentary numbers for this, otherwise they would be making blind changes
without knowing whether it improved or _degraded_ performance. Providing just
enough to get people interested in doing their own benchmarking (and possibly
publishing them) would be great for everyone.

Edit: One of top comment is actually what I'm talking about (but apparently
for a different feature). So it does get done, which is really nice. :)

Edit2: Now there's the link to the blog post for this feature. :)

~~~
Someone
If you aren't using it now, and thus don't know how fast or slow it would be
on your data and queries, why would you care about percentage speed ups?

It could be a 1% speed up from insanely fast to slightly more insanely fast,
or a 100% speed up from unbearably slow to just extremely annoyingly slow.

I gather that people considering migrating who want some reassurance before
investing time in testing should be more interested in such things things as
benchmarks that show whether PostgreSQL can saturate your hardware for various
query types, benchmarks that compare its query planner against that of
competitors, and benchmarks that show how well it works under load.

~~~
kbenson
>> I am interested in possibly switching at some point if it's worthwhile

> If you aren't using it now, and thus don't know how fast or slow it would be
> on your data and queries, why would you care about percentage speed ups?

I'm working under the assumption that similar operations between PostgreSQL
and MySQL for operations that aren't using an advanced feature that the other
plainly doesn't support will be _roughly_ comparable in performance. I
understand this isn't true, but it's useful in that it gives me a baseline to
work with when reasoning without forcing me to migrate my application (or at
least a significant portion of the data) to PostgreSQL just to test. If I'm
then confronted with a feature that one supports and the other doesn't that
has the capability to in some cases reduce the query time by an order of
magnitude, then it's starts approaching the criteria needed for me to invest
that time to see if it's worth migrating.

> I gather that people considering migrating who want some reassurance before
> investing time in testing should be more interested in such things things as
> benchmarks that show whether PostgreSQL can saturate your hardware for
> various query types

Isn't that exactly what this is? Throwing CPU cores at aggregating data to
reduce time? I asked for benchmarks regarding this specific feature, to see
what type of gain we were talking about with this. I didn't ask for
comparisons to other databases because those benchmarks are generally harder
to perform fairly, so I would have been (and am) happy with comparative gain
benchmarks.

------
HoyaSaxa
A slight tangent, but I am still shocked that many open source projects lack
downloads via https (including PostgreSQL). Sure you can offer some layer of
security by signing the distributions, but ultimately users are lazy.

~~~
harel
Out of curiosity what does it matter if your postgres download is via http or
https? Is there any benefit of hiding your download from prying eyes?

~~~
brazzledazzle
To put it another way it's less about prying eyes and more about MitM attacks.

~~~
derefr
I really do wonder whether there's a verified "path of trust" where I can get
a copy of e.g. Debian (without knowing what the canonical website for Debian
is), install it on a machine, connect it to the internet and update it,
without being at all vulnerable to a MitM attack at any step.

I would be suspicious that it's the very first step that poses the most
danger: there's little way to know whether minor-distro-X is "properly" hosted
at minordistrox.com or distrox.org or even distrox.github.io, other than by
blindly trusting the relative PageRanks of each.

\---

Something that would be very nice, I think, would be a way to rely on the
package-signing infrastructure of the OS you've already got (and trust), to
guarantee for you the copies of any new OS images—even for different OSes!—you
download. Sort of the same way you download new versions of OSX through the
Mac App Store, but going a bit further.

Now that I think about it, the browser-preloaded HSTS list might do the
trick... _if_ it was coupled with a mapping of 'important well-known
downloads' to a single (HSTS-preloaded) domain you should be allowed to get
those downloads from. (Presumably with heuristic detection, so it could find
"something looking mostly, from its metadata, like a Debian ISO image.") Then
your browser would just tell you you're being phished if you're trying to get
a Debian ISO from debianisamazing.info.

------
allan_s
it's pretty interesting especially for those of us who are creating analytics
tools on top of postgresql, the increase performance would certainly permit
less denormalized work-around , like creating "manual" aggregate directly in
the database (with all the problem of keeping them in sync with the rest)

and I'm still here waiting for 9.5 to arrive in RDS of amazon :(

------
tiglionabbit
Does it have generic upserts yet?

~~~
ddorian43
What do you mean by generic ? Upserts were implemented in 9.5

~~~
tiglionabbit
As in you don't need to write a unique PL/pgSQL routine for each kind of
upsert you want to do.

Oh, it does have upserts now. Awesome =]

