
Parallel Sequential Scan is Committed to PostgreSQL 9.6 - turrini
http://rhaas.blogspot.com/2015/11/parallel-sequential-scan-is-committed.html
======
sgt
Postgres is going in one direction - for the better. And as a long time
Postgres user I've been really happy with it for years. Now with JSONB support
I am even happier, as I can build powerful RDBMS + "NoSQL" apps that
outperform traditional NoSQL databases.

To my disappointment several members of my team have been championing Oracle
recently though - presumably because it's a safer option and can be managed by
others under an SLA.

How does one argue against this? With Postgres we're not going to get SLA's
because we know it very well, but with Oracle we would. It's mostly perception
management IMHO, and I'm mostly against it. What killer core features does
Oracle have that Postgres doesn't?

~~~
glogla
> What killer core features does Oracle have that Postgres doesn't?

While I don't really like Oracle as a company, there are some features that
Oracle has and postgres don't, that can be important for production
deployment:

\- flashback

\- partitioning that's much more mature

\- merge /*+ parallel

\- materialized views that refresh on commit

\- and oracle is quite a bit quicker - especially with badly written queries

But those (except for optimizer) are all Oracle Enterprise features which
means you're paying ridiculous money per-core for enterprise edition itself,
and then you're paying ridiculous money for the features one by one.

There's mostly no reason to use Oracle Standard edition instead of postgres.
And I have seen Oracle XE in producion and that's just sad.

EDIT: oh and I forgot - postgres has that stupid "we don't want to include
optimization hints in the language so we use CTEs as optimization boundary"
policy that the developers stuck on and refuse to change, which means you have
to decide between readable code and performant code. Meh.

~~~
sgt
Flashback is indeed a killer feature. I'd like to see something like Flashback
in Postgres. PITR is possible with Postgres but it takes very long to replay
log files to a certain point, if you need to restore data. I know this because
I've had to do this quite a few times.

~~~
je42
Flashback is useless if there are schema changes.

------
losvedir
Naive question here, but why wouldn't parallel scans actually be worse when
reading from a spinning disk? And even for cached data I'd expect the
bottleneck to be I/O still. Or is this just important because it's laying the
groundwork for parallel queries, and we shouldn't expect any improvements from
this feature alone?

~~~
andy_ppp
What do SSDs look like when reading in parallel?

~~~
meshko
SSD drives contain multiple NANDs. So a well designed SSD controller will take
advantage of that (including data striping, essentially turning your drive
into a tiny RAID). Of course it is not clear how good your consumer SSD drives
are at that, but any server-grade drive you are using in your db server should
support parallel reads. It gets tricky though because it reads data so quickly
it might saturate the io bandwidth even with a single request, I've never done
that math. All I know is that SSDs make my queries magical :)

------
mixmastamyk
PG is awesome, but I remember people often complaining about its replication
features a few years ago. Have they been improved?

~~~
dagw
Improved, yes. Fixed, no. Basically if you want proper full replication, you
won't get it out of the box, but will have to use various third party tools.
That being said 9.4 added a bunch of the fundamental changes that that laid
the foundation for full replication, now we're just waiting PG to finish
exposing those features with userland tools.

~~~
DrJokepu
What do you mean by "proper full replication"?

~~~
barrkel
Master-master

~~~
rgacote
The EnterpriseDB folk [http://www.enterprisedb.com/products-services-
training/produ...](http://www.enterprisedb.com/products-services-
training/products-overview/xdb-replication-server-multi-master) and
2ndQuadrant [http://2ndquadrant.com/en-
us/resources/bdr/](http://2ndquadrant.com/en-us/resources/bdr/) both have
multi-master replication.

------
gtaylor
It seems like this could make Postgres even better for large-scale data
warehousing. Very exciting!

~~~
lobster_johnson
Take a look at Greenplum, which was open-sourced recently. It's an MPP fork of
Postgres: [http://greenplum.org](http://greenplum.org).

------
mrmondo
This is going to be fantastic for us. I'm assuming 9.6 is still some way away
though.

~~~
narsil
Dates selected during PgCon this year for 9.6 show a release in October 2016:
[https://wiki.postgresql.org/wiki/PgCon_2015_Developer_Meetin...](https://wiki.postgresql.org/wiki/PgCon_2015_Developer_Meeting#9.6_Schedule)

~~~
elchief
That's what they said for 9.4 and 9.5 too. More likely December 2016

------
TheCondor
Has anyone tried Postgres xl? It has this kind of logic at the server level,
no?

~~~
lobster_johnson
I don't think any of the current open-source forks are ready for production.

The oldest one is Postgres-X2 [1] (formerly Postgres-XC), a project sponsored
by NTT. It implements fully consistent multimaster replication and
partitioning. After many, many years apparently still isn't production-ready,
and it seems to have a significant scalability bottleneck that's tied to its
design. Another issue is that being a fork, it has to be continually updated
from the mainline. It's currently based on 9.3.

Postgres-XL [2] is apparently a merger of Postgres-XC and a different
implementation called StormDB that was bought by a small company called
TransLattice. (TransLattice also sponsored or acquired an earlier project that
went nowhere, Postgres-R.) Unlike XC/X2, they say they aim to contribute
changes back to the mainline, and they also claim their distributed query
model is superior.

With the commerical support behind it (it's used as the basis of a commercial
product), it's possible that this is something that will be usable.
Unfortunately, it seems very quiet and not very open-sourcy; most of the
development seems to be by just one guy [3], and nobody seems to be using it
in production at this point.

[1]
[https://github.com/postgres-x2/postgres-x2](https://github.com/postgres-x2/postgres-x2)

[2] [http://www.postgres-xl.org/](http://www.postgres-xl.org/)

[3] [http://git.postgresql.org/gitweb/?p=postgres-
xl.git;a=summar...](http://git.postgresql.org/gitweb/?p=postgres-
xl.git;a=summary)

------
meshko
That example benchmark he has is weird. You'd probably want to test this one
something with at least a million rows, taking more than a 100ms and probably
actually returning some data. This sub-second difference can be attributed to
solar flairs. Also it is not clear that it is not just result of the table
getting cached. Did they implement a feature like this and not run a real
performance test?

~~~
quizotic
Robert Haas is the real deal, and will certainly have tested this feature.

Perhaps I missed it, but I don't think the post specified how many rows were
in the source. Could have been more than a million. Having a relatively
expensive filter (regex) eliminating all results keeps the performance picture
clearer. Anything done with a tuple that passes the filter would apply to both
the regular and the parallel scan cases, so eliminating the results highlights
just the difference between regular and parallel scans

~~~
meshko
I really really doubt that it can process a million uncached rows under a
second. But yeah, I just tested an million was not a good example because it
does take only a bit over a second on decent hardware. So perhaps should have
said 10 million. I agree about not returning any results being the right test.

~~~
Jweb_Guru
It can definitely process a million uncached rows in well under a second :P
Have you used Postgres recently?

~~~
anarazel
You can even insert well above a million rows/second...

