To my disappointment several members of my team have been championing Oracle recently though - presumably because it's a safer option and can be managed by others under an SLA.
How does one argue against this? With Postgres we're not going to get SLA's because we know it very well, but with Oracle we would. It's mostly perception management IMHO, and I'm mostly against it. What killer core features does Oracle have that Postgres doesn't?
While I don't really like Oracle as a company, there are some features that Oracle has and postgres don't, that can be important for production deployment:
- partitioning that's much more mature
- merge /*+ parallel
- materialized views that refresh on commit
- and oracle is quite a bit quicker - especially with badly written queries
But those (except for optimizer) are all Oracle Enterprise features which means you're paying ridiculous money per-core for enterprise edition itself, and then you're paying ridiculous money for the features one by one.
There's mostly no reason to use Oracle Standard edition instead of postgres. And I have seen Oracle XE in producion and that's just sad.
EDIT: oh and I forgot - postgres has that stupid "we don't want to include optimization hints in the language so we use CTEs as optimization boundary" policy that the developers stuck on and refuse to change, which means you have to decide between readable code and performant code. Meh.
EnterpriseDB, Fujitsu, and others.
> presumably because it's a safer option
Spoken like people who have never dealt with Oracle licensing and compliance. "Safer" for values of "well, I guess we didn't need those millions in this year's budget anyway.
* empty string == NULL in oracle
* ROWNUM: it's kinda like a limit, but not
* DUAL: hah. postgres eat your heart out.
And of course there are business practice differences:
... but tongue out of cheek, if you can afford it, Oracle has solutions for just about every database problem in just about every vertical domain. They have really good engineers, really good testing, and they're #1 for a reason
* empty string == NULL in oracle
Personally I'd see this as a bug, not as a feature. But your mileage may vary.
I am not sure if I am more or less confused.
+1 for PostgreSQL on this one.
select *, row_number() over (order by id desc) as rownum
They will come back after speaking with people at very large banks (if they go to PgConf NY), very large insurance companies, and very large enterprise tech companies. All of them are moving more of their business into postgresql. Many of them are moving away from oracle.
They will change their perception very quickly.
Say what you want about enterprise software but Oracle is a beast... if you work for a company willing to shell out for it. That cost will seem sillier though Postgres closes the feature gap.
As for "killer features", the zero downtime version upgrades has always meant a lot to me.
Lots of third party applications that support Oracle and not Postgres is sadly the most common reason I see people deploying Oracle these days. Also Oracle used to be a lot better with larger databases (400+ TB or so), but I have no idea if that is still true.
There are multi petabyte Oracle RAC instances in existence, while on the other hand a 10TB+ PG is very large.
This is a first step in closing that gap. Some form of compression on block level would really help as well. For analytics columnar support will be needed to, and an in-memory option would be nice too.
You might be surprised how many queries are bound by memory bus bandwidth and CPU, rather than I/O ... but even I/O bound queries may benefit:
Remember that postgres is a "demand-pull" system. A sequential scan node returns a single tuple at a time, whenever its parent node asks for a tuple. So there can be considerable time between requests for an actual disk block.
Parallelizing the scan should result in more frequent requests to the disk controller, which could then have better opportunity to optimize reads - even for a single disk. Since most of the latency for spinning disks is seek time, having more outstanding block-read requests should help performance.
No but seriously, if your "spinning disk" is even a semi-modern storage controller, it will queue and reorder your parallel requests for the most efficient access to the several, or dozens, or hundreds of physical disks it manages. So why not slam it with all the requests in a single sequence? Because a) if you are set up to handle parallel returns in your code, you don't care about the ordering in which blocks are returned so much and b) the way this is physically implemented is n queues of d depth and you want to fill them all with work.
Nowadays you actually can skip a lot of setup automation with RDS, with backups, read mirrors and automated failover taken care of for you, although you frequently hear folks complain about some of those scenarios not working as nicely as you'd expect.
Besides that, sync & async replication has been there nicely since 9.2. 9.3 added timeline switches, needed for cascaded slave setups in practice. 9.4 added replication slots so the master can keep track of slave sync status, and the foundations for multi-master & logical replication. 9.5 added pg_rewind which speeds up resyncs.
The oldest one is Postgres-X2  (formerly Postgres-XC), a project sponsored by NTT. It implements fully consistent multimaster replication and partitioning. After many, many years apparently still isn't production-ready, and it seems to have a significant scalability bottleneck that's tied to its design. Another issue is that being a fork, it has to be continually updated from the mainline. It's currently based on 9.3.
Postgres-XL  is apparently a merger of Postgres-XC and a different implementation called StormDB that was bought by a small company called TransLattice. (TransLattice also sponsored or acquired an earlier project that went nowhere, Postgres-R.) Unlike XC/X2, they say they aim to contribute changes back to the mainline, and they also claim their distributed query model is superior.
With the commerical support behind it (it's used as the basis of a commercial product), it's possible that this is something that will be usable. Unfortunately, it seems very quiet and not very open-sourcy; most of the development seems to be by just one guy , and nobody seems to be using it in production at this point.
Perhaps I missed it, but I don't think the post specified how many rows were in the source. Could have been more than a million. Having a relatively expensive filter (regex) eliminating all results keeps the performance picture clearer. Anything done with a tuple that passes the filter would apply to both the regular and the parallel scan cases, so eliminating the results highlights just the difference between regular and parallel scans
Keep in mind that's the first pieces and not all the patches has merged yet concerning this feature. Still some on progress.