Hacker News new | comments | show | ask | jobs | submit login
At 22 years old, Postgres might just be the most advanced database yet (arcentry.com)
314 points by max_sendfeld 9 days ago | hide | past | web | favorite | 172 comments

I'm a big fan of Postgres and it is basically the only DB I use but the title is naive and the post is low content. SQL Server, Oracle, and DB2 are crown jewels products and each have substantial and difficult to implement niche or extreme scale features or other sweet spots that no open source databases come close to after two decades, and they weren't sitting still during that time either.

Agree. I have been using Postgres and Oracle in production since ~2000 (oracle 8i). Postgres is an amazing project but the oracle core database has billions of R&D dollars. Don't confuse the oracle core database with the awful applications they hang off of it. Oracle will still destroy pg for most OLAP workloads, now you have to decide if the insane licencing cost is worth it.

Also just want to add an oracle developers post that I find hilarious...


> https://news.ycombinator.com/item?id=18442941

This was a gem. Really makes you appreciate not having to work on such a codebase.

One thing Postgres definitely doesn't do, and I wish it did, is embedded. This is usually the place that people use SQLite, for example, apps that are clients, but still need to do substantial work on the client side.

The problem is, SQLite does have a few limitations that it doesn't make obvious: some of the more complex SQL syntax that Postgres supports are missing, but more importantly, it can only ever do one write or read to the database predictably. (Don't believe me? Try this: https://gist.github.com/mrnugget/0eda3b2b53a70fa4a894)

I know that SQLite supports concurrent writes and reads technically, but it predictably leads to 'database is locked' issues. In practice you have to use it in a fashion that it only ever does one thing, read, or write. No multiple writes. No single write and multiple readers, no single write and single reader. Just read — or write. (Edit: WAL doesn't help with this either.)

Or you're in a world of hurt. I'd love to be able to move to Postgres just to get away from that.

> some of the more complex SQL syntax that Postgres supports are missing, but more importantly, it can only ever do one write or read to the database predictably

I don't know...it's recently added upsert compatible with Postgres syntax as well as support for window functions.

Additionally, sqlite has a sophisticated json extension.

> it can only ever do one write or read to the database predictably

WAL mode addresses this, allowing an arbitrary number of readers with a writer.

For concurrent applications just use a dedicated write thread and enqueue writes...your example is contrived. Connection management, transaction scope management, etc, can all work in your favor. Writes occur in a few milliseconds, meaning you only need to hold the exclusive lock very briefly in most situations.

More information here: http://charlesleifer.com/blog/going-fast-with-sqlite-and-pyt...

The gist I linked is extremely basic, just insert and read, really, and it does use WAL by default. It'll still end up with database lock, and from there it's all shallow waters — reverting from a crashed database lock is not trivial and can cause data loss if handled improperly.

Not all writes are couple milliseconds. I regularly have transactions that take 15< seconds, and it's not because the data size that's being input is large, it's just that the SQL is recursive and complex enough to warrant recursion (storing graph data).

This is almost always a driver/wrapper problem, which seems to be the case here. SQLite supports multiple readers without issue that only block slightly when a write transaction is committed, and there's also WAL mode with even better concurrency that avoids locks altogether for reads.

I've had the same problem with Python, not just Go. If it's a driver problem, it's a very widespread one, which would point out to SQLite driver API being in need of some love if so many folks who write SQLite drivers are making the same mistake.

The Python driver is buggy-by-default. This has been documented in recent versions [1], and is kinda sane for the undo log, but not so good if you want to use snapshot isolation. I guess many bindings would make the same decision. Basically they delay the emission of BEGIN until the first DML/DDL is executed.

If you fix this (which is easy), then WAL works just fine. Note that transactions, as is standard, span from BEGIN/COMMIT to COMMIT/ROLLBACK.

See e.g. https://docs.sqlalchemy.org/en/latest/dialects/sqlite.html#t... and https://docs.sqlalchemy.org/en/latest/dialects/sqlite.html#p...

[1] Really it's more like "encoded": https://docs.python.org/3/library/sqlite3.html#controlling-t... -- you already need to know the issue to understand that this section specifies the root cause.

Interesting! The last time I used this with Python was 2014, and this, alongside the botched migration to 3 was the reason I moved on from Python to Go. Glad to hear things have been improving.

SQLite has billions of global installs and is tested and certified under incredibly stringent conditions for use in applications like aircraft avionics. Unless there is serious evidence to the contrary, it's pretty much guaranteed that it's not the cause of the issue.

How many drivers have you actually tested? There are 2 comments from 2016 on that thread saying other drivers do work fine.

> it predictably leads to 'database is locked' issues

Do you mean your application actually throws the error "database is locked," and dies without simply waiting for the lock to release? This behavior definitely goes against the grain of the documentation. From the SQLite website:

"SQLite supports an unlimited number of simultaneous readers, but it will only allow one writer at any instant in time. For many situations, this is not a problem. Writers queue up. Each application does its database work quickly and moves on, and no lock lasts for more than a few dozen milliseconds." --- https://www.sqlite.org/whentouse.html

I have to wonder, with the others here, whether the bug is in the driver, or just anywhere but SQLite. With such fragility it would have blown up by now. SQLite is ubiquitous: the iPhone, iTunes, Android, Chrome, Windows 10, McAfee, the Redhat Package Manager (RPM), commercial flight software, and many other pieces of software --- https://www.sqlite.org/famous.html

As a data point with SQLite, there's a branch called the "Server Process Edition" which has support for 16 concurrent writers:



Haven't yet played around with it myself, and I tend to expect it'd be more experimental than something to use in production.

Can you be more specific? I always wondered why people use them

SQL Server has columnstores, in-memory tables with compiled procedures (which can be combined with columnstores), graph processing, built-in machine learning extensions, AlwaysOn availability/replication groups, Stretch and Polybase to read from external sources, integrated encryption, fantastic tooling, and the best optimizer with batch/vectorized processing.

It still doesn't have proper upsert or JSON columns, so PostgreSQL wins on small usability features, but SQL Server is a serious workhorse when it comes to large scale operational and analytical systems with unmatched performance. PG barely has parallel queries or decent connection management and is a decade away from catching up to SQL server in these features.

> It still doesn't have proper upsert or JSON columns

Upsert (MERGE) was added in SQL Server 2008 (although it's chock full of known bugs): https://www.mssqltips.com/sqlservertip/3074/use-caution-with...

JSON was added in SQL Server 2016, although indexing it is still tricky: https://docs.microsoft.com/en-us/sql/relational-databases/js...

That's what I meant by "proper". It can do it, but PG does it better. It's one area that I wish they would invest time into considering the large developer impact it would have.

My understanding is they got burned by all the work put into the XML datatype and don't want to repeat it with JSON, hence the more conservative approach.

How do Stretch and Polybase compare to Foreign Data Wrappers?

Is the integrated encryption comparable to pgcrypto?

FDWs are much more flexible, as they are basically an endpoint based on the SQL/MED standard that can be implemented by any module and there are dozens of options out there for connections.

Stretch and Polybase are much more limited to Azure storage and Hadoop systems with deep integration into T-SQL, and scale-out using clustering.

At work, we run SQL Server, because we use several third-party applications that use it. And once it's there and paid for, if we need another database, it is easier to just dump it on the server that is already there, rather than run a second RDBMS server.

I am a little torn about this - on the one hand it strikes me as laziness that all these Windows developers will just pick SQL Server because it is so convenient (good integration with Visual Studio, drivers are present by default in .Net Framework). On the other hand, in terms of performance and reliability, I cannot say a single bad word about SQL Server. Another thing I like is that one can use accounts from an Active Directory domain as database accounts, so you do not have to duplicate group structures and such, if you use elaborate database permissions.

I have come to like T-SQL, which integrates some procedural features into SQL, so SQL Server does not need a separate language for writing stored procedures, triggers, and so forth. Where T-SQL is not enough, one can use basically any .Net language, but there are some restrictions to it, and I have never used this feature.

Postgres offers many features SQL Server lacks, e.g. the array of data types available in Postgres is amazing, or the fact that it has enums. SQL Server has some features Postgres lacks, for example the Service Broker, which is very interesting, but apparently non-trivial to use correctly.

Where T-SQL is not enough, one can use basically any .Net language, but there are some restrictions to it, and I have never used this feature.

Embedded R in SQL Server as well.

Postgres is my default database choice, or SQLite if it's small, but SQL Server is a solid, high performance product and I'll gladly run it for the most demanding workloads.

I'm not advocating their use and the default should be right and smallest fit for the job, which means sqlite or Postgres nearly always. But too often people get in this mindset that old commercial products are meritless or whatever.

An example for Oracle would be RAC. This is a mature shared data architecture to provide high availability and horizontal server scaling to a DB. Deploying this is a fairly standard Oracle DBA activity. Getting something similar with open source tools is a big lift and permanent support commitment and will be for a while. Meanwhile these three commercial DBs have provided continuity for decades for features like this.

> But too often people get in this mindset that old commercial products are meritless or whatever.

There is a lot of that assumption. And then, there is the vendor lock-in licensing terms that commercial DB vendors utilize, in that database switching cost is just prohibitively high and impractical.

The lock-in is not from licensing, it's from the features and performance. If you aren't using those capabilities then you didn't need that product anyway, and can use any number of migration tools to switch to a different database.

Yes, most of factors for lock-in are from features and ecosystem and replacement cost. But in licensing negotiations, license up-sale, auxiliary products & services are definitely making the lock-in tighter - since you've just made the big investments, why switching?

Scaling/replication is the usual thing. Postgres certainly has a better story here than it used to, but if you want to do serious scale replication it's very much in roll-your-own territory.

Like, imagine you have a few hundred terabytes of data you need to keep in-sync globally, between multiple datacenters, with many servers in each DC.

I was under impression that only Spanner-like databases (CockroachDB and FaunaDB) are the only ones that can do cross-datacenter multi-row transactions at scale. Because they were designed exactly for this, otherwise tradeoffs taken are just too bad for the that use-case (e.g. too slow or too brittle in availability).

By cross-datacenter I mean datacenters far away from each other, in different geo regions.

You can do single master, many read replicas, which works fine for many, many use cases.

Oracle SQL Server is like 50 years old

I thought Oracle was founded in 1977 so it would be closer to 40 years.

Correct! First available release was in 1979 according to the wiki, so almost 40 years.


I've recently used T-SQL / MSSQL and was surprised how starkly it differs from your typical "foss sql" (be it postgres, sqlite or mysql).

One really obvious example is how [] are used for qualifying names, or how there is no LIMIT clause (instead you use SELECT TOP(n), but you still use an OFFSET n ROWS clause after the ORDER BY clause for an OFFSET; there is also OFFSET n ROWS FETCH NEXT m ROWS ONLY). Another example are curious limits to programmability, e.g. TEXT can't be used for procedure parameters. There also seem to be small limits on BLOBs. No NATURAL JOIN (which I mostly use for ad-hoc queries).

It is also very different deployment wise (as are all Microsoft products). You don't have a client library or anything like that, but a system-wide database driver instead. Applications use a driver interface and could (most don't) support other database versions or even databases. You can't "just" throw a MS SQL install on a machine, it needs to be properly installed system-wide and register all its components or it won't work properly etc. — so spinning an instance up for testing really isn't nearly as easy as with postgres.

2004 I remember working in a larger enterprise and was forced to use MSSQL. Now... I know there's a lot of 'good' in it, and compared to MySQL at the time, many benefits (though many of those benefits were for the admins, not so much the developers in my org).

We had a requirement for paginating through records, and the group of MSSQL admins were... flummoxed at the request. "Why would you need to do that? Just use top(X)!" Umm... when there's 48000 records, and you're trying to see records 47000-47100.... wth?

They spent days back and forth trying to come up with reusable sprocs with odd combinations of top/reversing/top/sorting - SQL Server 2000 at the time. I'm not sure the rownumber() thing was available at the time. And... there was much grumbling about this 'stupid' request. I was honestly a bit shocked (I was still a bit naive about 'enterprise' folks) that this was such a chore, and not there. We also needed in-place encryption/decryption, which one of the prototypes had with mysql aes_encrypt(), but we had to use MSSQL, so... thousand of dollars later for some db tool, this was available.

I know MSSQL has many advantages, and is very powerful in many situations, but MS-first people sometimes are oblivious to 'standard' functionality other products have.

> You can't "just" throw a MS SQL install on a machine, it needs to be properly installed system-wide and register all its components or it won't work properly etc. — so spinning an instance up for testing really isn't nearly as easy as with postgres.

As another commenter pointed out, you can run MSSQL in Docker. We do this in CI.

You can also install it pretty easily on a standard Linux machine, without much more complexity than Postgresql. For example, you can `apt install mssql-server`


I cringe every time I have to use SQL Server.

1) Weird gaps in ANSI compatibility (no concatenation with ||, no current_date, top n instead of limit/offset, etc.)

2) Just yesterday a client's SQL Server started selecting deadlock victims to be killed, so the DBA's workaround was to throw (nolock) hints on every table and read dirty data. It happens when concurrent queries update records in a different sequence, but the same application on Postgres never deadlocks.

3) The query planner seems amazingly naive for such a mature product. Hard to understand how a query with three where-clause constraints can execute in 20 ms, but adding a fourth constraint makes it run 240000 ms, but I see this kind of insanity every day.

It happens when concurrent queries update records in a different sequence, but the same application on Postgres never deadlocks.

SQL Server has like 4 different isolation levels + MVCC, you just pick the one you need. Postgres just does MVCC so it sounds like you should just switch SQL Server to that mode. A poor workman blames his tools. Unless that tool is MongoDB.

> just switch SQL Server to MVCC

In the real world of mediocre DBAs and databases managed by application vendors, it's never that easy, is it?

Postgres just works with default settings while SQL Server chokes on the same transactions with the same volumes.

SQL Server is legendary [0] for deadlocks even when the workload is mostly reads and rarely writes. You just don't see this with Oracle, Postgres or MySQL with default settings.

Not sure why I'd pay for the privilege of extra headaches.

[0] It was a problem in 2008, still not resolved https://blog.codinghorror.com/deadlocked/

Yes the defaults are more pessimistic but why is such a problem to change the locking level?

SQL Server has one of the most advanced query optimizers. Are you running on an old version or using out of date statistics?

You don't need a system-wide database driver, it has its own client protocol and uses drivers just like any other database:


It can also run in Docker containers on Windows, Mac or Linux and spins up in seconds, along with the usual package manager installs on Redhat/Ubuntu:


T-SQL might have some oddities, but Microsoft distributes SQL Server as a Docker image, very easy to setup.


That the setup is complex enough that they need a container should tell you enough.

Containers have nothing to do with the complexity of the software, they are just a packaging format.

Yes, SQL server is complex software, and can be installed via apt-get as well.

> You don't have a client library or anything like that, but a system-wide database driver instead.

That's a choice ala ODBC. A simple TDS client library works just fine. In fact, it's not a complex protocol for basic uses and is fairly easy to write a library for.

I've dived into the postgres code recently and it was incredible. Lisp legacy is all over the place, it's literally lisp in C, though unlike many "Langname-styled C" codebases it looks really clean and organic.

I'm still confused how people prefer oracle [1] other postgres.

[1] https://news.ycombinator.com/item?id=18442941

Seriously. After using Oracle for 15+ years, moving to Postgres has been amazing. Just moving from clob to text has nearly brought me to tears of happiness. Nearly all of our complex sql queries are simpler in postgres. The documentation is light years better.

What is "Lisp style C"? Can you link an example?

I tend to think of George Carette's SIOD Scheme interpreter when I think of "Lisp style C":

    LISP difference(LISP x,LISP y)
    {if NFLONUMP(x) err("wta(1st) to difference",x);
     if NULLP(y)
       {if NFLONUMP(y) err("wta(2nd) to difference",y);
        return(flocons(FLONM(x) - FLONM(y)));}}

Of course, that's a Scheme interpreter, so it might be at least a little expected.

I can't find it, but I remember seeing a linked list library for C, which supported a functional style like Lisp. I don't know how closely it was related to S-expressions, but I'd be interested if somebody could help me remember what I'm thinking of here.

Postgres has led the world in trash-talking other databases for 22 years.

I used to call it CrashGreSlow back when Mysql was maintained. Postgres had a manifesto which trash-talked Oracle but it was not a reliable product at that time. Maybe the people who said it was fast were running it on ram disks with fsync turned off or something.

After mysql got bought by Sun and put on ice, postgres caught up with the hype and now it is pretty reliable. That wasn't always the case.

MySQL has been continually developed and maintained. At no point was it ever "put on ice". There are more developers devoted to it in recent years than at any point in its history.

Recent major versions have provided substantial enhancements to performance, replication, JSON/document support, and observability, to name just a few.

MySQL is still under development, and there are numerous forks. I use it for my company; it's still a very good database option, albeit perhaps a little basic.

However the fact that it's owned by Oracle does make it a little bit scary, even though we have no exposure to their code or anything linked at all. I sort of wish we'd chosen Postgres instead.

When Sun bought mysql they promptly stopped development. Around that time they were also neglecting Java.

Oracle has been a reasonable steward of mysql. They started making releases again. They haven't really moved it forward but I am sure they are taking care of their customers.

So far I haven't been impressed with mariadb and the other forks. For instance I would download it because I heard some hype but the installer didn't work.

> When Sun bought mysql they promptly stopped development.

This simply is not true. I personally know a bunch of engineers who worked for MySQL AB and stayed on through the Sun and then Oracle acquisitions, and worked on MySQL that entire time.

Additional source: Sun acquired MySQL AB in early 2008. MySQL 5.1 was released in late 2008, and major releases continued on a cadence of every 2 to 2.5 years to this day. You can verify this via release notes, or via Wikipedia, among other sources.

> They haven't really moved it forward

This is purely a matter of opinion. From my own perspective, as someone who has been using MySQL for 15 years, I see substantial ongoing improvement in each major release.

If MySQL was as stagnant as you claim, why would the majority of the largest internet properties continue to rely on it as their primary data store?

Seriously though this is true especially from their users.

It is never just PostgreSQL is a great database. It’s always that MySQL, Oracle, MongoDB and the hundreds of NoSQL databases are all unusable junk with no unique benefits.

Oracle is OK if you can afford it and afford a nice battery backed storage array that (truthfully) fsyncs quickly.

MongoDB is crap.

I think the people who use Arangodb don't talk about it because they see it as a competitive advantage.

Right now I am working on an abstraction layer for document databases and using couchdb side by side with Arangodb. I guess I'll have to write a N1QL parser to go with my AQL parser.

The database I have the most fun with is SQLite. It is single-user, which means you can't log in with the SQL monitor when your program is running, but if you can accept that it's just great.

I've never heard this, but I don't know any hardcore database folks either. Mostly it's just "I need a SQL database for my application and Postgres seems to be a good default" (note that I mostly roll with a Linux/Unix crowd, otherwise it'd probably be the same thing except s/Postgres/SQLServer).

I'm taking a look at this now and can't make the same conclusion. For instance, the "bag-like" List and ListCell stuff is completely non-Lispy:


Mind you, I'm biased because I've written a fair amount of actual "Lisp style C", e.g.:


My detector for "this C is Lispy" has a rather high threshold / low gain.

Yeah, we replaced the old Lisp-style List implementation with a different implementation a while ago: https://github.com/postgres/postgres/commit/d0b4399d81f39dec...

You can still see the Lisp heritage in a few places, but overall I wouldn't say Postgres is "literally lisp in C".

I see that this diff contains a lot of fluff: in many places, functions and variables are changing names while the code which uses them remains the same.

In code that truly embraces "Lispy" lists, you cannot switch to an encapsulated list representation without making substantial changes to the code. For instance if you have any cdr recursion going on, that will have to be restructured, because a bag-style list doesn't have a cdr that is itself a bag-style list. (Perhaps the function has to be split into an external one that takes the List and then a local recursive part that works with the ListCell.)

I would say that to a fair extent, it looks to me as if the code anticipated a future retargeting to a different list representation, whereas the "Lispy" approach is to embrace a particular list representation.

Lisp programs sometimes need to improve the performance of adding to the tail of a list; it's done with some wrapping, like a structure that keeps a pointer to the tail. Usually that is only locally used; it doesn't "travel" as part of the representation of the list. It is not an encapsulation device, but only a process device. Of course there is also the common pattern of building up a list in reverse followed by a destructive nreverse. And the meta-approach of designing things to avoid doing work at the tail end of the list.

I just remembered the existence of some C code that has nothing to do with any Lisp implementation internals, which pushes and nreverses: http://www.kylheku.com/cgit/c-snippets/tree/autotab.c#n228

A double-ended queue (deque) can be formed in Lisp by using a pair of lists. So both ends of the queue are list heads, from which we cdr into the interior. I developed such a thing which is hosted here:


With this deque, we simply use the push macro to add items to either end of the queue. pop-deque will remove an item from either end. Underflows are handled by rebalancing the dequeue: shuffling items from one list to the other. The cost of that is amortized. (BTW I see that pop-deque has a multiple-evaluation-of-arguments problem; it really should be written using get-setf-expansion.)

> In code that truly embraces "Lispy" lists, you cannot switch to an encapsulated list representation without making substantial changes to the code.

I think changing the list representation made sense in part because the rest of the codebase didn't use lists in a deeply Lispy way, for the most part.

You should host teachings in college and schools.

Oracle like many other databases has clustered indexes. PostgreSQL does not and it can have big performance implications.

Yes they really should look at adding index organized tables, aka real clustered indexes.

Better space usage and performance for that single index since the table is the index, slightly worse performance for non clustered indexes due to a double index traversal.

That's not how a real clustered primary key works, because "the changes are not clustered.".

can confirm.

That said, one can often capture much of the performance benefit with Postgres indexes - basically, instead of trying to tightly-pack the base tables, copy the data into indexes which are tightly packed exactly as you need. Put another way, Postgres indexes are synchronously updated materialized views - and yes, the Postgres planner/optimizer will automatically answer queries from an index if it's faster and all the columns are in there.

I esp recommend reading about function and expression indexes, as well as the (new) covering indexes.

http://blog.scoutapp.com/articles/2016/05/31/3-postgresql-in... https://www.endpoint.com/blog/2013/06/10/postgresql-function... https://paquier.xyz/postgresql-2/postgres-11-covering-indexe...

Surely the writes will be much slower if you have to copy the data to the index

What do you think a (non clustered) index is? Of course it slows down writes.

It was a reply to proposed workaround, to show that it still has significant drawbacks

Legacy code + Enterprise = a life so long it feels like butter spread a little too thin over toast, as Bilbo Baggins put it.

Have you tried to scale horizontally with Postgres?

Have you tried to scale horizontally with Oracle?

Oracle scales horizontally perfectly... even up to hundreds of cores, it runs just as fast.

Yes, and I cried when I saw the bill.

FWIW I much prefer Postgres to Oracle, but not because of the technology as much as the price.

Because "nobody was fired for choosing oracle" https://www.google.com/search?hl=en&q=hackernews%20nobody%20...

I don't think anyone "prefers" Oracle; it just persists on some combination of politicking, inertia, vendor lock-in, and contracts that assure regulatory compliance.

I'm confused why anyone prefer Oracle at all. You can basically swing a bat at their codebase and default creds come raining down. Everything they touch has a steroid infused YOLO.

I'll tell you why - try to update one column in 100GB worth of row data.

Postgress makes a copy of _every_ row and you need 100GB of extra space on the hard-drive until you commit the transaction. Now extrapolate to a 1TB table that needs updating.

Oracle has a way of doing this w/o copying the entire row.

> Postgress makes a copy of _every_ row and you need 100GB of extra space on the hard-drive until you commit the transaction.

This only happens if the column is indexed, heap-only-tuples will allow in-place updates otherwise. This doesn't dismiss it as a potential problem entirely, but depending on your needs you may never run into this.

I would argue thats an issue with your architecture at that point - you may want to use table partitioning at sizes that big, or have some other mechanism in place to be able to lock access while updating such large data sets in one go.

I would generally agree, though there are unfortunately limitations imposed on you the moment you start using partitioning with PostgreSQL (foreign keys remain to be a big one).

Not as of PG 11

You still can’t make FK references TO partitioned tables, unfortunately.

So what's the workaround? Drop the index, update, then re-establish the index?

> Postgress makes a copy of _every_ row

That's what the zHeap storage engine for Postgres fixes - in-place updates for fixed width data.


"for eventual integration into PostgreSQL"..

And is being actively developed through the mailing list. Feel free to follow the progress. It's great to watch it happen in real time.

> Oracle has a way of doing this w/o copying the entire row.

And I have seen Oracle 12c servers completely lock because of that way.

Thanks to these engines and extensions, Postgres has become that rare tool for me where I have to ask "why use ___ instead of Postgres?"

If they got their clustering story to be as easy as MongoDB's was 5 years ago (From what I read, Citus does this well), it's yet another excuse to stick with it.

> why use ___ instead of Postgres?

MSSQL has DataDude. You write your database as CREATE statements. This is then parsed into an AST and semantic model, diffed against a live database (or another script) and you get an ALTER script. You also get working intellisense, build errors, and everything you'd expect from something like C++. It completely changes how you develop databases into something a whole lot more modern. Especially in source control: migrations are storing history on top of the history which source control already provides, which is nonsensical.

I still want to develop something for Postgres someday that replicates this.

Liquibase is quite nice for doing this in a cross-database way - you describe the changes you want to make to the schema and it applies them as needed to the db (and because you're properly describing the changes it can also do rollback for a change).

We use the same migrations across hsqldb, postgresql and sql server and it just works

> MSSQL has DataDude. You write your database as CREATE statements. This is then parsed into an AST and semantic model, diffed against a live database (or another script) and you get an ALTER script.

Oh wow, I've wanted exactly this for Postgres. Never knew how to search for it though.

I used to use RedGate's SQL Compare (and SQL Delta) for this for years. Compare two databases and spit out an alter script to get you from one state to another (for schemas and data).

The tooling is definitely lacking in the Postgres world. I like using Postico on the mac for basic data tasks, since it's rather polished. But doing anything else is a slog using an ugly app.

Is the product name really DataDude, or is that a typo? Do you have a link to that product?

It was the codename, it got rolled into Visual Studio and is now nameless (DB projects).


And even if you have to use some other tool (Redis, Cassandra, MySQL!), you can access it from Postgres – https://wiki.postgresql.org/wiki/Foreign_data_wrappers

PG has some warts, and the config defaults are really more suitable for a RaspberryPI rather than production use(so people sometimes get the wrong impression out of the box), but it is rock solid and can support so many different use-cases.

It's one of the few pieces of software I could feel confident that, due to the way it's designed, my data will still be there even if someone yanks a server power cord.

Are you running at a scale where a single master DB isn't sufficient?

Because nowadays you can get servers with about 192GB of RAM and 32 CPU cores.

Clustering seems like one of those "what if"[0] scenarios where maybe if you were operating at roflcopter scale you might need it but for 99.9999999999999% of cases, a single master DB is more than enough -- at least with a well designed SQL db like Postgres.

[0]: https://nickjanetakis.com/blog/optimize-your-programming-dec...

As others said, it isn't about scale, but reliability, and even things like patching.

With a load balancer, you can easily patch your web servers, but very often, DB servers go unpatched and un-upgraded for years since they aren't in a self-healing cluster.

To add some detail, I worked on an e-commerce system that was like this. We would've lost something like $60k an hour if we rebooted the database server (yes, one server, nobody expected this sort of growth). So we didn't touch it for 3 years until it was about to run out of disk space. Then we had no choice, but then we set it up as a fully replicated system with a ton of disk space, making future patching easier.

A single master db isn't sufficient for any production app; you should have no single points of failure and that includes having at a minimum 2 db's. Clustering isn't a what if scenario, it's a must have for any serious business, single points of failure are not acceptable.

100% uptime is impossible and the vast majority of companies do not need anywhere near the 99.99% they claim is necessary, nor can they actually pay for it.

A 30-second outage to switch to a replica is perfectly fine for production.

If you have a replica, you have two db's already as I've suggested. And no one said anything about uptime; the point was single point of failure, not uptime. When I said single master, I just meant a single db as opposed to a master/slave setup.

The GP said single master, not single database.

I am the GP, I think I know what I meant better than you, which is why I clarified that in the comment you just replied to.

Because nowadays you can get servers with about 192GB of RAM and 32 CPU cores.

You can get servers with 2T of memory and 96 cores off the shelf for a few years now. Real world workloads that you can't run on these boxes are few and far between. But everyone worries about "scalability" like they'll ever run into those limits.

Or maybe you just want to have a hiccup on a single machine not take down your entire application?

Hot standbys work well in Postgres. As OP said, the issue is clustering, not replication.

It’s not about scale it’s reliability. Servers especially in the cloud can go down pretty quickly. Not just for hardware reasons either.

I can’t imagine any company crazy enough to run one server like you’re describing. Definitely wouldn’t pass any enterprise service continuity testing/review that’s for sure.

Clusters for reliability are very different from clusters for performance, and very well supported by Postgres.

> If they got their clustering story to be as easy as MongoDB's was 5 years ago

You must have been using a different mongodb than I did.

One of the things I liked about Mongo (compared to Postgres) was the ability to configure a list of connected servers, an election pattern for deciding the master, and... that's it. With that, servers on multiple hosts were load-balancing read-only queries, propagating writes, and failing over in a way that worked for us (a few seconds of write failures of the master ever died was fine for the application we were making).

Meanwhile, postgres has this replication scheme available ("Trigger-Based Standby Master Replication") but the steps to implement it were much harder, and required an evaluation of possible replication/failover needs that we didn't know enough to do. It also requires a dive through pages of the documentation at https://www.postgresql.org/docs/10/different-replication-sol... to understand how to implement any of the above strategies.

Postgresql's availability of different replication methods, and the resources available for each, are impressive for sure. But there's something to be said for having Mongo's easily-understandable list of steps to get multiple servers talking to each other right away.

Out of interest, have you tried https://github.com/zalando/patroni ? It seems to fill this niche.

I've been meaning to try it out for the HA postgres use case, but haven't gotten around to it.

On paper. Add that to the end of that sentence, and it will be true again.

I won't speak to data integrity, but as the other poster said, you just had to set up the replication, election mechanisms and you were done.

I bet Citus is like this, but I haven't used it yet.

Couldn't agree more on the extensions part. We wrote a post recently highlighting how extensions have been key into what Postgres has become and is becoming. It got some pickup here on HN, in case you missed it in the initial post you can find it at https://www.citusdata.com/blog/2018/11/27/postgres-more-than....

SSMS is a really good free database management IDE/GUI/tool for MS SQL Server. It can do graphical live query profiling, index management, management of views/stored procedures/triggers/constraints/functions etc.

Postgres currently has nothing comparable. DataGrip, though proprietary, is the closest in terms of functionality, but even it doesn't have the tight integration with the database that SSMS has.

If you've used SSMS, other database GUIs feel underpowered. (Oracle SQL Developer is ugh)

Agreed, I actually really like SQL server especially because of SSMS.

One of SSMS's strengths, as opposed to a generic database GUI IDE, is that it only works with one database: SQL Server (and variants like Azure SQL).

This is important for accessing the specific features that distinguish that database from others, e.g. indexed views, clustered columnar indices, user-defined types, functions, linked servers, replication, HA, etc.

There are so many cool and interesting features in MS SQL Server. I don't go for Microsoft products in general, Microsoft really got SQL Server right.

I use Dbeaver. Its not a ssms, but very handy tool

I've used it too. It's one of the best free tools of its ilk. But yes, it's not SSMS.

While indeed advanced I find it's focus on reliably and ACID are more useful than triggers or extensions. Triggers increase write load and extensions aren't available on all hosting services. Replication, interchangable storage, and standard SQL/PL support are getting better, but still lag competitors like MySQL.

I've a love and hate relationship with triggers, but I just need to say triggers don't have that much overhead if done correctly. 99% of businesses don't have a scaling concern, and triggers will do them well if they use it to capture mandatory rules instead of trying to implement it at the application layer. It's a pain when misused. The key thing is to reason about your user permission/roles correctly so that you don't end up with cascading failures when adding new triggers.

I'd like to see support for computed/derived columns with options to be materialized or non-materialized.

I guess that's coming in version 12 after a few google searches.

The title should include “free” before database, and it’s generally true.

But the article doesn’t really delve into anything advanced at all. Triggers? Please.

Started using PG recently at a new job but can't really see any benefit over MySQL. What am I missing?

The big one I am aware of is transactional ddl, which I believe got integrated into MySQL 8.

If you lose the root password, you can fix it in Postgres without taking the database down.

There are several ways to update the root password in MySQL with zero down-time, either by manipulating the user table, using replication, or using other accounts (ie. most people grant excessive privileges to role accounts.)


Data :)

A fun article about PostgreSQL and fsync():

"PostgreSQL's fsync() surprise"


Anyone else getting a bad certificate when visiting in Firefox?

Check your clock and date. If it's off by a lot you'll get weird cert errors.

Works fine for me - Latest Firefox Developer Edition on Linux

The article doesn't have anything I didn't know about or new but yet another reminder of how postgres is, IMO, the best piece of sw ever created is nice.

I started using Postgresql a long time ago when I tried to do a sub query in MySQL and it didn't support sub queries. Haven't looked back since.

I do not know if this is still the case but the Json support was non existent in mssql 4 years ago and in postgresql is absolutely phenomenal.

One word: Exadata..

Postgres doesn't come close to the enterprise-level features AND support as that of Oracle or SQL Server.

I’m curious, what features are you missing? Support is always going to be a bit different with open source products, but I’m sure there’s companies that’ll give you a support contract worth more than the equivalent from MS or Oracle.

Out of the box clustering with automatic failover and recovery that "just works". It can be a sticking point when selling product which only uses Postgres to some enterprises. They want to start a few instances, put a load balancer in front of at least a couple of them, and then call it a day.

I think the Citus folks would like a word with you. If you are willing fork over what Oracle asks for, you can probably strike a deal with those folks.

Why would Citus folks like a word with someone wanting something that "just works" "out of the box"? It is reasonable to want/expect this feature to be part of the core DB distribution and handled, documented, maintained the same as other core DB features.

There are commercial solutions that provide that, EnterpriseDB will happily sell you licenses for their solution if you don't want to roll your own.

Distributed transactions, raw filesystems, nice graphical admin tooling with reporting, stored procedure development and debugging capabilities (pgAdmin is nice but that is about it).

A few key missing features:

- You can't write off or expense three martini lunches and golf excursions with a PostgreSQL sales rep.

- When you deal with unavailability, you can't point to the PostgreSQL support SLA's four hour response and tell your boss "not our problem".

- It's harder to justify larger budgets for your IT fiefdom and thus your sense of self-importance when you don't pay for unnecessary license fees.

So I know your comment is a bit tongue in cheek, but there are real concerns about the last two points that people may not realize. At many large orginizations the outside vendor may be the _only_ real technical capabilities that team has so there is no solution other then "outsourcing" to the vendor's support team.

Budgets have similar incentives that make teams want to keep a large budget around even if they could get the job done with less resources. That slush money comes in handy to have the vendor do all sorts of work that isn't really about getting their system running ( i.e. doing all the integration with other parts of your stack )

I think lots of open source products or smaller teams think winning those "Enterprise" deals is about having the better product and checking some boxes on a requirements sheet. When you get into enterprise its all about the relationships

> You can't write off or expense three martini lunches and golf excursions with a PostgreSQL sales rep.

That's one of the Oracle features supported by EnterpriseDB’s proprietary downstream version.

> When you deal with unavailability, you can't point to the PostgreSQL support SLA's four hour response and tell your boss "not our problem".

Also, this, but it is a 24 hour resolution window for Severity 1 issues.

> It's harder to justify larger budgets for your IT fiefdom and thus your sense of self-importance when you don't pay for unnecessary license fees.

EnterpriseDB addresses this, but certainly doesn't usually match Oracle in the budget impact department.

I know your post is sarcastic, but some of these (eg. 2) are real concerns for large enterprises. Enterprises want not just a product but a solution and that includes support and all that comes with it.

You can pay for Postgres support as well.

Sometimes it’s clustering, upgrading servers, replication layouts that work and don’t have random issues with WAL, encryption at rest without requiring the os to do it (has this been fixed?).

Maybe having clustered indexes, good partitioning, etc.

Still, I’m hard pressed to ever recommend anything other than PostgreSQL or MySQL.

What features of Oracle or SQL Server would you say it is lacking?

Regarding support, there's plenty of fantastic companies providing support and consulting services for PostgreSQL and having used plenty of commercial databases I know of know no other database that has as vibrant a community providing amazing free support as well.

SQL Server has a column-oriented index type. Oracle has a thing that lets you query based on what the data was at a particular (recent) timestamp. They both have lots of useful features like that.

However from my experience with Oracle and from what I've heard about their source code, its fundamentals are just disgusting. Decades of haphazardly piled-up features and fixes means that if you're the lucky customer to find a new bug, good luck understanding what the heck is going on by yourself.

I'd like to think MSSQL is in a bit better shape, but they have lots of the same incentives so who knows.

Oracle is like America. Its not all the same. The database itself is now one of the most rock solid pieces of software ever built. Other products - yeah, less so.

I think this is perhaps referencing a recent HN comment about the code quality of Oracle DB: https://news.ycombinator.com/item?id=18442941

Ha, that's pretty interesting. Not too sure how much credit to give it though based on this:

> It takes 6 months to a year (sometimes two years!) to develop a single small feature (say something like adding a new mode of authentication like support for AD authentication).

There's no way in the world I would have imagined that adding AD authentication to any database would be a "single small feature".

For a long time Postgres was the quintessential example of how open sources projects are driven by developers who love to add all kinds of cool, whiz-bang features... but there wasn't boring enterprise features like incremental backups because that's no fun to write and test. But that characterization is long out of date. I think.

plsql, packages and 'as of timestamp'

> What features of Oracle or SQL Server would you say it is lacking?

An ass to drag onto the carpet. In enterprise speak, "support" means "legal liability". You have to be able to sue someone if it fails and loses your company money. This was taught to me by an old IT hand back when I was a young naïve Linux kiddie "you have to be able todrag somebody's ass onto the carpet when the shit breaks".

Oh yes, and there's also brand recognition and a large pool of certified Oracle or SQL Server DBAs to draw from when making hiring decisions. Yes, technical people will know that Postgres is solid and will know how to spot good people to maintain a Postgres installation -- but the person signing off on the IT budget is a total normie. He will smile when he sees familiar names and make a lolwut face when he sees postgres.

Relatedly, Postgres doesn't have a yacht. Oracle's yacht won the America's Cup.

I’ve worked for dozens of enterprise companies. Support means support.

It means that anyone in the company can ring a vendor 24/7 and get the best quality help on the product. It means your outages are smaller and less frequent. You usually can’t get that with a generic open source product or from smaller companies and it’s especially an issue issue in non-US countries where it’s often impossible to get decent support.

I’ve never once seen a company sue a vendor or vice versa.

Including audits with threats to pay ever increasing amounts every year?

Ouch. Also add in that the budget changes dramatically in year two...

Every enterprise product I’ve ever seen has different prices for each year.

And if you can’t handle this I think you have bigger problems to worry about.

>And if you can’t handle this

What a retarded position to hold.

You have forgotten what the purpose of these providers was in the first place.

"Company X provides service/software Y so you can do business"

But that's not what they do.

"Company X optimizes to extract as much rent as possible from its client in order to show ever increasing amounts of profitability to the stock market"

You do know that most vendors offer increasing discounts in later years of the contract. It’s not just that prices increase.

And I’m guessing you’ve never dealt with a vendor before because often companies negotiate better terms than list price.

Of course, when managing my product or running my business, chasing discounts for and (re)negotiating with the vendor of one of the core components of my product/service are things I very much look forward to.

My mistress is retarded and I find this comment very offensive.

Why is flagging the fact that the term retarded is offensive to some people getting downvoted?

Maybe I've just seen the inside of too many "enterprise" companies. But I suspect the most important enterprise feature is "Supports the architecture that we came up with 20 years ago and that was the hot choice when the CIO first started coding."

Buying Oracle or SQL server is often more about compatibility with other products or specific features.

And CIO/CTOs rarely unilaterally make decisions on technology choices. You have Enterprise Architects for that.

Sure, but I think it's a CIO's job to push an organization forward in technology. But I have seen plenty of places whose current reason for using Oracle is, "Welp, that's what was hot when we got started and now it's baked in to our architecture." And that's the same reason I've seen older "enterprise" companies keep their mainframe stuff going.

In contrast, I recently heard about a company with a market cap in the tens of billions where the execs said, "By date X, we won't own hardware or run datacenters anymore." I'm told it forced a lot of teams to reevaluate fundamental technology decisions made decades ago.

So sure, the decisions are proximately about compatibility or specific features. But I think it's reasonable to ask why those specific features are necessary. Because places like Google and Facebook make it clear that it's not scale alone that makes it necessary to give Oracle a truck of cash every year.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact