Hacker News new | past | comments | ask | show | jobs | submit login
New Features Coming in PostgreSQL 10 (rhaas.blogspot.com)
517 points by ioltas on April 8, 2017 | hide | past | favorite | 135 comments



This bit about ICU support v.s. glibc:

    > [...] Furthermore, at least on Red Hat, glibc regularly whacks
    > around the behavior of OS-native collations in minor releases,
    > which effectively corrupts PostgreSQL's indexes, since the index
    > order might no longer match the (revised) collation order.  To
    > me, changing the behavior of a widely-used system call in a
    > maintenance release seems about as friendly as locking a family
    > of angry racoons in someone's car, but the glibc maintainers
    > evidently don't agree.
Is a reference to the PostgreSQL devs wanting to make their index order a function of strxfrm() calls and to not have it change when glibc updates, whereas some on the glibc list think it should only be used for feeding it to the likes of strcmp() in the same process:

    > The only thing that matters about strxfrm output is its strcmp
    > ordering.  If that changes, it's either a bug fix or a bug
    > (either in the code or in the locale data).  If the string
    > contents change but the ordering doesn't, then it's an
    > implementation detail that is allowed to change.
-- https://sourceware.org/ml/libc-alpha/2015-09/msg00197.html


Florian Weimer's reply is also interesting:

"Why do you think that? I don't see this documented anywhere, and I doubt it is something many readers of the C standard, the man page, or the glibc manual would expect.

The manual suggests to store the strxfrm output and use it for sorting. I expect that some applications put it into on-disk database indexes as a result. This will lead to subtle breakage on glibc updates.

(The larger problem is that there are definitely databases out there which use B-tree indexes in locale collation order, which break in even more subtle ways if we make minor changes to the collation order.)"


Which manual suggests storing the output of strxfrm? The glibc man page doesn't seem to.

I don't know that this is resolvable. The documented behavior of strxfrm() is just about its output properties. Improvements to the transformation algorithm would be expected to be made, if it's improvable.

If a database needs this to be static over time it needs to pick a particular transformation algorithm and specify it exactly, not just rely on whatever the C library happens to provide.

I mean, not only are PostgreSQL locale-sorted-indexes not portable across glibc releases. They aren't portable across any other system change either. No moving between distros or doing distro upgrades, etc... Those are all misfeatures probably worth fixing.


BTW, the new amcheck tool, available in Postgres 10, lets you validate that an index is consistent with its designated sort order (B-Tree operator class). At least you now have some way of detecting the kind of inconsistency you describe.

I wrote amcheck, and maintain a version targeting earlier releases of PostgreSQL on Github: https://github.com/petergeoghegan/amcheck


This somewhat mixes two issues, really.

Firstly, there are issues with glibc versioning, when the collation rules change even without proper version bump. I don't have a link to details at hand, but I've heard a number of complaints about it over the years from various PostgreSQL hackers, including Peter Geoghegan who did a lot of sort optimizations in 9.5 & 9.6. I'm sure there's something deep in the PostgreSQL mailing list archives. You'll have to dig deep, because we learned the lesson years ago.

Regarding strxfrm(), PostgreSQL certainly does not store the results anywhere - it's only used during the actual sort, e.g. during CREATE INDEX or to help with ORDER BY in a query, and then thrown away. So either I'm forgetting something, or Robert mixed this up in the blog post.

There's however an issue with strxfrm() disagreeing with strcoll(), which is a major problem, of course, because we build and index using strxfrm() and then fail to find the values in it using strcoll(). For more details, including links to pgsql-bugs and RH bugtracker, see https://wiki.postgresql.org/wiki/Abbreviated_keys_glibc_issu...

ICU seems not to have these issues, because it has better and more carefully maintained versioning scheme. Perhaps we'll still get broken indexes after an ICU upgrade, but at least we'll know about it.


PostgreSQL never stored strxfrm() output in indexes, but did at one time use it to build indexes (that is, strxfrm() was used to build values when sorting). However, it would be nice to be able to store strxfrm() output on disk, in indexes, which as you say leaves the implementation sensitive to changes in the transformation algorithm, and not just the underlying behavior defined by a collation.

Fortunately, it is possible to do this safely with ICU, which explicitly decouples implementation issues from behavioral issues. Maybe this will happen at some point in the future. For now, abbreviated keys are only used for sorting to build an index, and not in any subsequent index scan.

If you want much faster text sorting back, using "abbreviated keys", you can use ICU collations in Postgres 10. With the C standard library, strxfrm() was disabled when it was found to be buggy within glibc [1].

[1] https://wiki.postgresql.org/wiki/Abbreviated_keys_glibc_issu...


> I mean, not only are PostgreSQL locale-sorted-indexes not portable across glibc releases. They aren't portable across any other system change either. No moving between distros or doing distro upgrades, etc... Those are all misfeatures probably worth fixing.

Hence ICU support, which allows to address these (because collations are properly versioned).


Well what do you expect?

Patch releases are for bug fixes. If you can't handle any change in behavior, including a bug fix, then you shouldn't be upgrading.

I understand the problem, and kudos to Postresql for figuring out a solution, but railing on glibc for fixing bugs in patch releases makes about as much sense as breeding raccoon families to chuck into people's cars.


Ok, I'm not a database manager for enormous projects, so these changes may be great, but I don't understand them and don't care about them. Postgres is already the most awesome thing in Earth to me.

Still, if my opinion counts I think SELF-UPDATING MATERIALIZED VIEWS should be the next priority.


The work that has been done on transition tables is intended to enable future work on automatically updated materialized views; the idea is that the system will automatically derive a query to update the view based on the deltas between the set of old rows and the set of new rows. That will take more work, though. I do agree it would be valuable. It's possible to set up similar things by writing your own triggers, and having transition tables available in PL/pgsql will make it easier, but it's not necessarily easy to figure it all out by hand for a complex view involving joins and aggregates.


I wonder why not to go for a changelog-based implementation. Instead of modifying the materialized view directly, write the changes into a changelog, and then update the matview in the background. More efficient, less locking issues, etc.


How do you know when the change will affect the matview? I don't know the syntax for creating it off the top of my head, but imagine a materialized view limited to the top ten records of a table.


I assume there will be some limitations what matviews can be updated automatically/ incrementally. E.g. updatable views (regular, not materialized) have a range of such restrictions, see: https://www.postgresql.org/docs/devel/static/sql-createview....

Hopefully, some of those restrictions will disappear, because matviews are often used to store pre-aggregted data, so the restriction on GROUP BY would be unfortunate.


Thank you.


Some work started in this direction. I didn't follow closely the whole thread but I don't think that got commited in PG10.

https://www.postgresql.org/message-id/flat/20170119213859.GA...

More info in the EDB roadmap: https://wiki.postgresql.org/wiki/EnterpriseDB_database_serve...

Postgres has been amazing in shipping the foundation required to deliver complex feature.. Logical Replication is an example of it, all the piece commited in the last 6y allowed to make this patch achievable.


How do you mean? Couldn't you use a trigger to update the view?


IMHO triggers are almost always best avoided. There are some exceptions but most of the time you want changes to be explicit not happening "by magic" behind the scenes as a side-effect of something else.


Triggers on materialized views are really error-prone and tedious. It's cache invalidation, which is hard.


Effectively yes, but if it's that simple why not make this a built in functionality? Other DBs have it.


A trigger on what? Every update, insert, delete, etc.? On every table in the view?

Even if that is possible, it may be a major performance killer. This has to be done internally, I think.


I am not sure it necessarily would be that bad. After all foreign keys are implemented with triggers and they are usually fast enough. You just need to write trigger functions which are fast enough.


It is possible, that's how I do it, and it doesn't really affect performance at all. You just don't do it real time, the triggers just kick an id into a queue and you have a background task work the queue updating the materialized table from the view it's a copy of just for the id's in the queue.


So what would make the MV self-update?


I deeply appreciate the great care that Postgres committers take in writing their merge messages.

I think of it as a sign of respect for future developers to take the time to write a clear account of what has happened.


Postgres is one of the few projects that still use a strict patch-oriented development process that's based almost entirely around mailing-list communication.

While core team members can commit directly to the repo, everyone else must submit the code changes for review to the pgsql-hackers mailing list as a clean, self-contained patch, where it's discussed and considered for inclusion. An accepted patch might be committed right away, or it will be queued up for the next scheduled "commitfest" [2], when patches are reviewed and finally committed to mainline. (I don't know how the commitfest interacts with git exactly; the commitfest database doesn't even link to git, only to email discussions.)

From the outside it seems a bit antiquated, but it's apparently been working well for them. The Postgres team is a pretty conservative bunch; they only switched from CVS to git in late 2010, for example.

They also really care about code quality, getting the design right early, and covering all possible edge cases. As a result, Postgres solid, clean, has unusually few legacy oddities, and almost never any subtle, suprising breaking changes. If you read the MySQL manual, it's absolutely littered with sloppy little breakages throughout its history: Like how, until 5.0.something, when comparing a "date" value with a "datetime" value, the time portion would be silently ignored and ('2017-04-08 14:04' = '2017-04-08') would return true; but they fixed that, and broke a lot of client code because they didn't stop to realize that a lot of developers depended on that behaviour.

[1] https://wiki.postgresql.org/wiki/Submitting_a_Patch

[2] https://commitfest.postgresql.org


> If you read the MySQL manual, it's absolutely littered with sloppy little breakages throughout its history: Like how, until 5.0.something, when comparing a "date" value with a "datetime" value, the time portion would be silently ignored and ('2017-04-08 14:04' = '2017-04-08') would return true; but they fixed that, and broke a lot of client code because they didn't stop to realize that a lot of developers depended on that behaviour.

This is an interesting comment for two reasons. Firstly because a lot of people also complain about MySQL's archaic defaults which often stay too long because of upgrade concerns (though they fortunately are fixing a lot of them already or for MySQL 8.0 - hooray).

But also because it speaks volumes, in my opinion, about the MySQL documentation that these are documented in the first place. I worked at MySQL for 9 years and though it was always clear our manual was always a good source of information, now that I am working on Ubuntu & OpenStack it is painfully obvious just how good the MySQL documentation team and processes were compared to many other projects. Even just the version ChangeLog.

I'm not saying other projects don't get it right (and have no opinion at all about postgresql's documentation state), but MySQL seems to get it pretty right in general.


But MySQL's problem is that those "archaic defaults" shouldn't have happened in the first place.

No matter how good MySQL is good at documenting their bugginess, weirdness, flakiness and overall history of legacy warts, the point was PostgreSQL eliminates pretty much all of this through promoting, at every single stage of the development process, the same strict, uncompromising QA principles. By doing this, legacy behaviour generally disappears.

Postgres isn't bug-free, of course. But they take great care to not be continually chased by a tsunami of technical debt. If you build a house on a crappy foundation, you get a crappy house, so it's a good idea to spend time on the foundation before building the house. The Postgres team spent years on the foundation, before building the higher-level parts, and the rewards are obvious.

Meanwhile, MySQL has spent years and years slowly mopping up tech debt. Things have gotten a lot better with the tighter semantics, such as preventing "February 29th" from being inserted, or rejecting "0000-00-00" as a date, or silently ignoring data coercion errors, and so on. But those things shouldn't have happened in the first place if the developers had been better at QA. So while you're right that the documentation is decent at describing various legacy semantics, it's also encodes a history of carelessness that's rather embarrassing reading.

By the way, date/datetime regression I mention isn't in the documentation at all. It's in their bug tracker.


Linux kernel is another such project, I think.

One of the reasons why it's done this way (through mailing lists and not e.g. through pull requests on github) is that all the history is tracked in a way that's fully under control of the community. So it's fairly easy to find who/when submitted the patch, how it looked like, etc.

Of course, another reason is history - most of the process was established long before git, when CVS was the VCS.


Wine is another such project :)

But I gotta say it's only working for those projects because they have an extremely high barrier of entry to the code itself in the first place (working on projects like Wine and Postgres is scary, even though you can get started with easy stuff).

It also works for them because they have maintainers and core committers used to the workflow, already tooled on the workflow etc. But I wonder how much productivity would be gained by using a github-like flow maybe enhanced a bit.


In my opinion github isn't quite ready for more complex things:

- Lengthy reviews yield PRs that measure many megabytes, become unusably slow

- There's no reasonable way to do PRs based on other PRs, without closing and reopening then when the underlying branch is merged

- large changes can't even be displayed

- doesn't work offline. I do a fair amount of review while traveling, and not being able to do reviews while offline would prevent me from doing so

Then there's political issues: there's absolutely no guarantee github will be around in a few years. Postgres is old, and will hopefully live long into three future. When it started there was no sourceforge, and now there's again no sf. I very regularly look back into old development discussions...


Yeah absolutely, hence why I said github-like. What I mean by that is something more modern, that supports such review style. If only Github supported mailing lists...


There's still the aspect of handing over the control to a third party that may disappear next month, discontinue the features you rely on, etc.

FWIW, I don't think the workflow is a major barrier. Do 'git format-patch' and submit it to mailing list is about as simple as it can be. It's all the other stuff (complex code-base, going through review, ...) and fairly high bar for quality that makes it challenging.


Git itself is another such project.


Being one of them, though not a native speaker which is more than sometimes noticeable, I'd not even describe it as caring for future developers. It's self-care. I've spent enough time staring at code changes made long ago, trying to understand the reasoning, that providing enough context for my future self is justification enough.


I agree with that sentiment. I consider my future self to be an example of "other developers".

When I am working with peers on writing a commit message, I sometimes use the analogy of a newspaper. Any given newspaper is out of date very quickly. But we keep newspaper archives and store copies of every single newspaper.

Why? Because we don't know when we will need to refer to them, or which ones we will refer to. All that we know is that some of them will vital in future, as the journal of record.

And so it is with commit messages. We owe readers the courtesy of explaining our thinking.


Even a single feature from the list would make 10 an amazing release, all of them together is just unbelievable. Very happy we are using PG :)


I'm so excited for table partitioning. I use table inheritance in several places in my current project, but have felt the pain of foreign key constraints not applying to inherited children. Reading about table partitioning, I'm realizing that this is a much better fit for my use case.

Postgres continues to amaze me with the speed at which they introduce the right features into such a heavily-used and production-critical product. Thanks Postgres team!


Unfortunately, foreign keys won't be supported right away.

Read about the new feature and its limitations here: https://www.postgresql.org/docs/devel/static/ddl-partitionin...


Thanks, I hadn't read this. That's too bad, hopefully we'll see that in the future (if it's technically possible at all?). That'd be a huge feature for me.


How is Postgres so consistently the best open-source DB project from features to documentation? It's unreal.


Not just a DB project, either. I'd say it's one of the best executed (in a very broad sense of the word) open source projects around, in general.

From end user perspective, they have stable, quality releases with a predictable cycle and subsequent maintenance releases. They have great documentation - one of the best in the industry, much less open source. Things generally work as you'd expect them to, and when not (e.g. for historical or implementation reasons), you have clear and convincing explanations. And so on.

I haven't seen their developer side, but based on other people's feedback, it's also good - high quality bar for code, stringent review process etc. More importantly, they seem to be making the right (= leading to more stable quality releases with great features) technical decisions consistently, which to me is a hallmark of a very well run team.

I also can't remember any publicized "drama" around Postgres, either on the inside (dev disagreements etc), or between the team and the users. It looks like everyone's happy, or at least happy enough.

I don't know what the magic sauce is here, but it feels like many other open source projects could learn a lot from the Postgres team and community.


I second this, the Postgres contributors are consistently setting the bar for the rest of OSS projects out there, it's consistently been my favorite part of the stack for a long time now.

Don't forget about their phenomenal #postgresql channel on Freenode. The folks working on Postgres have been gracious enough to patiently answer my not always fully baked questions for the past 5 years on there, they're a bottomless treasure trove of best practices and pragmatic advice.


I'd say it is the best DB hands down. Open source or otherwise.


Will DDL replication for the logical replication be landing in 10 or later?

We have some use cases where logical replication would be very helpful, but keeping the schema in sync manually seems like a pain - will there be a documented workaround if DDL replication doesn't make it in?


I did read the article, but I can't find any mention of addressing the "Write amplification" issue as described by Uber when they moved away from postgres. https://eng.uber.com/mysql-migration/ I had heard talk on Software Engineering Daily that this new major revision was supposed to address that.

Is this issue resolved by the new "Logical replication" feature? It doesn't seem directly related, but it seems like maybe that is what he is referring to in this blog post?


There's a patch reducing write amplifications (when caused by indexes), by a significant degree. Unfortunately it didn't quite get ready in time for the feature freeze of 10 - as it affects the on-disk format, we considered the risk to be too high.


As the author of the patch I don't quite agree to it. But it's true that the patch did not receive adequate review even though most of the on-disk changes were known and coded at least 7 months before the feature freeze. So it's hard to tell which part of the patch wasn't ready. But there is always next cycle. So lets work towards getting it ready for v11.


Write amplification is a result of PostgreSQL's decision to not used clustered indexes, there's not much that can be done to avoid it without a massive redesign of the storage engine - though there are patches out there to reduce the penalty in some cases. In all reality though, Uber wanted a key-value store and not an RDBMS, MySQL was a better choice for this since InnoDB isn't much more than a fast K/V store (hence why MySQL uses clustered indexes).


> Write amplification is a result of PostgreSQL's decision to not used clustered indexes, there's not much that can be done to avoid it without a massive redesign of the storage engine

I don't think that's entirely accurate - the issue is more that indexes contain pointers to the heap position (simplified) of a tuple, rather than being indirect and pointing to the primary key, which then is resolved by another index (be that clustered / primary or not).

Updates already don't have to update indexes iff none of the indexed columns change (HOT - Heap-Only-Tuples). The proposed change (WARM - write amplification reduction method), allows to avoid updating indexes on non-changing columns, even if other indexes change.

https://www.postgresql.org/message-id/CABOikdMNy6yowA+wTGK9R...

> In all reality though, Uber wanted a key-value store and not an RDBMS

Agreed on that.


Not arguing with your assessment of Uber's requirements; but in general, why do you view InnoDB as not much more than a K/V store? And why do you equate clustered indexes with K/V stores?

InnoDB is a complex piece of software, supporting transactions, row-level locking, MVCC, schemas, secondary indexes, crash recovery, hot copy/backup, complex caching and buffering, many tunables, and extensive metrics visibility. Just because it's more appropriate for Uber's rather unusual EAV-like use-case, this doesn't mean InnoDB is a glorified K/V store.

Re: clustered indexes, it's a storage engine architecture choice with well-known trade-offs, both positive and negative. SQL Server also uses clustered indexes and is widely respected among database experts.

Regarding the topic overall, there are use-cases where Postgres is the best choice, and there are use-cases where it isn't. That doesn't inherently mean that other databases are uniformly worse. People like to trash MySQL, sometimes for completely valid reasons, but other times for FUD. But fwiw, several of the major features in Postgres 10 have already been supported in MySQL/InnoDB for a long time, in some cases for over a decade. Of course, that goes both ways; there are awesome major features that Postgres has had for a decade that MySQL still lacks.


Clustered indexes aren't an all-or-nothing choice. SQL Server allows heap tables without any clustered indexes, tables with a clustered index on the primary key, and tables clustered on a secondary index with the primary key as a non-clustered index. It is really nice to have all three of those options available.


> massive redesign of the storage engine

Have the Postgres thought about adding support for more than one storage engine? Then they could implement new ideas in a fork, an one could run them side-by-side and migrate over to it.

https://www.postgresql.org/message-id/4CB597FF.1010403@cheap...

For example MySQL had been mocked for its old ISAM storage engine. Then MySQL added InnoDB as another storage engine, the SQL interface is the same.


Pluggable storage engines for databases don't work that well in practice. Either you end up with the MySQL situation where the storage engine is so dumb that you can't push any smart optimizations into it (making having pluggable engines moot in the first place), or you have to write such a large interface that it's not worth providing.


That depends on what you mean by pluggable storage and how it's implemented ...

For example PostgreSQL supported custom index access methods, which you might see as a custom storage format (although only for secondary storage). You had to modify the source code and rebuild PostgreSQL, but there was a fairly clear separation / internal API that allowed that. Since PostgreSQL 9.6 you can do that without the custom build (i.e. you can create a new index in an extension and use CREATE ACCESS METHOD to plug it into the server).

We don't have such clear internal separation for the primary storage, but it's hard to ignore the possible benefits of alternative storage types. Another reason is that we're constantly scavenging for free bits in various places (e.g. flags in tuple headers needed by new features etc), and allowing multiple formats would help with this by supporting "old" and "new" table format. So it'll likely follow what happened to indexes - build a clear internal API, allow multiple storage formats internally, eventually make it usable from extensions.

(These are just my personal opinions, of course.)


Yes. But it's not a settled matter. The storage engines in mysql didn't work out that well, each of them duplicating a lot of work, having significantly different behavior, ...


Can anyone recommend a decently up to date book on postgres administration? Or are docs really the only way? I've used SQL Server for years but would likely choose postgres for an independent project if I intended to commercialize it. That said, I don't use it at work so it's hard to get in depth experience.


I'm not familiar with any books, but the docs really are excellent and have various sections for beginners and getting to know the system.

A good way in is to look at external tools like barman which manage dumps+streaming replication along with point-in-time restoration automatically for you rather than manually invoking all the stuff directly.

Mostly, postgres just works.


There are no better docs than Postgres docs.


I understand your need for a book. I prefer to read a book when diving into a new tech. That being said, when I started out using PostgreSQL years back, there were only online docs, and I must say they are although lengthy at time, very good.

Also, pgAdmin?


There's PostgreSQL 9 Admin Cookbook from Simon Riggs, for example (disclosure: I work for Simon).

Packt has several other good books about PostgreSQL, but always check the author - they started publishing books authored by people entirely unknown in the community, that are "inspired" by book published before (you might also use "plagiarism" instead).


Yeah packtpub is a real crapshoot. They're great in that they'll seemingly publish whatever tech subject you want to write about. The downside is they publish anything...


Yeah, although there's a difference between "publishing whatever" and "publishing books that copy from other books".

Ultimately it's not the publisher but the author that matters, I guess.


I just remembered there's also "PostgreSQL: Up and Running" published by O'Reilly. It deals with more stuff than just administration, but Regina O. Obe and Leo S. Hsu are good authors.


Extended Statistics! I was following the replication changes, but have just discovered the extended statistics and am more excited about them.

The directory renaming at the bottom of the post is interesting - I wonder if many other projects have to do things like this?


> The directory renaming at the bottom of the post is interesting - I wonder if many other projects have to do things like this?

The background is that, over the years, a number of people deleted the pg_xlog and pg_clog directories when they noticed they're running out of space, thinking it's just server logs. Unfortunately that's the directories containing the database journal, and transaction status (committed/aborted/in-progress). Which means they'll loose data. The idea is to rename them to something that's less likely to be mistaken for unimportant data.


To be fair, the extended statistics available in PG10 are about the most primitive ones possible. If the PG10 stats help with your queries, great, but otherwise it's mainly laying infrastructure for the more advanced stuff - histograms, MCV lists, expression statistics, and ultimately join statistics (which is about the main source of estimation issues). Oleg also mentioned it might be useful for JSON statistics, which would be cool.

Hopefully those bits will get in faster - I first submitted the patch in 2014, just before pgconf.eu, I think. OTOH I can't really complain, because I'm shitty developer so the initial versions were far from committable. The quality requirements for PostgreSQL patches are damn high these days.

BTW if you have examples of real-world queries hurt by poor estimates, report them to pgsql-performance mailing list. It's an important piece of information about what cases to look at first. Obviously, we already have already collected various queries, but having more is good.


Having good directory names helps a lot; I'd be very surprised if other projects didn't also need to make it clear to admins what's happening.

(On the other hand, some projects need to make things _less_ clear - https://github.com/mackyle/sqlite/blob/3cf493d4018042c70a4db... - "users would (...) call [the developers] to wake them up at night and complain".)


It would be great if some Linux distros clear up the directory mess. There are directories in there with names that no ones remembers what they originally meant in the UNIX of 1970s, or what ever. For compatibility they could be just hard/soft-links to a more sane directory structure.

Well the same goes for Windows. With Win95, WinNT 3.5, WinXP, WinVista they restructured the internal directory tree and renamed things. It was okay with WinXP, just the long user folder was trouble some because of 260 chars MAX_PATH limit. But with Vista and 64-bit support the fucked up and it's now a big mess in Win7+ (syswow64, system32, registry, winsxs, dotNet folders, ... such a big mess and sometimes also waste of HDD space by duplicates of files).


winsxs uses hard links - space wastage is more likely from more versions than just dupes. Also, many windows tools won't account t correctly for hard links in disk usage stats.


The feature I'd really love is master selection with Raft or similar and automatic query redirection to the master for all write queries (and maybe for reads with a query keyword).

That would make it very easy and robust to cluster pg without requiring a big complicated (a.k.a. high admin overhead and failure prone) stack with lots of secondary tools.

This kind of fire and forget cluster is really the killer feature of things like MongoDB and RethinkDB. Yes people with really huge deployments might want something more tunable, but that's only like 1% of the market.

Of course those NoSQL databases also offer eventual and other weaker but more scalable consistency modes, but like highly tuned manual deployment these too are features for the 1% of the market that actually needs that kind of scale.

A fire and forget cluster-able fully consistent SQL database would be nirvana for most of the market.


About redirection of write queries to the master, from 10 on, you will be able to specify all members of the cluster in the connection string and demand to connect to the master (like "postgresql://host1:5432,host2:5432/somedb?target_session_attrs=read-write"); libpq will do this automatically for you then, see the parameters "host" (now plural) and "target_session_attr" in section 33.1.2. here: https://www.postgresql.org/docs/devel/static/libpq-connect.h...

About raft-based leader-election, I believe the current recommendation is to look at patroni ( https://github.com/zalando/patroni), which has been built for docker and is now being integrated with Kubernetes; however, I don't think there is an inherent limitation that it couldn't be run on bare-metal.


With 2ndquadrant working on Postgres-XL (http://www.postgres-xl.org/), I think that you can be confident that you will see a lot of the features being proposed to core postgres. It will just take some times to build the building block necessary like: global index, distributed sequence, repartition ...

I quite confident that the postgresql from 5y in future will be quite different in term of storage / server topology support. I won't be surprise pg_bouncer capacity to finally make its way to core when we have a coordinator.

Postgres has steady progression (even if not fast enough for some people) but they are moving without compromising robustness of their product for the users.


Yes on the first one, and a big nope on the second for now. I passionately loathe Rube Goldberg machine deployments and am the kind of engineer who constantly asks "do we really need that?". I love exterminating complexity. But maybe that will change when we get to millions of concurrent users and tens of millions of devices and actually need Kubernetes to scale.

Raft is not complex. I doubt leader elect would be terribly hard to implement.


Can't pgbouncer/pgpool2 solve query redirection? I don't understand the desire for all-in-one solutions.


That desire comes from three places:

1. Minimize cognitive load by minimizing the number of things you have to learn.

2. Minimize deployment complexity and dependencies.

3. Complexity is just evil I'm general. Linear increases in complexity result in exponential increases in bugs, vulnerabilities, and failure modes. It's just combinatorics.


patroni builds upon etcd and (optionally) haproxy, two rather mature pieces of infrastructure (which can both be made HA on their own if SPOFs are to be avoided).

I understand where you are coming from, but having this kind of multi-server-who-is-master knowledge baked into Postgres itself will surely take another couple of releases, if it will be included at all.

Probably BDR (bi-directional replication, master-master like logical replication) will be there first, but whether the question is whether it will help a lot for local scale-out workloads (as opposed to glueing two datacenters together and allow transactions on both sides).


Number 2 is the only point I see merit in. With respect to 3, isn't that an argument against increasing the complexity of postgresql?


> I don't understand the desire for all-in-one solutions

You really don't understand it? It's less moving pieces to think about, worry about, read about, deploy, maintain, fix and almost always leads to better performance and security.

Something as simple as connection pooling should've already been part of the database and query redirection is even more important to have included.


Is it actually less moving pieces? The sum seems about the same. Your statement seems like FUD.

The unix philosophy of focusing on a single problem and solving it well seems to do just fine. I don't see why baking in featuress that the vast majority of users will never need is a good investment. If anything, it seems like it will only increase the complexity of postgresql. The protocol is well documented and open. Libpq provides all the functionality necessary. Why not bake the higher order logic into a discrete component?


That philosophy is just that - a philosophy that sounds nice and applies sometimes. It's not a rule.

The deployment and upkeep of components is a major hassle and yes, having it all bundled into the same running application (which is the only thing it can be used with anyway) decreases the amount of stuff the end-user has to worry about.

Scaling connections reasonably is something basic any database should handle natively - or better put hitting connection limits without yet another program to help is rather silly in 2017.

You can make the same argument about replication but half this thread is about logical replication being moved to core so we can get scalable distributed clusters in the future - something even fewers users would use but clearly has many benefits by being included in the main release.


Dumb question: does declarative partitioning pave the way for native sharding in Postgres? I'm not super super familiar, but it seems like along with some other features coming in Postgres 10, like parallel queries and logical replication, that this is eventually the goal.


I hope that it will have that effect. We need a few other features first: partitionwise join, partitionwise aggregate, asynchronous query, and ideally hash partitioning.


Doesn't this basically replicate all of the work done by Citus?


I see -- thanks! Really cool stuff.


Wow, so awesome. I do hope at some point we can see some language improvements to PLPGSQL. More basic data structures could go a long way in making that language really useful, and I still consider views/stored procedures a superior paradigm to client side sql logic


I agree with you that stored procedures are superior to client-side logic, because it means that you can have multiple routes of access to the database and all of them enforce the same business logic. But what exactly do you mean by "more basic data structures"?


PL/SQL has various types of collections, for example, that are super-useful when you need to do more complicated processing without having to create temporary tables and such.


What's the ops experience for a replicated setup like these days? i.e. assuming you want basic fault-tolerance at non-exotic size / activity levels, how much of a job is someone acquiring if, say, there are reasons they can't just use AWS RDS?


Streaming replication isn't hard at all: http://davide.im/setting-up-a-failover-database-for-postgres...


It's not hard to setup initially, but I'll admit that it's not very good.

It's not very good in a long-lived scenario where you're changing your replication topology for routine maintenance tasks. Changing from master to replica is easy, but now you have to rebuild that original master off of the former replica now. Completely start over. You can't just start up again from a given transaction ID. MySQL's GTID implementation is much better in this regard. You can change masters and replicas all repoint them without rebuilding. You can't do that (currently) with Postgresql. It's a major pain point.


> Changing from master to replica is easy, but now you have to rebuild that original master off of the former replica now. Completely start over. You can't just start up again from a given transaction ID. MySQL's GTID implementation is much better in this regard. You can change masters and replicas all repoint them without rebuilding. You can't do that (currently) with Postgresql.

Have you heard of pg_rewind? https://www.postgresql.org/docs/current/static/app-pgrewind....


I had not. Looks like it requires 9.5 or later? We're running 9.4 so we'll have to upgrade to use it. Thanks!


You can get pg_rewind for 9.4 (and 9.3 in its branch) here:

https://github.com/vmware/pg_rewind/tree/REL9_4_STABLE

It's from the people who wrote it upstream, they provide the code there for earlier Postgres releases.


Does this help at all with upgrades? Upgrading from 9.4 to 9.5 means you need to rebuild your entire replication topology because the master's identifier has changed (in the initdb step of the official docs).


Yes but here's the problem. Consider common scenarios like:

Master goes down. Slave takes over. Master comes back. Slave goes down 10 minutes later. Repeat.

This is common in e.g. multi data center replication and is often due to transient network failures. Netflix has a great open source tool called chaos monkey that can induce lots of random failure scenarios like this or much worse. Don't get me started on transient partial failures due to latency and packet loss spikes.

The manual nature of pg replication setup makes me really nervous here. What happens when it finds itself in a state where manual intervention is needed? You are now down.

This is tolerable for big companies with dedicated SREs and DBAs and enough of them that it's easy to always have someone on call, but it's a nightmare for smaller ventures. Even for larger ventures this adds a lot of cost overhead.

Like I said elsewhere this was really the true killer feature of the more successful NoSQL document store type databases. Everything else was largely hype.

We switched recently to RethinkDB for this reason. We miss the richness of SQL (to the point that we still use PG too for warehousing and analytics) but in return we got incredible robustness across three data centers. Of course our app does not need rich queries or strong consistency 99% of the time so YMMV. For some jobs ACID and complex queries on live data are not optional.


Note that starting from Postgres 10 (which this thread is about), you don't need to adjust wal_level and max_wal_senders (or max_replication_slots, for that matter) anymore. You still have to enable hot_standby=on on the standbys, though.

(and it is in general a good idea to keep the configuration the same as much as possible between primary and standbys).


Thanks! I was asking here mostly out of curiousity about how people felt after running it for awhile since it has certainly sounded like it has improved massively since I last dealt with it in the 8.x era.


get two or three dedicated servers, stick a 2 disk mirrored raid in each one, run one primary and have barman manage your snapshots+streaming replication. hook in some cronjobs to monitor age of stuff and failure of stuff. that is pretty much it for anything less than, say, 5TB. Beyond that your copy times are too high and making snapshots becomes extremely difficult.


For analytical loads the following is going to be great:

  While PostgreSQL 9.6 offers parallel query, this feature 
  has been significantly improved in PostgreSQL 10, with new 
  features like Parallel Bitmap Heap Scan, Parallel Index 
  Scan, and others.  Speedups of 2-4x are common with 
  parallel query, and these enhancements should allow those 
  speedups to happen for a wider variety of queries.


Impressive feature list. Glad to see logical replication is finally making it in.


What is you use case for it? My only thought was sending just one table to replica to be used to do analytics on ..


Replication across major versions, for example to upgrade without downtime. Partial replication, to distribute shared data across a series of clusters, or for analytics and reporting as you mention. Replicating the data without replicating any table bloat. Being able to do limited writes (e.g. to temporary tables) on the standby. http://rhaas.blogspot.com/2011/02/case-for-logical-replicati...


Indeed--anything where you want the secondary to be other than a bit-for-bit copy of the primary. It's also convenient for HA in some cases due to the fact that the DBMS copies are fully independent, hence free from propagated bit-level errors and also available for unimpeded reads.

MySQL started with logical replication very early on and it has proven extraordinarily useful. One of the more interesting use cases is feeding log transactions into data warehouses, which should be possible in PostgreSQL 10.


Isn't analytics a massive usecase in itself?


I am considering more and more a move back from MongoDB to PostgreSQL. I will be missing being schema less so much though. Migrations - particularly Rails migrations - left a bad taste in my mouth. Anyone did the move recently and what are their feelings?


I did just that move when I realized that I was doing a lot of work so impose schemas on my "schemaless" data and another bunch of work to implement joins in my application.

I found the best way to do migrations is with vanilla SQL. I wrote a little tool to read migrations from SQL files in a directory, send them to the server and keep track of which ones have already been applied. Simple and easy.

The big benefit of migration is that your app code doesn't have to deal with every possible schema that you've ever used; it can rely on the data being uniform.

I'm very happy with the switch; wouldn't go back to Mongo for anything.


Yeah, all the things that need flexible schema can go in a jsonb column. While querying on JSON has gotten less painful, it's still a bit of a chore. But I've found that I rarely need to do that. Or if I do, I just denormalize a bit and put those fields in a regular ol' column.

We did the move maybe 4-5 years ago? At least in the JavaScript world, this makes your life so so much better.

Of course, you still have to handle migrations, but at least you have transactions :)


You can easily have schema-less with Postgres's jsonb data type.


Not really. Postgres ORMs are not meant to do schema-less and tables still need to be created.


You're right that ORMs aren't really set up to use jsonb but it can be done. I've had pretty good success with a kind of "hybrid" putting most relational data in regular columns (with FK constraints) and adding flexible data in jsonb. The main trick is understanding how to build custom column types in whatever ORM you actually use.

I can imagine a future with specialized ORMs designed around jsonb, but the current state of the art is probably not as bad as you think.


In the .NET world, ServiceStack ORMLite has custom Postgres type attributes to use the jsonb and other fields automatically.

http://docs.servicestack.net/releases/v4.5.6#postgresql-data...

There is also Marten which is an ORM that offers a full document database interface all backed by Postgres JSON columns.

http://jasperfx.github.io/marten/documentation/documents/


You are implying you need an ORM for the simple key-value (jsonb) data access. I don't think you would benefit much.


Any recommendations for scaling Postgresql's best practices? Really appreciate it.


I know that several RDF data stores use PostgreSQL as a backend data store. With new features like better XML support, as well as older features for storing hierarchical data, I am wishing for a plugin or extension for handling RDF with limited (not RDFS or OWL) SPARQL query support. I almost always have PostgreSQL available, and for RDF applications it would be very nice to not have to run a separate service.

I tend to view PostgreSQL as a "Swiss Army knife" and having native RDF support would reinforce that.


As one of the people regularly working on postgres, I'm honestly a bit doubtful that it's realistic to add SPARQL frontend. Doing that well is a considerable amount of work, and there's relatively little overlap in experience between the communities. I suspect that focusing on other areas will have a bigger ROI.

But that's just my personal opinion, and other contributors and companies might very well disagree.


Which RDF data store uses Postgres as DB backend? And can one import WikiData? (does it scale) (I would rather avoid these old school RDF special case stores from SematicWeb days 10 years ago.)


https://github.com/cayleygraph/cayley is currently on the frontpage of HN, and it does use PGSQL as backend.


Does it scale is a very valid question for Cayley. Running sparql.uniprot.org I know virtuoso scales easily to 27 billion triples. Best benchmark I have seen for Cayley is 21 million RDF triples.

RDF and relational are similar but free form RDF is hard to put into a pure relational schema, as schema is derived from the data in RDF. There are ways to recover relational schema's from RDF but these are not yet in production state.


i helped build a triple store on top of postgresql back in '06-'08. company is gone so i don't know where the IP ended up, but it was a good foundation. we were competitive with the other players at the time.


Can you say which RDF data stores these are? Are they generic or special purpose?


A question on this statement, in the SCRAM authentication description: stealing the hashed password from the database or sniffing it on the wire is equivalent to stealing the password itself

How is that the case? That's exactly the thing that hashed passwords prevent. Of course, if it's just an MD5 hash that's feasibly vulnerable to brute-forcing today, but it's still not "equivalent" to having the clear-text password.


The point is that you only send the hash to the database to connect. If you steal the hash, you can connect to the database using said hash, not needing the plaintext. The password might as well be the hash in this case. Hence the equivalency.

Using that scheme, all you prove is that you know the hash of the password. SCRAM allows you to prove you know the plaintext password without actually transmitting it.


If you steal the hash from the database, yes. I don't know how stealing the hash over-the-wire is equivalent to having the password, since it is salted (with a salt generated by the server) and is not reusable.


Because the next time you connect to the server you provide the same hash. The person doesn't know your plane text, but they can get into the server just fine.


You can't provide the same salted wire hash. You'd need the pre-salted hash, which only the client and server now. I fail to see how this answers my question.


This is great because I couldn't go to production with earlier releases of logical decoding. Now we don't have to depend on a third party add on!


We're currently experimenting with logical decoding in 9.6, so I'd be curious to hear what problems you've been running into.


I could use a count of the number of file I/Os that each query takes, in order to optimize my queries further...


That's been there for a while:

    EXPLAIN (ANALYZE, BUFFERS) yourquery;
If you enable track_io_timing (has some overhead on platforms with slow timestamps, e.g. older VMware), you even get timing.

If you want that aggregated, rather than for an individual query, you should look into pg_stat_statements.


The BUFFERS count is more for row count info as it operates on large chunks of data, instead of index optimization that needs to count how many times index structures are accessed. Counting IOs directly would be more useful for tuning indexes.


Huh? It shows you the number of io operations.


fascinating that the road to improving the expr evaluator is better opcode dispatch and jit -- same tradeoffs every programming language project is looking at right now.


DECLARE @please VARCHAR(3) = '???';


You guys are awesome - keep up the good work!


the join speedup for provably unique operands sounds awesome




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: