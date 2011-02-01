Hacker News new | comments | show | ask | jobs | submit login
New Features Coming in PostgreSQL 10 (rhaas.blogspot.com)
I know that several RDF data stores use PostgreSQL as a backend data store. With new features like better XML support, as well as older features for storing hierarchical data, I am wishing for a plugin or extension for handling RDF with limited (not RDFS or OWL) SPARQL query support. I almost always have PostgreSQL available, and for RDF applications it would be very nice to not have to run a separate service.

I tend to view PostgreSQL as a "Swiss Army knife" and having native RDF support would reinforce that.

This bit about ICU support v.s. glibc:

    > [...] Furthermore, at least on Red Hat, glibc regularly whacks
    > around the behavior of OS-native collations in minor releases,
    > which effectively corrupts PostgreSQL's indexes, since the index
    > order might no longer match the (revised) collation order.  To
    > me, changing the behavior of a widely-used system call in a
    > maintenance release seems about as friendly as locking a family
    > of angry racoons in someone's car, but the glibc maintainers
    > evidently don't agree.
Is a reference to the PostgreSQL devs wanting to make their index order a function of strxfrm() calls and to not have it change when glibc updates, whereas some on the glibc list think it should only be used for feeding it to the likes of strcmp() in the same process:

    > The only thing that matters about strxfrm output is its strcmp
    > ordering.  If that changes, it's either a bug fix or a bug
    > (either in the code or in the locale data).  If the string
    > contents change but the ordering doesn't, then it's an
    > implementation detail that is allowed to change.
-- https://sourceware.org/ml/libc-alpha/2015-09/msg00197.html

Florian Weimer's reply is also interesting:

"Why do you think that? I don't see this documented anywhere, and I doubt it is something many readers of the C standard, the man page, or the glibc manual would expect.

The manual suggests to store the strxfrm output and use it for sorting. I expect that some applications put it into on-disk database indexes as a result. This will lead to subtle breakage on glibc updates.

(The larger problem is that there are definitely databases out there which use B-tree indexes in locale collation order, which break in even more subtle ways if we make minor changes to the collation order.)"

For analytical loads the following is going to be great:

  While PostgreSQL 9.6 offers parallel query, this feature 
  has been significantly improved in PostgreSQL 10, with new 
  features like Parallel Bitmap Heap Scan, Parallel Index 
  Scan, and others.  Speedups of 2-4x are common with 
  parallel query, and these enhancements should allow those 
  speedups to happen for a wider variety of queries.

I did read the article, but I can't find any mention of addressing the "Write amplification" issue as described by Uber when they moved away from postgres. https://eng.uber.com/mysql-migration/ I had heard talk on Software Engineering Daily that this new major revision was supposed to address that.

Is this issue resolved by the new "Logical replication" feature? It doesn't seem directly related, but it seems like maybe that is what he is referring to in this blog post?

There's a patch reducing write amplifications (when caused by indexes), by a significant degree. Unfortunately it didn't quite get ready in time for the feature freeze of 10 - as it affects the on-disk format, we considered the risk to be too high.

Even a single feature from the list would make 10 an amazing release, all of them together is just unbelievable. Very happy we are using PG :)

Extended Statistics! I was following the replication changes, but have just discovered the extended statistics and am more excited about them.

The directory renaming at the bottom of the post is interesting - I wonder if many other projects have to do things like this?

> The directory renaming at the bottom of the post is interesting - I wonder if many other projects have to do things like this?

The background is that, over the years, a number of people deleted the pg_xlog and pg_clog directories when they noticed they're running out of space, thinking it's just server logs. Unfortunately that's the directories containing the database journal, and transaction status (committed/aborted/in-progress). Which means they'll loose data. The idea is to rename them to something that's less likely to be mistaken for unimportant data.

Having good directory names helps a lot; I'd be very surprised if other projects didn't also need to make it clear to admins what's happening.

(On the other hand, some projects need to make things _less_ clear - https://github.com/mackyle/sqlite/blob/3cf493d4018042c70a4db... - "users would (...) call [the developers] to wake them up at night and complain".)

Dumb question: does declarative partitioning pave the way for native sharding in Postgres? I'm not super super familiar, but it seems like along with some other features coming in Postgres 10, like parallel queries and logical replication, that this is eventually the goal.

I hope that it will have that effect. We need a few other features first: partitionwise join, partitionwise aggregate, asynchronous query, and ideally hash partitioning.

I see -- thanks! Really cool stuff.

This is great because I couldn't go to production with earlier releases of logical decoding. Now we don't have to depend on a third party add on!

We're currently experimenting with logical decoding in 9.6, so I'd be curious to hear what problems you've been running into.

Impressive feature list. Glad to see logical replication is finally making it in.

What is you use case for it? My only thought was sending just one table to replica to be used to do analytics on ..

Replication across major versions, for example to upgrade without downtime. Partial replication, to distribute shared data across a series of clusters, or for analytics and reporting as you mention. Replicating the data without replicating any table bloat. Being able to do limited writes (e.g. to temporary tables) on the standby. http://rhaas.blogspot.com/2011/02/case-for-logical-replicati...

Isn't analytics a massive usecase in itself?

You guys are awesome - keep up the good work!

