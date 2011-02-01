I tend to view PostgreSQL as a "Swiss Army knife" and having native RDF support would reinforce that.
> [...] Furthermore, at least on Red Hat, glibc regularly whacks
> around the behavior of OS-native collations in minor releases,
> which effectively corrupts PostgreSQL's indexes, since the index
> order might no longer match the (revised) collation order. To
> me, changing the behavior of a widely-used system call in a
> maintenance release seems about as friendly as locking a family
> of angry racoons in someone's car, but the glibc maintainers
> evidently don't agree.
> The only thing that matters about strxfrm output is its strcmp
> ordering. If that changes, it's either a bug fix or a bug
> (either in the code or in the locale data). If the string
> contents change but the ordering doesn't, then it's an
> implementation detail that is allowed to change.
"Why do you think that? I don't see this documented anywhere, and I doubt it is something many readers of the C standard, the man page, or the glibc manual would expect.
The manual suggests to store the strxfrm output and use it for sorting. I expect that some applications put it into on-disk database indexes as a result. This will lead to subtle breakage on glibc updates.
(The larger problem is that there are definitely databases out there which use B-tree indexes in locale collation order, which break in even more subtle ways if we make minor changes to the collation order.)"
While PostgreSQL 9.6 offers parallel query, this feature
has been significantly improved in PostgreSQL 10, with new
features like Parallel Bitmap Heap Scan, Parallel Index
Scan, and others. Speedups of 2-4x are common with
parallel query, and these enhancements should allow those
speedups to happen for a wider variety of queries.
Is this issue resolved by the new "Logical replication" feature? It doesn't seem directly related, but it seems like maybe that is what he is referring to in this blog post?
The directory renaming at the bottom of the post is interesting - I wonder if many other projects have to do things like this?
The background is that, over the years, a number of people deleted the pg_xlog and pg_clog directories when they noticed they're running out of space, thinking it's just server logs. Unfortunately that's the directories containing the database journal, and transaction status (committed/aborted/in-progress). Which means they'll loose data. The idea is to rename them to something that's less likely to be mistaken for unimportant data.
(On the other hand, some projects need to make things _less_ clear - https://github.com/mackyle/sqlite/blob/3cf493d4018042c70a4db... - "users would (...) call [the developers] to wake them up at night and complain".)
