
PostgreSQL 12 Beta 1 Released - jakobegger
https://www.postgresql.org/about/news/1943/
======
jakobegger
The biggest news for me is that PostgreSQL 12 now really supports case-
insensitive collations!

    
    
        CREATE COLLATION case_insensitive (
          provider = icu,
          locale = 'und-u-ks-level2',
          deterministic = false
        );
        
        CREATE TABLE names(
          first_name text,
          last_name text
        );
        
        insert into names values
          ('Anton','Egger'),
          ('Berta','egger'),
          ('Conrad','Egger');
        
        select * from names
          order by
            last_name collate case_insensitive,
            first_name collate case_insensitive;
    

And you'll get them in the correct order, despite uppercase/lowercase
differences in the last name column.

~~~
gmac
I think the citext extension[1] already made much of this straightforward?

What I'm most excited about is the fact that WITH clauses will no longer
necessarily be optimization fences — I have some big queries that are really
only manageable this way, and now they may well be very much faster.

[1]
[https://www.postgresql.org/docs/11/citext.html](https://www.postgresql.org/docs/11/citext.html)

~~~
ako
Why are you not using views instead of with clauses?

~~~
jeffdavis
Creating views makes more sense sometimes and CTEs make more sense other
times.

Creating views just for one query is kind of like creating temporary variables
for each subexpression. Sure, it works, but not a great solution.

Also, other databases do optimize through a CTE. So if you port to PG 11, you
might see a lot of regressions in plans unless you rewrite to use views
instead. PG 12 will work a lot better for existing queries.

------
move-on-by
I’m not a DBA, but this seems like it would be huge if I were:

> Additionally, PostgreSQL 12 adds the ability to rebuild indexes
> concurrently, which lets you perform a REINDEX operation without blocking
> any writes to the index. This feature should help with lengthy index
> rebuilds that could cause downtime when managing a PostgreSQL database in a
> production environment.

~~~
vbezhenar
Can't you just create second index online and remove first when second index
is built?

~~~
anarazel
You can. But sometimes you can't drop indexes without further consequences. It
might be e.g. a primary key, or a unique index used to enforce uniqueness on
the referred-to side of a foreign key. You can work around those, but that
requires holding an exclusive lock for at least a short amount of time
(problematic in itself, but also can increase deadlock risk unless you're
really careful).

------
GordonS
> PostgreSQL 12 now allows execution of JSON path queries per the SQL/JSON
> specification in the SQL:2016 standard. Similar to XPath expressions for
> XML, JSON path expressions let you evaluate a variety of arithmetic
> expressions and functions in addition to comparing values within JSON
> documents.

I'm not familiar with JSON path queries - how do they compare to the jsonb
querying syntax that Postgres already has?

~~~
anarazel
jsonpath does allow a lot more complicated expressions than before. See
[https://www.postgresql.org/docs/devel/functions-
json.html#FU...](https://www.postgresql.org/docs/devel/functions-
json.html#FUNCTIONS-SQLJSON-PATH) for example, and also
[https://www.depesz.com/2019/03/19/waiting-for-
postgresql-12-...](https://www.depesz.com/2019/03/19/waiting-for-
postgresql-12-partial-implementation-of-sql-json-path-language/)

------
nathan_long
depesz has a nice series of posts about new features:
[https://www.depesz.com/tag/pg12/](https://www.depesz.com/tag/pg12/)

"Generated columns" and "foreign keys to partitioned tables" are two that
caught my eye.

~~~
gigatexal
Fkeys on partitioned columns is huge.

~~~
brobdingnagians
Does that open the possibility of doing fkeys for partitioned columns in
Citus? or is that a completely different technical challenge on the
distributed databases?

~~~
gigatexal
IMO Citus is the RHEL of Postgres -- so one would have to wait until Citus
bases on PG12 I guess. Running a HA DB is so hard so Citus making it a bit
easier is super nice

------
I_am_tiberius
Wow cool. Does anyone know if sharding is currently somewhere on the roadmap?
I know Microsoft just acquired Citus Data with their offering but is there an
official sharding solution planned?

~~~
craigkerstiens
Some of the primitives exist for sharding within a single node, but to roll
something the same as Citus would require a bit more work.

Citus is very much focused on simplifying all the extra work you'd have to do,
to your application it still appears as if it is a single node–you don't have
to wire together FDWs or teach your application to understand about different
nodes/shards.

A lot of the work going into 12 we're pretty excited about at Citus because it
can allow us to continue to tackle the scaling piece and leverage more of what
it is in the box.

~~~
I_am_tiberius
I'm currently working on an application platform based on postgres and was
very happy to hear that Azure will soon offer Citus scaling (or does already).
During the development phase it doesn't make sense for me to spend a lot of
money in order to have my database hosted in a sharding mode but in the long
term I would like to use the Citus offer on Azure. Do you have any advice on
what to consider in my database (table) design in order to avoid issues when
migrating to Citus at one point in time? I only read that it makes sense to
have one column (e.g. customerId) in each table so the sharding mechanism
knows based on which information the data should be distributed among the
nodes. In my case I guess such kind of key would be organizationId (= a group
of users)

------
GordonS
> PostgreSQL 12 improves the overall performance of the standard B-tree
> indexes

I know to a large extent "it depends", but it would be nice to get an _idea_
of some numbers here - kind of surprised it doesn't say.

~~~
petergeoghegan
I am the primary author of this work. It's very hard to give a representative
example, but I can give you a simple real-world example.

The mouse genome database is perhaps the only real world Postgres databases
that is publicly available and isn't trivially small. It is available from
[http://www.informatics.jax.org](http://www.informatics.jax.org). There are
quite a number of publicly available datasets, of course, but this is
specifically a raw pg_dump that includes indexes that are actually useful to
the underlying application. A living, breathing database.

One of the test cases used during development involved restoring the dump, and
then re-inserting the data with retail insertions (pristine indexes generated
by CREATE INDEX are not very interesting). Something like the following,
repeated for each table:

create table __foo (like foo including indexes); insert into __foo select *
from foo;

Afterwards, all of the indexes taken together (all indexes on all "__foo"-like
tables) were about 15% smaller than they were when the same procedure was
performed on a v11 Postgres. Since the database has many indexes on each
table, the overall size of the database was reduced by about 10%.

That is only one anecdote, involving a database that happened to have a lot of
indexes that each stored a lot of duplicates. And, this tells you nothing
about how latency or throughput are affected for the application itself, which
is usually much more interesting. Your mileage may vary, but this is what
happened in the one real-world case that I actually tested. It was not cherry-
picked.

Andres provided a link to some TPC-C benchmark results that I published,
showing significant improvements despite the fact that that benchmark involves
few or no duplicates in indexes. That's due to _another_ specific issue with
how we choose where to split leaf pages.

Overall, the best way to think of the v12 work is as something that fixes
cases where Postgres previously didn't do so well. It makes performance more
_consistent_ overall.

~~~
isoprophlex
I'm extremely thankful for all the cool stuff thats being done on postgres.

To me, it's the nr. 1 software where every new update brings super exciting
things.

Thanks for the efforts you're putting in.

------
jonstaab
Stoked about jsonpath! The current collection of json(b) functions are a huge
pain to use. If you want to do something as trivial as filtering a json array,
you have to select jsonb_array_elements in a subquery, do a filter, and then
roll the thing back up using jsonb_agg, giving you a 3-deep subquery
structure.

Also excited about generated columns; we do a fair amount of denormalization
in postgres, and managing that stuff in code can get pretty error-prone.

------
tux1968
Read a while back about a new storage backend named zheap that was meant to
obviate the need for Vaccuum. Was from a year or two ago and at the time they
were aiming for PG 12. Wonder where they stand now. Looking at their Github,
it's still active:

[https://github.com/EnterpriseDB/zheap](https://github.com/EnterpriseDB/zheap)

~~~
atombender
Zheap's design [1] sounds almost exactly like Oracle.

The way Oracle works, when you do transactional row update (e.g. UPDATE), the
previous version is read and copied to the undo log, and then the update is
performed in-place in the table even though it's not committed yet.

Each row (or maybe it's per page — it's been a while) has a "transaction
slots" header, and the transaction then writes its ID to this list. If
another, concurrent transaction does a SELECT against the same row, it will
see in the slot list that this row is from an uncommitted transaction, and if
your transaction isolation level is high enough, it will then look into the
undo log for the currently committed row.

This design means that there's no transactional overhead for reads done
against rows that aren't being written to, so it's optimized for the general
case — performance is only impacted when a transaction has dirty rows and
other transactions have to dip into the undo log to find the old versions. The
downside is that commits and rollbacks are slower; for commits, all the
transaction slots need to be updated, and for rollbacks the undo log needs to
be copied back into the table.

Postgres's current MVCC is very elegant in theory — MVCC means never updating
anything in-place, always appending, so there's never any read/write
contention, and commits/rollbacks are near-instantaneous. But the garbage
collection aspect ("vacuum") has always been a challenge, and Postgres has
mitigated a lot of performance issues over time with clever hacks that
compromise the purity of the MVCC model (e.g., HOT tuples, index visibility
bits). I think it's definitely time for alternative approaches.

[1]
[https://github.com/EnterpriseDB/zheap/blob/master/src/backen...](https://github.com/EnterpriseDB/zheap/blob/master/src/backend/access/zheap/README)

~~~
iracic
>> The downside is that commits and rollbacks are slower

Commit is not slower. Some part of the work is delayed, ex first SELECT after
change may be involved to check & update row status in data block by peeking
transaction state in UNDO. Huge rollback is the real pain and something you
don't want to do and in some situations you would prefer to go UNDO-less.

------
GrayShade
> Inlined WITH queries (Common table expressions)

That's pretty nice.

~~~
LoneWolf123
Isn't it a breaking change? Using WITH subqueries to fence off query planner
seemed to be a thing, right?

~~~
GrayShade
Yes, it can affect (for better or worse) the performance of a query.

------
etxm
> the ability to use foreign keys to reference partitioned tables is now
> permitted

Hell yeah!

------
electrotype
Still no real MERGE, sadly?

[https://commitfest.postgresql.org/21/1446/](https://commitfest.postgresql.org/21/1446/)

~~~
LoneWolf123
This might be excellent: fully implemented MERGE in other RDBMS seemed to be
highly unreliable. Given Postgres' focus on quality, not on number of
features, not implementing MERGE fully might be a great choice.

Just imagine how many permutations we'd need to unit test the implementation
of full MERGE. Sounds scary. Better develop a frequently used subset, the
UPSERT, and stop at that.

------
GordonS
> Pluggable Table Storage Interface

This sounds interesting - I wonder if we'll see a whole bunch of weird and
wonderful plugins built on this!

Really intrigued to see what people come up with.

~~~
latch
Maybe this is some groundwork to support zheap storage?
[http://rhaas.blogspot.com/2018/01/do-or-undo-there-is-no-
vac...](http://rhaas.blogspot.com/2018/01/do-or-undo-there-is-no-vacuum.html)

~~~
anarazel
Indeed, that's one of the major reasons to introduce the tablem ("table access
method") infrastructure. Interest in integrating at least one form of columnar
storage is another.

------
dikei
Does anyone know if PG 12 can split the default partition into new partitions
yet ?

Currently, we have to manually detach the default partition, create the new
partitions, move the data from default partition to the newly created
partitions, then re-attach the default partition. That's a lot of works
compared to the reorganize partition feature of MySQL.

------
jteppinette
late 2019 release - need to fast forward life

~~~
anarazel
The more people test, the more likely we'll feel comfortable to release sooner
;)

~~~
jteppinette
Do you encourage people to report successful deployments of the beta / rc?

~~~
anarazel
Yes, please do. Best with a short description of what the workload is, so we
know what areas have been tested how.

Just one word of caution: I'd be careful deploying betas into production. Even
leaving possible bugs aside, while they'll be upgradable using pg_upgrade,
they're not guaranteed to be on-disk compatible with the final version. In
fact, we're already certain that there'll be some small changes between beta1
and beta2.

~~~
jteppinette
Awesome! Thanks for the info. I will drop it in PERF and report back. It’s a
sizable deployment, billions of rows. I’m interested in seeing the b-tree
index update impacts.

------
jteppinette
It doesn’t look like the betas get support in the official Postgres docker
image. Any suggestions on running the beta in Docker?

Edit: _for testing purposes_

------
lajith
Any news regarding result set return in stored procedure in postgres 12?

------
lcnmrn
Any support for big data? I tried to ingest HaveIBeenPwned password hashes in
PostgreSQL 11 and after that Postgres become unusable, not even a simple query
could run.

~~~
sterwill
What was your schema? What did you index? I don't have a copy of the HiBP
archive at hand, but the web site says it contains 551 million entries. If we
assume the 11 GB archive expands to 30 GB, that's certainly not a small table,
but it's not what I'd call "big data".

~~~
lcnmrn
'hash' column (varchar). Index created in advance slowed down the import so
much it wasn't usable. I couldn't create an index after the import, too slow.
I just want to find if one or multiple hashes are present in their data set.

I used Clickhouse previously with large data like this and worked much better.
Obliviously, I compare oranges with apples, but PostgreSQL could support a
columnar data engine or some kind of index?

~~~
anarazel
> I couldn't create an index after the import, too slow.

What made that "too slow"?

I just tested it, and it's not too bad. This is on my development environment
(~3yo laptop, reasonably powerful), with all kinds of compile-time debugging
enabled - but with compiler optimizations turned on.

    
    
      CREATE TABLE hashes(hash text, count int);
      BEGIN;
      TRUNCATE hashes;
      \copy hashes FROM program '7z e -so ~/tmp/pwned-passwords-sha1-ordered-by-count-v4.7z pwned-passwords-sha1-ordered-by-count-v4.txt' WITH (DELIMITER ':', FREEZE);
      Time: 1245024.360 ms (20:45.024)
      COMMIT;
    

The bottleneck is 7z decompressing:

    
    
        PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                   
      24403 andres    20   0   31480  22292   4544 R  73.8   0.1  14:54.09 7z                                                                                        
      23701 andres    20   0 4380008  11488   9024 S  27.9   0.0   5:42.04 postgres
    

And with a bit config the PG side could be made consdierably faster.

Creating an index:

    
    
      CREATE INDEX ON hashes(hash COLLATE "C");
      Time: 709595.370 ms (11:49.595)
    

And yes, without the COLLATE, and FREEZE above, it'd have taken a bit longer.
But not that much.

This is all while I also was working on the same loptop, including recompiling
etc.

EDIT: Formatting #2

~~~
lcnmrn
How long does it take to search for a hash? I'm interested in both WHERE
cases: hash = 'hash1' and hash in ('hash1', 'hash2').

~~~
anarazel
A few ms, with cold cache (both OS and postgres). The IN() roughly scales
linearly with the number of elements, although it's often a bit better, as the
upper tree levels are all in cache after the first few IN() element lookups.

~~~
lcnmrn
I manage to replicate your performance, but I used a hash type index on the
hash column.

