
PostgreSQL 10 Beta 1 Released - ahachete
https://www.postgresql.org/about/news/1749/
======
pilif
While everybody is going to be rightfully excited about the logical
replication, for me personally, CREATE STATISTICS and the new ROW syntax for
UPDATE amount to the additions that have the probably biggest effect on me
ever since I moved to postgres exclusively when 7.1 was released.

Especially CREATE STATISTICS (wonderful explanation here
[https://www.postgresql.org/docs/10.0/static/multivariate-
sta...](https://www.postgresql.org/docs/10.0/static/multivariate-statistics-
examples.html)) is the one single knob I wanted to be able to turn so many
times in the past now (especially the n-distinct counts).

Most of the time, the planner does an ok job, but sometimes you have tables in
a peculiar shape and if you're unlucky, the planner will opt into horribly bad
plans. Whenever this happened for me, it was due to one of the two things
CREATE STATISTICS allows me to tune in the future.

Thank you, thank you, thank you to whoever gave us this wonderful feature

~~~
walrus
What's the significance of the `update .. set (..) = row (..) ..` syntax?

    
    
      update comment set modified = now(), body = 'edited comment' where id = 123;
    

vs

    
    
      update comment set (modified, body) = row (now(), 'edited comment') where id = 123;
    

It doesn't seem to provide any new functionality, just a minor difference in
syntax.

~~~
colanderman
EDIT: Turns out the below is not true, though you can achieve the same effect
with the sub-select syntax.

The row value can be a single value from some other query. (Rather than having
to pick apart each column from the row value.) That said I think the feature
has been there for a while, and now simply the "ROW" keyword is optionally
allowed.

~~~
walrus
Thanks for the explanation!

    
    
      update comment set (modified, body) = (select now(), 'edited comment') where id = 123;
    

You're right, it works the same without the `row` keyword in 9.6.

~~~
colanderman
Actually now that I re-read the grammar, sub-selects are yet another accepted
syntax. Looks like the ROW syntax doesn't support row expressions like I
thought. Not really sure the benefit beside making it easier to
programmatically construct UPDATEs.

~~~
mjw1007
It's for when you want to update more than one column with values from the
same record in another table, eg:

    
    
      UPDATE books
        SET (title, isbn) = (
        SELECT title, isbn FROM other_books
      WHERE other_books.foo = books.bar);
    
    

You can also do it with UPDATE ... FROM (which may be more efficient), but
that's a PostgreSQL extension, while the added-in-9.5 syntax is SQL standard.

    
    
      UPDATE books
        SET title = other_books.title, isbn=other_books.isbn
        FROM other_books
        WHERE other_books.foo = books.bar;

~~~
colanderman
That's not the row syntax though, that's the sub-select syntax (my initial
confusion). i.e., there are three options according to the grammar:

    
    
      SET { column_name = { expression | DEFAULT } |
            ( column_name [, ...] ) = [ ROW ] ( { expression | DEFAULT } [, ...] ) |
            ( column_name [, ...] ) = ( sub-SELECT ) }
    

The second option, "row" syntax (to which the optional "ROW" keyword was
recently added), doesn't allow for a row expression, only column expressions.
The sub-select syntax suffices for row expressions (as your example
demonstrates).

------
jph
Congratulations to the team. The replication/partition improvements are
significant and much appreciated.

My favorite improvements are full text search of JSON & JSONB; this makes pg a
full replacement for Mongo for my use cases.

~~~
zachmax
I feel like this feature replaces almost all of Mongo's use cases!

~~~
htsh
I'm interested. When you say almost, can you elaborate on any remaining use
cases when you'd use Mongo?

~~~
byefruit
At my previous company we made heavy use of its lossy compression feature.

~~~
X86BSD
You would get compression with Postgres running on ZFS.

~~~
sqeaky
Database plus Copy-On-Write file systems sound like a bad idea. I am imagining
a modest 100gb database being re-written for every change.

I am sure there is some way to work around this, but wouldn't this be the
default behavior with a typical database and typical COW file system?

~~~
X86BSD
You might be interested in reading this paper:
[https://people.freebsd.org/~seanc/postgresql/scale15x-2017-p...](https://people.freebsd.org/~seanc/postgresql/scale15x-2017-postgresql_zfs_best_practices.pdf)

~~~
sqeaky
I am interested and I appreciate the thought and link, however these appear to
be the slides to a talk without the actual talk. If so they are of limited use
because the slides never have all the information the presenter has, and that
information is often just a summary.

Is this the correct talk:
[https://youtu.be/dwMQXLOXUco?t=5380](https://youtu.be/dwMQXLOXUco?t=5380) ?

~~~
X86BSD
Yes, sorry about that, that is the corresponding talk to the slides. Thanks
for pointing that out.

------
paukiatwee
Important features:

Native Table Partitioning - [https://www.keithf4.com/postgresql-10-built-in-
partitioning/](https://www.keithf4.com/postgresql-10-built-in-partitioning/)
Logical Replication - [https://blog.2ndquadrant.com/logical-replication-
postgresql-...](https://blog.2ndquadrant.com/logical-replication-
postgresql-10/)

or what is new in PG 10
[https://wiki.postgresql.org/wiki/New_in_postgres_10](https://wiki.postgresql.org/wiki/New_in_postgres_10)

Postgresql really awesome!

------
ilanco
PostgreSQL is an amazing project. A no-nonsense database that delivers what it
promises. I'm amazed at what a talented group of people can accomplish when
they are driven and put their mind to it. Thanks for a wonderful product.

~~~
irrational
Plus their documentation is top notch. After using Oracle for 15 years, I
nearly cried when I moved to Postgres recently and saw how wonderful their
documentation is.

~~~
swasheck
Completely agree!

I've often said that I didn't learn English grammar until I studied German, I
didn't understand Linux until I read the Arch documentation, and I didn't
comprehend relational data until I read the Postgres documentation.

------
jakobegger
The biggest news for me is ICU support for collations (text sorting).

Previous versions of PostgreSQL relied only on strcoll, which is horribly
broken on BSD and macOS. On platforms where it wasn't completely broken, it
had the potential for subtle data corruption bugs (eg. an update to glibc
might change sort order, causing indexes to become corrupt).

Now, you can optionally use ICU for collations, which gives you reliable,
versioned collations. This is a big step forward!

ICU collations are not the default, you need to add them with CREATE
COLLATION. You have a lot more collations available to choose from, but I
think it's not yet possible to change any of the advanced settings that ICU
provides.

(Also, when I tried it, it seems that the ICU collations are case insensitive
-- but I think case insensitive collations aren't fully supported yet.)

~~~
petereisentraut
ICU collations are prepopulated; see
<[https://www.postgresql.org/docs/devel/static/collation.html#...](https://www.postgresql.org/docs/devel/static/collation.html#collation-
managing>).

Also, ICU collations are case sensitive, just like libc locales.

~~~
anarazel
s/insensitive/sensitive/?

~~~
petereisentraut
fixed, thanks

------
jakobegger
If you're looking for a quick way to try PostgreSQL 10 on a Macbook, we made a
special build of Postgres.app that includes PostgreSQL 10beta1.

You can download it from Github:
[https://github.com/PostgresApp/PostgresApp/releases](https://github.com/PostgresApp/PostgresApp/releases)

~~~
petereisentraut
Or if you use Homebrew you might like: brew install --devel
petere/postgresql/postgresql@10

------
alfalfasprout
The native table partitioning makes me so happy. I'd been doing this for years
with really hacky external modules and tons of triggers. Sadly, even then
there were always weird edge cases.

Postgres really has become the most versatile database out there. I cringe
whenever I have to work with MySQL again...

~~~
jdubs
Every time I see a job post mentioning mysql I realize they just haven't
discovered postgres, or they have some really gross problem. :/

~~~
ubertaco
Or they want to allow for case-insensitivity of some data, like for example
email addresses on login forms.

As much as postgres is overall better than MySQL in so many ways, it's still
ridiculously difficult to set things up such that

    
    
        SELECT id FROM users WHERE email='foo@example.com' 
    

returns the same result as

    
    
        SELECT id FROM users WHERE email='Foo@example.com'

~~~
mgkimsal
not sure why you got downvoted so much.

Everyone's answer is "just lowercase everything".

I'll respond just a bit:

1\. You don't always have control over all the queries that have been written
against your database.

2\. You would probably lose the ability to use ORMs without a moderate amount
of customization.

3\. If you're migrating from a different database, you may have checksums on
your data that would all need to be recalculated if you change case on
everything stored.

4\. Doing runtime lowercase() on everything adds a bit of overhead, doesn't
it?

citext on postgresql seems a decent option - the citext docs even mention
drawbacks of some of the other recommended options.

~~~
openasocket
Is there a single ORM out there that doesn't support the lower() function? I
googled "case insensitive search" \+ a couple ORMs and each of them could
implement it as a one-liner.

And doing the runtime lower() on everything will generally not be slower than
citext. If you look at the source for the citext comparison
([https://github.com/postgres/postgres/blob/aa9eac45ea868e6dda...](https://github.com/postgres/postgres/blob/aa9eac45ea868e6ddabc4eb076d18be10ce84c6a/contrib/citext/citext.c#L113))
you'll see it is internally converting the values to lowercase and comparing
them. All it saves you is the overhead of a sql function invocation, and you'd
have to do a lot of comparisons to make that difference measurable. But if
you're doing a lot of those comparisons, unless you're just running the
calculation on the same couple values over and over, the memory and disk
latency will dominate performance, not the minimal overhead of the sql
function invocation.

I agree you should probably use citext if you need case-insensitive unique or
primary key values, but be aware of the drawbacks.
[https://www.postgresql.org/docs/current/static/citext.html](https://www.postgresql.org/docs/current/static/citext.html)

------
cygned
I wished they would implement more from SQL:2011. I have a lot of applications
that would benefit from system versioned tables.

This is a good sum up of useful modern SQL features:
[https://www.slideshare.net/MarkusWinand/modern-
sql](https://www.slideshare.net/MarkusWinand/modern-sql)

~~~
saurik
I was expecting that presentation to be stuff you were missing, but the
presentation is saying that essentially every feature they describe is
implemented in PostgreSQL. Of the SQL:2011 ones, only one wasn't (temporal
tables), and even the SQL:2016 features had partial support (with the summary
slide at the end of other features they didn't do in detail having stuff that
looks familiar in a PostgreSQL context). Do you have a reference of features
not in PostgreSQL you are hoping to use?

~~~
cygned
I really would like to use Temporal Tables (slides 137 and following) to say
like 'give me the record of last week'.

I can of course do that already manually, but it is tedious and I hope it will
be faster if implemented directly.

~~~
phonon
[https://github.com/arkhipov/temporal_tables](https://github.com/arkhipov/temporal_tables)
seems like a good option?

~~~
cryptonector
Thanks for the link!

~~~
phonon
Here's two more--

[http://clarkdave.net/2015/02/historical-records-with-
postgre...](http://clarkdave.net/2015/02/historical-records-with-postgresql-
and-temporal-tables-and-sql-2011/)

[https://pgxn.org/dist/temporal_tables/](https://pgxn.org/dist/temporal_tables/)

------
crgwbr
Will having logical replication make doing a DB version upgrade in production
easier? We're using Postgres 9.4 on RDS right now, and there doesn't seem to
be an upgrade path that doesn't involve some downtime.

~~~
fjordphylogen
> E.1.2. Migration to Version 10

> A dump/restore using pg_dumpall, or use of pg_upgrade, is. > required for
> those wishing to migrate data from any previous > release.
> ([https://www.postgresql.org/docs/devel/static/release-10.html](https://www.postgresql.org/docs/devel/static/release-10.html))
> This means no, I guess. This is what I don't get. How can anyone with a
> sizeable db in production do that?

~~~
pritambaral
That is only if you want to upgrade in-place. Upgrading using logical
replication is different: it requires running another instance and switching
over to it when all the data has been replicated.

------
zitterbewegung
The new features related to clustering looks exciting. Also SCRAM support and
Full text search for JSON and JSONB . Postgres is cementing itself as not only
the safe choice for Databases but also the Swiss Army knife .

------
qeternity
As someone with little to no Postgres experience, it seems like they are
heading in the direction of providing the type of massively parallel, scale
out features that Citus provides.

Would love to hear thoughts from someone with real expertise.

~~~
dhd415
Citus is building functionality on PostgreSQL to provide sharding for data
sets too large for a single machine. There aren't really any PG 10 features
that duplicate that. Some of the new PG features such as increased query
parallelism allow for better utilization of single-machine resources. Other
features such as logical replication may allow for some horizontal scaling by
splitting read workloads across replicas, but none of them provide any
sharding capabilities.

~~~
bjt
Aggregate push down to foreign servers seems to overlap with Citus. It's not
as transparent, but you could have a dataset sharded across several PG
instances, mount them all into a master with foreign data wrappers, and UNION
them (for some queries) to aggregate across the set.

~~~
amitlan
BTW, you can also create partitions as foreign tables:
[https://www.postgresql.org/docs/10.0/static/sql-
createforeig...](https://www.postgresql.org/docs/10.0/static/sql-
createforeigntable.html)

------
Tostino
So glad that GiST indexes now support UUID and ENUM data types. That was a big
wart for me due to needing exclusion constraints.

~~~
15155
I always just cast to text in this scenario.

~~~
Tostino
The issue with that is performance and index size. At first that was my
solution, but as more data got into the tables that just wasn't working. For
UUID's I had to move to casting to bytea using the uuid_send(uuid) function.

For enums, I had to create a function to cast an enum to a unique integer, and
used that in my exclusion constraints. It's hacky as hell and not something I
like, but it's worked for the past 3 years.

I'll be really really happy to get rid of those hacks come PG 10 though.

------
snuxoll
I know this sounds icky to some, but what I _really_ want from Postgres is a
proper equivalent to MSSQL's FILESTREAM.

I know, I know, "databases are bad for files" \- but let's take something like
an ECM suite where images and documents are literally part of a transaction,
having to synchronize those between filesystem and database breaks the Atomic
constraint in so many ways. PostgreSQL has LOB support, but oid's being
32-bits severely limits the usefulness of the feature without using convoluted
workarounds (multiple databases).

~~~
anarazel
Postgres dev here.

> but what I really want from Postgres is a proper equivalent to MSSQL's
> FILESTREAM.

What sizes of files and such are you interested in? What kind of read/write
patterns?

I do think we need some improvements in the area. Not enough that I'll drop
the other stuff I'm working on, which I think is higher priority, but enough
to discuss approaches and review patches.

It'd be cool if you could comment on the pgsql-hackers list.

~~~
snuxoll
Small files, 25-50K on average (images of patient charts), and they're all
they're all write-once read-many.

~~~
anarazel
Hm, in that case, what stops you from just using a bytea column?

~~~
snuxoll
My major issue with bytea is two fold, the same oid issue that you have with
LOB's (we have 10's of billions of pages) so you're forced to less than
optimal solutions like table inheritance for large numbers of records, plus it
severely bloats the size of the heap file making a VACCUM FULL take a century
if needed.

~~~
derefr
> so you're forced to less than optimal solutions like table inheritance for
> large numbers of records

Would the "native table partitioning" feature introduced in this release be a
solution (or, at least, more optimal) for this?

Alternately, if you want to go off-heap, what about using Foreign Data
Wrappers
([https://wiki.postgresql.org/wiki/Foreign_data_wrappers#File_...](https://wiki.postgresql.org/wiki/Foreign_data_wrappers#File_Wrappers))?
These two both sound like they might solve your problem:

•
[https://github.com/ZhengYang/dc_fdw/wiki](https://github.com/ZhengYang/dc_fdw/wiki)

• [http://multicorn.org/](http://multicorn.org/)

~~~
snuxoll
The native table partitioning removes one of the headaches of inheritance, but
it mostly only relieves some of the administrative headaches - the foreign key
issues remain.

fdw's are great, but they don't participate in replication - which is a pretty
big issue. Unfortunately, the "proper" way to do this would be with a custom
type, but I can't find any way you could write one with the extension API that
wouldn't be stored in the database heap (you could create some native
functions similar to the existing LOB functions, but then you run into
atomicity issues again).

Honestly, this would be easy to write if the extension API supported off-heap
storage of custom types - though I understand there's a ton of technical
complexity in implementing such a feature.

~~~
derefr
To be clear—you want Postgres to manage the data (so that it gets replicated
through Postgres replication), but you also want the data to exist in
somewhere other than the DB heap?

As far as I can tell, these two conditions together form a bit of a bind: for
Postgres to manage the data in a way that would enable replication under MVCC,
it has to have said data mmap(3)ed and indexed and have it participate in the
WAL log and so forth. Just the space overhead of this management metadata will
then be prohibitively costly in memory, if you have "tens of billions of
files" to keep under management.

Personally, I think the pragmatic solution would be to replicate the files
themselves outside of Postgres, and have Postgres just hold opaque references
to their paths, which you would join through an FDW to fill a column with
their contents.

As you say, the files are Write-Once-Read-Many—so you (presumably) aren't
worried about how Isolated or Consistent writes to the file data would be
without Postgres. Plain filesystems provide Atomicity and Durability
guarantees all by themselves. It's less _convenient_ to do things this way—you
have to manage e.g. "WAL-E + a separate rsync" rather than just WAL-E—but it's
not particularly _bad_ in terms of the engineering trade-offs.

~~~
anarazel
> To be clear—you want Postgres to manage the data (so that it gets replicated
> through Postgres replication), but you also want the data to exist in
> somewhere other than the DB heap?

> As far as I can tell, these two conditions together form a bit of a bind:
> for Postgres to manage the data in a way that would enable replication under
> MVCC, it has to have said data mmap(3)ed and indexed and have it participate
> in the WAL log and so forth. Just the space overhead of this management
> metadata will then be prohibitively costly in memory, if you have "tens of
> billions of files" to keep under management.

There's really not too much of a technical problem here. What the sub-op is
complaining about mainly just some work that needs to be put into parts of
postgres that have been designed long ago. Not entirely trivial due to
compatibility concerns, but also not super hard. The first thing would be to
have a separate type of toast table with 64bit ids (can't drop support for 32
bit ids without making in-place upgrades impossible), and to have per table
toast id. Then use same infrastructure for the LOB piece.

Btw, postgres doesn't mmap() data. On some systems it uses mmap(MAP_ANONYMOUS)
to allocate its buffer cache, but that's just a configurable size.

~~~
derefr
Right, I didn't mean to suggest that Postgres currently mmap(3)s all data;
rather, I meant to get across that—in the scenario where you really want to
keep all this data laying around _as files_ rather than as LOBs—Postgres would
need to have some way to guarantee that the data that's sitting around in
those files is always in the state Postgres _thinks_ it's in, if the
corresponding file-streams are going to join in the same MVCC transactions as
everything else. From what I can tell, that would require Postgres to do
whatever it does for each table backing-file it keeps, to _each_ of those
files: both on the OS level (at least fopen(3) + flock(3)), and in terms of
book-keeping in the system catalog.

Let me put it this way: how much overhead do you think there would be if you
split your Postgres data across "ten billion" _tablespaces_? Because that's
essentially what's being talked about here—little tablespaces containing one
{oid, LOB} table, with one row, with a "storage engine" that represents that
tablespace as a file.

...of course, if you _do_ just (fix and then) use LOBs rather than insist on
externally-visible files, none of that matters. :)

------
gigatexal
unrelated to this particular release, but since we're talking about
postgres... how do people find the ability to write stored procedures and
functions in languages other than just SQL? I'm coming from a MSSQL shop and
curious how writing a query in python for example has benefited anyone if at
all -- did you / can you use modules like numpy or pandas in a postgres python
procedure?

~~~
zie
Python under PG can do anything python can do.. but that said, just because
you _CAN_ import numpy/pandas and go to town, doesn't mean you _SHOULD_. In
fact you probably really shouldn't :)

That said, we use python inside of PG, and it's awesome. It's not really any
more powerful than PG/SQL(their built-in language) i.e. in terms of what you
can do in the DB itself.

Our main app is also python, so we have a continuity of language from front-
end to back-end. We do have some DB functions that take advantage of python's
flexibilities, but like I said, we try to keep external modules and craziness
out of the PG backend if possible. Sometimes there is no help for it, and you
do what you must, and it's very handy that you can just go do it.

~~~
gigatexal
Sometimes I'd rather do python magic than self join correlated sub-queries or
write window functions and have it all become a convoluted mess. But I see
your point.

edit: i'm still trying to train myself to think in a relational way.

~~~
zie
relations are awesome, but it def. takes training! I'd suggest avoiding window
functions for pagination.. see [http://use-the-index-luke.com/no-
offset](http://use-the-index-luke.com/no-offset) as one method.

------
combatentropy
So it looks like PostgreSQL will finally be web scale ;)

(ducks and runs)

~~~
deadliftpro
That, i suppose, was meant to be a joke. see
[https://www.youtube.com/watch?v=b2F-DItXtZs](https://www.youtube.com/watch?v=b2F-DItXtZs)
and just replace mysql with pg.

~~~
combatentropy
Yes, it was a reference to that. My comment is currently sitting at -2. I'm
still getting the hang of Hacker News' sense of humor. On Slashdot, I would be
at +5. Ho hum.

Anyway, just to make my intentions clear, I love Postgres, have used it for
over a decade, and have yet to use MongoDB or any other NoSQL database.

~~~
grzm
In general, strive for substantive and constructive comments on HN. If you've
got that, a little humor added in can be appreciated. Comments that are
submitted only for humor value (which you knew yours was, given your
parenthetical addendum) are likely to be less appreciated on HN than on other
sites.

Here's a recent thread where this has been discussed:

[https://news.ycombinator.com/item?id=13760333](https://news.ycombinator.com/item?id=13760333)

There are likely others that describe it better, but I don't have them at
hand.

Edit to add: Of course there are. Here's some comments by 'dang on the topic:

[https://hn.algolia.com/?query=author:dang%20humor&sort=byPop...](https://hn.algolia.com/?query=author:dang%20humor&sort=byPopularity&prefix&page=0&dateRange=all&type=comment)

~~~
combatentropy
So what I want to know is how the members of Hacker News act as one.
Presumably these are the same people who appreciate humor on other sites,
upvote it, and participate in it. But on Hacker News they somehow all know to
downvote it. It's eerie.

~~~
grzm
I'm not sure I follow. People often behave differently depending on the
context (e.g., work, home, school, out with friends), so acting differently on
different sites doesn't seem very surprising. Also, different sites have
different, though perhaps overlapping, populations, so the "average" behavior
or culture is going to be different.

One example (mentioned in the HN guidelines) is Reddit. There are people who
frequent both HN and Reddit, yet it's clear that members on HN—even those who
use Reddit as well—don't want HN and Reddit to be the same. As such, they
behave differently on each site. That's not to say one is better than any
other: they're just different.

And HN members don't act as one—just as they don't act as one on any other
site. If they did, your comment would have been downvoted to the point that it
was flagged dead (given it's likely been seen by hundreds, if not thousands)
or not flagged at all. Given it's current shade of gray, I suspect you've
received only a few downvotes. (Edit to add: I see you did say it is at -2.)

Does that make sense? Or am I misreading you?

Anyway, this is quite off-topic now. I posted these in the hopes of providing
a bit more insight into the HN community. I hope they've been more helpful
than frustrating.

~~~
combatentropy
Thank you. No, you're not misreading me. I guess like you say, it could be
worse than -2. (Although now it's at -3!)

I don't frequent Reddit. It sounds like it's so jocular that people come here
to get away from it all.

------
noobsquared
What is a good resource for learning databases in general (besides just
queries) and then focus on PostgreSql?

~~~
pgaddict
So you want something that covers all types of database systems? I don't think
any such book exists, but maybe look at the database-oriented courses at CMU,
Berkeley, Stanford, MIT ... Those are typically comprehensive and include
links to source books/papers.

In particular, look at Joseph M. Hellerstein at CMU: db.cs.berkeley.edu/jmh/
(doesn't load for me at the moment, not sure what's wrong).

Also, the Redbook (prepared by Hellerstein, Bailis and Stonebraker) is great:
[http://www.redbook.io/](http://www.redbook.io/) but it's more an update on
various topics than introduction.

------
alixaxel
Wasn't primary-primary replication from 2ndQuadrant BDR tool supposed to make
GA with PostgreSQL 10?

~~~
petereisentraut
The new logical replication feature is based on the experiences from BDR and
pglogical, but it isn't multimaster yet.

------
milquetoastaf
Postgres is awesome and thankfully is much easier to get set up with high
availability and replication baked-in (remember pgpool anyone?). AWS Cloud
Formation Docker Swarm stack + postgres primary/replica compose file + datadog
and you're good to go.

