
PostgreSQL 10 Released - Tostino
https://www.postgresql.org/about/news/1786/
======
thom
It's 2017, and despite all this parallel stuff, query planners are still dumb
as bricks. Every day I get annoyed at various obvious-to-the-eye WHERE clause
pushdown opportunities, or situations where I can manually copy a view to a
table, add indexes, and build a query on top of that quicker than letting the
database fill up the entire disk with temporary files trying to do it as a
subquery. It's utterly maddening if you spend your life on ad-hoc
analytics/reporting workloads.

I strongly believe there are huge opportunities for AI to come along and,
given a logical description of the data and query, do a better job of
physically organising and serving the data. I realise some database management
systems do this to an extent, but it's pretty weak sauce in my experience.

~~~
lobster_johnson
A lower-hanging fruit may be a dynamic profiler that continuously analyzes
query patterns, and automatically reorganizes indexes, table clustering,
partitioning, and query planning based on the actual database activity.

This could even be something that operates as a layer on top of Postgres,
although it would probably need more precise data than what is currently
available through stats tables, and since Postgres doesn't have query hints
there's no way to modify query plans. It would also need the look at logs to
see what the most frequent queries are, and possibly require a dedicated slave
to "test run" queries on to gauge whether it has found an optimal solution (or
maybe do it on the actual database during quiet hours).

AI would definitely be interesting, though.

~~~
derefr
I've always wondered why there doesn't exist something like this, but
architected like Apache's mod_security: something you run for a while in
"training" mode to let it learn your usage patterns, which then spits out a
hints file (in mod_security's case, a behaviour whitelist; in this case, a
migration to add a set of indexes + views).

As a bonus, it could (like mod-security) also have a "production mode" where
the DB consumes that hints file and applies the instructions in it as an
_overlay_ , rather than as a permanent patch to your schema; that way, you'd
be able to re-train after changes and then swap out the hints in production by
just swapping out the file and HUPping the DB server (which would hopefully do
the least-time migration to turn the old indexes into the new indexes), rather
than needing to scrub out all the old indexes before building new ones.

~~~
tatersolid
This is also exactly what Microsoft SQL Server has done _since 2005_ with
dynamic managements views. It shows on a running system what indexes would be
optimal for an actual workload.

------
simonw
I feel like they’re burying the lede a bit in this one:

“PostgreSQL 10 provides better support for parallelized queries by allowing
more parts of the query execution process to be parallelized. Improvements
include additional types of data scans that are parallelized as well as
optimizations when the data is recombined, such as pre-sorting. These
enhancements allow results to be returned more quickly.”

This sounds HUGE. I want to see detailed benchmarks and examples of the kind
of workloads that are faster now.

~~~
riffraff
maybe the benchmarks are not so impressive and the workloads not super common,
so they didn't want to mislead you? :)

Anyway, some more info [http://rhaas.blogspot.hu/2017/03/parallel-
query-v2.html](http://rhaas.blogspot.hu/2017/03/parallel-query-v2.html)

~~~
simonw
Thanks, that's a really detailed article.

------
brianskarda
The wiki entry linked at the bottom of the page has a ton of good information
[https://wiki.postgresql.org/wiki/New_in_postgres_10](https://wiki.postgresql.org/wiki/New_in_postgres_10)

------
disconnected
I've used both mysql and postgresql and they worked just fine for my needs.

But I have always been curious: how does postgresql (or even mysql) stack up
vs proprietary databases like Oracle and Microsoft sql server?

~~~
mohaine
The warts from a dev perspective:

1) Oracle is really lacking in modern features/usability. Features where
frozen in roughly 1999 and they are pretty still the same (mostly). (You are
STILL limited to 30 chars for a table name FFS). They do add new stuff from
time to time but anything existing isn't modified. Works but not fun to work
with.

2) MSSQL needs NOLOCK everywhere (I've seen codebases with this on EVERY
query). The default locking really sucks. I'm sure a DBA can make the locking
sane system wide but I've never seen this on any of the DBs I've worked with.
Also, SQL Manager is a PITA IMHO. Toad it is not. Almost all DB interactions
via a 1G windows only install is a "bad idea"

3) MySQL is nice but will just silently eat your data from time to time. Auto
truncate is evil, as is missing enums. These have both hit me multiple times
in production. Note: Not sure if this is still the case since I avoid it now
for this reason.

4) Postgres. Lots of nice features and easy to work with but the optimizer
will sometimes do something silly. Sometimes you have to cast your bound
variable just use an index. (id=? => id=?::NUMBER just because you do a
setObject in jdbc)

~~~
beefield
> MySQL is nice but will just silently eat your data from time to time. Auto
> truncate is evil, as is missing enums. These have both hit me multiple times
> in production. Note: Not sure if this is still the case since I avoid it now
> for this reason.

Looks like I am forced to use MySQL (or some of its variants ) in the near
future. This thing about MySQL eating data is a statement I have read about
occasionally. Is there any way to identify and beware use cases where this
could happen? Would there be any more thorough documentation of this issue
anywhere?

~~~
mohaine
Older versions would silently allow you to insert 75 chars in to a 50 char
column. The extra was just gone. Of course without an error, nobody notices
until somebody notices the missing data. This is usually an end user in
production and the data is just gone.

Also watch out/avoid enums.

Example:

CREATE TABLE shirts ( ... size ENUM('x-small', 'small', 'medium', 'large',
'x-large'));

You have to specify all the values in the alter so to add xx-small it is

('xx-small','x-small', 'small', 'medium', 'large', 'x-large')

and then later if you add xx-large and forgot about the xx-small add:

('x-small', 'small', 'medium', 'large', 'x-large', 'xx-large')

You just silently lost all the xx-small values, they have been promoted to
different values that exist. (Unless this has been fixed as well). Migration
scripts are the real issue as they don't know about any custom values that may
have been added out of band.

~~~
beefield
Thank you. Helpful.

------
jarym
Postgres continues to astound me and the dedication of the people working on
it is wonderful. Thank you to everyone who ever contributed a line of code or
to testing or to documentation. Everything is top-notch!

------
tabeth
As someone who's familiar and uses postgres, but not familiar with the more
detailed things databases/postgres related where would be a good place to
start?

I don't know what I don't know and am not familiar with situations in which
the new functionality introduced here should be used.

TLDR: am database/postgres noob. Help?

~~~
zitterbewegung
The manuals are really well written and easy to access. I would start at the
release notes from the manual and then look at the corresponding manual pages.

[https://www.postgresql.org/docs/10/static/release-10.html#id...](https://www.postgresql.org/docs/10/static/release-10.html#idm46046834158304)

[https://www.postgresql.org/docs/](https://www.postgresql.org/docs/)

~~~
tibbon
I find they are well written for some things, but not others. They are good
manuals for documentation, but not guides. I was trying to read about certain
lock events recently, and understand their impacts, and aside from a single
reference in the manually there was absolutely nothing.

They tell the what, but rarely the why.

------
jfbaro
Congratulations to the PG team! Another great release. Does anyone know when
will PostgreSQL 10 be available on public cloud (AWS in particular) as a
managed service (RDS)?

~~~
oskari
PostgreSQL 10 will be available in Aiven (managed Postgres and other databases
in multiple clouds including AWS and GCP, see
[https://aiven.io](https://aiven.io)) within a couple of days after we've
completed our validation process with the final version.

~~~
esseti
well done people!

~~~
oskari
PG 10 is now available on AWS, Google Cloud, Azure, DigitalOcean and UpCloud
with Aiven.

See [http://blog.aiven.io/2017/10/aiven-is-first-to-offer-
postgre...](http://blog.aiven.io/2017/10/aiven-is-first-to-offer-
postgresql-10.html) for more information

------
lwansbrough
Native partitioning, as well as the advancements on the replication and
scaling side of things look like good first steps for a distributed SQL
system.

Can anyone speak to how much closer this brings Postgres to being able to work
like Spanner/CockroachDB? Partitioning is great but having to define my own
partitions ahead of time isn’t a great solution for us folks who want to be
able to spin up new databases on the fly (without the overhead of assigning
new ranges for the tables on these new instances.)

Obviously CockroachDB has a long way to go before it works as well as a single
instance Postgres DB. But how far does Postgres have before it works as well
as a multi-instance Cockroach DB?

~~~
qaq
What is the largest production instance of CockroachDB ?

~~~
MycroftH
I can't talk about the largest production database, but our largest internal
cluster we've tested on so far is 128 machines and hit each one with our own
continuous load generators and a chaos monkey (who turns on and off machines
at random).

~~~
qaq
Cool but how much data was it storing and what was the performance?

------
coldcode
I haven't worked with databases in a while, at my employer we are moving to
MariaDB (from MySQL) - is there some reason why we wouldn't be considering
PostgreSQL? Is there some drawback to P?

~~~
YorickPeterse
Not so much a reason to not use it, but something to keep in mind: queries
such as `SELECT COUNT(*)` tend to be a bit more expensive in PostgreSQL
compared to MySQL/MariaDB. This doesn't necessarily mean they're always
slower, but it's something you should take into account.

Another thing to take into account is that updating between minor versions
(major versions per 10.x) is a bit tricky since IIRC the WAL format can
change. This means that upgrading from e.g. 10.x to 11.0 requires you to
either take your cluster offline, or use something like pg_logical. This is
really my only complaint, but again it's not really a reason to _not_ use
PostgreSQL.

~~~
ZitchDog
With logical replication in PG 10 the wal issue shouldn't be a factor going
forward :)

~~~
snuxoll
Here's hoping someone writes an alternative to pg_upgrade to handle this
automatically. Hell, I'd be willing to throw some money in.

------
tomp
Hm... looks like it's mainly focused on distributed stuff. Instead, I would
hope they focus on the bare essentials as well.

I've recently started working with PostgreSQL and stumbled upon a problem with
their UPSERT implementation.

Basically, I want to store an id -> name mapping in a table, with the usual
characteristics.

    
    
        CREATE TABLE entities (id SERIAL PRIMARY KEY, name VARCHAR(10) UNIQUE NOT NULL);
    

Then, I want to issue a query (or queries) with some entity names that would
return their indices, inserting them if necessary.

It appears that's impossible to do with PostgreSQL. The closest I can get is
this:

    
    
        INSERT INTO entities (name) VALUES ('a'), ('b'), ('c')
        ON CONFLICT (name) DO NOTHING
        RETURNING id, name;
    

However, this (1) only returns the newly inserted rows, and (2) it increments
the `id` counter even in case of conflicts (i.e. if the name already exists).
While (2) is merely annoying, (1) means I have to issue at least one more
SELECT query.

We like to mock Java, C++ and Go for being stuck in the past, but that's
nothing compared to the state of SQL. Sure, there are custom extensions that
each database provider implements, but they're often lacking in obvious ways.
I really wish there was some significant progress, e.g. TypeSQL or CoffeeSQL
or something.

~~~
fusiongyro
I'm having trouble imagining why you need this. Often when people come up with
"interesting" scenarios involving UPSERT, they are trying to use the database
as persistent memory for some algorithm, rather than as a relational database.
What are you actually trying to do? Why would it be insufficient to do
something like:

    
    
        BEGIN;
    
        INSERT INTO entities (name)
        (VALUES ('a'), ('b'), ('c') EXCEPT SELECT name FROM entities);
    
        SELECT id, name FROM entities WHERE name IN ('a', 'b', 'c');
    
        COMMIT;

~~~
kuschku
You can even go further.

Use WITH select id, name from entities where name in :data AS existing

Then return a UNION of existing and (INSERT into entities (name) :data except
existing returning id, name)

~~~
jimktrains2
That still executes two queries under the hood. The basic issue is a
misunderstanding of what INSERT RETURNING does.

I've actually seen the specific misunderstanding many times on forums and on
IRC. I wonder if it'd be possible to change the syntax to something like
"RETURNING [ ALL | NEW ] * | output_expression [ [ AS ] output_name ]" where
when empty "NEW" would be used and be equivalent to the current semantics
where "ALL" would include rows that were not inserted on conflict. "ALL" would
have no different meaning on "ON CONFLICT DO UPDATE" or when no conflict
resolution was specified.

~~~
fusiongyro
I balk at the added complexity. Just use two queries. This is why God created
transactions.

~~~
tomp
I think your solution is good. I'll try it. Assuming entries are only ever
added (and never changed), you don't even need a transaction!

~~~
fusiongyro
You always need a transaction, even just for reads. Otherwise you are not
reading from a single consistent snapshot of the world.

~~~
tomp
So you're saying that a SELECT statement that happens strictly _after_ an
INSERT statement (on the same connection) can possibly _not see_ the data that
was written by the INSERT statement?

I guess that would be possible, but I would be very surprised if that was
so... Also, I would also expect writes to rows to be atomic (i.e. you wouldn't
see a row with half-written data, or a row that has an `id` but not (yet) a
`name`) - again, that kind of behaviour would be possible, but very surprising
to the point of the DB being unusable.

~~~
fusiongyro
This is why almost all ORMs create a transaction and have a "unit-of-work"
that encompasses a single transaction.

Race conditions are always surprising. Fortunately, we have a simple remedy
for that: transactions. :)

Suppose you have another connection and it runs DELETE FROM t WHERE id =
(SELECT MAX(id) FROM t). If that winds up in the middle, it could screw things
up for you. Is it likely to happen? No, but again... the remedy is simple.
Also, the scenario you describe seems simple enough, but what happens when you
bulk insert a few thousand rows?

By the way, with Postgres, there really is no way to not have a transaction.
If you don't issue BEGIN, it treats the statement as a small transaction. So
you're getting the same overhead on a single statement, if your plan was to
somehow avoid the overhead.

If this makes you question the database's usability, I have some bad news for
you: all ACID compliant databases work this way. The "atomic" unit is the
transaction, not the connection. The consistency guarantees pertain to the
transaction, not the connection. This is not a weird thing about Postgres,
this is how they all work (except, of course, that MySQL can't always roll
back aspects of a transaction, like DDL).

------
maxpert
Really love the builtin sharding and parallelism. Would wait a little before
using it in a project.

~~~
m12k
In chess, the pawns go first

~~~
irishsultan
Not really, knights can storm ahead.

------
Tostino
I'm super excited for the better parallelism, and all the little quality of
life improvements (like enums / uuids being able to participate in exclusion
constraints). There was an amazing amount of work that has gone into this
release.

Amazing job everyone who has worked on it.

------
intsunny
Congrats on releasing logical replication! No more WAL horror stories, or
whole "host to host" replication.

I suspect many larger shops will wait till PGSQL 10.1 before going whole-hog
on this feature, but exciting stuff nonetheless!

~~~
Tostino
Unlike previous releases, 10.1..2..3 will be just bug fixes. The next major
release will be PostgreSQL 11.

~~~
ddorian43
He's saying new software always (rare in pg) has bugs, so you wait some minor
versions before rolling out.

------
netcraft
Congrats to the team! Cant wait to take the performance improvements and hash
indexes for a spin.

------
ciconia
A heartfelt thank you to all PostgreSQL contributors for such a wonderful
solid, well documented and versatile tool. One of my absolute favorite pieces
of software.

------
ropeladder
Just upgraded and it's been rocky so far. (I'll spare you the upgrade details,
just make sure you don't try to upgrade using apt if you initially installed
the enterprisedb.com bundle)

But now that I've got v10 running I'm unable to do a fairly simple CREATE
TABLE command that worked fine in 9.6: it maxes out my RAM and my swap until
the db crashes. (which never happened once in 9.6)

~~~
majewsky
Did you file a bugreport?

(This is not directed at you personally, but a _lot_ of people just complain
about bugs on public forums like this one without filing a bugreport, thus not
giving the developers a chance to address the problem. I'm guilty of this
myself way too often.)

~~~
ropeladder
Just started writing it up! Thanks for the encouragement.

------
buro9
For someone running a bunch of sites on a single Postgres 9.2... what's the
best guide for upgrading?

~~~
striking
Give [https://blog.2ndquadrant.com/untangling-the-postgresql-
upgra...](https://blog.2ndquadrant.com/untangling-the-postgresql-upgrade/) a
read perhaps.

~~~
buro9
Thanks, that's helpful.

For the database size it seems the recommended approach is the pg_dump >
pg_restore... with a maintenance window and a new VM holding Postgres 10.

This looks pretty safe and I'd be happy following that process, and it would
let me practise it a few times first which I like.

~~~
snuxoll
You can practice with pg_upgrade as well, backup your PG_DATADIR, restore it
on your new machine and run the upgrade. I typically don't bother doing it on
a separate VM myself, I just run the upgrade without the --link flag so it
copies all the data files and log segments, test and then open it up.

~~~
buro9
Also helpful :) Thank you.

------
seaghost
Postgres is my go to database for any project size.

------
LeicaLatte
I have moved to PostgreSQL in all my production apps and never been happier.
Looking forward to migrating to 10 by the year end.

------
swasheck
Excited for the release. Disappointed that EnterpriseDB is now making me sign
up for an account to download (for Windows).

~~~
latch
I just tried and wasn't forced to sign up.

I'd link straight to the exe which doesn't require a token / session as proof,
but rather they have the analytics data.

~~~
swasheck
Ah. I generally get the zipped binaries and that's where I'm bumping into the
account creation requirement.

------
romanovcode
How to properly install it on Xenial? When I go to how-to link[0] it still
shows instructions for 9.6.

[0]
[https://www.postgresql.org/download/linux/ubuntu/](https://www.postgresql.org/download/linux/ubuntu/)

~~~
anarazel
You should be able to install by simply installing 10 instead of 9.6. I just
checked, and it's in the repository. I.e.

    
    
        apt-get install postgresql-10
    

all supported versions can be installed that way.

~~~
romanovcode
Oh, OK. Great! Thank you.

------
esseti
Does anyone have a tutotrial on how to upgrade the version in Debian? If I
install it from the apt then the service remained connected to the 9.6 and not
the 10. Plus, I've to migrate all the configs, is there anaything that helps
in doing this?

------
jcadam
Hmm... I have some tables partitioned using the popular pg_partman extension.
I wonder how that will interact with pg 10's new partitioning functionality.

I half expect everything to be borked if I try to upgrade to 10 :/

~~~
scott_karana
The new partitioning requires explicit declaration, so your existing triggers
and tables shouldn't be affected.

[https://www.keithf4.com/postgresql-10-built-in-
partitioning/](https://www.keithf4.com/postgresql-10-built-in-partitioning/)

~~~
jcadam
Good, that makes it less 'urgent' at least. Though, I really hate the clunky
way partitioning works in Postgres<10... I might have to take the time to
rework my partitioning at some point.

------
arrty88
what is the best way to shard on postgres?

~~~
zie
There are tons of ways to do that, the lazy way would probably be pay someone
like Citus
([https://www.citusdata.com/product](https://www.citusdata.com/product)).
Alternatively, use a 3rd party tool, do manual sharding, or the PG FDW, or ...
the list is pretty endless. Depends a lot on your needs, requirements, how
much sharding you have to do, etc. It's not a simple topic.

First I'd make sure your DB sizes really NEED that before going down that
road, it's a lot of hassle, and I'd personally push for just sizing your DB
box up and up and up before going down that road if at all possible. Simple
solutions(read: not sharding) usually win, especially when production falls
over and you have to prop it back up.

------
thebiglebrewski
Thank you PostgreSQL team you are awesome!!

------
med_abidi
It's also about time that they upgrade their website. It's so 1999.

~~~
Tostino
It works fine for me. If you think it's holding things back, why not
collaborate with the community and work on something you feel is better?

------
softinio
Wish they had called it PostgreSQL X :)

~~~
jeltz
It is confusing enough with Postgres-XC and Postgres-XL to have a PostgreSQL
X.

