
PostgreSQL 9.5: UPSERT, Row Level Security, and Big Data - ahochhaus
http://www.postgresql.org/about/news/1636/
======
saosebastiao
Whenever there is a version update, I can't help but be grateful for the
documentation ethic of Postgres. For the vast majority of my projects I have
to wade through unaffiliated and incomplete blog tutorials that may or may not
be relevant to the version I'm trying to use. With anything related to
Postgres, I may read about something on a blog post, but I _always_ know that
I can count on the Postgres documentation if I need supplemental information,
or sometimes I'll skip the post and go straight to the official docs. The
PostgreSQL project, in my mind, sets the standard globally for software
documentation.

I should add that Postgres was the first database I ever used, and I literally
learned pretty much everything I know about Postgres, SQL, as well as
Relational and Set Logic from the official docs. And that was with no
background in software development and an undergraduate business degree with
Excel being my most technologically advanced toolset. _That_ is a
documentation success story.

~~~
jrapdx3
Remarkable similar to my own history with Postgresql, which I started using in
~1998 at the time of their first public release. Postgresql documentation has
indeed been the exemplar for all software, open source or not. It's been the
SQL textbook I've relied on.

With the steady addition of features, it's gotten much more complex, and there
will come a time when using just the documentation won't be enough to learn
how to use Postgresql to full advantage. With release of 9.5 we might be there
now.

Perhaps the logical extension of the documentation is some form of coursework
to enable users to learn the DB systematically. I haven't looked into it, this
might already be offered.

~~~
AlisdairO
(self plug) for coursework on the read-only SQL side, you might want to give
[http://pgexercises.com](http://pgexercises.com) a try. I have ambitions to
expand it to include arrays and json, but alas haven't found the time so
far...

~~~
Alex3917
This is awesome, thanks for making it!

~~~
AlisdairO
Cheers, hope it's useful :-)

------
lugus35
With a REST layer like
[https://github.com/begriffs/postgrest](https://github.com/begriffs/postgrest)
you don't need any extra application server layer to serve your data, securely
(RLS). Bye bye Java EE ?

~~~
reitanqild
You just got an accidental downvote as I came back after bookmarking your
link. Sorry.

Dang/mods: Can we please please get something to avoid accidentally downvoting
good stuff on mobile phones?

~~~
danieltillett
Happens all the time and has been the most requested feature of all time. I
like to take the charitable view and say the current design is there to let
your ego explain away down votes - i.e. I am only being down voted by
accident, not my post is terrible.

------
tracker1
Really nice to see the direction things are moving in... I do feel that the
replication/failover story needs a lot of work, but there's been some progress
towards getting it in the box. Even digging into any kind of HA strategy is
cumbersome to say the least (short of a 5-6 figure support contract). It's one
of the things that generally stops me from considering PostgreSQL for a lot of
projects. As a side note, I really like how RethinkDB's administrative
interface is and their failover usage. It would be great to see something
similar reach an integration point for PostgreSQL.

I also think that PLv8 should probably make it in the box in the next release
or two. With the addition of JSON and Binary JSON data options, having a
procedural interface that leverages JS in the box would be a huge win IMHO.
Though I know some would be adamantly opposed to this idea.

~~~
roeme
_> I do feel that the replication/failover story needs a lot of work, but
there's been some progress towards getting it in the box. Even digging into
any kind of HA strategy is cumbersome to say the least (short of a 5-6 figure
support contract)_

Eeeh...a simple HA solution can be developed in about a week (I was able to do
so on 9.3, and so far, it held it's ground). Also, now with 9.5's pg_rewind
you can easily switch back and forth between nodes
([http://www.postgresql.org/docs/9.5/static/app-
pgrewind.html](http://www.postgresql.org/docs/9.5/static/app-pgrewind.html)),
simplifying things a great deal. Can't imagine that's 5-6 figures.

I agree that you don't get a Plug&Play-Solution out of the box, but from
anecdotal evidence they often don't quite work as advertised anyway (remember
1995? And I'm sure your friendly DBA has some stories to share as well).

------
jeltz
There are plenty of nice minor improvements in the release notes. One of my
favorites is "Allow array_agg() and ARRAY() to take arrays as inputs (Ali
Akbar, Tom Lane)", it will come in handy when writing ad hoc queries to
understand stored data. Right now I have to build a string which I use as
input to array_agg().

~~~
Twisell
I'm so pleased to see that I was not the only freak out here doing that!

------
spamizbad
I'm very excited about CUBE and ROLLUP -- I am just about to start a project
that barely requires an OLAP shim. Now it looks like I can just do it all with
just database features. Yay for fewer dependencies!

~~~
paulsmith
I'm not familiar with those statements, can you provide an example?

~~~
jsmeaton
They're useful for providing summaries like row totals and column totals.
Rather than just get an aggregated count for the total GROUP, you can also get
aggregated counts for each unique combination of columns within the GROUP.

    
    
      col1 col2 count  
      ---- ---- -----  
      a    b    10  
      a    null 5  
      null b    5  
    

There's more to it than that obviously, but you can read about them here:
[http://www.postgresql.org/docs/devel/static/queries-table-
ex...](http://www.postgresql.org/docs/devel/static/queries-table-
expressions.html) (7.2.4. GROUPING SETS, CUBE, and ROLLUP)

------
tommoor
I might get some hate, but I also think upsert was one of the best features
that MongoDB offered that PG didn't, so this is a big win from that
perspective too.

~~~
bpicolo
Which makes sense, because the atomicity of the upsert is really the tricky
part.

------
ropiku
Great that Heroku sponsored upsert and have support for 9.5 right now (in
beta): [https://blog.heroku.com/archives/2016/1/7/postgres-95-now-
av...](https://blog.heroku.com/archives/2016/1/7/postgres-95-now-available-on-
heroku)?

------
elchief
Regarding Row Security, yes you can use it with a web application and still
use connection pooling.

From web server, connect to db as one user then SET ROLE to the database user.
This gives you Column Security and easier auditing as well. See
[http://stackoverflow.com/questions/2998597/switch-role-
after...](http://stackoverflow.com/questions/2998597/switch-role-after-
connecting-to-database)

~~~
jimktrains2
The thing is that each application user now needs a corresponding DB user to
use RLS. While this isn't a huge problem, it's different than how most (if not
99%?) of applications work.

~~~
anarazel
No, RLS does _not_ necessarily require separate database users. Using database
users is one relatively obvious way to use the feature, but you can very well
do something like 'SELECT myapp_set_current_user(...)' or something, and use a
variable securely set therein for the row restrictions.

~~~
jimktrains2
Interesting. I didn't think about doing that. 9.2 makes that much easier to do
[http://dba.stackexchange.com/questions/97095/set-session-
cus...](http://dba.stackexchange.com/questions/97095/set-session-custom-
variable-to-store-user-id)

------
overcast
Interesting, I wasn't aware PostgreSQL didn't have UPSERT until now. MySQL has
INSERT on DUPLICATE functionality that is similar.

~~~
avidal
Yep. Been a long requested feature. One of the reasons why the post states:
"This feature also removes the last significant barrier to migrating legacy
MySQL applications to PostgreSQL."

~~~
avar
What do they mean by "legacy" in this context? INSERT ... ODKU is not a legacy
feature of MySQL, it's a currently supported first-class feature of the
database, nor is MySQL itself a "legacy" database.

~~~
brlewis
I think they're referring to mysql. The sentence would have worked just as
well without the word "legacy". I say this as someone who prefers PostgreSQL.

~~~
X-Istence
No, they are referring to an application that is being moved from MySQL to
PostgreSQL. The application is "legacy" in that it is an older version and the
new version is "current".

Due to the English language however there can be some debate as to what they
meant. In this case "legacy" is most likely meant to describe the "MySQL
application" not "MySQL" itself.

~~~
brlewis
You're wrong about which meaning is more likely.

A meaning that adds something to a sentence is a more likely meaning than one
that adds nothing to a sentence. If you take "legacy" to mean "being migrated
from" then the sentence becomes

This feature also removes the last significant barrier to migrating _being-
migrated-from_ MySQL applications to PostgreSQL.

It's more likely that if "being migrated from" was the intended meaning, they
would have simply left the word out.

~~~
elbear
Your comment assumes the author of the release notes has perfect command of
the English language and they thought through in detail what the word "legacy"
would mean in this context.

~~~
brlewis
Not at all.

First, my comment says "more likely" so it isn't assuming anything.

Second, if we change "more likely" to "definitely", the assumption is merely
that the sentence in question is written with the same command of the English
language as the rest of the announcement, i.e. no egregiously redundant words.

------
jayess
Can anyone suggest a good "getting started" tutorial for
PostgreSQL/debian/php? I've been using Mysql for years and would like to give
Postgres a try.

~~~
olefoo
Install it, and then build something.

A few things you will want to look at that are different:

1\. data types are much richer and more useful than in mysql

2\. transactional DDL means migrations are atomic.

3\. schemas are what mysql refers to as databases. Remember to set
`search_path`.

4\. roles and grants are somewhat more expressive and work differently than in
mysql, but not that differently for the simpler use cases

5\. database functions ( aka stored procedures ) are awesome as are extension
languages.

~~~
tracker1
On point 5... love PLv8, which imho makes working with the newer JSON data
types really nice.

------
btilly
Yay! I like the changes.

But they are doing absolutely nothing about my biggest beef with PostgreSQL.
Which is that there is absolutely no way to lock in good query plans. It
always reserves the right to switch plans on you, and sometimes gives much,
much, much worse ones. No other database does this to me. Even MySQL's stupid
optimizer can be reliably channeled into specific query plans with the right
use of temporary tables and indexes.

This is a problem because improvements don't matter if the query plan is "good
enough". But they will care if you screw up. PostgreSQL usually does well, but
sometimes screws up spectacularly.

The example that I have been struggling the most often with in the last few
months is a logging table that I create summaries from. Normally I only query
minutes to hours, but I set it up as a series of SQL statements so I first put
the range in a table, and then have _happened BETWEEN range_start AND
range_end_. PostgreSQL really, Really, REALLY wants to decide that the index
on the timestamp is a bad idea, and wants to instead do a full table scan.
Every time it does, summarization goes from under a second to taking hours.

Hopefully the new BRIN indexes will be understood by the optimizer in a way
that makes it happier to use the index. But I'm not optimistic. And if I lean
on it harder, I'm sure from past experience that I'll find something else that
breaks.

~~~
ProblemFactory
There is some discussion on why the Postgres team dislikes query hints here:
[https://wiki.postgresql.org/wiki/OptimizerHintsDiscussion](https://wiki.postgresql.org/wiki/OptimizerHintsDiscussion)

But perhaps I also have some practical advice to try.

I had a similar issue: I have a few tables with sensor data, 300-500 million
rows, indexed among other things by event type. Some counting queries kept
defaulting to full table scans. It turned out that this was because of limited
statistics on the distribution of counts by event type.

The default_statistics_target config parameter sets how many entries Postgres
keeps in the histogram of possible values per column, the default is 100 I
think. Because my event types were not evenly distributed, the less frequent
ones were missing from the statistics histogram altogether, and somehow this
resulted in bad query plans.

As a fix, I upped the default_statistics_target to 1000, and to set it to 5000
for the biggest tables. Then after a vacuum analyze, the query planner started
making sensible choices.

Another thing to try is perhaps reducing the random_page_cost config parameter
from it's default of 4.0. On SSDs, random page costs are much closer to 1 than
they are 4 (compared to long sequential reads).

~~~
btilly
This is all good optimization advice, but I don't think it is applicable to my
specific case.

My problem is not that PostgreSQL does not understand the distribution of my
data. It does. The problem is that it comes up with a query plan without
realizing that I'm only querying for a very small range of timestamps.

If this happens again, I'll have to try rewriting code to send it queries with
hard-coded timestamps, cross fingers and pray. I find prayer quite essential
with PostgreSQL sometimes because as ineffective as it is, at times I've got
nothing else.

~~~
anarazel
> My problem is not that PostgreSQL does not understand the distribution of my
> data. It does. The problem is that it comes up with a query plan without
> realizing that I'm only querying for a very small range of timestamps.

Which version did you reproduce that on? While the problem has not been
generally addressed, the specifically bad case of looking up values at the
"growing" end of a monotonically increasing data range has been improved a bit
over the years (9.0 and then some incremental improvement in 9.0.4).

~~~
btilly
PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.2
20140120 (Red Hat 4.8.2-16), 64-bit

If it matters, it is an Amazon RDS instance.

------
reactor
Congrats to everyone involved, indeed an amazing opensource project that
displays true integrity and discipline.

~~~
petergeoghegan
Thanks

------
wingsonfire
I am looking forward for Row Level Security, Though cell level security will
be even more awesome to have.. Difficult to achieve in SQL. Only Apache
Accumulo in NoSQL space has it.. But once we have it make sure no one has
access to SSN column and we will be protected to one degree in data breaches.

~~~
uberneo
Any practical use cases of Row level security ?

~~~
Sanddancer
Imagine you're working in a place with a rather large sales/marketing
department. Now, sales is a pretty cutthroat job, and some people will do any
cheat possible to get ahead of their co-workers. In a typical sales database,
someone who manages to borrow a bit of sql from a techie friend could
potentially go in and get sales leads from co-workers, poaching their deals.
With row level security, the sleezier co-worker isn't allowed to look at those
rows to be able to poach.

~~~
brunoqc
Do you mean as another security layer? I guess the sales people don't normally
have direct access to the databases and the softwares already restricts which
data they can see.

Maybe you mean just in case they manage to bypass the software restrictions.

~~~
tracker1
In some cases, people will defer to database level security restrictions. It
really depends on how much application logic is in the database. Some
applications are designed with as much logic as possible in the DB, including
each user being a db user with credentialed access. Others will treat the DB
as dumb storage with all access through a programatic API... with thin API
shims over DBMS, the db security is paramount.

------
ankimal
[http://www.postgresql.org/docs/9.5/static/brin-
intro.html](http://www.postgresql.org/docs/9.5/static/brin-intro.html) If like
me you were looking for what BRIN index is all about.

~~~
avita1
Out of curiosity, has anyone managed to find something more detailed about the
guts?

It sounds like it's basically a BTree that stops branching at a certain
threshold, but I'm almost certainly wrong.

~~~
anarazel
No, that's not really it, although you could see it as a very degenerate form
of a btree. Basically it's using clustering inherent to the data - say a
mostly increasing timestamp, autoincrement id, model number ... - to build a
coarse map of the contents. E.g. saying "pages from 0 to 16 have the date
range 2011-11 to 2011-12" and "pages from 16 to 48 have the date range
2012-01-01 to 2012-01-13". With such range maps (where obviously several
overlapping ranges can exist) you can build a small index over large amounts
of data.

Obviously single row accesses in a fully cached workload are going to be
faster if done via a btree rather than such range maps, even if there's
perfect clustering. But the price for having such an index is much lower,
allowing you to have many more indexes. Additionally it can even be more
efficient to access via BRIN if you access more than one row, due to fewer
pages needing to be touched.

------
sandGorgon
anybody know how quickly does RDS upgrade to newer versions of Postgres.

I'm really, really keen to use 9.5 jsonb with its insert/update changes.

~~~
andor436
While you wait, check out
[http://stackoverflow.com/a/23500670/229006](http://stackoverflow.com/a/23500670/229006)

My plan is to rely on these functions for now, and switch to the native
implementations once 9.5 is production ready on RDS.

~~~
sandGorgon
this is so cool... thanks!!

------
dandigangi
I swear... I will redesign and develop Postgre's site for free.

~~~
spacemanmatt
I'm curious, what do you think needs to be redesigned?

~~~
dandigangi
I wish I could say it's circa Web 2.0 but it's still stuck even farther back
than that. I mean it works which is great but I loathe spending time on it
because it's such a poor experience.

------
systems
how does postgresql upsert compare to ms sql's merge statement

i want to look deeper into this, but didnt have the time but from the little i
read, seems ms sql merge is more powerful

~~~
jeltz
The new PostgreSQL syntax is more convenient to use in the UPSERT use case
while the MERGE syntax is more convenient to use when doing complicated
operations on many rows of data (for example when merging one table into
another, with a non-tricial merge logic).

The reason PostgreSQL went with this syntax is that the goal was to create a
good UPSERT and getting the concurrency considerations right with MERGE is
hard (I am not sure of the current status, but when MERGE was new in MS SQL it
was unusable for UPSERT) and even when you have done that it would still be
cumbersome to use for UPSERT.

EDIT: The huge difference is that PostgreSQL's UPSERT always requires a unique
constraint (or PK) to work, while MERGE does not. PostgreSQL relies on the
unique constraint to implement the UPSERT logic.

~~~
dsp1234
_I am not sure of the current status, but when MERGE was new in MS SQL it was
unusable for UPSERT_

I've used MERGE as an UPSERT using MATCHED/NOT MATCHED and
SERIALIZABLE/HOLDLOCK since it was introduced in mssql 2008. It was one of the
first features I upgraded my code to use, and it worked out of the box with no
issues.

~~~
jeltz
See this blog post for what I am talking about:
[https://www.mssqltips.com/sqlservertip/3074/use-caution-
with...](https://www.mssqltips.com/sqlservertip/3074/use-caution-with-sql-
servers-merge-statement/)

If PostgreSQL had gone the same route as MS SQL I would have expected a
similar set of bugs. I suspect all of this have been fixed by now, but I do
not follow MS SQL.

------
cjauvin
I wonder how much time it will take to appear in the Ubuntu apt repo? Should
it be already there (I don't see it yet)?

Edit: I meant apt.postgresql.org of course, not the official Ubuntu repo..

~~~
jeltz
No idea, but the PostgreSQL community distributes official Debian and Ubuntu
packages at apt.postgresql.org. They should already have 9.5, or if not have
it very soon.

------
chbrown
Every time there's a minor version update I have to remind myself the sequence
of upgrade incantations. It's pretty simple, but here's a gist that might help
anyone upgrading from 9.4 to 9.5 with Homebrew:

[https://gist.github.com/chbrown/647a54dc3e1c2e8c7395](https://gist.github.com/chbrown/647a54dc3e1c2e8c7395)

------
jhealy
pglogical ([http://2ndquadrant.com/en-
us/resources/pglogical/](http://2ndquadrant.com/en-us/resources/pglogical/))
claims to allow cross version upgrades from 9.4 to 9.5 with minimal downtime,
but the documentation seems fairly light-on.

Has anyone come across a guide to using it for upgrades?

------
desmondrd
Upsert is something I expect so commonly in modern databases nowadays. Happy
to see it here with Postgres.

------
gcb0
search in 5% the time it would take to search a btree? anyone can see that
with actual data?

~~~
gcb0
[http://pythonsweetness.tumblr.com/post/119568339102/block-
ra...](http://pythonsweetness.tumblr.com/post/119568339102/block-range-brin-
indexes-in-postgresql-95)

the very first example points to BRIN indexes resulting in smaller index than
btree but with much longer search time... so i guess the 5% time figure was
very use-case specific?

------
tmaly
this is excellent news. I really enjoy using postgresql in one of my current
projects. I look forward to using upsert and the new indexes

------
omarforgotpwd
What an absolutely fantastic project. The recent releases have all been very
exciting.

------
gionn
Bye bye mongodb.

------
mmaunder
Unless I'm mistaken MySQL has had this for almost a decade with "ON DUPLICATE
KEY UPDATE". I'm seeing a lot more about PSQL here and in the news. I've
always found it to be unfriendly and slow. Why the new attention? Is there
really something about PSQL that makes it better than MySQL these days? It
used to be transactions, but InnoDB made that moot years ago.

We do over 20,000 queries per second on one of our production mysql DB's and
I'm not sure I'd trust anything else with that:
[http://i.imgur.com/sLZzXhS.png](http://i.imgur.com/sLZzXhS.png)

Just curious if I'm missing out on some new awesomeness that PostgreSQL has or
if it's just marketing.

~~~
sigil
> Is there really something about PSQL that makes it better than MySQL these
> days?

In a word: correctness.

Yes, MySQL has an UPSERT implementation. Like so many things MySQL rushed out
the door, it's also buggy and unpredictable. Did you know UPSERTing into a
MySQL table with multiple unique indexes can result in duplicate records? Did
you know MySQL's ON CONFLICT IGNORE will insert records that violate other
not-NULL constraints? [1]

I've used both MySQL and PostgreSQL for over a decade, and working around the
many MANY misbehaviors and surprises in MySQL requires continuous dev effort.
PostgreSQL on the other hand is correct, unsurprising, and just as performant
these days.

MySQL is what happens when you build a database out of pure WAT [2].

[1]
[https://wiki.postgresql.org/wiki/UPSERT#MySQL.27s_INSERT_......](https://wiki.postgresql.org/wiki/UPSERT#MySQL.27s_INSERT_..._ON_DUPLICATE_KEY_UPDATE)

[2]
[https://www.destroyallsoftware.com/talks/wat](https://www.destroyallsoftware.com/talks/wat)

~~~
mmaunder
Nah. MySQL rocks. I've been using it since 1998 at Credit Suisse, 2000 at
eToys.com where we used it to run the entire company from warehouse to web. I
used it at the BBC in 2003 for a high traffic Radio 1 application and I've
used it since then on my own companies with serious volume including a job
search engine featured in NYTimes and Time Mag in 2005 and feedjit.com doing
real-time traffic on over 700,000 sites. We use it for Wordfence now which is
where the image link comes from I posted earlier with 20K TPS. All very high
traffic with consequences if it screws up. I've never run into any of the
issues you mention.

You say "upserting into a mysql table". Which storage engine? MyISAM? InnoDB?
I find MySQL to be both reliable and incredible durable i.e. it handles
yanking the power cord quite well. The performance also scales up linearly for
InnoDB even for very high traffic and concurrency applications.

We use redis, memcached and other storage engines - by no means are we tied to
mysql. But for what it does, it does it incredibly well.

I'm also completely open to using PostgreSQL and I was hoping someone could
give me a compelling reason to switch to it or to use it.

~~~
brobinson
I've worked at a company that was doing > 20,000 TPS (transactions/sec) on a
single PG instance with no problem. That is baby-tier usage for a DB.

As far as why you should give up MySQL like I did four years ago:
[http://grimoire.ca/mysql/choose-something-
else](http://grimoire.ca/mysql/choose-something-else)

