
Postgres gets support for upsert - anarazel
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=168d5805e4c08bed7b95d351bf097cff7c07dd65
======
pbnjay
I think I squealed with delight when I read this title. This is really a great
addition, and will simplify so much of my code once I'm able to upgrade.

I try not to think too deeply about all the DB internals (some people are much
smarter than me about this stuff), but clearly there are a lot of things that
need maintained internally to make this correct, so kudos to all the people
who contributed to this.

~~~
craigkerstiens
A big +1 to this. Peter Geoghegan is the primary author and has been
championing it for over 2 Postgres releases now. I know he's especially
excited to see it land.

Also it couldn't have been possible without the support of Heikki and Andres
(who actually submitted this to HN as his first ever submission).

~~~
anarazel
Hah, where from do you know that I don't have my sock puppet accounts ;)

------
anilshanbhag
This is actually huge. A common problem that arises when you write
applications is you want to INSERT if key does not exist else UPDATE. The
right way of doing this without an upsert is using a transaction. However this
will make life easier as you can do it directly in one SQL statement.

~~~
colanderman
Just "using a transaction" is insufficient. You must be prepared to handle the
case that neither the INSERT nor the UPDATE succeeds (with READ COMMITTED
isolation), or that the transaction fails (with REPEATABLE READ isolation or
better), by repeating the transaction. And if latency is at all a concern to
you, you must wrap this all in a stored procedure to avoid the necessary
round-trip-time between the commands.

Hence this is more than just saving typing a couple lines -- this saves
writing entire stupid loops to do what is conceptually a simple (and very
common) operation.

Postgres gets better and better.

~~~
saurik
...and then you additionally have to verify that the specific reason both the
update and the insert failed was due to concurrency, or you can make the
mistake I did a couple years ago where I had transactions spinning against my
database for weeks on end due to an unrelated constraint failure I had failed
to notice with some transactions that I kept retrying in a loop as part of my
hacked together upsert implementation (which is why my user id numbers jumped
from ~8m to ~460m... it is a meaningless mistake, but one I am reminded of
every day due to that "46" staring at me :/).

~~~
stingraycharles
Is my conclusion correct that this implies sequence increments are run outside
of the transaction?

~~~
sirclueless
Sequences in Postgres are updated even if a transaction is rolled back. It's
pretty easy to see why that's important, especially when dealing with
concurrent transactions.

You can't give out the same number to multiple transactions without causing a
bunch of unique constraint violations. And you can't retroactively decrease a
sequence number without affecting all the other transactions currently
executing. You could in theory go back and plug in the gaps in a sequence, but
it's more important to guarantee that a single thread of execution will always
see increasing numbers than it is to guarantee that all numbers will
eventually be handed out to a successful transaction.

------
viraptor
I love that it's there now. I've been waiting for it for a long time, but one
thing I don't get is... why does every single implementation have to have
their own slightly different syntax?

pgsql -> on conflict

mysql -> replace / on duplicate

oracle -> merge

mssql -> merge

sqlite -> insert or replace

firebird -> merge / update or insert

~~~
anarazel
MERGE is actually a separate feature. We're still considering implementing it.
But it's syntax is too cumbersome for many usages. There's also some unrelated
complexity - so implementing merge support does get easier by the
infrastructure in here, but a fair amount of work remains.

~~~
buremba
It's great that you guys implemented UPSERT but I hope MERGE support won't be
considered as insignificant because of this feature.

~~~
anarazel
I'm pretty sure it's not (going to be) considered insignificant.

Unfortunately that does not equate to resources (i.e. time by somebody
sufficiently crazy^Wdetermined) for implementing it being available. A large
part of postgres development is driven by individuals. Some of it on company
time, but usually not most of it.

------
DigitalSea
Yet another reason why I absolutely love Postgres. It might have taken four
years, but we finally got what many of us have been asking for. Nothing ruined
my day more than having to write insert-update loops, I am beyond ecstatic for
this.

~~~
dan15
You love Postgres because it takes four years to get requested features?

~~~
colanderman
Because the devs pay attention, and take the time to do things right. See
[https://wiki.postgresql.org/wiki/UPSERT](https://wiki.postgresql.org/wiki/UPSERT)
for a glimpse into the design that needed to go behind this.

I feel a lot of Postgres fans (myself included) put their "money" on Postgres
circa late version 7 or early version 8, back when MySQL was the more
featureful and performant of the two, while Postgres had the reputation for
being more, shall we say, _robust_. (Remember, those were the days before
InnoDB was the default in MySQL.)

The payout for investment in Postgres the past few years has been substantial
-- native replication, true serializable transactions, foreign data wrappers,
index-only scans, native JSON support, updatable views, and materialized views
are among the features added to Postgres in the last 4 years. It's matured
from being an "entry-level" RDBMS with few features that all work reliably, to
a much more enterprise-friendly RDBMS with many features that _still_ work
reliably.

~~~
pilif
_> late version 7 or early version 8, back when MySQL was the more featureful
and performant of the two_

MySQL was never more featureful. The reason I started migrating back in the
7.1 time frame (when TOAST tables were added and you could finally store more
than 8K of text in a TEXT column) was the lack of sub-selects in MySQL.

Even aside of that, Postgres was far ahead when considering basic SQL support:
stored procedures, views, subselects, check constraints, triggers, actually
enforcing foreign key constraints and so on.

It was significantly slower than MySQL, but it also scaled much better under
load. Back then, when you had low load, MySQL would be about twice as fast as
PostgreSQL but then as the load increases, MySQL's performance would drop
sharply and Postgres would stay consistent.

By now, MySQL has mostly caught up feature-wise, but there's still stuff left
that Postgres just does better. Also, even plain ideological reasons
(community project vs. oracle open-core project) would want me to stay with
postgres.

I also have anecdotal evidence that MySQL still has serious issues in the
robustness department which I've yet to see with postgres.

~~~
threeseed
MariaDB, WebScaleSQL, Percona are just three MySQL forks that are completely
open source in every single way. Likewise MySQL-Server is GPL so it definitely
should be considered as open source.

And if MySQL wasn't robust then YouTube, Facebook, Twitter, Alibaba, LinkedIn
etc wouldn't be using it for core parts of their infrastructure. It's
definitely robust.

No doubt that PostgreSQL is better at MySQL in many areas though and probably
could do with a self contained, single download PostgreSQL Cluster edition.

~~~
slapresta
> And if MySQL wasn't robust then YouTube, Facebook, Twitter, Alibaba,
> LinkedIn etc wouldn't be using it for core parts of their infrastructure.
> It's definitely robust.

Popularity is not an argument for quality. See: crocs, Justin Bieber, PHP.

Those companies you mentioned, like many others, probably use it because
they're locked in that technology, not because it's a superior one. Just like
banks still use COBOL.

If you're interested on knowing why using MySQL isn't a good idea, there's
this (highly opinionated, obviously) post about it:
[http://grimoire.ca/mysql/choose-something-
else](http://grimoire.ca/mysql/choose-something-else)

~~~
threeseed
I never said popularity was an argument for quality. You did.

What I am saying is that if MySQL wasn't robust then those companies simply
wouldn't be using it. Since at their scale any bug or weakness will manifest
at a level far greater than say at a startup. And they have the skills, time
and money to choose any technology they wan't so I don't buy your argument
that they are "locked in". Some like Facebook and LinkedIn have even created
their own databases.

And I never said MySQL was a good idea. You did. I am saying that to claim it
is not robust flies in the face of available evidence.

~~~
qooleot
" never said popularity was an argument for quality. You did."

I'd tend to think your statement:

"And if MySQL wasn't robust then YouTube, Facebook, Twitter, Alibaba, LinkedIn
etc wouldn't be using it for core parts of their infrastructure. It's
definitely robust."

..points out that many (popular) sites use it, and a prerequisite is
robustness, which thesaurus-wise, sounds a lot like quality.

I get that

a) you didn't actually say it, and

b) you could mean something much more specific, such as "companies with many
highly-starred, complex open source projects which have also re-written major
parts of their tech stack, but chose to leave mysql in place".

I know thats a bit more verbose, but it just seems to obviously close to the
other author's interpretation with the ambiguity of the statement.

~~~
philsnow
> many (popular) sites use it

threeseed's argument is not

    
    
        this handful of popular companies use it, therefore it is good
    

but rather

    
    
        this handful of companies, each of which is considered by some/many to be 'excellent' or 'top', use it, therefore it is good

------
petergeoghegan
The documentation covering the new feature has now built:
[http://www.postgresql.org/docs/devel/static/sql-
insert.html](http://www.postgresql.org/docs/devel/static/sql-insert.html)

~~~
electrotype
Postgres's documentation is one of the best I've seen. Thanks to everybody
working on it!

------
endymi0n
For the backstory on why this has taken so long:
[http://www.depesz.com/2012/06/10/why-is-upsert-so-
complicate...](http://www.depesz.com/2012/06/10/why-is-upsert-so-complicated/)

------
kbendyk
I know it doesn't matter to some developers out there, but for me, the only
thing left is the ability to add a column after another one. Even if it's only
logical order, it's easier for me to read a table structure when fields are
ordered nicely.

~~~
narsil
Unsure what you mean. A subset of columns can be selected in any order, across
tables. Were you referring to the select order used by "*"?

~~~
pmontra
Yes, I think it's about changing the default presentation order. That could be
a feature of the client, psql, pgcli, pgadmin3, but it won't be portable
across them unless it's some metadata stored inside the db. Is there any db
with that feature?

Edit: munro answered that. PostgreSQL does it but it's expensive because it
has to rewrite the table.

~~~
kbendyk
Yes, MySQL: ALTER TABLE t1 ADD COLUMN `count` SMALLINT(6) NOT NULL AFTER
`lastname`

------
ozgune
So excited about upserts in Postgres!

This was the most requested feature in PostgreSQL:
[https://postgresql.uservoice.com/forums/21853-general](https://postgresql.uservoice.com/forums/21853-general)

------
olalonde
Which version of Postgres will this land in?

~~~
amitlan
9.5, that is, upcoming!

~~~
elchief
Which will hopefully drop in September.

------
piker
I am extremely happy to hear about this. We currently insert, catch the
exception and update if necessary. That can lead us to race conditions in
concurrent transactions. Thank you very much for your excellent work on
Postgres. It continues to be an amazing product to work with.

------
abannin
I want to give a giant digital hug to all PG committers who made this happen.

------
taspeotis
Who decided the syntax? Why not use MERGE ... INTO ... USING ... ON ... WHEN
like Oracle and MSSQL and apparently ANSI SQL [1]? I'm not aware of ON
CONFLICT being standard.

[1]
[http://en.wikipedia.org/wiki/Merge_(SQL)](http://en.wikipedia.org/wiki/Merge_\(SQL\))

~~~
veddan
There's some extended discussion of the issue on the wiki[1]. A future full
MERGE implementation might have slightly different semantics or performance
characteristics than ON CONFLICT (which might also differ from other DBMS).

[1]
[https://wiki.postgresql.org/wiki/UPSERT#SQL_MERGE_syntax](https://wiki.postgresql.org/wiki/UPSERT#SQL_MERGE_syntax)

------
ivoras
Does the description seems like it's a two-pass operation to anyone else? Can
anyone with more detailed knowledge tell why can't this be a single-pass
"INSERT in a unique index, UPDATE if it fails" operation?

~~~
anarazel
To understand the problem one first has to know that for better or worse
postgres does not use index organized tables. Additionally this ought to work
not only on a primary key but also on additional unique constraints. Including
partial unique indexes, potentially over expressions (say lower(username) or
something).

That makes some approaches harder.

The bigger problem is that 'update if fails' is easier than it sounds. When
you find a conflicting row in the index, that row's creating transaction might
not yet have committed. So you need to first wait for it to commit, and then
retry. But that implies that the transaction could actually roll back or
delete the row again. And suddenly there's not actually a row to UPDATE. Thus
you need to retry, right?

There's also some issues around avoiding deadlocks. If you "naively" just
acquire a 'value lock' on the to-be-inserted row, you can easily get into
deadlock territory if more than one row is inserted in one transaction. Such
value locks have to be released after an attempt to update. Which then
necessitates a looping attempt...

Does that start to explain the problems a bit?

------
smegel
I remember about 10 years ago using UPSERT in a Teradata database to do delta-
diff updating of tables.

Basically, I was updating a table from an external file, and I wanted to add
what was new, leave what was the same, and delete any records no longer in the
external table. Using UPSERT to insert new rows, update a timestamp on
existing rows, and then afterwards delete any rows who's timestamp was before
the UPSERT operation. Because of the nature of the file, doing an external
patch/diff approach was difficult, but the UPSERT operation worked like a
breeze!

------
donw
Damnit, I just wrote a sweet gem that handles this... ah, okay, but it
requires an index constraint, whereas I needed to merge on arbitrary
conditions. So I didn't totally waste my time.

Way to go Postgres team!

~~~
loopdoend
Is the gem open source? I checked your github and didn't see it, I think
people would still find it useful!

------
corford
I've been looking forward to this for ages! The solution we use at the moment
for upsert is a create or replace rule on the table but it isn't great as you
need to remember to update the rule whenever you change the table schema...

    
    
      CREATE OR REPLACE RULE "replace_row" AS
      ON INSERT TO my_table
      WHERE EXISTS(SELECT 1 FROM my_table WHERE id=NEW.id)
      DO INSTEAD
      (UPDATE my_table SET
      col1=NEW.col1,
      col2=NEW.col2,
      col3=NEW.col3,
      WHERE id=NEW.id);

~~~
anarazel
That unfortunately is not safe on several fronts.

For one, the WHERE EXISTS() will not see concurrent insertions by transactions
that are still in progress, which means you'll still get constraint violation
errors.

For another, rules generally have very surprising behaviour. E.g. with this
RULE you'll get into trouble if "id" isn't passed in as a explicit value, but
uses a non-deterministic DEFAULT or directly passed in expression. Every
reference to NEW.id will not be replaced by the result of that expression, but
rather with the expression itself. For the common serial id column (aka
autoincrement) you'll have a separate nextval() call in each reference. Which
can make this explode in pretty damn confusing ways.

~~~
corford
Unfortunately, there's no good all round solution for upsert in current
postgres versions. For our use case (one logger daemon feeding results off a
message queue in to a table in an idempotent way) this approach was the
simplest and worked the best.

~~~
anarazel
Can't you just do the "usual" looping pattern around a subtransaction? If
events are inserted by a single callsite that shouldn't prove to onerous?

C.f. [http://www.postgresql.org/docs/9.4/static/plpgsql-control-
st...](http://www.postgresql.org/docs/9.4/static/plpgsql-control-
structures.html)

~~~
corford
Could have done but coming from a mysql background (i.e. where replace
statement is supported), just crafting a simple replace rule was easier/faster
to grok and implement. Also helped that the table schema was very unlikely to
change much in the future - so we didn't have to worry too much about
remembering to keep the rule up to date (and the fact the id column was
defined as VARCHAR NOT NULL and had no default assigned to it).

------
agotterer
Does anyone know when this is expected to be released?

~~~
elchief
New versions of Postgres usually ship in September. Except 9.4, which slipped
to December.

------
asnyder
As someone that's been using PostgreSQL way before it was cool (since 2005),
I'm super happy to see this. Previously I had to do EXISTS, IF FOUND UPDATE
ELSE INSERT. Now I can just do upsert, so nice.

Very thankful to all the hard working Postgres devs. PostgreSQL was really
good when I first started using it, and just keeps getting better. Glad to see
the general community finally getting on the PostgreSQL bandwagon.

Now let's see if I can get more people to use internal PostgreSQL functions
instead of constantly writing statements directly in their code.

------
Mister_Snuggles
This reminds me of a common idiom in the Pick-style databases that I've worked
on...

    
    
        READU REC FROM FILE,KEY ELSE
            REC<1> = 'stuff'
            * Set other fields as appropriate for a brand new record - this is the 'insert'
        END
        REC<1> = 'blah'
        * Set other fields as appropriate for an update
        WRITE BLAH TO FILE,KEY ;* Write record and release locks
    

This can, of course, be wrapped in a transaction.

------
andrewstuart2
Seriously, IIRC the wiki page for this feature was like 5 years old. I'll have
some statements I'll be very glad to update as soon as this is released.

I was so surprised when I ran into the lack of this feature as I'd had it on
both MSSQL and MySQL.

------
pcl
Upserts are generally useful tools for applications that run in active-active
multi-datacenter configurations. Have you guys explored the implications of
this feature for such deployments on top of Postgres yet?

------
nsxwolf
I come from a SQL Server background and recently jumped over to a Postgres
shop. It seems like I came in at a really exciting time.

I was really missing SQL Server's powerful MERGE statement, so this is welcome
news.

------
morekozhambu
So to simplify, what I understand is

1\. INSERT .. ON CONFLICT is an implementation UPSERT technique

2\. MERGE is something meant to be different from UPSERT but has UPSERT
feature?

~~~
jeltz
Basically true except for many implementations of MERGE not being concurrency
safe, which means you would have to loop anyway on unique violation making it
no better than the CTE based techniques that already exist in 9.4.

------
whistlerbrk
For anyone who has ever had to write scripts to import incremental batch data,
CSV, etc this seems huge.

------
DiabloD3
When is Postgres going to get built in fault tolerant multi-master support in
any form?

------
qubyte
I found engineering around the lack of this feature very _upserting_.

I'll get my coat.

------
silon3
Does it handle locking (aka no deadlocks) correctly?

~~~
amitlan
There is a section in UPSERT wiki about this -
[https://wiki.postgresql.org/wiki/Value_locking#.22Unprincipl...](https://wiki.postgresql.org/wiki/Value_locking#.22Unprincipled_Deadlocking.22_and_value_locking)

EDIT: rather the whole "Value Locking" page -
[https://wiki.postgresql.org/wiki/Value_locking](https://wiki.postgresql.org/wiki/Value_locking)

------
ape4
I think an UPSERT keyword would be more clear.

------
rabee
Oh happy day. PostgreSQL is the best.

------
ExpiredLink
Say goodbye to CRUD. Because 'create' and 'update' are syntactically and
semantically the same, right?

------
jedi_master
its about F __ __*G time

------
karmakaze
Yes! it's about bloody time.

------
DotHide
Great! I'm just like it!

------
morekozhambu
Slightly OT, why do we see these errors with Oracle's MERGE statements?

ORA-01779: cannot modify a column which maps to a non key-preserved table

ORA-30926: unable to get a stable set of rows in the source tables

~~~
njharman
Asking a support question on a feature announcement post for a completely
unrelated product is EXTREMELY off topic.

~~~
morekozhambu
It was not meant as a support question. It was in the context of MERGE/UPSERT
was I asking. I wanted to know if similar thing happens in PG.

------
ianamartin
In my crazy-ass opinion, if you need upserts, there are problems with your
application logic and problems with your database.

Get off my lawn, etc.

~~~
taco_emoji
Why bother posting if you're not going to explain yourself? It's 2015,
nobody's going to be impressed by mere contrarianism.

