
A Case for Upserts - ash
http://lucumr.pocoo.org/2014/2/16/a-case-for-upserts/
======
chrisfarms

      > If Postgres would implement a shitty and 
      > inefficient version of an upsert statement at 
      > the very worst it could be as bad as the current 
      > implementation that people write on their own
    

Wow. please don't do that.

Many people do not understand why an upsert is a hard problem and would
blindly use any solution presented without understanding it's limitations. At
least with the current situation if a developer finds themselves in need of an
upsert they are forced to seek out a solution and in the process understand
why it's tricky.

~~~
joeyespo
You left out a crucial part of that sentence.

    
    
        > implementation that people write on their own
        > and then at least, there is an established syntax
        > and a way to improve it further.
    

That was the point.

Being aware of the problem doesn't mean you'll actually find a good solution.
At least this way, any lingering problems will eventually be fixed instead of
draining more of your own resources. Chances are, the Postgres team will be
able to figure out the subtleties much better than you.

The more domain knowledge your system requires, the worse off everybody is.
Here's a story.

At my previous company, I implemented something close to 'upsert'. After hours
of researching, implementing, and testing in as many environment I could think
of, I wondered why I wasted so many hours trying to get it right. Sure, I now
understand the subtleties involved. (Well, I think so anyway. I still don't
know what I don't know.) But am I better off now? Is the company? Not really.
What we have is a tangled hack.

And worse, I no longer work with this company. So the next guy will have to
figure out what's going on when there's an issue. To be safe, I spent another
day re-organizing the code for clarity and documenting the complicated
workaround. A scrappy syntax shipped today and fixed tomorrow would have been
ideal. That way, the new guy could search for the symptom and see that that
the current implementation of UPSERT is the cause. They can then use a
recommended workaround, or simply upgrade Postgres to fix it entirely.

~~~
einhverfr
The problem though is that the corner cases of concurrency etc. with upserts
need to be hammered out and agreed on at the beginning. Otherwise you will
have "well, we improved concurrency handling in upserts and sorry, that broke
your application."

Once you have a public API you have a contract, and that contract really
shouldn't be broken without good reason. If PostgreSQL offered an upsert
syntax with the same limitations as writeable CTE's, and promised to solve the
concurrency issues in future releases, I wouldn't use it. I would rather have
a hand-written implementation which is guaranteed to work in the future with
the same caveats than an implementation I have no control over that may pose
backwards-compatibility handling issues if I am supporting multiple Pg
versions with my app.

Stability in API's is important. The idea that we will just throw something
together now, and then fix it later will result in PostgreSQL getting a bad
reputation.

------
btilly
I hate this pattern.

People hear, "PostgreSQL is like MySQL except that it is done right." They
then go to PostgreSQL and say, "PostgreSQL does not implement my favorite
MySQL feature in my favorite MySQL way."

Part of what makes PostgreSQL what it is is that they care so much about
getting features right, and making them standards compliant. (Because if you
do things differently than the standard, there is a chance that you will find
yourself forever maintaining both the way that you did it and the way the
standard said to do it.)

I like upserts. I understand the value and the functionality. But please don't
add it until you can do it right. And please, please, please make it work like
the standard says to. The standard is explicit, flexible, and a standard. Plus
if you haven't impressed on the MySQL solution, it really isn't very hard to
figure out how to do what you want to do. (I certainly didn't find it hard
when I had to do it.)

------
hosay123
2226 words and not a single mention of CREATE RULE. Postgres has had something
far more general than "upserts" for over a decade.

    
    
      CREATE RULE Pages_Upsert AS ON INSERT TO Pages
      WHERE EXISTS (SELECT 1 from Pages P where NEW.Url = P.Url)
      DO INSTEAD
         UPDATE Pages SET LastCrawled = NOW(), Html = NEW.Html WHERE Url = NEW.Url;

~~~
masklinn
1\. Note that there are no plans for postgres to implement a specialised
UPSERT. The plan was/is to implement MERGE, which is a fair bit more generic
(especially with the 2008 and 2011 extensions). In fact, TFA very specifically
disagrees with this route in his conclusion.

2\. IIRC the rule method does not work correctly for multiple-rows inserts
including duplicates

~~~
the_mitsuhiko
> 1\. Note that there are no plans for postgres to implement a specialised
> UPSERT. The plan was/is to implement MERGE, which is a fair bit more generic
> (especially with the 2008 and 2011 extensions). In fact, TFA very
> specifically disagrees with this route in his conclusion.

Does the SQL standard actually guarantee that a MERGE is atomic? After all the
MERGE statement involves a subselect which seems to be about as concurrency
safe as a SQLite replace insert which a join.

~~~
mjn
The MERGE itself is atomic, but it doesn't prevent duplicate insertions
without a lock: [http://stackoverflow.com/questions/9871644/is-merge-an-
atomi...](http://stackoverflow.com/questions/9871644/is-merge-an-atomic-
statement-in-sql2008)

~~~
jeltz
If PostgreSQL goes with an implementation of MERGE like that then it would not
be a replacement for UPSERT. With UPSERT you can make sure to never get any
constraint violations.

------
sujeetsr
"The reason for this is that transactions just serialize the execution, they
don't guarantee any atomicity of independent row updates. After the delete
happens the second transaction gets a chance to run and the update will fail
because it no longer sees a row" Umm - I thought everything in a transaction
can be treated as atomic wrt to other transactions.. ie they don't see "in
between" states?

~~~
ash
"…it is possible for an updating command to see an inconsistent snapshot: it
can see the effects of concurrent updating commands on the same rows it is
trying to update, but it does not see effects of those commands on other rows
in the database."

[http://www.postgresql.org/docs/current/static/transaction-
is...](http://www.postgresql.org/docs/current/static/transaction-iso.html)

~~~
anonymouz
That only applies for the read committed isolation level. It seems to me that
at least serializable should not show inconsistent state in this situation?

~~~
the_mitsuhiko
A row lock is released on a delete.

~~~
anonymouz
What does that have to do with serializable transactions?

"The Serializable isolation level provides the strictest transaction
isolation. This level emulates serial transaction execution for all committed
transactions; as if transactions had been executed one after another,
serially, rather than concurrently."

If you don't get the right behavior with serializable transactions, as you
seem to be claiming, it seems to me that serializable transactions should be
considered buggy. In this case they do not provide the guarantees they are
claimed to provide.

~~~
jeltz
You get the right behavior (a serialization conflict) in PostgreSQL. If you do
not in some other database I would consider it to be buggy.

------
jhh
I understand that MERGE or something like that would be a nice convenience.

But if I personally had this situation what I would do is (in pseudo code, but
think this would be in e. g. Python, or in a stored procedure):

    
    
      begin transaction
      does row with unique key x exist?
      if exists: update all relevant fields
      else: insert new row
      commit
    

How does this have a different concurrency behavior than e. g. an equivalent
Oracle MERGE statement or MySQL ON DUPLICATE KEY?

If another process tries to write to this row before the change is committed,
it would have to wait (in any case).

If another process tries to read from this row before commit then what happens
depends on the transaction isolation level and maybe even on implementation
details. But I don't see how this is changed from having a built-in MERGE-like
utility.

~~~
einhverfr
What happens is this: Concurrent transactions marked A and B:

A: Does row exist? Answer no A: Ok, so insert row. A: ....

B: Does row exist? Answer no (A has not committed yet) B: Insert row (locks,
waits to see if A commits or rolls back)

A: Commits B: Unique constraint violation error.

Now here's where it gets really tricky: What is the desired behavior? Do we:

1\. Just raise a constraint violation error? That's what we do with LedgerSMB
because, frankly, unique constraints are far more likely to be violated by
human error than concurrency issues and we can tackle other related errors
elsewhere. This is specific to our app. Other apps may differ.

2\. Should the upsert lock, wait, and retry if the the upsert criteria is
violated by a pending transaction? That's the cleanest option. However, this
requires being aware of _which_ unique indexes are affected by the upsert
criteria.

A quick and dirty upsert like yours is good enough for most applications based
on human input. It is not good enough, IMO, for applications with lots of
direct automation coming in from various sources. So those are very different
use cases and they have very different criteria.

~~~
jhh
Thanks for this explanation. I would have expected #1 to happen in the case
that you outline.

You're right, since a built in merge utility has more information about the
semantical intent of the operation it can essentially repeat the complete
operation whereas my naive implementation would not.

------
lazyjones
The UPSERT implementations in this article are simply wrong (he should call
them "DELSERT" perhaps).

A simple Postgres function like the following works for me:

    
    
      CREATE OR REPLACE FUNCTION upsert_my_table(key text, value text) RETURNS void
      LANGUAGE plpgsql AS $$
      BEGIN
        UPDATE my_table SET value=$2 WHERE key=$1;
        IF NOT FOUND THEN
          INSERT INTO my_table (key,value) VALUES ($1,$2);
        END IF;
        -- add PERFORM pg_sleep(10) for concurrency testing 
        END
      $$;
    

(according to my tests, it has no concurrent UPDATE problems ... duplicate
keys are another matter and best handled with restarting transactions /
savepoints)

~~~
the_mitsuhiko
> according to my tests, it has no concurrent UPDATE problems

You have concurrency updates with deletes as well as inserts. That's literally
the worst way to implement upserts currently. Even CTE's are superior to that.

------
hcarvalhoalves
Isn't upsert a bad design smell though?

If you have high concurrent updates, you should be dealing with append-only
database design and collapsing into a view to query, like Datomic. All the
locking required for atomic upserts would wreak havoc on performance anyway
no?

~~~
the_mitsuhiko
> If you have high concurrent updates

When do you not have concurrency?

> you should be dealing with append-only database design and collapsing into a
> view to query, like Datomic

True, but postgres is very bad at that. So you need to work with the tools
that postgres gives you. Yes postgres currently does not give you an upsert
but an upsert is much closer to postgres' design than an append only database
that denormalizes on inserts.

~~~
hcarvalhoalves
> When do you not have concurrency?

Highly concurrent _updates_? By that I understand a workload where you have
multiple clients updating the same rows.

In my experience it's either rare or the result of doing something
inefficiently (e.g., doing analytics as a "view_count" column). Everytime I've
faced that I've remodeled the database, both for performance (you avoid
invalidating caches for the entire table) and auditing reasons (if you have a
bunch of people replacing each other's changes, you might want to retain the
history).

------
amirouche
This article is missing the solution brought by MVCC.

Namely, invalidating any query that comes after the update and asking to redo
the transaction. This is done without lock.

~~~
ash
"Asking to redo the transaction" pushes complexity into the client. Armin did
mention that: "Unfortunately this problem does not go away if you tell the
client to solve it."

~~~
amirouche
Asking to redo the transaction doesn't mean "push the error to client".

In this case, server knows the transaction can pass, so you don't push the
error to the client.

The server push errors to the client when:

\- it is sure that the transaction can not pass.

\- it doesn't want to guess whether the transaction can pass.

\- it reached a max retry (because of heavy writes for instance)

~~~
the_mitsuhiko
What you are suggesting is a loop where you try update/insert until it
succeeds. Because there is no lock involved there is no guarantee that it will
ever finish. That's exactly what's being mentioned in the blog as a workaround
until an upsert exists.

~~~
spacemanmatt
It is guaranteed to return after either inserting or updating. Most
implementors terminate the loop after some arbitrary number because an error
elsewhere could prevent both insertion and deletion, and it is nice to get an
exception rather than a hang in from your upsert.

~~~
the_mitsuhiko
> It is guaranteed to return after either inserting or updating. Most
> implementors terminate the loop after some arbitrary number because an error
> elsewhere could prevent both insertion and deletion, and it is nice to get
> an exception rather than a hang in from your upsert.

Which is why I said multiple times that this needs database support at which
point a lock. That would guarantee that the contention at the lock is fair.
With a loop around that section the winner is chosen by network latency and
retry speed if high contention happens. Also there is not even a guarantee
that it will ever finish.

~~~
amirouche
> Which is why I said multiple times

First time read that...

> that this needs database support

Transaction retry is part of mvcc databases

> at which point a lock.

Not necessarily.

> That would guarantee that the contention at the lock is fair.

It's not fair, it's a lock. Locking all the table for one row, is not fair. I
don't know what you mean by fair.

> With a loop around that section the winner is chosen by network latency and
> retry speed if high contention happens.

Yes and? Anyway, this must be in the database, the database knows the
transaction can pass, it doesn't need to send it back to the user, except if
there is heavy writes on _this_ row, which would mean the application is under
attack or not correctly programmed.

> Also there is not even a guarantee that it will ever finish.

There is a MAX_RETRY, it is guaranteed to return.

\---

Also:

\- You don't state your problem upfront, just say you need this COMMAND. Which
is forbidden by netiquette. \- You never state that it's a Write/Write
transaction problem \- What about Read/Write? \- There is a solution, use a
migration or MYSQL

------
jeltz
He is incorrect about isolation levels, at least in PostgreSQL. Serializable
with DELETE + INSERT will make a concurrent serialiable update fail with
serialization error. I just tested this.

    
    
        1=# CREATE TABLE my_table (key text primary key, value text not null);
        1=# INSERT INTO my_table VALUES ('key', 'foo');
        1=# BEGIN ISOLATION LEVEL SERIALIZABLE;
        1=# DELETE FROM my_table WHERE key = 'key';
        2=# BEGIN ISOLATION LEVEL SERIALIZABLE;
        2=# UPDATE my_table SET value = 'bar';
        1=# INSERT INTO my_table (key, value) VALUES ('key', 'value');
        1=# COMMIT;
        2 ERROR:  could not serialize access due to concurrent update

~~~
ash
How is it different from other isolation levels? It looks like it still fails
compared to what upsert would do.

~~~
jeltz
Other isolation levels will make the concurrent update find zero rows.
Serializable gives an error instead which you reliable can retry the
transaction on.

They fail in different ways (for serializable a in this case more useful way)
which should have been pointed out by the author. It does not seem like he
actually tested what isolation levels does or he tested it in a database which
does not provide serializable isolation.

~~~
ash
I think Armin didn't go into isolation levels details because he didn't see a
meaningful distinction in the context of emulating upsert.

------
justintocci
I'm confused. Are we designing accounting software or high concurrency? Upsert
is not magic, there are always trade offs.

If this is accounting software then use transactions.

If you're doing a research paper on accounting software then have fun, there
are lots of trade offs to play with.

If you are designing for concurrency then drop your constraints and
transactions and just make everything an insert. Then when you want to select
just look for the first record from the bottom of the table.

No feature, be it upsert or whatever, can ever beat proper design. The best it
can ever aspire to do is equal it.

~~~
jeltz
How does transactions remove the need for upserts? And things do not get
faster if you do not use transactions, in fact they generally get slower.

------
AaronBBrown
The DBA in me is having a difficult time understanding PostgreSQL's behavior
with regard to the delete/insert within a transaction?

In MySQL, I get the exact behavior that I expect with anything above RC
isolation level (which is the default for PostgreSQL):

    
    
      session 1> create table foo ( k varchar(10) not null, v varchar(10) not null ) ;
      session 1> set transaction isolation level read committed;
      session 2> set transaction isolation level read committed;
      session 1> insert into foo (k,v) values ('a','b');
      session 1> begin;
      session 2> begin;
      session 1> delete from foo where k = 'a';
      session 2> delete from foo where k = 'a';
      -- transaction hangs here as expected since session 1 has a lock on this record
      session 1> insert into foo (k,v) values ('a','c');
      session 1> commit;
      -- session 2 executes the delete
      session 2> insert into foo (k,v) values ('a','e');
      session 2> commit;
      session 2> select * from foo;
      +---+---+
      | k | v |
      +---+---+
      | a | e |
      +---+---+
    
    

When I do this in PostgreSQL, I end up with 2 rows in the table, which _seems_
fundamentally incorrect and to me implies that PostgreSQL's implementation of
read committed is flawed and should be fixed. The transaction in session 2, at
RC isolation, should see the most recently committed result, which is the
single record from session 1. I'm not CJ Date, though so am very likely
missing some subtlety.

PostgreSQL after the sequence above:

    
    
      test=# select * from foo;
       k | v
      ---+---
       a | c
       a | e
      (2 rows)
    
    

What am I missing? Why is this considered correct behavior?

~~~
skybrian
I'm not an expert, but as I understand it, both deletes see row ('a','b').
Session 2 sees that that row is locked so it waits, and when the lock is
released it sees that it's already deleted. However, it doesn't redo the query
so it never sees ('a','c'). As far as PostgresSQL is concerned, ('a','c') is a
completely different row from ('a','b') and it was created too late to have
any effect on session 2.

These transactions aren't serializable but you didn't ask for that.

To get intuitive behavior, a delete followed by an insert with the same
primary key in the same transaction would have to be treated as a row update,
holding onto a single row lock, rather than as two independent rows with
different row locks.

~~~
AaronBBrown
This violates the promise of read committed. In RC, all transactions are
supposed to see the most recently committed version of a record. The fact that
it was a delete, then an insert is an implementation detail and "not my
problem" as a DBA. At the time the transaction in session 1 commits, the most
recently committed version of k='a' is v='c'. PostgreSQL, in the situation,
seems to be operating in read uncommitted isolation level...session 2 is
treating the uncommitted delete in session 1 as the most recent version.

I'd like to give PostgreSQL the benefit of the doubt here, but this behavior
seems fundamentally incorrect to me.

~~~
skybrian
That's the subtle part: they aren't two versions of the same record. They're
two entirely different records, so all bets are off. (In your example, they
don't even have the same primary key because no primary key is declared.
PostgresSQL doesn't know that k is supposed to be a key; it's just another
column.)

~~~
AaronBBrown
That is an implementation problem. The behavior is still incorrect. MySQL
manages to do this correctly, so it's clearly not impossible. I can't come up
with any other conclusion except that transaction isolation in pg is
fundamentally broken when I see this type of case show up.

------
callesgg
Before i know about "on duplicate key update" i used to first run an update
check the results on how many rows where matched then insert if there was
none.

Another way i remember doing was having a unique key and inserting if an
exception was thrown the update statement would run update.

Both where slow as hell. Du to the back and forwards stuff required.

~~~
byroot
And worse than that both are subject to concurrency issues.

------
pedrocr
Why doesn't taking a row lock instead of a table lock fix this? Every
transaction that wants to read/write to that row id blocks on that lock but
you're not blocking the whole table.

~~~
amirouche
The row may not exists. OP wants to update a row or create it.

~~~
pedrocr
You mean that with current semantics if you do a "delete from table where
id=X" and X doesn't exist you don't get a lock on that id? So why not just
change that or add a "lock my_table in id=X" syntax. Isn't what's needed here
fine grained locking by row id?

~~~
jeltz
You are correct, and the solutions I have read are based around locking the
index entry when doing an upsert. The question is how it is best done.
Simplified explanations: one solution is to insert a row and lock the new row,
another solution is to lock the index entry itself.

~~~
pedrocr
Thanks for the answer. I'm sure there's plenty of complexity I'm missing here.
It just seems like having an ID-indexed set of mutexes shouldn't be too hard.
Does anyone have a good link discussing the design tradeoffs for a performant
upsert?

------
PeterisP
I'm a bit confused - can anyone clarify what is the problem in TFA example

    
    
      begin;
      delete from my_table where key='key';
      insert into my_table (key, value) values ('key', 'value');
      commit;
    

? As far as I understand any outside transaction would either see the old
value (assuming one existed) or the new value, not the middle result where the
row doesn't exist - am I mistaken and this doesn't work that way in
Postgresql?

~~~
nuclear_eclipse
I think his point is quite exactly that a concurrent write transaction (starts
after the first transaction starts but before it commits) that relies on the
existence of that key/value pair will fail because of how Postgres works.

~~~
PeterisP
OK, I tried this on a random PG server I had and replicated the issue with a
concurrent update; and running two such transactions in 'overlapping' way
inserted two duplicate entries for the same key.

That is interesting - is the current behavior really considered WAD? I.e.,
shouldn't then the 'correct' solution be in fixing this behavior instead of
extending syntax for 'upserts'?

------
dakridge
I'm sorry, I only just started learning SQL a few days ago... But I was
building a simple web application and I wanted to add a user if the email
didn't exist else update that user's fields. And I came across this article:
[http://www.the-art-of-web.com/sql/upsert/](http://www.the-art-of-
web.com/sql/upsert/)

Is that not an upsert?

~~~
zwily
If you read the whole page, you see that it still requires a full table lock
to avoid duplicates. A true upsert would not require a table lock.

------
Pxtl
im not a postgres guy so I don't know how possible this is - couldn't the
pythonic "forgiveness not permission" pattern work here?

That is, a proc that just attempts the insert, handles the unique or primary
key constraint violation error, and then proceeds to the update as a fallback?
Sure you push the concurrency risk to deletion instead of creation, but
deletions are generally more rare than creations so a table-lock on deletion
is more palatable.

~~~
amirouche
posgres is implementing both permission and forgiveness for performance
reasons.

~~~
amirouche
What is troubling for me, is to have a database that works sometime and is
sometime fast. But no database that works all the time and is /slow/.

~~~
spacemanmatt
This is a common trouble. It is, however, a structural artifact of data size.
You cannot solve this problem with a "code" solution because it is an actual
computational complexity issue, not a implementation weakness.

This is why data design is with performance and security as an attribute that
you must bake in. You cannot bolt on data design after the fact.

------
moron4hire
I tend to think that UPSERT is the right solution to the wrong problem.

Take the example of a user profile. The problem for me is not typically "the
first time the user shows up, I need to create a record, and all other times,
I need to overwrite it." In the systems I build, there is almost always a use
case for having full access to the history of data. So the better solution is,
"I must ALWAYS update whatever record exists AND insert a new record".

That means my UserProfile table also has two extra columns, CreatedOn _and_
InvalidatedOn. Then, creating/updating a user's email address is always the
same, two statement operation:

    
    
        -- If UserName does not yet exist, this does nothing
        UPDATE UserProfile
        SET InvalidatedOn = NOW()
        WHERE UserName = ?
        AND NOW() BETWEEN CreatedOn AND InvalidatedOn;
    
        INSERT INTO UserProfile(UserName, Email, <etc.>, CreatedOn, InvalidatedOn)
        VALUES(?, ?, <etc.>, NOW(), MAX_DATE());
    

In practice I would use default value constraints for the CreatedOn and
InvalidatedOn columns. Writing it explicitly is just for this example.

This becomes really important once you've lived with your database for several
months/years, and _especially_ when dealing with customers. For example, say
you send an important email to a user, then they change their email address,
then you send more important emails referring to the first one, which they now
claim they never received in the first place. With an UPSERT, that user's
email address for all of time looks like the most current version, and you
can't figure out why they didn't receive the first one.

It's better to capture more data and realize down the road that you don't need
it than it is to not capture enough and realize you need more. Databases are
temporal objects. They grow over time and they change over time.

Another example from my past: a client would send us a dump of their data,
every morning. We were performing analysis of their data and were supposed to
wipe out our database and re-import it every morning. This meant on day N+1 we
were re-importing the first N days again. I implemented this as a temporal
database, which caught me some hell when the database grew in size. But, after
6 months, we discovered someone had attempted to alter the historical data. If
we had blindly taken the data, they would have gotten away with their crime
(and it literally was a crime, we had several government regulations we had to
fulfill, the project being used in both a health-care and a financial
capacity).

I've never regretted implementing my tables temporally. I've sometimes changed
the implementation to a data warehouse setup, after learning that the
historical data wasn't necessarily important to day-to-day operation, but I've
never regretted having the historical data available, and I have almost always
learned to regret NOT having historical data.

~~~
einhverfr
I think you'd need partial unique indexes to make that remotely concurrency
safe.

~~~
moron4hire
I don't think partial unique index would have an impact on concurrency safety.
An appropriate one would be useful for making the query for the most recent
value faster, but that doesn't mean it would be concurrency-safe without a
transaction.

~~~
the_mitsuhiko
How do you implement unique constraints then?

~~~
moron4hire
The main take-away from my original comment should be "never destroy data".
You may write new data, but you may never delete or overwrite data. How you
design the database after that is up to you. The particular example I used is
of course not going to be good for enforcing, say, uniqueness of email
addresses across users.

It has tended to make me really consider the notion of uniqueness within a
database. What does the unique constraint mean? I think that people think of
uniqueness to mean "at this point in time", but if you enforce it in the
database, it means "for all time." In relational algebra, the unique
constraint is really about enforcing a candidate key, a field that could
satisfactorily be used as a key, but we've so happened to have chosen a
different field as the actual primary key. In that context, given that we
would want someone to be able to change their email address, I don't think
"unique" email addresses make sense.

So many things change over time and so many seemingly "unique" fields turn out
to not be unique in the real world that, for the most part, I've given up on
the unique constraint. Full names are neither unique, nor are they final
(people get married and change their name). Not even Social Security Numbers
are necessarily unique, for a variety of reasons
([http://www.computerworld.com/s/article/300161/Not_So_Unique](http://www.computerworld.com/s/article/300161/Not_So_Unique)).

~~~
einhverfr
That's what partial unique indexes allow you to put in the database. take this
example which would work in PostgreSQL:

    
    
        CREATE UNIQUE INDEX profile_username_valid_idx_u
        ON profile(username)
        WHERE invalidated_at IS NULL;
    

The problem with enforcing this in app-space is it doesn't get around the
isolation problem. In both cases, you are assuming that the same profile won't
get concurrently updated twice by two different transactions. Since only the
database can referee this issue, you _must_ enforce the uniqueness there. I.e.
this check _must_ happen on transaction commit and your application has no way
of knowing if it did.

------
briteside
[https://github.com/seamusabshere/upsert](https://github.com/seamusabshere/upsert)

~~~
the_mitsuhiko
Yes. And it's implemented on the client and suffers from contention problems.
I already have that :-/

