
SQLite Query Language: upsert - coleifer
https://www.sqlite.org/draft/lang_UPSERT.html
======
jedberg
This is great! Given that SQLite is generally used for small projects and not
in any sort of cluster or HA environment, pushing this kind of work (the
if/then/else block that you don't have to write in your code anymore) makes a
ton of sense.

I actually like this feature more in SQLite than in Postgres. In Postgres I
worry about pushing that kind of work into the database in clustered
situations, as there are places where it can go wrong and it would be more
likely that the code wouldn't be able to handle that failure properly (as
opposed to having to write the if/then/else yourself, which if you're a good
coder would also involve some try/catch and proper fallback behavior in case
of an error).

~~~
ams6110
I would trust postgres to manage concurrency better than I could write an
if/then/else that did it properly in all cases.

If you're writing if/then/else when dealing with a database, in my mind you're
likely doing it wrong.

~~~
jedberg
Before upsert, how would you handle the examples in the article? Like, how
would you "insert Alice's phone number unless it's already there, in which
case update it".

Edit: Downthread it was pointed out that you could just run an update and
insert on every request and let one fail, which seems like a fair tradeoff for
the extra network round trips.

~~~
cobythedog
UPDATE phonebook SET phonenumber = 704-555-1212' WHERE name = 'Alice';

INSERT INTO phonebook(name, phonenumber) VALUES ('Alice','704-555-1212') WHERE
NOT EXISTS (SELECT 1 FROM phonebook WHERE name = 'Alice');

~~~
anarazel
Except that that's not concurrency safe in a transaction, because you might
not yet see the row inside the NOT EXISTS() subquery if it hasn't yet
committed.

~~~
jessaustin
The update isn't inserting anything. Either the row exists already, or it
doesn't. Maybe I'm missing something?

~~~
aseipp
I believe the issue is a concurrent transaction (i.e. a separate one from the
one you're currently in) may perform the INSERT and COMMIT that insert _after_
your transaction's SELECT subquery returns nothing (thus allowing INSERT to
proceed, by the WHERE clause), but before the actual INSERT itself is
committed.

This means there is a window between the sub-SELECT and the INSERT itself
where a secondary transaction can insert something, causing both your original
update to do nothing, _and_ the insert to fail. Thus you've lost the write
entirely.

~~~
jessaustin
In that case the other effort (I'm reluctant to call it a _transaction_ )
wins. Maybe it was later and should win, maybe it was earlier and shouldn't.
It takes a lot more care than seen here to preserve a meaning of "earlier" and
"later" that isn't vulnerable to this issue. The way the schema is designed,
there can be only one "Alice". I'm trying and failing to imagine a way in
which both this effort and the other effort can fail.

~~~
aseipp
I think the particular point made by Andres (and elsewhere in this thread) is
really not about what the schema for a phone book is like, and more that
"Accurately replicating what UPSERT does without it is exceedingly hard and
you will probably get it wrong or suboptimal in a number of scenarios you
can't possibly imagine until they happen".

Isolation levels will for example have impact on how transactions proceed and
what results you see (REPEATABLE READ vs READ COMMITTED); the error handling
(did your transaction _actually_ fail from a concurrent INSERT succeeding, or
did the update fail, or did something else happen entirely? UPSERT is often
not a standalone operation in a transaction) must be carefully decided for
each use-case or else you're likely to redo needless transactions or throw
away good ones, and you have to work around incidental problems like latency
using stored procedures to keep everything server-side, etc etc.

It's just not the kind of thing you want to replicate a number of places at
_each use site_ in a number of tools. It's something only the database knows
how to manage properly by having "global knowledge".

~~~
endymi0n
Exactly this. It‘s so hard that it took PostgreSQL almost two decades to
finally implement that feature and they had to support a lot of internal
features first, like speculative insertion. If you want to do UPSERT 100%
correctly and ACID, the list of edge cases just grows and grows.

Here‘s a great article describing these problems in detail if you‘re
interested: [https://www.depesz.com/2012/06/10/why-is-upsert-so-
complicat...](https://www.depesz.com/2012/06/10/why-is-upsert-so-complicated/)

------
cyberferret
It's been a while since I've used SQLite (over a decade ago), but didn't they
already have this sort of functionality in there with an 'INSERT OR UPDATE'
modification to the standard SQL INSERT command?

In fact, it always made me wonder why other systems (at the time) didn't
support this critical and useful feature? I was getting tired of writing a
SELECT, comparing the returned row against the one I was about to update, then
branching to an UPDATE or INSERT command depending. Doing it all in one
command was a breeze.

~~~
anorwell
If I understand, the difference is that this allows insertion of only some
columns, including the primary key. If there is an existing row, the union of
all columns will be the result. On the other hand, `INSERT OR REPLACE`
completely replaces the existing row, if one exists.

~~~
cyberferret
Thanks for the quick and informative explanation. I'll admit that I didn't
have time to read the entire article right now, but I will go back and look at
it in more detail.

------
coleifer
Just a note: this feature is scheduled to be included in the next official
release of SQLite (3.24.0). Additionally, for some time SQLite has supported
INSERT OR REPLACE which offers a simpler, all-or-nothing upsert query.

------
Normal_gaussian
The diagram used on this page [0], and the ones on the JSON site [1] are
pretty awesome diagrams. What are they called? Is there a good programmatic
tool to generate them?

[0] - [https://www.sqlite.org/draft/images/syntax/upsert-
clause.gif](https://www.sqlite.org/draft/images/syntax/upsert-clause.gif)

[1] - [https://www.json.org/](https://www.json.org/)

~~~
wyldfire
Often called "railroad diagram" [1] [2]

[1]
[https://en.wikipedia.org/wiki/Syntax_diagram](https://en.wikipedia.org/wiki/Syntax_diagram)

[2]
[https://stackoverflow.com/a/7143467/489590](https://stackoverflow.com/a/7143467/489590)

~~~
easytiger
I thought these are simple EBNF diagrams. As taught in British schools to 16
year olds

------
twodave
At an old job we had a mammoth awful database running on MSSQLSERVER and we at
one point ran into concurrency using the MERGE statement in a high-traffic
area of the database. It turns out MERGE isn’t transactional so in between the
check to see if a row exists and the insert, a concurrent call would sometimes
create the row and we would get a constraint violation (it was a unique index
basically). Tracking down the source of the error was only the first of many
horrors on the way to fixing the problem for good.

Fun times.

~~~
sixdimensional
Did you put the merge in a transaction to resolve the issue?

------
Grollicus
I hope they build this in a way that returns the correct changed row count.
Currently upserts in sqlite are a bit complicated because there is no feedback

~~~
Apocryphon
Agreed that it should return the updated count automatically, but it's
possible to get that amount by pairing it with a query for the rowid of that
new record:

[https://www.sqlite.org/c3ref/last_insert_rowid.html](https://www.sqlite.org/c3ref/last_insert_rowid.html)

And if it's a table with autoincrement primary key, try:

[https://stackoverflow.com/a/2217015/223327](https://stackoverflow.com/a/2217015/223327)

------
anonu
Cool. Just as a side note - I was first introduced to the upsert concept in
kdb. Similar behaviour to the SQLite one..

------
binarymax
I'm torn on whether always forcing an explicit SET for ON CONFLICT UPDATE is
good practice. It would make for much cleaner code to just update every column
with the supplied insert values (as an option). Or maybe I'm just lazy (heck
that's why I like UPSERTs in the first place!)

~~~
konsumer
INSERT OR REPLACE does this. [http://www.sqlitetutorial.net/sqlite-replace-
statement/](http://www.sqlitetutorial.net/sqlite-replace-statement/)

------
dzonga
I actually started using Sqlite as my go to SQL db. Logic being, the products
I make will never have more than a thousand people that pay me $1/day. For
that reason, Sqlite suits me well and will serve my users too reliably.

~~~
tomkinson
At what level would you say SQLite doesn't scale?

~~~
ioddly
I'd say it depends more on what you're doing -- if you're not doing a whole
lot of writes, you could get really far. See the "websites" section of the
sqlite When to Use article:
[https://www.sqlite.org/whentouse.html](https://www.sqlite.org/whentouse.html)

------
zenmollusc
Excellent news that they are finally including this. I was using SQLite3
heavily a few years ago, and one of the only complaints I had was the lack of
this feature.

------
jcrites
For those who may be interested: I was curious about the difference between
MERGE [1] and UPSERT, and why database engines seem to support a custom
(nonstandard) UPSERT instead of the MERGE introduced in standard SQL in 2003,
so I did some web research and came across a reasonable-sounding answer in a
comment on the following article.

[https://www.depesz.com/2012/06/10/why-is-upsert-so-
complicat...](https://www.depesz.com/2012/06/10/why-is-upsert-so-complicated/)

> As noted, MERGE is complicated and slow. If two different database servers
> implement it, then it is going to be implemented with different schematics
> [sic]. Oracle and MySQL already have different implementation’s, and to
> nobody’s surprise, MySQL’s implementation allows for risky behavior that
> does not protect the data. PostgreSQL users expect that any data put into
> the database will reliably come out of the database. So MERGE is slow and
> inconsistently implemented. The correct thing to do is make the application
> aware of if a record is new or existing. Any application making use of MERGE
> should open a bug to replace it with more predictable and faster logic using
> INSERT and UPDATE.

> If an application does use MERGE, it has to account for implementation
> specific behavior and not will not be portable, in a safe way, to other
> database servers. So it is not suitable for ORM’s or database independent
> applications.

> So what is a legitimate use case for MERGE where not knowing if a record is
> new or existing is not possible prior to the transaction? They only thing
> that I can think of is one-off scripts that do not have any concurrency. But
> if you design for that, then some misguided ORM for web applications is
> going to use your MERGE function and users are going to wonder about what
> happened to their data when concurrency concerns were ignored or not
> understood.

(I believe this comment is by Michael McLaughlin, the author of books on
Oracle PL/SQL programming.)

Does anyone happen to know if this is still accurate today, or whether there's
been any attempt to incorporate UPSERT support into standard SQL? I could
imagine the SQL standard maintainers not wanting to offer UPSERT if MERGE is
already a superset of its functionality; but a case could be made that, since
MERGE's complexity has apparently discouraged adoption by implementations and
users, it may be sensible to support UPSERT nonetheless. Is this a reasonable
way to think about it?

I don't mean to imply that standardizing UPSERT into SQL is an especially
important concern for anyone; this is merely a trail of thought that
originated from wondering about why the UPSERT statement is nonstandard.

[1]
[https://en.wikipedia.org/wiki/Merge_(SQL)](https://en.wikipedia.org/wiki/Merge_\(SQL\))

~~~
viraptor
> So what is a legitimate use case for MERGE where not knowing if a record is
> new or existing is not possible prior to the transaction?

I think he concentrates on "possible" where the interesting use case is
"convenient". The are many cases where you want to go from at-least-once
execution of idempotent tasks to a consistent view of data. In those cases
it's irrelevant if the data existed before or not - you're likely doing an
upsert with exactly the same data under exactly the same key and care only the
resulting log of events.

------
z92
MySQL named this action as "REPLACE" instead of UPSERT which isn't an english
word.

~~~
willvarfar
MySQL has REPLACE INTO. SQLite has REPLACE INTO too.

MySQL has INSERT INTO ... ON DUPLICATE KEY UPDATE. This is the functionality
that SQLite now has.

The difference between REPLACE and ON DUPLICATE KEY UPDATE is subtle but
useful.

------
bmpafa
I'd love it the SQLite page wrapped content in a div of a reasonable max-
width. Text running edge-to-edge is hard to read.

Anyone know if the site is hosted on GitHub?

~~~
lmm
I've never understood this position - if you want sites to be no wider than a
particular width, can't you just make your browser that width?

~~~
Sean1708
I usually have things set up such that my browser only takes up half of one of
my monitors and the amount of websites that don't handle this properly is
ridiculous, even Hacker News has a scroll bar at the bottom because the bar at
the top is too wide.

That was somewhat tangential, sorry, but it's been really bugging me recently
and I wanted to vent.

~~~
bmpafa
I do a lot of front end work and this is a pet peeve of mine. And I _still_
screw it up sometimes. For whatever reason, layout is still really hard to get
bug-free.

CSS Grid will help a lot as it gains adoption, but if FlexBox's adoption is
any indication, that won't be for some time.

