How to Get or Create in PostgreSQL

rst · on Aug 5, 2024

Skip down to the bottom for a simple approach that just works, in sufficiently recent PG releases (MERGE). Back up just a bit for something that's nearly as good, and works with everything since Postgresql 10 (INSERT ... ON CONFLICT). Most of the post describes more complicated approaches that have subtle race conditions, and are worth looking at mainly to understand what the problems are.

geekodour · on Aug 5, 2024

at one point i was using upserts for everything but then had a case where it was impossible to get it working with generated id.

wrote about it very roughly here https://blog.geekodour.org/posts/upserts/

can3p · on Aug 5, 2024

Small nitpick - the author mentions upsert in the beginning of the article only to "forget" about it and use it in the end (insert .. on conflict ignore) and that's the obvious solution there.

The final query is very neat though! And special thanks for mentioning "MERGE ... RETURNING" in PG17, that's really cool

atemerev · on Aug 5, 2024

I wonder why this really common use case requires so much esoteric SQL/Postgres knowledge. While everything is well explained, the solution is not simple, by all means. I don’t like complex solutions to common problems.

robertlagrant · on Aug 5, 2024

SQL is partitioned along the major axes of GET, INSERT, UPDATE, DELETE. Combining any of them just isn't so natural for SQL. Upsert is another one.

da_chicken · on Aug 5, 2024

It's not that difficult to do in the general case. Usually, you're just doing it through the program.

- Open connection

- Begin transaction

- Do INSERT

- If you get a PK violation, do an UPDATE instead

- Commit

The thing is, while it's a common pattern, it also has many equally common similar patterns. Sometimes you want to try the UPDATE first. Sometimes you want the program to throw an exception if the INSERT fails. Sometimes you're doing something with row versioning or pessimistic locking.

It's same reason why your file IO library might have a ReadAllText() method, but only after it has Open() and Read() and ReadLine().

zokier · on Aug 5, 2024

- Begin transaction

- Do INSERT

- If you get a PK violation, do an UPDATE instead

- Commit

This is not safe in the default transaction mode, other transactions can delete the row between insert and update/select.

Also, as explained by the article, this has the table bloating problem

da_chicken · on Aug 5, 2024

If that's not safe, then Postgres has not implemented transactions correctly. That seems unlikely, so my guess is that what you think I'm suggesting is not what I'm actually suggesting.

zokier · on Aug 6, 2024

there is nothing wrong postgres transactions, they work exactly as intended, but they aren't magic. this is pretty easy to test:

Session 1:

    psql (16.3 (Debian 16.3-1.pgdg120+1))
    Type "help" for help.

    postgres=# CREATE TABLE tags (
      id INT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
      name VARCHAR(50) NOT NULL
    );
    CREATE TABLE
    postgres=# ALTER TABLE tags ADD CONSTRAINT tags_name_unique UNIQUE(name);
    ALTER TABLE
    postgres=# INSERT INTO tags (name) VALUES ('A'), ('B') RETURNING *;
     id | name
    ----+------
      1 | A
      2 | B
    (2 rows)

    INSERT 0 2
    postgres=# BEGIN;
    BEGIN
    postgres=*# INSERT INTO tags (name) VALUES ('B') ON CONFLICT DO NOTHING RETURNING *;
     id | name
    ----+------
    (0 rows)

    INSERT 0 0

Session 2:

    psql (16.3 (Debian 16.3-1.pgdg120+1))
    Type "help" for help.

    postgres=# DELETE FROM tags WHERE name = 'B';
    DELETE 1

Session 1 continues:

    postgres=*# SELECT * FROM tags WHERE name = 'B';
     id | name
    ----+------
    (0 rows)

    postgres=*# END;
    COMMIT

(yes, I added `ON CONFLICT DO NOTHING` to avoid aborting the transaction prematurely)

da_chicken · on Aug 6, 2024

I did some more research. Postgres doesn't handle transactions correctly. Once an error on a constraint occurs, subsequent commands are ignored. You can't handle errors in a transaction. Most RDBMSs don't require a rollback in that situation. It's weird.

anarazel · on Aug 6, 2024

You can, you just need to use savepoints.

zokier · on Aug 6, 2024

Yeah the error handling is a side-note here. The main thing to understand is that postgresql transactions have "read committed" isolation level by default, which means that reads will see the writes from any committed other transactions, which means that there can be arbitrary changes to data between operations, e.g. between insert and select, even if no errors or conflicts are involved. This is pretty much what the sample I posted in parent comment demonstrates.

BeefySwain · on Aug 5, 2024

What transaction mode would you need to make this safe? Can you explain further?

VWWHFSfQ · on Aug 5, 2024

I believe PG will abort the transaction on an exception (PK violation) and the subsequent update will not run in the same context that had original violation. So it could result in a data race. I don't know what isolation level would fix that, if any.

My understanding is that in general, if you hit an exception in postgres at all then you can't trust the isolation of the current transaction anymore.

That's what MERGE and speculative insertion (on conflict do update) addresses.

zokier · on Aug 6, 2024

well, yes, you are right that exception does abort the transaction. but even if you use something like 'on conflict do nothing' to avoid the exception, you still can get problems with concurrent writes (see my sibling comment https://news.ycombinator.com/item?id=41169638)

REPEATABLE READ or SERIALIZABLE isolation levels would help with that, as the name suggest repeatable read ensures that the read made by inserts constraint check is repeatable in successive select (or update) statements.

https://www.postgresql.org/docs/current/transaction-iso.html is pretty comprehensive.

da_chicken · on Aug 6, 2024

Yeah this is correct. I didn't realize it.

Most RDBMSs don't behave this way. They allow you to correct exceptions that occur for anything less than a deadlock without a total rollback, but not Postgres.

eknkc · on Aug 5, 2024

Oh I've been doing the `on conflict do update set id = EXCLUDED.id` thing for ages now. Thought the update would be no-op because nothing changed. Interesting.

arkh · on Aug 5, 2024

> Thought the update would be no-op because nothing changed.

You can use a specific trigger function just to prevent this: suppress_redundant_updates_trigger

https://www.postgresql.org/docs/current/functions-trigger.ht...

> The suppress_redundant_updates_trigger function, when applied as a row-level BEFORE UPDATE trigger, will prevent any update that does not actually change the data in the row from taking place. This overrides the normal behavior which always performs a physical row update regardless of whether or not the data has changed.

agwa · on Aug 5, 2024

I just tested that and the trigger causes the INSERT to return no rows if the row already exists.

It basically turns the ON CONFLICT DO UPDATE into ON CONFLICT DO NOTHING.

teget · on Aug 5, 2024

If you update a column that is not indexed you might get away with it without stressing the cleanup processes too much (vacuum and friends):

https://www.postgresql.org/docs/current/storage-hot.html

jillesvangurp · on Aug 5, 2024

This is super informative, thank you. I did a small OSS project around postgresql a while ago to implement a simple document store with Kotlin and a low level co-routine friendly postgresql driver called jasync-postgresql that doesn't do blocking IO. I ended up using a few of the other solutions mentinoed in the article. But based on this, I might want to upgrade it to use MERGE.

My intention with this project is to use it for storage and transactions. Most of our query heavy lifting is done via external search engines like Elasticsearch. And my attitude with databases is that if you are not querying on it, it doesn't need separate tables, columns, and indices. The point with a document store is that you only use limited querying strategies. I've used documents stores for various projects for over ten years and that style of persistence just fits a lot of things I do.

I'm currently planning to roll this out for one of our products. So, this is not just a hobby project. If somebody has time; I wouldn't mind some eyeballs on this project: https://github.com/formation-res/pg-docstore

shiandow · on Aug 5, 2024

Interesting and very educational article.

I do wonder why in the first example of inserting multiple tags with get_or_create_tag the id of 'C' suddenly changes to 4. Were some queries perhaps not run in the order they appear?

CRConrad · on Aug 5, 2024

I got the same impresssion, that the demo wasn't implemented as a single "clean" run. Another indication of this is that the next passage, about eliminating "bloat" from failed inserts, shows the table containing two live tuples again, not three or four as it should after the preceding example.

haki · on Aug 5, 2024

Since at least two people noticed, I would just say that the insert-select-insert approach was added later, after I took some inspiration from Django's implementation of get_or_create.