This post reads like it's vaguely implementing what should be in a queue backend. There's a locked_at field in the schema, furthermore should this not be performed via ...FOR UPDATE?
It is of note that we're actually using the UUID datatype though and not just generating a UUID and throwing it into a text field.
I was thinking that a UUID datatype implemented as a series of ints could have fairly good join performance, since you can effectively treat it as a series of separate smaller int indices that you join across, and I imagine that's a well understood and optimized problem for years now. A text UUID field though, ugh, that just seems so wasteful even before you get to optimization techniques.
For comparison in storage:
select pg_column_size('DDB5E9ED-60B6-4405-A6A8-18E339C7B172'::uuid) AS uuid_bytes,
pg_column_size('DDB5E9ED-60B6-4405-A6A8-18E339C7B172'::text) AS text_bytes;
uuid_bytes | text_bytes
16 | 40
If you're using the UUID type to encode information that you may want to access subfields of, you could create function indexes to do so. Otherwise, I think indexing on the native UUID type is going to be better than a collection of narrower fields and indexes.
Edit: Spell 'craigkerstiens correctly.
I was speaking towards the idea that some people throw UUIDs into text fields, as implied by the parent;s clarification that they were not doing that. For example, in years past when that type may not have existed yet. People have been talking about "just use a uuid for the primary key" for at least around a decade now.
> As 'craigkierstens points out, the index performance of the native UUID datatype is very good: it's not a string comparison.
I would assume so! Any database that implemented an actual 128 bit numeric type (no matter if it's converted to hexidecimal notation for display, ints aren't stored in decimal either) as a text field would deserve quite a bit of ridicule, in my opinion.
> If you're using the UUID type to encode information that you may want to access subfields of
And that answers a question I raised somewhat implicitly in my other comment, which is why use a UUID in the first place. Because you can encode specific sub type information into the parts. Thanks!
Edit: Forgot some words in my paragraph that mentioned ridicule, so I added them to make sure it was obvious what and why I thought they would deserve some ridicule (which is actually a bit stronger than how I generally like to come across).
If we're storing UUID's in canonical form that's 36 bytes of information (32+4 for separators, and if you're stripping separators why not convert to a numeric form anyway...). Even if we assume we're only storing 32 characters, that's 256 bits, so we're doubling the storage requirements and memory usage for an index. Assuming you're using UUID's because you have a lot of records and want to keep them unique, that space presumably matters (especially if you're using it multiple places because there are relations). If you don't have a huge amount of records, a sufficiently large auto-incrementing int field that increments by a random amount between 1 and 255, and a randomized bit shuffle codified into a reversible function (extend int to 128 bits, use previously decided bit shuffle to rearrange bits. Fully reversible) would solve that need and make everyone who has to deal with the DB's life easier. Just convert the ints to and from UUID representations between your view and controller.
I've never had to actually deal with UUID's at the DB level, so there could be some major downsides to these that I'm not thinking of. At this point, I'm just not sure what storing a UUID in a database gets you.
We do plan to eventually convert the text versions into real uuids but our performance/size isnt an issue at the moment and its a lot of risky engineering work to do.
That's a good point, and makes sense. Also, as gzrm noted in another comment, you can make use of the different parts of the UUID to encode specific information, which may be helpful (but at the loss of uniqueness bits).
Edit: Update to correct snowflake to Instagram's strategy. I make this mistake often enough that it's now self-reinforcing :/
I'll wait a bit to see if Brandur (author chimes in first), otherwise happy to pile in some thoughts.
Yeah, I'm relying on transactional semantics to guarantee no contention during the lock, but the part that I'd intended as important here is really just the idea of locking the key — not the particular implementation. I think alternatively using a `FOR UPDATE` is totally plausible and likely even to be more performant than what I've written here.
Try to imagine this as more of a "ideas guide" rather than a dictation on precise implementation.
I started wondering at the beginning of the article how many people were going to read it, try to implement it in mongo, and faceplant.
The only thing that strikes me as odd about the hypothetical example is that I can’t imagine a universe where you would want a request for a ride to complete 72 hours after it initially failed.
I get the point of keeping the information around for 3 days in case of errors. And perhaps my imagination is lacking, but most of the cases where I would want to implement this model are time-sensitive. I need a ride now. I need a message guaranteed to deliver exactly once.
Actually, the message queue aspect is how I think of this in general. It’s an application/database layer that takes an at least once mq system and forces it to be at most once. The combination then seems to imply exactly once.
Actually, after writing that out l, I can now think of a really great use case for this that isn’t time sensitive. ETL systems.
Let’s say you are streaming live updates from some third party system to some staging area and triggering processes from there through a broker to transform and load data into a warehouse.
Your team pushes an update that breaks something on Friday night. Nobody works that much on the weekend, so no ones cares. The dev lead gets notifications, but gets to take it easy until Monday morning because they (trigger warning: intentional singular they) know that everything can be replayed and resolved after the bug is fixed. Even in the case of frequently changing api data, you have the entire mutation history stored in your table and can accurately replay it.
I can get on board with this.
And yeah. Sorry for thinking out loud. Now I’m thinking of all kinds of place in my web apps this can be applied, as the article suggests. I think I need to write a Python wrapper around this concept and make it dead simple to include in projects.
Thank you again to the author for this very clear and excellent article.
Can this be used with Memsql with the binary type and will I get the same benefits?
It looks like a atomic check-and-update, but more sophisticated.
In case of network partition where the client doesn't receive a reply, it can safely and simply retry the operation on the promise, which returns the stored result.
The E language popularized futures/promises, and it's still the gold standard.
The concept of idempotence in mathematics predates futures/promises by about a century and is part of the general background understood by most people exposed to any CS concepts. It has been prevalent across a wide range of communities discussing networking protocol and message-passing designs since the inception of the Internet. It appears in countless RFCs and other standards documents which are read by people building web-based systems.
The concept of futures/promises you are promoting is honestly a niche part of computer science from the 1970s which persists in an intersection of distributed systems research and programming language research. It was never the prevalent formalism shared by general CS or network engineering communities. Meanwhile, a different aspect of the same 1970s concept has had a rebirth as an asynchronous programming syntax and appears in widely consumed languages. This different camp of PL researchers has essentially won the popularity contest which defines the new "standard" meaning of futures and promises for the general technical audience.
It's stupid and unnecessarily obfuscated. This abstraction has a well-known technical name, so use it, don't invent a new term.
And I frankly find it bizarre that you call futures/promises "niche" when they're now part of every web browser in the world and millions of programmers use them every day. The fact that these futures aren't durable isn't relevant since the browser environment itself is transient.
There is a vast literature using many different forms of explicit identifier tied to some form of idempotence. Do you run around the IETF and W3C telling them to stop saying "Message-ID" when they really mean futures/promises? Or security researchers with their nonces? How about those pesky IPv4 and IPv6 designers with their silly ports, SYN cookies, and sequence numbers?
Oh, was I asleep when Petri nets and denotational semantics are now used by millions of programmers around the world?
> Those futures/promises in every web browser in the world are a purely local asynchronous programming idiom, or what I like to think of as CPS unconversion. I didn't suggest that browser programming is a niche concept, but it also has nothing to do with organizing idempotent remote state changes over a lossy message channel.
Of course it does, and I already made this point so I don't know why you're ignoring it. Add durability to browser promises, and there you go. The durability is irrelevant to the semantics.
> Do you run around the IETF and W3C telling them to stop saying "Message-ID" when they really mean futures/promises? Or security researchers with their nonces?
Nonces and Message-IDs have different semantics than promises. It's like you're not even listening.
Idempotency keys have the same semantics as promises. Ergo, they are promises. Promises are idempotent, but not every idempotent abstraction is a promise. It's simple, I'm not sure where the confusion lies.
For more complete examples, consider SMTP, NNTP, and their UUCP predecessors which provided idempotent message transfer and eventually-consistent replication using client-generated, globally distinct IDs. I absolutely fail to see how these are different than a tweet ID or any similar solution for the web. It is such standard practice, we take it for granted like the use of 2's complement arithmetic.
The concept's reappearance in Twitter, Instagram, countless bespoke CRUD applications, or the PostgreSQL example in this article are not "actually promises" just because they're on the web. They are just the umpteenth instance of source-labeled resources being tracked in a distributed system. That is the web abstraction: a resource is created and we do not care whether it is an event, object, message, computational job, bank transaction, or packet of curry being delivered to my doorstep next Tuesday.
Message formats have nothing to do with programming languages. This is how we understand Internet architecture. The local programming language and programming paradigm of each node in the distributed system is immaterial to the messaging semantics and interop standards and their concepts.
But I always thought an idempotency key was just sending in a time stamp or something else unique.
And you'd be better served learning about the more general abstraction rather than learning a single domain-specific trick. This solution is obvious once you know about promises.