Also, instead of updating the `users` table with the LSN of the commit - which creates extra write load - why not store it in the session cookie, then you can route based on that.
Another option is to enable synchronous replication for transactions that need to be visible on all replicas: https://www.postgresql.org/docs/10/static/warm-standby.html#...
Since this can be enabled/disabled for each transaction it's really powerful.
Note that that'll lag behind reality a bit, we don't continually send the feedback messages that contain the replay progress.
> Another option is to enable synchronous replication for transactions that need to be visible on all replicas: https://www.postgresql.org/docs/10/static/warm-standby.html#....
Note that's only behaving correctly if synchronous_commit is set to remote_apply, which has been added in 9.6. Before that syncrep could only guarantee that the remote side has safely received the necessary WAL, not that it has been applied.
There is also a proposal to add a 'wait for LSN' command, which the author of this article should check out. See https://commitfest.postgresql.org/15/772/ .
The two proposals achieve some form of causal consistency from different ends: one makes writers wait but doesn't require extra communication of LSN inside (and possibly between) client apps, and the other makes readers wait but requires more work from clients. I hope we can get both of these things in (in some form)!
What's your recommendation how to solve the problem described in the article? Would you compare LSNs of replicas in application logic at all?
I think it depends a lot on what you're trying to achieve. In a lot of cases all the guarantee you need is read-your-own-write, and small delays for an individual connection aren't that bad. In that case it can be a reasonable approach to inquire the LSN of the commit you just made, and just have your reads, to whichever replica they go, wait for that LSN to be applied.
For a system that's very latency sensitive that'd not work, and you'd fall back to querying the master pretty much immediately. Or you'd use something like described in the article, although I'd probably implement it somewhat differently.
> Also, instead of updating the `users` table with the LSN of the commit - which creates extra write load - why not store it in the session cookie, then you can route based on that.
The article's a little long, but at one point I do mention that we're putting this information in Postgres in the demo for convenience, but that in a real implementation it might be worthwhile moving it elsewhere as an optimization.
This addresses the first point as well: you're right in that this information can be procured from Postgres, but the point of putting in the observer is to demonstrate a system that could be implemented in a way that's agnostic of storage. It's very plausible that you might want to put `min_lsn` and replication statuses in something very fast like Redis, and even have each `api` worker caching its own version of the latter so that a replica can be selected without even making a network call.
(I've asked Brandur this in the past, so assuming he hasn't changed it this should still be what he uses)
My internal ikiwiki pages are filled with ASCII art diagrams. This tool looks awesome; I’ll have to try it out!
At reddit we would store a field on the user recording the last time they voted on something. If they loaded a page that was newer than that timestamp, then we could skip reading any votes for that page since we knew they couldn't possibly have voted on anything on that page.
This basically takes the same technique and puts it a level down and reverses it, saying "this database has any update this user could have made".
I like it.
FTA "When mutating a resource in the system we’ll store the last committed log-sequence-number for the entity making the request"
This is what Datomic does by design, if you need infinity read scaling and perfect consistency just use that, and you get a host of other benefits (the replica is inside your application process, like a data structure, which has profound architectural consequences, for example you can query in a loop so N+1 problem is solved).
I'm storing `min_lsn` directly to the database, so anything connecting to it and loading a user record should be able to take advantage of the information.
I have a note in there about this as well, but although I'm showing how this information can be persisted to Postgres so that I'm only using one data store, it could really live in any kind of shared persistence layer.
For example, replication statuses and a user's `min_lsn` might be put in Redis instead, and be treated as ephemeral data such that we don't worry about losing them because they can be repopulated on every restart. It's also probably a good idea to have individual API workers caching replication statuses directly in memory so they can select a read candidate without even having to go to a database.
"By routing read operations only to replicas that are caught up enough to run them accurately"
"To save every api process from having to reach out and check on the replication status of every replica for itself, we’ll have a process called an observer that periodically refreshes the state of every replica and stores it to a Postgres table. ... Keep in mind that this status information could really go anywhere. If we have Redis available, we could put it in there for fast access"
"The observer runs in a loop, and executes something like this on every iteration"
Also, "The technique is inspired by GitLab’s article on scaling their database, where they refer to it as 'sticky connections'"