
Scaling Postgres with Read Replicas and Using WAL to Counter Stale Reads - craigkerstiens
https://brandur.org/postgres-reads
======
MartinMond
Instead of having an "observer" process that updates a table with the LSN you
can just ask Postgres directly: The `pg_stat_replication` view has the last
LSN replayed on each replica available:
[https://www.postgresql.org/docs/10/static/monitoring-
stats.h...](https://www.postgresql.org/docs/10/static/monitoring-
stats.html#PG-STAT-REPLICATION-VIEW)

Also, instead of updating the `users` table with the LSN of the commit - which
creates extra write load - why not store it in the session cookie, then you
can route based on that.

Another option is to enable synchronous replication for transactions that need
to be visible on all replicas:
[https://www.postgresql.org/docs/10/static/warm-
standby.html#...](https://www.postgresql.org/docs/10/static/warm-
standby.html#SYNCHRONOUS-REPLICATION)

Since this can be enabled/disabled for each transaction it's really powerful.

~~~
anarazel
> Instead of having an "observer" process that updates a table with the LSN
> you can just ask Postgres directly: The `pg_stat_replication` view has the
> last LSN replayed on each replica available:

Note that that'll lag behind reality a bit, we don't continually send the
feedback messages that contain the replay progress.

> Another option is to enable synchronous replication for transactions that
> need to be visible on all replicas:
> [https://www.postgresql.org/docs/10/static/warm-
> standby.html#...](https://www.postgresql.org/docs/10/static/warm-
> standby.html#..).

Note that's only behaving correctly if synchronous_commit is set to
remote_apply, which has been added in 9.6. Before that syncrep could only
guarantee that the remote side has safely received the necessary WAL, not that
it has been applied.

~~~
MartinMond
Thanks!

What's your recommendation how to solve the problem described in the article?
Would you compare LSNs of replicas in application logic at all?

~~~
anarazel
I've only skimmed it so far, so I might be missing parts.

I think it depends a lot on what you're trying to achieve. In a lot of cases
all the guarantee you need is read-your-own-write, and small delays for an
individual connection aren't that bad. In that case it can be a reasonable
approach to inquire the LSN of the commit you just made, and just have your
reads, to whichever replica they go, wait for that LSN to be applied.

For a system that's very latency sensitive that'd not work, and you'd fall
back to querying the master pretty much immediately. Or you'd use something
like described in the article, although I'd probably implement it somewhat
differently.

------
jedberg
Pretty clever take on an old technique.

At reddit we would store a field on the user recording the last time they
voted on something. If they loaded a page that was newer than that timestamp,
then we could skip reading any votes for that page since we knew they couldn't
possibly have voted on anything on that page.

This basically takes the same technique and puts it a level down and reverses
it, saying "this database has any update this user could have made".

I like it.

------
dustingetz
If I understand this right, they can guarantee fresh point reads. But an
actual query needs a consistent snapshot of all entities (rows) under
consideration for the query. ... for all of the "hundreds of queries a mature
application might execute to fulfill a single request". You need a snapshot of
the entire database (like git)

FTA "When mutating a resource in the system we’ll store the last committed
log-sequence-number for the entity making the request"

This is what Datomic does by design, if you need infinity read scaling and
perfect consistency just use that, and you get a host of other benefits (the
replica is _inside your application process_ , like a data structure, which
has profound architectural consequences, for example you can query in a loop
so N+1 problem is solved).

------
foota
Wouldn't this only work when your client is the one that last updated the
record? Because how else would you know the correct min lsn? Seems like you
would need to store the map from user to lsn in something other than pg to get
any benefit here.

~~~
brandur
(I'm the author.)

I'm storing `min_lsn` directly to the database, so anything connecting to it
and loading a user record should be able to take advantage of the information.

I have a note in there about this as well, but although I'm showing how this
information can be persisted to Postgres so that I'm only using one data
store, it could really live in any kind of shared persistence layer.

For example, replication statuses and a user's `min_lsn` might be put in Redis
instead, and be treated as ephemeral data such that we don't worry about
losing them because they can be repopulated on every restart. It's also
probably a good idea to have individual API workers caching replication
statuses directly in memory so they can select a read candidate without even
having to go to a database.

~~~
foota
Thanks for writing the article, it was interesting. To see if I'm missing
something here though, in order for this to offer benefit the way it is now, I
would need to already have a correct min_lsn for a user? But the issue is that
if I got this from a previous request it could be out of date, isn't that
accurate?

~~~
asdfaoeu
I assume they are fetching the min_lsn for a user from the master so it would
be up to date.

~~~
foota
Doesn't that defeat the point of moving reads off the master? I guess they're
lower cost reads though..

------
mbell
I'm having a bit of a hard time figuring out whom would use this. How big is
the intersection of folks that can't figure out how to setup Citus but need a
number of read replicas?

~~~
Manozco
I cannot go into much details about what I do, but basically we have some kind
of "network operators" that are the only one using the write db (and not a
lot) and then we have a massive amount of computers doing reads on the system.
We don't need citus for that, but PG replica does the job very well

