That was my initial thought as well. I'm hoping the original author reads this a...

aleksiy123 · on May 16, 2024

I think because ID is random and not monotonically increasing.

From article one of them is: 'tpomh9yu1tffmdp6dopobwuc9h'

But the query is trying to page through the most recent posts first.

So a newer post doesn't necessarily have a higher ID or lower ID.

The ID is just there to have a stable tie break between the same timestamps.

paholg · on May 16, 2024

One branch of the OR is comparing IDs as though they're monotonically increasing, so either they are or the query is broken.

throwaway28453 · on May 16, 2024

I think it's a reasonable implementation. It is only used to separate rows with the same created timestamp. This query is run some time after the insert has happened, so it is unlikely that there have been more rows with the same timestamp at query time.

If this assumption is correct, then first sorting by the timestamp and then sort the id alphabetically will ensure that the pagination is deterministic.

I guess we could consider some edge cases where we have lots of rows with the same timestamp that is inserted after the ingestion query is run due to latency, but it might be acceptable for this use case.

Delmololo · on May 16, 2024

In this case they should switch to uuidv7 or similar to have this guarantee

aleksiy123 · on May 16, 2024

Switching IDs is going to be a breaking change across their whole product and customer base.

Thats a way bigger undertaking and decision then just optimizing a single query.

Like killing an ant with a Tsar Bomb.

throwaway28453 · on May 16, 2024

One alternative could be to add a separate column that can be used to keep track of insert order. But they would need to consider the costs before they do this, as it could impact the insert performance.

The true "modern" cool kids solution would of course be to create a service that listens to WAL, inserts it into a Kafka cluster that is connected to a pipe for inserting into Elastic search. Much more fun and resumé friendly than optimizing a query. I bet the author of this blog would get a much larger audience.

Bonus points if the ingestion service was written in Rust and run on a serverless platform

Delmololo · on May 16, 2024

Actually it's not a breaking change.

You can switch the uuid generation, wait a little bit and add an.'archive' switch which will use the old and slower query when the date is old.

Should definitely be helpful resource wise for a lot of people

Delmololo · on May 16, 2024

Yes and sooner this gets started the better.

Otherwise it will cost more and more.

Especially for a chat app!