I think the problem is the service that feeds the cache durations is something t...

cestith · on July 28, 2021

If you don't mind serving stale content, you can do exactly the opposite. Instead of a relatively short TTL that gets extended for failure one can set an infinite TTL and actively eject data that's been updated. In either case, having a separate service handle checking the query and timing out the cache content that is neither part of the frontend nor backend is another thing that can break but can be a fairly simple, isolated thing that doesn't break consistently along with either the frontend or backend.

thehappypm · on July 28, 2021

This seems reasonable for a read-heavy system (like Reddit). In a failure, the content on the site would just get staler and staler, but there wouldn't be a heavy outage.

jtsiskin · on July 29, 2021

This is like the “fail open vs fail closed” question. It depends on what is worse. For legal reasons, they might need the ability to say “yes, we can guarantee we will be no longer serving this content within 1 hour”.