Hacker News new | past | comments | ask | show | jobs | submit login

I think the problem is the service that feeds the cache durations is something that can itself have an outage, which itself could trigger a cache stampede. Usually to solve these kinds of problems you want less new blocks in the block diagram, not more.



If you don't mind serving stale content, you can do exactly the opposite. Instead of a relatively short TTL that gets extended for failure one can set an infinite TTL and actively eject data that's been updated. In either case, having a separate service handle checking the query and timing out the cache content that is neither part of the frontend nor backend is another thing that can break but can be a fairly simple, isolated thing that doesn't break consistently along with either the frontend or backend.


This seems reasonable for a read-heavy system (like Reddit). In a failure, the content on the site would just get staler and staler, but there wouldn't be a heavy outage.


This is like the “fail open vs fail closed” question. It depends on what is worse. For legal reasons, they might need the ability to say “yes, we can guarantee we will be no longer serving this content within 1 hour”.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: