
Monitoring and Architecting for Failure - mmcclure
https://mux.com/blog/monitoring-and-architecting-for-failure-at-mux/
======
therealwardo
I'm the author of this post.

Here are a few things I highlight in this post as things to consider when
architecting for failure: Retry, Backoff and Rate Limit. Use a Cache. Add
Redundancy. Build a Buffer. Reconsider Dependencies. Introduce Isolation.
Improve Test and Release Practices.

Click the post for more about how I think about each of these. I think that
considering cost tradeoffs when doing evaluating each of these approaches is
what makes architecting systems so challenging (and interesting).

What else do you do in your systems to handle failures gracefully? Any
questions about what we are doing or how we are doing it?

