
Ask HN: Good resources for systems resiliency / reliability? - Spooky23
My employer has gone through alot of m&amp;a activity, and recently suffered from some significant outages as of late, and we&#x27;re starting to put an increased focus on improving the reliability of our systems, resiliency to failures and ability to provide support.<p>It&#x27;s a big enterprise environment, but I&#x27;m in a someone unique business unit that is responsible for a broad variety of critical systems that would typically be in silos. Intuitively, we&#x27;re pretty good at this stuff where I sit, but we want to level up our capability and lead where we want the rest of the organization to go by example.<p>Where would you recommend starting? I&#x27;m looking for recommendations for best books, resources, types of people to talk to.
======
aayala
I have a few links

    
    
      * http://book.mixu.net/distsys/single-page.html
      * http://the-paper-trail.org/blog/distributed-systems-theory-for-the-distributed-systems-engineer/
      * http://www.hpcs.cs.tsukuba.ac.jp/~tatebe/lecture/h23/dsys/dsd-tutorial.html
      * https://henryr.github.io/distributed-systems-readings/

~~~
Spooky23
Thank you!

