

CircleCI: Our First Postmortem - dlowe
http://blog.circleci.com/our-first-postmortem/

======
carimura
Happens to the best of them... thanks for the detailed writeup guys. Love the
service.

~~~
dlowe
Thank you!

------
dqminh
any reason why you guys choose to pipe the compressed content into `tar xzf`
process inside the container instead of extracting it outside and overlay the
extracted content onto the container via overlayfs or something similar ?

~~~
dlowe
Piping it in allows the build driver to be agnostic about the physical
location of the container.

------
chris_wot
Troubleshooting can be a bitch.

Could you add a tl;dr though?

~~~
dlowe
I'm not sure if I can do any better than "troubleshooting can be hard",
frankly. The actual details are all tangled together in a way that resists
summary.

~~~
chris_wot
Just tell customers that there were queue backlogs caused by slow git clones
that were exacerbated by server failures that occurred due to kernel panics
and LVM snapshot problems. These were resolved, but due to MTU configuration
changes made during troubleshooting there were further outages; later on an
unrelated bug in schejulur caused another outage.

However, all these issues are now resolved and your service is far more robust
because of it.

