
How Ansible Took Down Our Servers - heartsucker
https://www.heartsucker.com/blog/how-ansible-took-down-our-servers
======
nasalgoat
In the olden days, before AWS, you'd just log into your console as root and
fix it.

Now there's no console.

------
flippyfloppy
I love a good ansible horror story. That said, why you delete the directory
recursively? Why not just push the intended state and be done with it. This
could happen with any configuration tool if you do a recursive delete. If you
have some files you don't want just have a task that says you don't want the
file to be there. Full disclosure I love ansible and use it all the time

~~~
heartsucker
In a previous version of ansible, the user module would error out if it
couldn't delete the home directory, so this was added as pre-remove-user step
to prevent errors. In theory we could have pushed an empty authorized_keys,
but it's so simple (if not erroneous) to say "If the user is gone, remove
their home."

------
devn0ll
I'm thinking this would have solved it:

path: "/home/{{ item|default('doesnotexist/') }}"

But yeah, one would do that on test machines first, say: Vagrant on your local
machine at least.

It may also make sense to create a user with it's home in a different place,
for "just in case" purposes. Or even just use something like freeipa for all
accounts.

~~~
heartsucker
That would have solved it. For me the problem was that somehow Ansible took an
non-existent value, looped over that, and then applied that value as the
variable inside the loop.

------
joobus
Why didn't he test his script on a dev server before blasting every server in
production?!

~~~
heartsucker
#moveFastAndBreakThings ?

