
Do we really need network automation? - mirceaulinic
https://mirceaulinic.net/2019-01-09-do-we-need-network-automation/
======
kelp
Back when I lead Network Engineering at Square, we had a global network of
production dataceters and offices. Nothing of the scale of Cloudflare, but we
still have many hundreds of devices. They had all been built manually with
copy and pasted configs. The trouble with this was it made changes very risky
and terrifying because there could be subtle inconsistencies between sites, or
huge differences. So it was always very challenging to reason about the impact
of a given change.

Thanks to a ton of grit by the team, and the insistence of one engineer in
particular we built a config management system and started tracking the total
percent of our global network config that was managed by our config management
system.

That metric was regularly presented at the VP level to hold us accountable to
getting the percentage to 100.

It was months and months of boring work to remove inconsistencies and
templatize configs. But in the end, I believe it resulted in a much more
reliable and ultimately safer network to operate. I'm also happy that my
management chain saw the value in this work.

I'm quite proud of the work the team did.

Some side benefits were that once we started going through audits like SOC2,
we had a really good story to tell about how we reviewed and pushed changes to
production.

~~~
zackify
What did you use to automate it? Ansible? Or what. Thanks!

~~~
kelp
We wrote our own thing because at the time people were not all doing that with
Ansible.

------
nerdbaggy
I really like way Avaya and Cisco DNA do automation. It’s all centrally
managed and it uses VXLAN so no vlans have to be present. Just say you want
vlan 20 here and here and it takes care of the rest

~~~
zamadatix
Avaya's previous network architecture (their networking business is now owned
by Extreme) wasn't at all centrally managed. I'd go as far to say SPBM is the
anti-thesis of centralized networking management. It didn't use VXLAN either.

------
faebi
What open source tools scale easily to manage 100‘000 devices or more via
CLI|SSH?

~~~
mirceaulinic
From what I heard (I never had so many devices to manage myself), Salt can
scale nicely when managing a very large number of devices - see for example
this story from LinkedIn: [https://s.saltstack.com/saltstack-at-web-scale-
better-strong...](https://s.saltstack.com/saltstack-at-web-scale-better-
stronger-faster/). However, they're using it in the "classical" topology which
is agent based. For non-agent base, you may consider looking into Salt-SSH:
[https://docs.saltstack.com/en/latest/topics/ssh/](https://docs.saltstack.com/en/latest/topics/ssh/).
Hope this helps!

