
Network Monitoring, Moral Hazards and Crumple Zones - ohjeez
http://etherealmind.com/network-monitoring-moral-hazards-and-crumple-zones/
======
neffy
This is the old epic circuit switching vs packet switching religious divide
dressed up in new clothing.

The reason why data networking is done in "declarative" mode rather than
"imperative" aka centralised mode, is that centralised systems can't scale -
or they can scale - but not to the degree required for planetary communication
of seemingly infinite quantities of cat videos. Mathematically, it's not just
that the difference in the available real time information capacity is orders
of magnitude, but that there is a reachable capacity for very large
distributed systems, while all centralised systems are necessarily limited by
the centre.

The tools are light years ahead of where they used to be, but you still have
to able to use them, and deal with users who at no time want to acknowledge
that their application has to be a good network citizen. Just like real life
really.

------
devonkim
This article resonates a lot with me as someone that was supposed to be in
devops and wound up becoming a human network monitor in the end that covered
the blind spots of the network infrastructure team's monitoring while teaching
some engineers how to use tools like mtr and how to disprove that the network
is not responsible and how to use Chrome Developer Tools to show network
latency v. infrastructure latency.

Fundamentally, the network infrastructure team had plenty of monitoring - it
was just insufficiently scoped for anything anyone _but_ the network team
cared about. The granularity of the monitoring was insufficient to catch a
wave of handshake failures with third parties, yet it wasn't broad enough to
detect when everyone across the enterprise was experiencing so much packet
loss due to asymmetric routing that throughput to third party services dropped
by an order of magnitude, too. Yet somehow every rack in the DCs had perfect
layout and each cable was pinpointed to the extent that if you unplugged a
cable randomly someone would come directly to that rack and switch in 3
minutes... after going through the biometric scanners and everything.

The fact that I became a part-time network engineer spending half my time
doing network monitoring and troubleshooting when I was supposed to lead cloud
operations and develop a devops in such an immature environment with no
urgency to make it better is a large part of why I quit.

------
padiyar83
Network monitoring sucks because its usually an afterthought from a protocol
design standpoint. This means most of the monitoring tools end up running at a
very high abstraction layer like SNMP or PING requests and as a result, deeper
level visibility suffers. We need to make network monitoring as a first class
citizen and start including them within networking protocol stack. Like say
call out a unique TCP packet type for 'monitor' with the sole job of carrying
details on how many packets were dropped, who dropped it and when. Do the same
with control plane protocols like BGP and OSPF in that any changes like
learning a new route or removing a new route should be relayed to a monitoring
tool directly from the protocol stack itself.

------
solotronics
I agree that a big part of the problem is SNMP and logs/screen scraping.
NETCONF is a good starting point for this but there should be an equivalent
for getting XML of monitoring and metrics in a streaming manner. There are an
infinite number of potential issues and the standard high level SNMP driven
overviews of networks always fall short. It would be really cool if there was
an open source network measurement and instrumentation movement. Facebook
actually posts things along these lines such as their NANOG talks on
automation.

------
starving_coder
Legacy networking gear might have faced, back in the days, a chronic lack of
tools when confronted with performance and/or stability issues. And most of
tools/solutions developed were as a direct result of the push from the
industry for concrete solutions that instrument the problem methodically. May
be the challenges faced by SDN/NFV can similarly be classified as growing
pains and the times may present a tangible solution to address the problems
you've discussed.

