
Troubleshooting IPv6 badness to certain hosts in a rack - tomsmeding
http://rachelbythebay.com/w/2018/03/16/slowroad/
======
nikanj
I find it sad and amusing that turning every piece of equipment off and on
again would have fixed the symptoms, i.e. flushed the caches.

------
jwbensley
Hello author, NetEng department here, good article, thanks!

I'm sad to say this problem affects IPv4 too and its not that uncommon when
using layer 3 switches. The TCAM allocation for IPv4/6 is usually a fraction
of the size allocated for MAC addresses.

Also a big annoyance of mine is that TCAM space is something few people graph
by default on their NMS system. Case in point: we have Cacti as a backup NMS
because it can graph >anything< that is a number. Some L3 switches have a MIB
from the vendor that reports TCAM usage, others from the same vendor use a
different MIB, and others from the same vendor don't expose the stats via SNMP
at all and we have to scrape them from the CLI! So the vendors don't make it
easy for an operator to track a limited and critical resource of a layer 3
switch. I urge everyone to graph their switches TCAM usage however they can.

~~~
jauer
> Also a big annoyance of mine is that TCAM space is something few people
> graph by default on their NMS system. [...]

This (and really everything else you mention) is one thing I like about doing
NetEng at FB. We collect all this info (and anything else you can think of)
from every network device, try to normalize it, and then build tooling that
watches for and reacts to any anomalies.

(as far as I can tell the events in the article were from 2015 and the vendor
involved was thrown out in the aftermath--this was one of many bugs)

------
bluedino
>> Pay attention to ssh lag, even if it seems minor

I noticed this one time while investigating a machine that was have varied
upload speeds - the only one in the server room.

Ended up being the port on the switch was set on autonegotiate, and there was
some issue with the firmware version on some of our NICs.

------
js2
I was sure this was going to be an MTU issue. I'm glad it was a bit more
exciting than that.

~~~
op00to
Same! At least 10 times in my career it was an MTU issue when presented with
weird latency or packet loss.

~~~
VectorLock
MTU or Spanning Tree, so frequently the bugbear of weird network behavior.

