Hacker News new | past | comments | ask | show | jobs | submit login
Troubleshooting IPv6 badness to certain hosts in a rack (rachelbythebay.com)
79 points by tomsmeding on March 17, 2018 | hide | past | favorite | 9 comments



I find it sad and amusing that turning every piece of equipment off and on again would have fixed the symptoms, i.e. flushed the caches.


Hello author, NetEng department here, good article, thanks!

I'm sad to say this problem affects IPv4 too and its not that uncommon when using layer 3 switches. The TCAM allocation for IPv4/6 is usually a fraction of the size allocated for MAC addresses.

Also a big annoyance of mine is that TCAM space is something few people graph by default on their NMS system. Case in point: we have Cacti as a backup NMS because it can graph >anything< that is a number. Some L3 switches have a MIB from the vendor that reports TCAM usage, others from the same vendor use a different MIB, and others from the same vendor don't expose the stats via SNMP at all and we have to scrape them from the CLI! So the vendors don't make it easy for an operator to track a limited and critical resource of a layer 3 switch. I urge everyone to graph their switches TCAM usage however they can.


> Also a big annoyance of mine is that TCAM space is something few people graph by default on their NMS system. [...]

This (and really everything else you mention) is one thing I like about doing NetEng at FB. We collect all this info (and anything else you can think of) from every network device, try to normalize it, and then build tooling that watches for and reacts to any anomalies.

(as far as I can tell the events in the article were from 2015 and the vendor involved was thrown out in the aftermath--this was one of many bugs)


>> Pay attention to ssh lag, even if it seems minor

I noticed this one time while investigating a machine that was have varied upload speeds - the only one in the server room.

Ended up being the port on the switch was set on autonegotiate, and there was some issue with the firmware version on some of our NICs.


I was sure this was going to be an MTU issue. I'm glad it was a bit more exciting than that.


Same! At least 10 times in my career it was an MTU issue when presented with weird latency or packet loss.


MTU or Spanning Tree, so frequently the bugbear of weird network behavior.


[flagged]


You don't like conversational narrative?


Nope, it's just that many of her pieces carry a tone of numerous implications that those with whom she works (eg coworkers and such), are idiots. She has a condescending tone which I don't care for. That is all.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: