Hacker News new | past | comments | ask | show | jobs | submit login

> Drop packets when demuxing. From y to o, it's best to drop packets at o rather than use backpressure from o to y. Otherwise, a single full output queue could starve the others by stopping y. (Example: imagine if a full queue to a 1 Mbps wifi station could stop traffic to a 100 Mbps wifi station. Some Linux wifi drivers suffer from this.)

Aaah, memories. Combined with point 11, pause frames. I was debugging a weird issue with a gbit switch about 15 years ago.

Port A is a server sending to port B and C. C is only capable of 100mbits. I could send from A to B at 950mbits, and A to C at 50, all good. As soon as I didn't artificially throttle the rate to C at A, it would eventually hit 100mbits for A -> C, which resulted in the rate from A to B also dropping to 100mbits, so a total output of 200 at A. After a lot of trying and poking I saw these mysterious pause frames in Wireshark, which I glanced over before because who'd wanna look at anything below IP... Once I looked them up and disabled pause frames on all the machines, I got the expected result of 900 to B and 100 to C. And once I figured that out it was trivial to formulate a google query that resulted in exactly this problem and the solution to it, which I failed at before.

So ever since then disabling pause frames is one of the first things I do when networking is acting weird.

Bonus: Back then when I told an older colleague about my findings, he basically confirmed "pause frames are evil" with another story: Late 90s they started having a problem in another department that entire network segments sometimes became completely unreachable. And the machines in that segment couldn't even communicate with each other. Randomly power-cycling switches and replugging machines solved the problem. After quite some time they tracked it down: Some folks in said department got shiny new laptops, and whenever those entered standby, the NIC "didn't get the message". Its buffer would eventually fill up (as there was no OS running to handle any packets) and from then on, the network segment would get spammed into oblivion with pause frames.




The modern AQM's are based on "Time in Queue". Both pie and codel work brilliantly with pause frames, so long as BQL is also in the driver. fq+aqm (be it codel, pie or cobalt) works even better.

See:

https://datatracker.ietf.org/doc/html/rfc8290

https://datatracker.ietf.org/group/aqm/documents/

https://arxiv.org/abs/1804.07617

Adding AQM and FQ wifi was way, way harder, (apenwarr drove the group at google that did some of it), but there is full support for fq_codel now in the mt76, ath9k, ath10k, iwl, and one new realtek chipset in the linux kernel. https://lwn.net/Articles/705884/


Funnily enough there’s another excellent article by apenwarr that touches on this a bit. Ethernet is designed to go as fast as possible/line rate and assumes that everything is the same speed. When you disabled pause frames, it likely prevented the switch from interfering and punting the issue to the TCP stack of the server/client.

https://apenwarr.ca/log/20170810




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: