The current approach is fundamentally not going to work in the long term. 100Gbps at line rate means single digits[1] of nanoseconds between frames. At that frequency, a cache miss is pretty bad.
This is all not to mention locks, or that there are competing functions running in most distros (turn off irqbalance completely and watch your forwarding rate increase).
The low hanging fruit seems to have been picked as well - NAPI polling, interrupt coalescing, RSS + multique NICs + SMP, etc, are already out there, and we're still struggling to do 10G line rate in the Kernel...and data centers are moving quickly to 25/100G.
[1] Edited for terrible math - 10Gbps at line rate is 67ns per packet, 100Gbps is 6.7ns
We are not struggling to do line rate 10G in the kernel. Modern 100Gbe nics (mellanox, solarflare) will happily do line rate with stock upstream kernel for a while now (definitely since 4.x) you only need to tune your irq balancing, and you can probably get away with not even doing that. If you are buying 100gbe nics you are also buying server class (xeon, rome) processors that can keep up.
Source: I operate a CDN with thousands of 100Gbe nics with a stock upstream LTS kernel, and minimal kernel tuning.
Wrong. You’re like an order of magnitude wrong - rofl there is no fucking way the stock Linux kernel will even do 40Mpps at 64 byte packets. It chokes way before that. This is partly why things like DPDK exist.
16 iperf threads...sending at what packet size? Do you understand the notion of line rate? 85Gbps at 1500B is only 7MPPS, which is half of 10Gbps at line rate.
You're both right. It's an older term from the early 90's when a router's selling point was being able to hit "line rate" with the smallest possible packet size. Example, how many tiny datagrams can you forward to fill that link. Back then, people were still doing lots of routing on general purpose machines and Cisco/Juniper were just starting to get into the high performance game.
These days line-rate just means sending enough traffic to fill the link at whatever rate you want. That's generally good enough for server folk since they just want to get you the cat pics ASAP.
That's not good enough for people running transit networks, since they care more about packets per second performance. Sending huge amounts of data is easy for them; what they really care about is PPS.
Aside, the next generations of router NPU's are trash in terms of PPS performance. I take that back, they're not trash. They're the trash in the dumpsterfire. That's how bad they are. We're fairly screwed there.
My guess is GoDaddy was looking at increased PPS performance either for DNS or maybe building their own DDoS mitigation framework (Arbor gear is pricey).
I've never heard this weird qualification for the definition of "line rate" that it somehow requires minimum packet size, so I looked it up. The first three sources for a quoted big-g search all imply or directly state that it's the same as bandwidth:
Line rate does imply pps at the smallest sized frames in the context of networking equipment performance. Vendors use it extensively in their docs.
64B is the minimum frame size in Ethernet, including interframe gap and preamble its 84B on the wire. It is the same with Ethernet, Gigabit Ethernet and even 100Gbit Ethernet, that source is not correct.
No line rate does not "imply pps at the smallest sized frames."
Network hardware always quote PPS using the smallest sizes. And this makes sense for things like route and switch processors. Perhaps you are confusing that.
You should reread your link a little more carefully. From your link:
">However it is also important to make sure that the device has the capacity or the ability to switch/route as many packets as required to achieve wire rate performance."
The key phrase there is "as required." Almost nobody needs to sustain forwarding Ethernet frames with empty TCP segments or empty UDP datagrams in them. In fact many vendors will spec for an average size. Since packet size x PPS will give you your throughput, if the average packet size is larger you need much less PPS to achieve line rate.
Line rate doesn't imply small packets. But most userspace benchmarking uses 64B packets. That being said, the "imix" packet size, which is supposed to represent internet traffic, is around 500B.
There are numerous imix definitions floating around now, it really depends on who you ask. The more realistic ones define a pattern of different sized packets. And performance varies greatly depending on which imix flavour you use, which probably goes back to the earlier poster's 'dumpster fire' comment (although I don't know exactly what NPU generation they are referring to).
No, PPS and bandwidth are two distinct metrics. Although there can be a linear relationship between them line rate does not always imply a payload equal or close to the interface MTU. You see this with network vendors and some of the higher end gear. Network vendors always give specs for their gear by quoting both metrics. And there is some gear that that is capable of doing line rates even with small packets, example the Cisco ASR 1000:
"For example, because one of the newest Cisco routers, the Cisco ASR 1000 Series Router, is capable of forwarding packets at up to 16 Mp/s with services enabled, it can support the processing of the equivalent of 10 Gb/s of traffic at line rate, with services, even for small packets."[1]
The point of discussion here is that the Linux kernel struggles to do line rate 10Gbps. This was misinterpreted as "the Linux kernel struggles to do 10Gbps".
"""
While RPS steers packets solely based on hash, and thus generally
provides good load distribution, it does not take into account
application locality. This is accomplished by Receive Flow Steering
(RFS). The goal of RFS is to increase datacache hitrate by steering
kernel processing of packets to the CPU where the application thread
consuming the packet is running. RFS relies on the same RPS mechanisms
to enqueue packets onto the backlog of another CPU and to wake up that
CPU.
"""
I have zero problem doing 25G and 40G ethernet at the Linux kernel in RHEL7 (I run Kubernetes clusters on them in fact). Non-ethernet (Infiniband) 100G+ line rate is also totally doable, but IB is an entirely different beast. I agree with you it isn't going to work in the long term, but it should be fine in the medium term. The long term Linux change is likely to just rewrite more and more of the existing filtering code ontop of the eBPF vm, which is fast as holy hell and is replacing large swaths of existing filtering mechanisms for a reason.
What if you have no application and you're purely forwarding traffic? That's what I'm measuring here (and that's actually typically faster than passing packets up to user space apps to consume through syscalls).
25G line rate is 37MPPS - are you saying you have zero problem forwarding at that line rate? I'd be very surprised if that's being done in the kernel. I'd be more surprised if you said that you're consuming bytes from the network at that speed with user space apps and no kernel bypass.
For routing, most people prefer to use... routers. Linux isn't great for forwarding and IMO that's fine. As the article notes, there's always VPP when you need to do software routing for some reason.
Yes, and the current user space approaches are spinloop based so the machine isn't really general purpose at that point. You might as well just call it a router if all your cores are spinning on RSS/etc queues.
With one thread, maybe. But using multiple threads it’s not even that hard. I’ve hit 100Gbps using stock TCP stack and ~10 threads in Rusts Hyper without much trouble.
Another example, you can saturate 100Gbps with just 4 iPerf3 processes.
This is all not to mention locks, or that there are competing functions running in most distros (turn off irqbalance completely and watch your forwarding rate increase).
The low hanging fruit seems to have been picked as well - NAPI polling, interrupt coalescing, RSS + multique NICs + SMP, etc, are already out there, and we're still struggling to do 10G line rate in the Kernel...and data centers are moving quickly to 25/100G.
[1] Edited for terrible math - 10Gbps at line rate is 67ns per packet, 100Gbps is 6.7ns