Hacker News new | past | comments | ask | show | jobs | submit login

The current approach is fundamentally not going to work in the long term. 100Gbps at line rate means single digits[1] of nanoseconds between frames. At that frequency, a cache miss is pretty bad.

This is all not to mention locks, or that there are competing functions running in most distros (turn off irqbalance completely and watch your forwarding rate increase).

The low hanging fruit seems to have been picked as well - NAPI polling, interrupt coalescing, RSS + multique NICs + SMP, etc, are already out there, and we're still struggling to do 10G line rate in the Kernel...and data centers are moving quickly to 25/100G.

[1] Edited for terrible math - 10Gbps at line rate is 67ns per packet, 100Gbps is 6.7ns




We are not struggling to do line rate 10G in the kernel. Modern 100Gbe nics (mellanox, solarflare) will happily do line rate with stock upstream kernel for a while now (definitely since 4.x) you only need to tune your irq balancing, and you can probably get away with not even doing that. If you are buying 100gbe nics you are also buying server class (xeon, rome) processors that can keep up.

Source: I operate a CDN with thousands of 100Gbe nics with a stock upstream LTS kernel, and minimal kernel tuning.


You're saying you can forward 100Gbps at line rate (148MPPS) through a stock kernel?


You can get within a few percentage points, yes

I just tested this with two hosts with 4.14.127 upstream kernel and upstream mlx5 driver, and mellanox connectx-5 card. Using 16 iperf threads

[SUM] 0.0-10.0 sec 85.1 Gbits/sec

That's pretty close with no tuning, and well beyond 10gb/s we mentioned earlier


Wrong. You’re like an order of magnitude wrong - rofl there is no fucking way the stock Linux kernel will even do 40Mpps at 64 byte packets. It chokes way before that. This is partly why things like DPDK exist.


16 iperf threads...sending at what packet size? Do you understand the notion of line rate? 85Gbps at 1500B is only 7MPPS, which is half of 10Gbps at line rate.


Where are you getting your definitions? I have never seen "line rate" used to refer to packets per second.



"How do you fill a 100GBps pipe with small packets?"

"achieve 10 Gbps line rate at 60B frames"

"reaching line rate on all packet sizes"

Line rate is just bits per second. You have to add in a qualifier about packet size before you're talking about packets per second.


You're both right. It's an older term from the early 90's when a router's selling point was being able to hit "line rate" with the smallest possible packet size. Example, how many tiny datagrams can you forward to fill that link. Back then, people were still doing lots of routing on general purpose machines and Cisco/Juniper were just starting to get into the high performance game.

These days line-rate just means sending enough traffic to fill the link at whatever rate you want. That's generally good enough for server folk since they just want to get you the cat pics ASAP.

That's not good enough for people running transit networks, since they care more about packets per second performance. Sending huge amounts of data is easy for them; what they really care about is PPS.

Aside, the next generations of router NPU's are trash in terms of PPS performance. I take that back, they're not trash. They're the trash in the dumpsterfire. That's how bad they are. We're fairly screwed there.

My guess is GoDaddy was looking at increased PPS performance either for DNS or maybe building their own DDoS mitigation framework (Arbor gear is pricey).


Nope, I'm sorry you're not quite getting it here. Minimum Ethernet frame is 84B on the wire - it's simple enough from there.


I've never heard this weird qualification for the definition of "line rate" that it somehow requires minimum packet size, so I looked it up. The first three sources for a quoted big-g search all imply or directly state that it's the same as bandwidth:

https://blog.ipspace.net/2009/03/line-rate-and-bit-rate.html

https://www.reddit.com/r/networking/comments/4tk2to/bandwidt...

https://www.fmad.io/blog-what-is-10g-line-rate.html

Also, for gigabit networks, ethernet packets are padded to at least 512 bytes because of a bigger slot size: https://www.cse.wustl.edu/~jain/cis788-97/ftp/gigabit_ethern...


Line rate does imply pps at the smallest sized frames in the context of networking equipment performance. Vendors use it extensively in their docs.

64B is the minimum frame size in Ethernet, including interframe gap and preamble its 84B on the wire. It is the same with Ethernet, Gigabit Ethernet and even 100Gbit Ethernet, that source is not correct.

https://kb.juniper.net/InfoCenter/index?page=content&id=KB14...


No line rate does not "imply pps at the smallest sized frames."

Network hardware always quote PPS using the smallest sizes. And this makes sense for things like route and switch processors. Perhaps you are confusing that.

You should reread your link a little more carefully. From your link:

">However it is also important to make sure that the device has the capacity or the ability to switch/route as many packets as required to achieve wire rate performance."

The key phrase there is "as required." Almost nobody needs to sustain forwarding Ethernet frames with empty TCP segments or empty UDP datagrams in them. In fact many vendors will spec for an average size. Since packet size x PPS will give you your throughput, if the average packet size is larger you need much less PPS to achieve line rate.


Line rate doesn't imply small packets. But most userspace benchmarking uses 64B packets. That being said, the "imix" packet size, which is supposed to represent internet traffic, is around 500B.


There are numerous imix definitions floating around now, it really depends on who you ask. The more realistic ones define a pattern of different sized packets. And performance varies greatly depending on which imix flavour you use, which probably goes back to the earlier poster's 'dumpster fire' comment (although I don't know exactly what NPU generation they are referring to).


Just like the sibling, I admit I’ve never heard this definition of line rate...


No, PPS and bandwidth are two distinct metrics. Although there can be a linear relationship between them line rate does not always imply a payload equal or close to the interface MTU. You see this with network vendors and some of the higher end gear. Network vendors always give specs for their gear by quoting both metrics. And there is some gear that that is capable of doing line rates even with small packets, example the Cisco ASR 1000:

"For example, because one of the newest Cisco routers, the Cisco ASR 1000 Series Router, is capable of forwarding packets at up to 16 Mp/s with services enabled, it can support the processing of the equivalent of 10 Gb/s of traffic at line rate, with services, even for small packets."[1]

[1] https://tools.cisco.com/security/center/resources/network_pe...


Does line rate imply smallest packets? For CDN style use cases, you want to use the whole pipe, and it's going to be mostly larger packets.


Yes.

The point of discussion here is that the Linux kernel struggles to do line rate 10Gbps. This was misinterpreted as "the Linux kernel struggles to do 10Gbps".


You’re sending 1500 byte MTU (1538 bytes on the wire) or maybe larger (9000 byte MTU) packets.

1538 bytes is 12,304 bits. 10,000,000,000 bits/sec / 12,304 bits/packet is 812,744 packets per second.

Now try it with 64 byte packets, which are 84 bytes on the wire.

14,880,952 packets per second.

And this is 10gbps.


You briefly mentioned Receive Side Steering (RSS) with multiqueue network cards, but didn't mention Receive Flow Steering.

From the kernel docs: https://www.kernel.org/doc/Documentation/networking/scaling....

""" While RPS steers packets solely based on hash, and thus generally provides good load distribution, it does not take into account application locality. This is accomplished by Receive Flow Steering (RFS). The goal of RFS is to increase datacache hitrate by steering kernel processing of packets to the CPU where the application thread consuming the packet is running. RFS relies on the same RPS mechanisms to enqueue packets onto the backlog of another CPU and to wake up that CPU. """

I have zero problem doing 25G and 40G ethernet at the Linux kernel in RHEL7 (I run Kubernetes clusters on them in fact). Non-ethernet (Infiniband) 100G+ line rate is also totally doable, but IB is an entirely different beast. I agree with you it isn't going to work in the long term, but it should be fine in the medium term. The long term Linux change is likely to just rewrite more and more of the existing filtering code ontop of the eBPF vm, which is fast as holy hell and is replacing large swaths of existing filtering mechanisms for a reason.


What if you have no application and you're purely forwarding traffic? That's what I'm measuring here (and that's actually typically faster than passing packets up to user space apps to consume through syscalls).

25G line rate is 37MPPS - are you saying you have zero problem forwarding at that line rate? I'd be very surprised if that's being done in the kernel. I'd be more surprised if you said that you're consuming bytes from the network at that speed with user space apps and no kernel bypass.

XDP (the forwarding approach built on top of eBPF) is limited to ~20MPPS, as well: https://www.netronome.com/blog/bpf-ebpf-xdp-and-bpfilter-wha...


For routing, most people prefer to use... routers. Linux isn't great for forwarding and IMO that's fine. As the article notes, there's always VPP when you need to do software routing for some reason.


Yes, and the current user space approaches are spinloop based so the machine isn't really general purpose at that point. You might as well just call it a router if all your cores are spinning on RSS/etc queues.


With one thread, maybe. But using multiple threads it’s not even that hard. I’ve hit 100Gbps using stock TCP stack and ~10 threads in Rusts Hyper without much trouble.

Another example, you can saturate 100Gbps with just 4 iPerf3 processes.




Applications are open for YC Summer 2023

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: