
The CPU Cost of Networking on a Linux Host - sohkamyung
https://people.kernel.org/dsahern/the-cpu-cost-of-networking-on-a-host
======
ra1n85
For many years, the most popular routing platforms (i.e., boxes built by
Cisco) performed IP packet forwarding and management functions on the same
processor (often a RISC architecture). In cases where packet rate was high, it
was possible for devices to become unresponsive or lose critical protocols
responsible for sharing routing information.

In the last 15 years there has been a hard move away from these architectures.
Almost no packets are forwarded by the same processors running management and
control plane functions anymore. This is mainly because the required traffic
rates today need dedicated silicon purpose built for the task (the Broadcom
Tomahawk3 can do 12.8 Terabits/sec above a relatively small packet size).

I don't know how things will shake out for the Linux world and x86 packet
forwarding given the trend and lack of real performance in the kernel. Right
now, your best bet when it comes to Linux and high network throughput/packet
processing requirements is to just bypass the kernel entirely with DPDK, a
"smart" NIC, or XDP.

~~~
EvanAnderson
I wonder if there's an economic argument, at useful scales, to using FPGA's in
general purpose servers to accelerate network performance. The purpose-built
ASICs would win on cost-per-unit every time, but the FPGA would have some
adaptability to new protocols or algorithms that the ASIC wouldn't.

~~~
karatekidd32v
There's some cool tech going on at barefoot
([https://barefootnetworks.com/](https://barefootnetworks.com/))

I don't know much about the technical details, but the pitch I've heard is
that it gives you ASIC level performance with more flexibility to reprogram
the chip (not full FPGA).

~~~
ra1n85
Yeah, it's an interesting approach. They're basically allowing you to define
packet processing with P4 on their Tofino family of chipsets:
[https://p4.org/](https://p4.org/)

That said, there's only so much you can do in a chip before considerable
tradeoffs are going to be made. They're not going to offer the same level of
flexibility you get out of a general purpose CPU, but may not have same the
restrictions of most fixed pipeline chips - their product sits somewhere in
the middle. Also, P4 seems to sit in a space complex enough to make it
unreasonable for most network shops - it's not for your average enterprise or
service provider network.

------
dahfizz
This is one of the reasons I have a strong aversion to "cloud" technologies
like docker and kubernetes. You take networking, something with decades of
development and hardware support, and you put it all in the CPU.

To be clear, Linux has a very robust networking stack. But it will never come
close to the natting and routing performance of an actual router.

And so we develop things like DPDK to spend _even more_ CPU just to keep
things usable, but it still feels like a big step backwards.

A typical k8s deployment runs in containers that are in VMs. So each packet
you want to send to or from a container needs to touch a cpu and traverse a
networking stack _six times_. That's dumb.

~~~
dan_quixote
> That's dumb.

Is it? It's certainly inefficient compared to dedicated hardware. But so is
anything relying on a CPU - we could just use ASICs for everything. But then
every logical change requires weeks/months/years of development and
manufacturing.

The goal of k8s, VMs, etc is flexibility. I can set up a 100-node k8s cluster
with less-than-perfectly-efficient networking stack in mere minutes. Good luck
matching that with dedicated hardware.

~~~
dahfizz
> But so is anything relying on a CPU - we could just use ASICs for
> everything. But then every logical change requires weeks/months/years of
> development and manufacturing.

Right, except the asics already exist and you're actively choosing the less
efficient, more expensive option.

> The goal of k8s, VMs, etc is flexibility.

I don't think this is necessarily bad, as long as you understand the tradeoff.
You're choosing a fundamentally slower architecture to make management easier.
It's a choice of prioritizing the developer experience over the user
experience.

~~~
oneplane
In this case it's not even user experience vs. developer experience as you can
have both; it's just the cost for performance gets higher as the efficiency
decreases.

On the other hand: the cost for development goes down if you don't need to pay
for extra steps taken by an extra person. If you take a 10-step process that
is run by 5 people and reduce it to a 5-step process run by 3, you have 2 more
people to do other stuff, or roles that you don't have to create/fill in the
first place.

------
marsdepinski
Good short article. CPU power saving has an effect on interrupt handling and
becomes a factor at high packet rates. Turning off p-states will improve
performance.

------
z3t4
What happens when you "netperf TCP_STREAM from 2 sources"?

