
Snabb Switch – A toolkit for solving novel problems in networking - justincormack
https://github.com/SnabbCo/snabbswitch/wiki
======
fasteo
The title is a bit misleading. This nice product is basically a programmatic
wireshark in Lua, that is, a packet processor, so you are getting "40 million
packets per second".

Once you do some meaningful work (say, HTTP protocol decoding), this figure
will be a lot lower.

~~~
jvermillard
For HTTP/TCP it's not helping much but it could be very interesting for the
IoT protocols like CoAP (RFC7252) which is UDP based.

Anyway I suppose the main target of this project is to help developers of
packet switching and load balancing software.

~~~
chton
If it's easy enough to use, it might be interesting for other too. To give an
example: my own project is a message processing system as a service, intended
for (among others) the massive amounts of data that come from IoT gateways or
devices. While we're not quite there yet, we do intend to be able to handle
loads that would require this kind of performance. In some applications,
millions of messages per second aren't out of the ordinary.

If this kind of library can help us do that, that could save us a lot of time
and effort. Time we could use to work on the next bottleneck :)

~~~
justincormack
It is pretty easy to use, as it is largely scripted in Lua, and the code is
small and easy to follow. It does depend what you want to do with the data
once you receive it though.

~~~
chton
Most likely, split it up and feed it to a distributed internal system that can
handle the data with slower processing than the single machine can.

~~~
justincormack
That sounds like a pretty easy application then, just rewrite some addresses
and put back out on the wire again (hardware should do checksum offload).
Probably a bit more to it but definitely worth giving Snabb a go for that.

~~~
chton
Quite a bit more to it, but it does sound like it could be a fit. We'll give
it a try when we get to the point where our regular intake methods aren't
enough :)

------
wingerlang
As usual when there is a word I recognise, I go looking for a Swedish
reference in the docs/founders/etc. And sure enough the guy is currently a
Swedish citizen.

Snabb means fast/quick in Swedish.

~~~
sanityinc
The author, Luke Gorrie, is a true hacker's hacker who built a good chunk of
Emacs' Slime environment for Common Lisp. He ended up in Sweden to be part of
the first successful Erlang-based business Bluetail, which got sold to Nortel
IIRC.

(I imagine "Snabb" is closely related to "snappy" in English.)

~~~
signa11
and for erlang folks on emacs, distel is god sent...

------
dclusin
This pretty common in the financial trading space. It's usually referred to as
kernel bypass. It's how HFT's are able to achieve < 100 microsecond tick to
trade response times.

~~~
mikemoka
this is a very interesting remark, it would be nice to find some more
explorations of this concept applied to current web servers

~~~
chton
Indeed. I'd also like to know if there are any alternatives that offer similar
performance in an OSS wrapper.

~~~
rasur
Not sure about OSS offerings, but the larger Arista switches come with FPGAs,
which I hear is used by some HFT firms to achieve blinding speeds.

~~~
chton
This is usually the problem, almost every single option I've seen is based on
dedicated hardware. This is fine for many, but not for those of us running in
cloud hosts and other general-purpose locations.

~~~
pjc50
You can't possibly have a de-layered system on virtualised cloud hardware!

Besides, dedicated hosting for a 1U or 2U rack machine is not expensive. If
you have enough traffic that it's worth building your own TCP solution, you
already have enough servers that you're spending a considerable amount of
money on.

~~~
chton
I'm aware of that, but if we could reduce dedicated hosting to a single
machine through simpler code, we can feed the information from there into a
cloud system. It helps a lot if the machine can be a general purpose one,
since that reduces hosting costs, maintenance overhead and risk.

~~~
jamiesonbecker
Definitely -- +1 to general purpose machine and doing this on normal hardware
without special NIC's. However, as soon as you dump the OS, the machine would
no longer be suitable as a shared tenancy/cloud host. It might be useful for
some sort of dedicated service offering (that would be a cool AWS feature and
allow things like Vyatta that're tied right into the kernel and SDN), but not
for general purpose cloud hosting. The hypervisor/container/OS are still
needed to enforce roles, manage resources, etc.

~~~
chton
I agree, though as bcoates indicated, cloud hosts are adapting to this demand.
While bare-metal access is indeed very unlikely to happen in a shared tenancy
environment, there will be at least some efforts towards lower-level access.

------
jeffreyrogers
Shouldn't this link directly to GitHub instead?
([https://github.com/SnabbCo/snabbswitch](https://github.com/SnabbCo/snabbswitch))

Also, just in case the author of Snabb Switch is reading this, it would be
really helpful to provide a short example program of how to use this. A brief
look over the documentation didn't show anything. (It's possibly I just missed
it. In which case it would be great if you linked to it in the README or
otherwise made it prominent).

~~~
lukego
loadgen is a good first example program. That is a load generator that can
transmit an arbitrarily large amount of traffic (hundreds of Gbps) from a
trace file using a negligible amount of CPU. This is practical and being used
every day.

[https://github.com/SnabbCo/snabbswitch/tree/master/src/desig...](https://github.com/SnabbCo/snabbswitch/tree/master/src/designs/loadgen)

Here is the driver hack that makes it possible:
[https://github.com/SnabbCo/snabbswitch/blob/master/src/apps/...](https://github.com/SnabbCo/snabbswitch/blob/master/src/apps/intel/loadgen.lua)

------
jsnell
This is a very strange article, since the quoted text appears to have nothing
at all to do with Snabb except for linking to the website. Instead it's
describing some other closed source system that's using a similar networking
layer, and that's also written in Lua.

~~~
dang
Yes, and the quote was from HN itself, which doesn't make for a new HN post.
We changed the url to that of the project.

------
lukego
Snabb NFV is the main software being built on the Snabb Switch base right now:
[http://snabb.co/nfv.html](http://snabb.co/nfv.html)

------
nnx
How does this compare to netmap?

[http://info.iet.unipi.it/~luigi/netmap/](http://info.iet.unipi.it/~luigi/netmap/)

netmap / VALE is a framework for high speed packet I/O.

Implemented as a kernel module for FreeBSD and Linux enabling memory mapped
access to network devices.

~~~
deadgrey19
On of the serious drawbacks of Netmap is that it amortizes, but does not
mitigate system call latencies. This means that although it can handle high
packet rates, latencies for processing are dominated by the system call
overhead (typically about 5us). This may not sound like much, but on a 10Gb/s
network (1 bit = 0.1ns), 5us = 50,000bits or roughly 5KB. For small packets,
this means that you have to batch very aggressively (about 150 packets at a
time) in order to keep up.

------
anarcticpuffin
I'd like to know how the 26ns wire-to-wire latency was measured. As far as I
know, just handling the PHY layer on NIC takes at least ~300ns. Likely the
authors are inferring latency by using 1/<throughput>, which is mistaken at
these levels.

~~~
deadgrey19
Typical capture cards have a resolution of 6-7ns. 300ns PHY is an out of date
value. These guys can do ~425ns wire to application (or 850ns round trip to
software) [http://exablaze.com/exanic-x4](http://exablaze.com/exanic-x4)

------
bajsejohannes
I watched the short introduction video [1], where they say that to get the
fastest throughput, you need to pull up everything from the kernel and into
user space. This is what they are doing with the network drivers. Won't you
still need to talk to the kernel when you actually want to read/write from the
network? Or is part of this that you take complete control of the NIC?

[1] Look for snabb switch here:
[https://cast.switch.ch/vod/channels/2i5k459xe3](https://cast.switch.ch/vod/channels/2i5k459xe3)

~~~
corysama
(Disclaimer: All I know of Snabb is from reading the web site and a few
discussion group posts)

Because Snabb is specialized around a few specific NICs and virtual IO
interfaces with specific features, it is able to do things like set up the
network hardware to be memory mapped. As in, after some setup, Snabb can bang
on the bits of a specific memory range and that will change bits in the
network card state without the OS or even the driver being involved. This
means you are not calling send() or select(), you are dealing directly with
the Intel NIC hardware interface.

It sounds a lot more like game console engine programming than node.js
programming. As a game console engine programmer, that's pretty interesting to
me :)

------
Ecio78
Relevant comment from author (lukego) in this thread of few months ago:
[https://news.ycombinator.com/item?id=7250505#up_7251722](https://news.ycombinator.com/item?id=7250505#up_7251722)

"I'm the Snabb Switch originator.

The project is new: I and other open source contributors are currently under
contract to build a Network Functions Virtualization platform for Deutsche
Telekom's TeraStream project [1] [2]. This is called Snabb NFV [3] and it's
going to be totally open source and integrated with OpenStack.

Currently we are doing a lot of virtualization work from the "outside" of the
VM: implementing Intel VMDq hardware acceleration and providing zero-copy
Virtio-net to the VMs. So the virtual machine will see normal Virtio-net but
we will make that operate really fast.

Inside the VMs we can either access a hardware NIC directly (via IOMMU "PCI
passthrough") or we can write a device driver for the Virtio-net device.

So, early days, first major product being built, and lots of potential both
inside and outside VMs, lots of fantastic products to build with nobody yet
building them :-)"

------
bcoates
How does Snabb Switch compare to the Linux kernel's openvswitch/openflow
combination? Are there any advantages to Snabb for forwarding/counting
appliances or is it just for packet-consuming applications?

------
haddr
Does it wrap around some sort of userspace TCP stack? I see that performance
measures are really matching these kind of systems...

~~~
justincormack
There was some work done, see [https://groups.google.com/forum/#!msg/snabb-
devel/2yF5LZ-VS1...](https://groups.google.com/forum/#!msg/snabb-devel/2yF5LZ-
VS10/DxEifZXuIpkJ) \- its not the primary focus but I am likely to do some
work on it later.

------
personZ
_The disparity between what the OS can do and what the hardware is capable of
delivering is off by a few orders of magnitude right now. It 's downright
ridiculous how much performance we're giving up for supposed "convenience"
today._

What are they talking about?

~~~
justincormack
The operating system APIs (sockets) basically cannot delivery to/from
userspace anything like what the hardware can do. The classic example is 10Gb
ethernet, where if you want to do something like packet switching, you cannot
do anything like line rate, due to cost of context switches, memory copies,
scheduling, cache misses in the stack layers and so on.

Given that servers are getting 10Gb, 40Gb or more, this is becoming more and
more of an issue. The hardware is capable of it, but the abstractions are
getting in the way.

~~~
personZ
There is an OS overhead for sure (although you can mitigate that with things
like Zero Copy solutions), but it generally imposes a couple of percentage
points of overhead. I was thrown off by the several orders of magnitude claim.

Reading it again, it sounds more holistic -- that a simplified, monolithic
application with a local, lightweight database is much higher performance
than, for instance, an n-tier, distributed database type solution.

~~~
justincormack
It is not just a few percent for applications where userspace needs to look at
each packet. There are some performance comparisons on the netmap page for
example
[http://info.iet.unipi.it/~luigi/netmap/](http://info.iet.unipi.it/~luigi/netmap/)

~~~
personZ
That comparison is of questionable merit. It compares bulk, unmetered packet
generation (in an OS, worth noting, just as a kernel module) against netsend,
which is intentionally a rate limited generator (using busy waits, as an
aside, which means that the CPU probably was at 100%...doing nothing), with
the significant overhead that entails.

 _where userspace needs to look at each packet_

That benchmark is like those naive "look how fast my web server is when it
returns just status code 200" comments. Even if we accepted that the overhead
was anywhere close to the linked, which it isn't, the moment you actually do
something with the packets those savings disappear into rounding errors.

~~~
lrizzo
netsend and other apps used in that comparison were not rate limited. Surely
that test only measures system's overheads, but i put that disclaimer very
clearly in all papers and talks (btw "which it isn't" suggests that you have
different numbers so please let us know). Surely, sometimes these per-packet
savings are irrelevant, but there are a number of use cases where this kind of
savings matter a lot. This is true for netmap, dpdk, DNA and all other
network-stack-bypass frameworks.

On passing, people typically use the term 'os-bypass' but netmap relies on the
OS for protection, synchronization, memory management etc -- all things that
the OS does well and i find no reason to reinvent.

~~~
personZ
Unless you used a specialized, customized version of netsend, yes it was rate
limited -- it by default waits on the clock interval, and warns you if you try
to call it in defiance of that. Further it is endlessly calculating the time
and calling system functions to get the time.

As a test of max throughput, it is a horrible test. I don't have the
motivation to prove it, but I would be surprised if more than 5% of the CPU
load actually went towards networking, the rest time calculations and interval
tests.

------
dang
Url changed from [http://highscalability.com/blog/2014/2/13/snabb-switch-
skip-...](http://highscalability.com/blog/2014/2/13/snabb-switch-skip-the-os-
and-get-40-million-requests-per-sec.html), which points to this.

