
Streaming video on 10 Gigabit Ethernet and beyond - howsilly
http://www.bbc.co.uk/rd/blog/2015/10/streaming-video-on-10-gigabit-ethernet-and-beyond
======
mixmastamyk
Interesting, reminds me of a related question. I've looked recently for 10 gig
ethernet on a new laptop and haven't been able to find it.

I know it is overkill, its just that it has been about ten years already,
isn't it cheap enough yet? Can't a modern ssd keep up with it?

~~~
_wmd
Probably a combination of lack of need until recently (125MB/sec is more than
fine to saturate a classic disk), parts cost and power consumption.

I can't think of anything except connecting to a _fast_ SAN that would require
a 10GBE port in a laptop. Maybe something specialized for a network engineer,
but even then it's probably easier to buy dedicated equipment for line rate
port monitoring

~~~
jamesblonde
A very cheap RAID5 setup with 4 spinning disks and a filesystem like ZFS or
btrfs should get you about 3-500 MB/s, so 10 GbE is good for that setup on
nodes.

~~~
trentnelson
The only way you're getting 3-500MB/s on a 4-disk ZFS RAID5/raidz is with
very, very fast SSDs and a very, very fast CPU. Not exactly "very cheap". (The
compute and I/O overhead for raidz is significant.)

~~~
illamint
This is... not my experience at all. I have a RAIDZ2 comprised of 6 4TB
Seagate drives (the 5900RPM variety) and it can do about 600MBps read/write
with moderate CPU usage on an Intel i3. A mirrored zpool of two Samsung 850
EVO SSDs can do nearly 1GBps read/write. That's not a particularly expensive
setup.

~~~
trentnelson
600MB/s write over 6 drives in RAID-Z2 is very good on an i3.

------
maxhou
from the article:

> Each core needs to generate a few thousand data packets per second, because
> Ethernet packets typically contain up to 1500 bytes. This gives the CPU
> around 100 microseconds to process each packet.

No it doesn't, not when using TCP Segmentation Offload (TSO)

This only works for a particular use-case: sending static data using TCP, but
this is the most common use-case since a typical "video streaming server" is
actually a simple HTTP server that serves static MP4/MPEG-TS data.

for each connected client this is what happens \- nginx/apache does
sendfile(file, sock, off, <large_number>) \- kernel issue large (> 10kB) DMA
read to the file storage backend into a set of memory pages and wait for
completion \- kernel allocates/clone a small IP/TCP header (40 bytes) \-
kernel gives that small header + set of memory pages to network card, which
will segment and create those 1500 bytes packets and send them on wire

if you have a lot of RAM, the read from storage could even be skipped because
the previously read data pages are kept in the page-cache with a LRU approach.
(help if clients are requesting the same file).

you can easily saturate a 10G link with spare CPU cycles on cheap hardware
with that approach, no need to bypass anything.

------
nly
Kind of arcs back to the days when people were putting HTTP servers in kernel
space. Slightly different tac though

~~~
gonzo
but the same result, reduce copies.

~~~
nimish
Reducing context switches is probably just as important.

~~~
nly
Context switches in to kernel land are a lot cheaper on x86 than they used to
be.

------
jand
My state of knowledge leads me to think, that bypassing the kernel requires
some non-blob network drivers with which you can tinker around. Am i mistaken?

So right now, i am missing the information on what kind of NIC they were
using. Any thoughts or comments on that HN-community?

What vendor and product model would be a reasonable entry point for such
endeavours? Answers very much appreciated.

~~~
xtacy
In the past, some of my colleagues have used Intel's 82599 NICs for kernel
bypass. Their Linux driver is quite good, they have a DPDK platform for
developing user-space apps to directly access ring buffers on the NIC, and if
you do a quick search, you should be able to find examples online.

Cloudflare wrote a blog post recently about accelerated packet IO and their
post mentions the 82599 NIC: [https://blog.cloudflare.com/kernel-
bypass/](https://blog.cloudflare.com/kernel-bypass/).

~~~
gonzo
via netmap, yes.

------
p1esk
Is a single CPU core able to process 4k/50fps video stream? Or is there no
need for any processing, other than encapsulating it into data packets for
sending to the network card?

~~~
revelation
Clearly not, which is exactly why kernel bypass is such a huge red flag: it's
only possible (rather, useful) if you aren't doing anything sensible with the
data anyway, or the kernel overhead would be tiny compared to the processing.

Use the right tool for the job and don't funnel network data through your
instruction pipeline. When they realized that for memory, they called it
"DMA", and when graphics was scaling up, we created the GPU.

~~~
greglindahl
The networks used by supercomputers have have kernel bypass with DMA for more
than a decade... and they do a lot of processing on the data, too. Check out
Infiniband and Intel's Omni-Path for modern examples. Or the Cray T3E (1995),
which had an excellent network with a user-level DMA engine that only did 8-
or 64-byte transfers.

------
pcunite
Linus Tech Tips talked about the difficulties in getting 10 Gigabit working.

[https://youtu.be/D03t890dKTU](https://youtu.be/D03t890dKTU)

------
kierank
All this talk about TCP and HTTP streaming when they are actually trying to do
low-latency UDP for broadcast production.

(When you have a hammer (web technologies) everything looks like a nail I
guess)

