Hacker News new | past | comments | ask | show | jobs | submit login
Introduction to Linux interfaces for virtual networking (2018) (redhat.com)
195 points by teleforce on Sept 21, 2023 | hide | past | favorite | 13 comments



I think it's worth also mentioning SR-IOV.

Basically, a single hardware device like a network card pretends to be a whole bunch (say 16) virtual devices. Each device can be passed as a PCIe device to a guest VM, and will be handled inside as a hardware device. So your Windows VM will need the Broadcom driver or whatnot, rather than using the VirtIO one.

Why do this? Part because it turns out that putting your VM host's hardware interface into a Linux software bridge disables part of the hardware acceleration. This can actually make it so that you can't reach the full bandwidth of the device. On slower CPUs this may mean you can't get to 10 Gbps.

Part because there's overhead in the VM transition and this also greatly reduces this.

I also like that it doesn't need you to fiddle with the network configuration on the host.

It's well supported, including on some consumer motherboards but you have to do a bunch of fiddling in the BIOS config to enable it.


Christophe Massiot had a great talk at FOSDEM last year about the pros and cons of different network options including SR-IOV in a VM environment, especially the challenges of multicast

https://archive.fosdem.org/2023/schedule/event/om_virt/


Oh my god, thank you. I've been trying to figure out why my VM to VM bandwidth is capped at 30Gbit. I'm using multi-threaded iperf to benchmark, so it doesn't seem to be a data generation or consumption bottleneck. I'm going to have to do a bit more experimenting.

If both VMs are on the same host, is there any way to essentially achieve RDMA? VM1 says to VM2, "It's in memory at this location", and VM2 just reads directly from that memory location without a copy by the CPU?

I'm no expert, obviously, but I fail to see why VM to VM memory operations should be slower than RAM sans some latency increase due to setting up the operation.


There is something like this for QEMU on Linux hosts. It's called "Inter VM SHared MEMory" (IVSHMEM) [1].

Hyper-V has something akin to this that they call "Hyper-V sockets" [2]. But it seems it only works between guest and host.

[1] https://www.qemu.org/docs/master/system/devices/ivshmem.html

[2] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-...


I think it should be doable, because my BIOS has an option to enable/disable RDMA under SR-IOV. I've not tried messing with it yet though.


Where one can learn more about this (using books)? For instance, if I would like to learn more about processes and TLBs and context switching, I know I can learn about it from the Tanenbaum's book, from the OSTEP book, from the "dinosour book", etc. But I have no idea which book provides the fundamentals about Linux interfaces for virtual networking.


This reminds me of reading "The Linux Programming Interface" (TLPI) [1]. It's very informative like the "The Unix Programming Environment" (TUPE), but focuses only on Linux without a lot of caveats about other Unix. Although the latest version was from 2010, now seems a bit old.

[1]: https://man7.org/tlpi/


bridge: I wish there was decent documentation on how to configure & use Linux's vlan-aware bridge functionality ('bridge vlan' command). I understand it means that you don't have to create separate vlan interfaces but I've never found decent documentation on how to configure it.

macvlan: is VEPA mode still a thing that people use, or did it not take off (in terms of switches that support it--last time I looked I didn't find anything newer than about 10 years old that talked about VEPA but maybe I suck at searching?)

macvlan: 'bridge' sounds really convenient, if you try it you'll find that the host can't communicate with macvtap interfaces.

macvtap: does this suffer from the same limitation as macvlan in 'bridge' mode (host can't communicate with mactvap interfaces?


I found networking/switchdev.txt very approachable. It documents all those different bridge+vlan (aware/unaware) configs you can do: https://docs.kernel.org/networking/switchdev.html

In particular I've learned from that doc that there's special handling for putting a vlan device on top of a bridge (br0.123) even if the bridge is vlan unaware.

DSA might also be relevant if you're working with hardware that supports it: https://docs.kernel.org/networking/dsa/dsa.html


It looks like it never took off[1]; I was working on this exact topic this week and just went with multiple vlan interfaces on the host and bridged in container interfaces via multus. Would love to know if there's a better practice floating around these days.

[1]: https://blog.ipspace.net/2012/02/edge-virtual-bridging-8021q...


Never heard about netdevsim before, not too much documentation shows up in a quick search though. Seems like a tool for control plane solution development in the absence of a nic / driver that supports hardware offloading?


Very helpful and interesting guide.

OpenWRT (what I use) also allows such interfacing using config files. It’s Linux anyway though.


Nice. I feel like I need to read something foundational and unshifting like this to better grok cloud networks.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: