Hacker News new | past | comments | ask | show | jobs | submit login
XDP for game programmers (mas-bandwidth.com)
109 points by gafferongames 11 months ago | hide | past | favorite | 88 comments



I'm calling eBPF "kernel shaders" to confuse graphics, os and gpgpu people all at the same time.


Decade and a half ago there was PacketShader, which used usermode networking+GPUs to do packet routing. It was a thrilling time. I thought this kind of effort to integrate off-the-shelf hardware and open source software-defined-networking (SDN) was only going to build and amplify. We do have some great SDN but remains a fairly niche world that stays largely behind the scenes. https://shader.kaist.edu/packetshader/index.html

I wish someone would take up this effort again. It'd be awesome to see VPP or someone target offload to GPUs again. It feels like there's a ton of optimization we could do today based around PCI-P2P, where the network card could DMA direct to the GPU and back out without having to transit main-memory/the CPU at all; lower latency & very efficient. It's a long leap & long hope, but I very much dream that CXL eventually brings us closer to that "disaggregated rack" model where a less host-based fabric starts disrupting architecture, create more deeply connected systems.

That said, just dropping down an fpga right on the nic is probably/definitely a smarter move. Seems like a bunch of hyperscaler do this. Unclear how much traction Marvell/Nvidia get from BlueField being on their boxes but it's there. Actually using the fpga is hard of course. Xilinx/AMD have a track record of kicking out some open source projects that seem interesting but don't seem to have any follow through. Nanotube being an XDP offload engine seemed brilliant, like a sure win. https://github.com/Xilinx/nanotube and https://github.com/Xilinx/open-nic .


What about Nvidia / Mellanox / Bluefield?

It looks like they have some demo code doing something like that. https://docs.nvidia.com/doca/archive/doca-v2.2.1/gpu-packet-...

What kind of workloads do you think would benefit from GPU processing?


I've been exactly thinking about it this way for a long time. Actually once we're able to push computation down into our disk drives, I wouldn't be surprised if these "Disk Shaders" will be written in eBPF.


It's already a thing on mainframes, disk shaders are called channel programs: https://en.m.wikipedia.org/wiki/Channel_I/O#Channel_program


Thank you for this beautiful rabbit hole to chase.


Totally sibling comment confirms it already exists! I hope that the 'shader' name sticks too! I find the idea of a shader has a very appropriate shape for tiny-program-embedded-in-specific-context so it seems perfect from a hacker POV!

I have a VFX(Houdini now, RSL shaders etc earlier) and openCL-dabbling and demoscene-lurking background, based on which I think I prefer 'shader' to 'kernel', that's what OpenCL calls them.. but that conflicts with the name of like, 'the OS kernel' at least somewhat..


I read that as eBNF and was very confused


This analogy works well when trying to describe how eBPF is used for network applications. The eBPF scripts are "packet shaders" - like a "pixel shaders" they are executed for every packet independently and can modify attributes and/or payload according to a certain algorithm.


The name “shader” screwed me up for so long. But once I better understood what they really are I think they’re incredibly powerful. “Kernel shader” is amazing.


I love this name, I hope it catches on


Seems to have worked on me! Well played! :)


not "koroutines"? I like "kernel shaders" though.


I'm stealing this.


I've actually written a high-performance metaverse client, one that can usefully pull half a gigabit per second and more from the Internet. So I get to see this happening. I'm looking at a highly detailed area right now, from the air, and traffic peaked around 200Mb/s. This XDP thing seems to address the wrong problem.

Actual traffic for a metaverse is mostly bulk content download. Highly interactive traffic over UDP is maybe 1MB/second, including voice. You're mostly sending positions and orientations for moving objects. Latency matters for that, but an extra few hundred microseconds won't hurt. The rest is large file transfers. Those may be from totally different servers than the ones that talk interactive UDP. There's probably a CDN involved, and you're talking to caches. Latency doesn't matter that much, but big-block bandwidth does.

Practical problems include data caps. If you go driving around a big metaverse, you can easily pull 200GB/hour from the asset servers. Don't try this on "AT&T Unlimited Extra® EL". Check your data plan.

The last thing you want is game-specific code in the kernel. That creates a whole new attack surface.


I don't know about how well this solves any game programmer's problem, but the attack surface thing --- modulo the kfunc trick --- doesn't seem real: eBPF programs are ruthlessly verified, and most valid, safe C programs aren't accepted (because the verifier can't prove every loop in them is bounded and every memory access is provably bounded). It's kind of an unlikely place to expect a vulnerability, just because the programming model is so simplistic.


> Actual traffic for a metaverse is mostly bulk content download. Highly interactive traffic over UDP is maybe 1MB/second, including voice.

Typical bandwidth for multiplayer games like FPS (Counterstrike, Apex Legends) are around 512kbps-1mbit per-second down per-client, and this is old information, newer games almost certainly use more.

It's easy to see a more high fidelity gaming experience taking 10mbit - 100mbit traffic from server to client, just increase the size and fidelity of the world. next, increase player counts and you can easily fill 10gbit/sec for a future FPS/MMO hybrid.

God save us from the egress BW costs though :)


There are only so many pixels on the screen, as Epic points out. The need for bandwidth is finite.

It will be interesting to see if Epic makes a streamed version of the Matrix Awakens demo. You can download that and build it. It's about 1TB after decompression. If they can make that work with their asset streaming system, that will settle the question of what you really need from the network.


But pixels can be arbitrarily "deep". That is to say, the amount of context that is needed to figure out the color of a pixel can grow arbitrarily large.


Yes, but if it requires more data than the screen's worth of pixels, you can just send the pixels. Pretty sure this is the "cap" that was described.


And yet if you just send the pixels, you cannot client side predict (hide latency in multiplayer games) because the pixels fix you to a specific point of view. Game streaming is not really the solution here.


Even with a zero size game client, there is only so much to stream in textures and geometry. And when you're done with that, the game world itself is trivial.

The only thing in games that is bandwidth heavy is on-demand game asset delivery, which is highly cacheable and shardable. It will need no XDP-like networking tricks in either servers or clients.


> the game world itself is trivial.

This is just absolutely not true.


Adding a possible example here to back Glen's comment. Larger render distance Minecraft servers, Bandwidth cost is un-predictable (Player movement based ex: Elytra, portals, joining) and scales squared! with the render distance.


That's what impostors are for.

In GTA V, each region has a custom-built low-rez model, and that's what you're seeing when you're more than about 200-300m away from it. Watch closely and see where the cars appear and disappear in the distance. That's the edge of the real rendering area.

I'm looking at doing this for a metaverse. In the GTA V era, those impostors were a manual job done by game devs. That needs to be automated. Rather than doing mesh reduction on large areas, I want to take pictures of each area at high resolution from multiple angles, and feed the pictures through Open Drone Map to get a 3D mesh. The result looks like this.[1] For even more distant areas, those meshes can be consolidated into larger and lower-rez mesh tiles. It's the 3D equivalent of a slippy map. The amount of data you need to send is finite regardless of the world size, because the far-away stuff has lower resolution. The sum of that series is finite. This is similar to how Google Earth works when you get close enough to see 3D.

Handling a metaverse with user-created content is a big data-wrangling problem, but the compute and network loads are finite.

[1] https://content.invisioncic.com/Mseclife/monthly_2023_12/bas...


You can't client side predict pixels.


You'll hit game engine CPU usage limits way before getting anywhere near Gbit/s in outbound game traffic on a game server, and the cost of network I/O is going to be negligible.

XDP is useful for applications that are network I/O bound. Gaming is not one of those.


i venture a guess that there are still no online games that use more than one mbps for interactive traffic and seeing fully remote gaming using tens of mbps, i don't see any justification for complexities like asset streaming.


What is a metaverse client? Do you just mean a cross platform VR app?


It's a client for https://secondlife.com/


In games the problem isn't about bandwidth but latency.


That's really not true for the type of game the OP is talking about. Think Second Life et al, where most of the content is dynamically streamed and rendered in real time


We're seeing more real metaverse-type systems. With the NFT clown car out of the way, and Meta's Horizon becoming a niche, the development efforts that were quietly underway to build real, working metaverses are starting to show results. There's M2, from Improbable, which now has a shared developer metaverse in test, for which one can sign up. There are others who have reached the demo video level, such as Readyverse, which is probably going to be an MMO rather than a real user-created metaverse. Disney and Epic are jointly making metaverse noises.

The big-world high-detail user-created metaverse problem is being worked on.


This article is plain ridiculous.

> Why? Because otherwise, the overhead of processing each packet in the kernel and passing it down to user space and back up to the kernel and out to the NIC limits the throughput you can achieve. We're talking 10gbps and above here.

_Throughpout_ is not problematic at all for the Linux network stack, even at 100gbps. What is problematic is >10gbps line rate. In other words, unless you're receiving 10gbps unshaped UDP datagrams with no payloads at line rate, the problem is non existant. Considering internet is 99% fat TCP packets, this sentence is completely absurd.

> With other kernel bypass technologies like DPDK you needed to install a second NIC to run your program or basically implement (or license) an entire TCP/IP network stack to make sure that everything works correctly under the hood

That is just wrong on so many levels.

First, DPDK allows reinjecting packets in the Linux network stack. That is called queue splitting,is done by the NIC, and can be trivially achieved using e.g. the bifurcated driver.

Second, there are plenty of available performant network stacks out there, especially considering high end NICs implement 80% of the performance sensitive parts of the stack on chip.

Last, kernel bypassing is made on _trusted private networks_, you would have to be crazy or damn well know what you're doing to bypass on publicly addressable networks, otherwise you will have a bad reality check. There are decades of security checks and counter measures baked in the Linux network stack that a game would be irresponsible to ask his players to skip.

I'm not even mentioning the ridiculous latency gains to be achieved here. Wire tapping the packet "NIC in" to userspace buffer should be in the ballpark of 3us. If you think you can do better and this latency is too much for your application, you're either day dreaming or you're not working in the video game industry.


> _Throughpout_ is not problematic at all for the Linux network stack, even at 100gbps. What is problematic is >10gbps line rate. In other words, unless you're receiving 10gbps unshaped UDP small datagrams at line rate, the problem is non existant. Considering internet is 99% fat TCP packets, this sentence is completely absurd.

But games are not 99% fat TCP packets.

Games are typically networked with UDP small datagrams sent at high rates for most recent state or inputs, with custom protocols built on top of UDP to avoid TCP head of line blocking. Packet send rates per-client can often exceed 60HZ, especially when games tie client packet send rate to the display frequency, eg. Valve and Apex Legends network models.

Now imagine you have many thousands of players and you can see that the problem does indeed exist. If not for current games, for future games and metaverse applications when we start to scale up the player counts from the typical 16, 32 or 64 players per-server instance, and try to merge something like FPS techniques with the scale of MMOs, which is actively something I'm actually doing.

XDP/eBPF is a useful set of technologies for people who develop multiplayer games with custom UDP based protocols. You'll see a lot more usage of this in the future moving forward, as player counts increase for multiplayer games, and other metaverse-like experiences.

Best wishes

- Glenn


A modern server class machine can push "100Gbps" through the entire Linux stack just fine. TCP or UDP. With standard packet sizes (e.g. 1500 bytes.) We do this where I work. Yes a long time ago 1Gbps was hard, you needed jumbo frames, then 10Gbps was hard, then 100Gbps was hard. Right now where we are seems to be you don't need to do a kernel bypass unless you're running multiple 100Gbps NICs - that's an application where I've seen DPDK used in the wild.

EDIT: Might be some benefits in terms of latency through the stack and resource usage on the machine... But I don't think 10Gbps is where the pain is for these either.


> A modern server class machine can push "100Gbps" through the entire Linux stack just fine.

Maybe we'd like to use some of that CPU to run the game instead of pushing packets.

Maybe if we can push the packets more efficiently, we save $$$.

Maybe game servers get DDoS'd and it's great to be able to quickly drop packets without any linux kernel overhead.


Maybe I would like my games to not access my kernel?

Maybe I would like my games to not bypass my firewall and VPN?

Also, eBPF is not a way to shave CPU cycles, you would use DPDK/Netmap/BSD BPF for that, it's a way to lower latency. And we're talking single digit microsecond latency, where a typical internet link has millisecond jitter.


The use case is for dedicated game servers controlled by the developer running in data centers, not clients running on player's home computers.


If we're talking server tech then why does the article claim that 10gbps is becoming retail commodity? 10,40 and 100gbps is already the baseline for servers since a decade, I have not seen a server with 1gbps NICs for anything else than remote management or off band communication in a decade.

If it's for servers then the article still doesn't make any sense. See my other comment https://news.ycombinator.com/item?id=39939876


CableLabs.com, the creators of the DOCSIS standard for cable modems are part of an industry wide push towards 10G internet for ISPs. According to Neilsen's law, we should see 10G internet become more common than 1G internet is today within 10 years.

https://www.cablelabs.com/10g https://www.nngroup.com/articles/law-of-bandwidth/


You're mixing server and client use cases. And you're looking at a gaussian distribution. And DOCSIS is a shared medium.

Most of the world doesn't even have 100M internet: https://www.speedtest.net/global-index

By the time your game can use even 1G per client, XDP won't even be helpful for 100G. (Which, depending on use case, it already isn't.)


I repeat:

According to Neilsen's law, we should see 10G internet become more common than 1G internet is today within 10 years.

That's a cool thing. Let's look forward to it, and the rising tide that lifts all boats. The world will be a much more connected place 10 years from now, and the 10G target from CableLabs.com as well as DSCP packet tagging, L4S and other technologies will make the internet a lot better than it is today.

Why the negativity. Embrace the future :)


What do you mean by wanting games not to access the kernel? You don't want them making system calls full stop?


> First, DPDK allows reinjecting packets in the Linux network stack. That is called queue splitting,is done by the NIC, and can be trivially achieved using e.g. the bifurcated driver.

My apologies, the last time I looked at DPDK was in 2016, and I don't believe this was true back then. Either way, it seems that XDP/eBPF is much easier to use. YMMV!


No harm taken, kernel bypass technologies have moved fast in the last decade.

As a side note though, I would encourage you to revisit DPDK. I've worked in the low latency space for a long time, and used pretty much every solution out there, open source or proprietary, and DPDK is one of my favorite.

This is because DPDK is not _just_ poll mode drivers, it's a a full featured SDK for low latency packet processing. You get top notch thread safe memory pools to store packets, thread safe queues, hand written intrinsics optimized hashing and CRC algos, a cooperative scheduler in case you isolate your CPUs, etc, etc.


That sounds really good. To be honest, just not having to deal with the BPF verifier would be a plus.


> Packet send rates per-client can often exceed 60HZ

You do realize the most random Python program, written by the most random programmer, running on the most random computer, could easily read 10k TCP packets per second per core?


I absolutely do. Now let's see that random Python program handle 1M connected players each sending and receiving 1384 byte UDP packets at 244HZ, while also running game logic and simulation in the background.


I feel like I'm reasonably well read into low-level TCP/IP security (also: eBPF) and I'm not sure what you mean by "decades of security checks and countermeasures baked into the Linux network stack" than an XDP kernel bypass would skip. Can you say more?


> I'm reasonably well read into low-level TCP/IP security > I'm not sure what you mean by "decades of security checks and countermeasures baked into the Linux network stack" than an XDP kernel bypass would skip. Can you say more?

If you bypass the kernel stack on an publicly addressable network, then it is your responsibility to implement and calibrate backloging, handshake recycling, SYN cookies


Most games use UDP. The article describes techniques for UDP. We already do all the stuff you talk about in custom UDP protocols.


Are you under the impression that he was talking about game clients doing this? That would be absurd, since no gamers (within epsilon of zero) use Linux, and you'd need a completely different bag of tricks on Windows.

He's fully focused on the backend.


> In other words, unless you're receiving 10gbps unshaped UDP datagrams with no payloads at line rate, the problem is non existant. Considering internet is 99% fat TCP packets, this sentence is completely absurd.

Uh, "the internet" traffic shape is a terrible model for low-latency multiplayer games traffic shape. Surely you don't think that 99% of the packets being exchanged during a session of Counter-Strike 2 are fat TCP packets, right?


Surely not, but I don't expect a video game datagram size and rate to be anywhere near line rate. To put things in perspective, we are talking ~1M datagrams per seconds on a 1gbps link, 10M dgps on a 10gbps link.

Hell, I don't even think a commodity gaming computer have enough cores to process line rate datagrams on a 1gbps link.


Think of a game server with many thousands or millions of players connected at the same time, and these players are all sending 10-100mbps of UDP packets at very high rates (60pps bidirectionally, or higher).

Kernel bypass is not going to be something that is necessary on the client (even though there is a trend towards symmetric 10G internet becoming available in the US today, and according to Neilsen's law, it should be widely available within 10 years).


Right, I have to say I misunderstood the article by thinking it was a client side proposal instead of a server one.

Still, I think bandwidth and latency should not be confused as it is on this thread.

From my experience, the Linux network stack is not bandwidth bound _at all_. I've used it extensively on 94gpbs infiniband networks without any issues. Problems arise only when you close up to line rate, meaning very small payload and low packet latency, which I think (though I'm not a game developer by any mean) is unlikely to happen in game use cases.

As for the technological side of things, I have my reserves that open source solutions like XDP, Netmap, or BSD BPF are able to compete with proprietary solutions like DPDK (which IMHO is more complete) or SF OnLoad - which is just an LD_PRELOAD on top of your existing BSD socket server.

> according to Neilsen's law

If were being honest, this is more of a fun fact than a law... We're talking about a single guy fitting a linear regression of a dozen points of his internet connection speed over time.


XDP (with AF_XDP) is effectively a superset of what DPDK does. But it's a weird argument regardless --- both XDP and DPDK are mechanisms to bypass the Linux kernel skb stack. If you're right about that stack, DPDK is poorly motivated too.


60kpps is a joke to the network stack, and by the time you'd reach 60Mpps, your problem is doing something useful with it, rather than getting it pulled off the NIC. If your game engine can handle even 10k players on one system, it'd be the gold plated unicorn with wings of game engines. Ask CCP (EVE online), they know a thing or two about this.


I'm a professional multiplayer game programmer but thanks for lecturing me about how game are networked.


shrug most people on HN are professional engineers of some type or another. If you want to make this about "creds", you're free to look up mine.


If I were to summarize most engineer comment threads on HN that go off the rails, I would characterize them as experts in one field falsely believing that their expertise in one field qualifies them to make statements about another, perhaps adjacent or even unrelated field.

To the experts in the field in question these comments are usually incorrect and often highly amusing, but they are stated with such gravitas and certainty by the poster, that a casual reader might mistake them for the truth.


Glad we can agree on something. Cheers!


I didn't find anything useful related to networked games when looking for "David Lamparter". Now, try yourself quick look for "Glenn Fiedler".

Also: https://www.mobygames.com/person/68015/glenn-fiedler/

Note that MobyGames is wildly incomplete.


It's not ridiculous but it is quite dated - there's better approaches for packet processing at scale nowadays.


> Within 10 years, everybody is going to have 10gbps internet. How will this change how games are made? How are we going to use all this bandwidth? 5v5 shooters just aren't going to cut it anymore.

I have no doubt that more bandwidth will change how games are made, but 5v5 shooters (and pretty much all existing multiplayer styles) are here to stay for a lot longer than that, in some form or another.


> Within 10 years, everybody is going to have 10gbps internet

LOL, no way in America is that going to be true.


Meanwhile in the rest of the world it is ALSO not true.

That opening blanket statement turned me off so hard I couldn't get past that paragraph to read the rest of the article.

I am struggling to justify 10Gb LAN in my house. Between purchase costs and energy requirements (some 10Gb arrangements seem to be crazy inefficient). And I like "more, more, faster, faster" in my tech.

How does this scale (and cost) across even a significant section of human society?!? According to the article's prediction things must get crazy soon.



An interesting article.

Given it applies to "A high-end user's connection speed" I am dubious as to the applicability to everyone else - say the bottom 20% of society for example - and so I am most doubtful of the article's "everyone will have 10Gbit" claim.

Maybe "everyone" has a stricter definition than I think.

But as someone who has literally grown up alongside PC's - your link gives confidence to look forward to the crazy ahead. :-)



Is this supposed to be a joke? I cant even reliably get 1G in most places.


I can't get 0.05G reliably in most suburbs, even when I do the worst-case ping in a typical 10-minute window is 500+ms, and there are still many places in the US where the best internet I can find is under 0.001G.


I live on the outskirts of a city in the middle of England and I /very/ much doubt I'll see 10gig in 10 years


What if that kind of bandwidth becomes available over cell networks?


I've restricted my phone to only using 4G because I've had a lot of problems with 5G feeling a lot slower than 4G. I've not done any quantitative testing on it but it just generally feels a lot slower, even when I'm looking right at the 5G mast, and the 4G mast is behind a few buildings. If whatever problem it's having can be fixed, and it can be priced similarly to broadband, then maybe


You don't need to create your own kernel modules to define custom packet processing logic. You can map an RX/TX ring into userspace memory by invoking setsockopt on your XDP socket with the XDP_RX_RING, XDP_TX_RING and XDP_UMEM_REG options. Your XDP BPF program can then choose to redirect an incoming packet to the RX ring (see BPF_MAP_TYPE_XSKMAP).


What I think you're describing here is AF_XDP (just for a simpler search term).


I was hoping a few code snippets.

So these mini kernel programs are written in a subset of C?



The real way to do this is to get user space networking.

Also event-based protocols with deterministic physics.

Last but not least, you need to use a language that can atomically share memory between threads; C (with Arrays of 64 byte Structs) or Java.


Hello from Canberra Glenn!


[flagged]


The eBPF code runs on the game server not the client. Cool story tho


Doesn't mean you should risk a takeover of all your game databases because you decided to bypass every single security measurement linux has in place already.

I'd argue it's even worse, and network programming wise I'd argue that every custom implementation of TCP/UDP was broken in the past and lead to a lot of remote exploits, and eventually was abandoned due to maintenance costs.

Just think about the shitshow of the source engine, for example. RCE'ing in the wild for what, 14+ years now? Took a long while and a lot of steam runtime sandboxing code to fix the decision to use a custom unsafe networking code.


What takeover risk? What security measurements is eBPF bypassing? How are you imagining getting RCE from a compiled eBPF packet processor?


If you want to use kernel bypass on the server, not the client, then XDP is even more useless. Because on the server world, you have access to professional NICs that you cannot expect clients to have.

It's been decades now that people routinely do kernel bypass on servers using e. g. Solarflare NICs with OnLoad. You just write your networking code normally using BSD sockets API, and LD_PRELOAD the OnLoad SDK which will intercept the socket API and perform a bypass. You get 90% of the latency benefits of kernel bypass without modifying your code, which still runs everywhere.


Both Katran[1] by Facebook and Unimog[2] by Cloudflare are L4 load balancers written using XDP. It's factually incorrect to claim that XDP is useless on servers as many companies intentionally choose to design their systems around commodity hardware.

  [1]: https://engineering.fb.com/2018/05/22/open-source/open-sourcing-katran-a-scalable-network-load-balancer/
  [2]: https://blog.cloudflare.com/unimog-cloudflares-edge-load-balancer


I think you're a little bit on tilt here. "The server world" was what XDP was invented for, and it's used all over cloud providers. Watch the XDP mailing list.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: