The whole point of BPF is that you can't run this code in a separate context _and_ have the code work. The original use case was packet filtering in interrupts because waiting for user space took too long.
The whole point of DPDK is to avoid the context switches by dedicating the adapter to a single application, and letting the application take over management. Once you have multiplexing, you're right back to where you started. There has been a secure multiplexing scheme based on packet buffers... the Berkeley Packet Filter (BPF), ie. what this paper is talking about as prior art.
No, you can simply have memory mapped ring buffers between processes and all of the stuff not required for the specific application cut away. You don't need to have any more context switches that way.
No need to have a traditional socket API while still being able to do access it from multiple applications.
I have no interest to create such beast, but I'd be truly shocked if it couldn't at least beat a generic kernel based stack.
It'll lose some performance compared to the single application approach, but there might still be a niche for this kind of way.
Sure, you'll need to copy memory, but the data should be almost always in L3 cache anyways.
My intention was however to only have one copy (other than initial DMA to (hopefully) directly L3 cache).
Yeah, it hurts if it's 400 Gbps ethernet. L3 bandwidth is like 50-90 GB/s. X86 just doesn't have enough bandwidth even to the caches! Better have pretty high CPU frequency, as I think L3 gets faster in proportion (but not completely sure). 200 Gbps should be somewhat fine.
Also pretty bad if the data travels over a QPI link... better have both processes in same NUMA region. And that ethernet PCI-e adapter... :-)
Regardless, I do think it'd still work way faster than anything a reasonably general kernel stack could do. Might be a reasonable compromise when process & permission isolation is required.
Bpf can handle 60 million packets per second, adding a user/kernel context switch will kill that by a factor of 1000x. So while your code may not care about it, there are definitely low latency applications where milliseconds equate to dollars.
Tell that the various libc maintainers. They are blocking the C11 Annex K (safe bounds checks) for over a decade now.
And talking about performance With good compilers my secure memcpy_s is actually faster than the glibc or BSD libc memcpy, with compile-time constexpr.
Then you are illusional. None of their safety guarantees are real. Memory, type, concurrency, none. But talking to them fall on deaf ears. Its called hype driven development and very popular amongst HN folks. There exist plenty of real safe languages though.