I have looked at Dragonfly with interest, due to how they manage VM. I really like how they shard the page queues to be per-cpu. With that said, a lot of work has been done by jeffr in the last 6 months to a year to improve NUMA performance (which we partially sponsored) which has dramatically improved VM scalability even on single socket boxes, and which has allowed us to remove a lot of our local hacks.
I considered giving Dragonfly a try in the past, but just never had the time. There would be a depressing amount of work before we could even consider Dragonfly, mostly centered around our async sendfile (which is now upstream in FreeBSD), our TCP changes: BBR (not yet upstream), RACK (now upstream) and TCP pacing (now upstream). Not to mention unmapped mbufs (not yet upstream) and kernel TLS (also not yet upstream). Also, the last time I looked, Dragonfly did not have drivers for Chelsio or Mellanox 10/25/40/100GbE NICs. Without even just one of these things, performance will be so bad that any comparison testing would be meaningless.
Just a thought, the Linux/BSD community might hugely benefit if someone with your deep knowledge released a set of perf test scripts that anyone can run locally to regression test network perf. That way, the OSS community can integrate those perf test scripts into their commit/regression test pipeline.
It would be great to have this kind of test, but often the regression test is going to be run the system on production load. Benchmarks have a way of testing something, but not quite what you need to test. The interactions between the whole system are important.
For the Netflix boxes, they're pushing 40gbps+, not a lot of the community is going to be to be able to test that, unless they have fairly expensive networks laying around.
Oh mine! I record there was an article about pushing 100Gb/s not long ago and hitting memory bandwidth limit. Now it is 150Gb/s already ? Are you guys going to try something crazy like 400Gb/s ? At this rate Netflix could start selling their Appliance as another business.
> BTW, 40Gb/s was so 2015. I have a box serving at 156Gb/s :)
That's right, I wanted to put 100+, but I wasn't totally sure. I stopped counting when you got way beyond the 20G connectivity on the servers I manage.
For what it's worth: Sepherosa Ziehau spent some years serving out huge streams of video in a company similar to Netflix, in China. He used DragonFly. So, it may not have the same changes you are listing, but it may be closer than you think.
I considered giving Dragonfly a try in the past, but just never had the time. There would be a depressing amount of work before we could even consider Dragonfly, mostly centered around our async sendfile (which is now upstream in FreeBSD), our TCP changes: BBR (not yet upstream), RACK (now upstream) and TCP pacing (now upstream). Not to mention unmapped mbufs (not yet upstream) and kernel TLS (also not yet upstream). Also, the last time I looked, Dragonfly did not have drivers for Chelsio or Mellanox 10/25/40/100GbE NICs. Without even just one of these things, performance will be so bad that any comparison testing would be meaningless.