Adrian Chadd did most of the FreeBSD RSS work, and gave a good talk about it at BAFUG: https://www.youtube.com/watch?v=7CvIztTz-RQ
The RSS in Linux was just used for load spreading (the last I checked, I haven't used Linux much since I left Google 1.5 years ago). If this has improved, I'd love to hear about it.
Linux RFS depends on the packets being dispatched to the correct CPU for the connection by the interrupt handler running wherever the packet happened to land. This has cache & memory locality implications, especially on NUMA.
Linux aRFS lets the NIC do the steering. Unfortunately, each connection requires an interaction with the NIC to poke it into the steering table, and most NICs can't steer 100,000 connections.
So, to sum up, Linux has a lot of cool tech for steering individual connections and support for that varies greatly by NIC. Windows and FreeBSD use standard RSS to predictably steer an unlimited number of connections. For a large CDN server, the latter is more useful. However, for low-latency / high bandwidth applications, I can see the advantage to aRFS.
Linux is the platform of choice for bufferbloat research, although FreeBSD isn't far behind in adopting the results of it:
Netflix gets nearly 100Gbps from storage out the network on their FreeBSD+NGINX OCA appliances. Some details in the "Mellanox CDN Reference Architecture" whitepaper at http://www.mellanox.com/related-docs/solutions/cdn_ref_arch..... The closest equivalent I've found on Linux was a blog post on BBC streaming getting about 1/4 of the performance.
Chelsio has a demo video (with terrible music) using TCP zero copy of 100Gbps on a single TCP session, with <1% CPU usage https://www.youtube.com/watch?v=NKTApBf8Oko.
At SC16 NASA had a "Building Cost-Effective 100-Gbps Firewalls for HPC" demo, using FreeBSD and netmap: https://www.nas.nasa.gov/SC16/demos/demo9.html
Another interesting optimization we've done (and which needs to be upstreamed) is TLS sendfile. There is a tech blog about this at http://techblog.netflix.com/2016/08/protecting-netflix-viewi....
We don't have a paper yet about the latest work, but we're doing more than 80Gb/s of 100% TLS encrypted traffic from a single socket Xeon with no hardware encryption offloads.
I was very sad when alpha got axed, but I agreed with killing it. FreeBSD is about current hardware.
I work directly with both of the gents who gave this talk about 100G networking (on Linux) and still find that much of the actual cutting edge research is done on Linux. Perhaps I'm biased! I've also been to one of Mellanox's engineering offices (Tel Aviv) to speak with their engineers at my previous employer 7-8 years ago. They told me they do most all of their prototyping and initial development on Linux, and RHEL to be specific. Then then port to other platforms.
Maybe I was wrong on some of this, but my use case (due to my employer's industry being finance) is lower latency, where Linux absolutely and positively crushes anything else.
Actually, while we're on the subject, SmartOS with CPU bursting from illumos is the leader in low latency trading:
Additionally, I don't believe (Experts please correct me if this is wrong) SmartOS has an equivalent to Linux's isolcpus boot command line flag (or cpu_exclusive=1 if you're in a cpuset) to remove a cpu core entirely from the global scheduler domain. This prevents any tasks from running on that CPU, including kernel threads. Kernel threads will still occasionally interrupt applications if you simply set the affinity on pid 1 so that does't count.
These two features, along with hardware that is configured to not throw SMIs, allow Linux to get out of the way of applications for truly low latency. As far as I'm aware, this is impossible to do in Solaris/SmartOS. I'm not even getting into the SLUB memory allocator being better or the lazy TLB in Linux massively lowering TLB shootdowns, etc, etc. There is a reason why virtually every single major financial exchange in the world runs Linux (CME in Chicago, NYSE/NYMEX in New York, LSE in London, and Xetra in Frankfurt), it is better for the low latency use case.
On timers: we (I) added arbitrary resolution interval timers to the operating system in 1999 -- predating Linux by years. (We have had CPU binding and processor sets for even longer.) The operating system was and is being used in many real-time capacities (in both the financial and defense sectors in particular) -- and before "every single major financial exchange" was running Linux, many of them were running Solaris.
One final question while I've got you that your response didn't seemingly address. Does the cyclic subsystem allow turning off the cpu timer entirely ala Linux's nohz_full? If so, I stand corrected.
I've done a great deal of reading and research on OS ethos, IMO a thriving and production worthy operating system can be maintained with as few as 40 people in total. The superiority of Linux feels exaggerated, and systems innovation has chilled because of it.
Im not sure what you mean. Linux has led TCP implementations for a decade now.
The Linux network stack is great. It's the preferred system of choice for nearly every researcher in the networking field. I don't know what Facebook meant in their case.
The main remark seems to be:
> The predominant difference is that the FreeBSD network stack was much more carefully designed. The Linux stack was less careful and thus is much more haphazard. Also, more work has been put into optimizing the FreeBSD stack.
It is not my area of expertise, but the Linux skbuf seems to fit the description of haphazard while the FreeBSD mbuf seems to fit the description of more carefully designed. The same could be said about epoll versus kqueue.
The remark about more work in optimizing the FreeBSD stack also seems to be true. While I cannot speak for everything in FreeBSD's network stack, I do know that FreeBSD's netmap far exceeded anything Linux could do at the time and while it is available on Linux, I never hear of it being used anywhere but on FreeBSD:
Development of FreeBSD's network stack had plenty of innovative things in development at the time Facebook's post was made:
That included additional contributions from a major network equipment vendor that had made many contributions throughout the years. If I checked the commit history, I imagine I would find performance work done by said vendor. From what I can tell, FreeBSD's network stack is improving regardless of whether the rest of us hear about it.
Lastly, there have been multiple things discovered to be wrong in the Linux network stack since that facebook job listing. Two prominent ones that I recall offhand are:
They both could fall into the category of stability problems to which facebook had alluded. The second one more so though:
> The end result is that applications that oscillate between transmitting lots of data and then laying quiescent for a bit before returning to high rates of sending will transmit way too fast when returning to the sending state. This consequence of this is self induced packet loss along with retransmissions, wasted bandwidth, out of order packet delivery, and application level stalls.
This is covered by my previous team's page:
Note: "On newer Linux OSes this is no longer needed." (IE, it's already set properly).
For the second one, they fixed a bug in Linux TCP cubic implementation. FreeBSD didn't get cubic until 8.2, which was around 2009. So, you're criticizing Linux for having a in a bug in a feature that FreeBSD didn't even add until 7 years ago.
Again, I will repeat: I worked on a team that did multi-OS TCP/IP optimization. What you're describing in terms of oscillation is a well-known problem in many implementations. All of the people doing research on this are now using linux as their platform for research and development.
Not implementing cubic in FreeBSD when there was a bug in the only implementation of it in the world could have been an advantage in certain situations, including Facebook's.
There seems to be a hubris by many Linux users that Linux is the best solution in the world for everything and it is not. There is always someone who does something better. Maybe not in everything, but the same applies to Linux. No matter how good it becomes, it is not the best in everything. Networking is a broad topic. I don't think Linux is the best in every area of networking. I am not even sure if it is the best in many of them, given that many platforms do things very well and at some point, it is hard to be better.
Subsystems are now done with up front design and some degree of consensus in the BSDs, closer to the cathedral and commercial development than the bazaar of Linux. This necessarily means we are not usually at the forefront of cutting edge features. It doesn't necessarily mean we don't have features before Linux; if the idea exists in academia or other OSes enough to reason about it's reasonable to propose, design, and build. Netmap is a good example. The new FreeBSD selectable TCP stacks are another, where we avoid incremental growing pains and baggage. When these designed features hit, they tend to be coherent, usable, obvious, and lasting.
My opinion of Linux features is that little due diligence was done, especially public acknowledgement of inspiration and why one route was taken over another. For instance, the Linux KPIs are littered with questionable decisions made in isolation. epoll and the various file notification calls are examples. That attitude manifested strangely up to userland through IPC/DBus with the continued systemd drama.
A little bit of logical inference.. there are financial drivers vendors are fleeing the Linux kernel in preference of userspace (i.e. Intel's DPDK and SPDK). One is licensing, which is not an issue with BSD nor userland. The other is the rate and quality of KPI churn. Linux KPIs break all the time, switch licenses all the time, and it is a general nuisance to maintain a vendor tree whether it is open or closed source. The good side is that hopefully drivers and products end up open source. The bad side is, in many modern usages, that does not happen because GPL is not relevant to hosted services, as well as low motivation/quality/incentive/license violation for IoT type things. The BSDs start with no pretense of GPL nor flippant APIs, so it is a lot more comfortable to consume and build great products.
This remark seems more to me like a statement of belief that no one else can do good things other than Linux. That is far from true.
"In linux, buffers in the tx queue hold a reference to the socket so completions can be used to notify sockets. Implementing the same mechanism in FreeBSD should be relatively straightforward. "
"We don’t have software TCP segmentation, we have to carry information in the mbufs.
Performance was doubled, without hardware support, by doing segmentation very low in the stack, right before input into driver. (Student project.) Linux calls this approach GSO, pushing large segments through the stack; the hardware can do segmentation if supported, otherwise we do it at the bottom layer. Simplifies TCP code since you can send arbitrarily large segments. "
"Linux has their standard ifnet interface, with a single pointer to the extensions; if the interface does not support them, the system still runs. If it does, have interfaces to configure numbers of queues, numbers of buffers, etc.
All of this is slow-path (configuration) code.
Think we should go for a similar route — ease configuration of 10gig interfaces"
the rest of the stuff in there is just low level optimizations to update the design that was written out in the original FreeBSD book.
I never said that people can't do good things in OSes other than linux. I said that Linux's networking stack has been better than BSD's for ten years, I can cite numerous factual arguments and research papers to support this, along with my extensive experience with linux (my experience with BSD is less, but enough to know it's stack isn't magically better.
Linux does have plenty of nice things and plenty of nice work, but I am not going to dismiss everything being done elsewhere by declaring Linux to be "better". At best, I would say that it is ahead in some areas, behind in other areas and the same in many areas. As for what some of those "other areas" are, I recall Adrian Chadd implementing time division multiplexed atheros wifi support in FreeBSD that Linux does not have. Netflix also contributed a rather nice thing to FreeBSD that Linux did not have:
There are plenty of nice things in both platforms. Labelling one as "better" just doesn't do justice to either of them. It ignores opportunities for the "better" one to improve by denying that opportunities for improvement have been demonstrated to exist. It also denies the "lesser" one the acknowledgement of having done something worth while.
When I say something is "better", I mean "I've looked at the data, and integrated over a wide range of parameters".
I'm still waiting to hear about a magical BSD feature that is better. That hasn't happened in about 10 years, hence my statement.
If you are as experienced in networking as you claim, you should stop waiting to hear about magical features that are better. Nothing will ever impress you as being magical. That is a downside of having experience.
Maybe you would find talking to an actual expert on FreeBSD's network stack more interesting. I am not one and while I could list several other things I know, I am clearly is not doing it justice.