"I worked on Solaris for over a decade, and for a while it was usually a better choice than Linux, especially due to price/performance (which includes how many instances it takes to run a given workload). It was worth fighting for, and I fought hard. But Linux has now become technically better in just about every way. Out-of-box performance, tuned performance, observability tools, reliability (on patched LTS), scheduling, networking (including TCP feature support), driver support, application support, processor support, debuggers, syscall features, etc. Last I checked, ZFS worked better on Solaris than Linux, but it's an area where Linux has been catching up. I have little hope that Solaris will ever catch up to Linux, and I have even less hope for illumos: Linux now has around 1,000 monthly contributors, whereas illumos has about 15.
In addition to technology advantages, Linux has a community and workforce that's orders of magnitude larger, staff with invested skills (re-education is part of a TCO calculation), companies with invested infrastructure (rewriting automation scripts is also part of TCO), and also much better future employment prospects (a factor than can influence people wanting to work at your company on that OS). Even with my considerable and well-known Solaris expertise, the employment prospects with Solaris are bleak and getting worse every year. With my Linux skills, I can work at awesome companies like Netflix (which I highly recommend), Facebook, Google, SpaceX, etc.
Large technology-focused companies, like Netflix, Facebook, and Google, have the expertise and appetite to make a technology-based OS decision. We have dedicated teams for the OS and kernel with deep expertise. On Netflix's OS team, there are three staff who previously worked at Sun Microsystems and have more Solaris expertise than they do Linux expertise, and I believe you'll find similar people at Facebook and Google as well. And we are choosing Linux.
The choice of an OS includes many factors. If an OS came along that was better, we'd start with a thorough internal investigation, involving microbenchmarks (including an automated suite I wrote), macrobenchmarks (depending on the expected gains), and production testing using canaries. We'd be able to come up with a rough estimate of the cost savings based on price/performance. Most microservices we have run hot in user-level applications (think 99% user time), not the kernel, so it's difficult to find large gains from the OS or kernel. Gains are more likely to come from off-CPU activities, like task scheduling and TCP congestion, and indirect, like NUMA memory placement: all areas where Linux is leading. It would be very difficult to find a large gain by changing the kernel from Linux to something else. Just based on CPU cycles, the target that should have the most attention is Java, not the OS. But let's say that somehow we did find an OS with a significant enough gain: we'd then look at the cost to switch, including retraining staff, rewriting automation software, and how quickly we could find help to resolve issues as they came up. Linux is so widely used that there's a good chance someone else has found an issue, had it fixed in a certain version or documented a workaround.
What's left where Solaris/SmartOS/illumos is better? 1. There's more marketing of the features and people. Linux develops great technologies and has some highly skilled kernel engineers, but I haven't seen any serious effort to market these. Why does Linux need to? And 2. Enterprise support. Large enterprise companies where technology is not their focus (eg, a breakfast cereal company) and who want to outsource these decisions to companies like Oracle and IBM. Oracle still has Solaris enterprise support that I believe is very competitive compared to Linux offerings.
So you've chosen to deploy on Solaris or SmartOS? I don't know why you would, but this is also why I also wouldn't rush to criticize your choice: I don't know the process whereby you arrived at that decision, and for all I know it may be the best business decision for your set of requirements.
I'd suggest you give other tech companies the benefit of the doubt for times when you don't actually know why they have decided something. You never know, one day you might want to work at one."
I feel sorry for the Solaris engineers (and likely ex-colleagues) who are about to lose their jobs. My advise would be to take a good look at Linux or FreeBSD, both of which we use at Netflix. Linux has been getting much better in recent years, including reaching DTrace capabilities in the kernel. It's not as bad as it used to be, although to really evaluate where it's at you need to be on a very new kernel (4.9 is currently in development), as features have been pouring in.
Also, since I was one of the top Solaris performance experts, I've been creating new Linux performance content on a website that should also be useful (I've already been thanked for this by a few Solaris engineers who have switched.) I've been meaning to create a FreeBSD page too (better, a similar page on the FreeBSD wiki so others can contribute).
FreeBSD feels to me to be the closest environment to Solaris, and would be a bit easier to switch to than Linux. And it already has ZFS and DTrace.
BTRFS/ZoL doesn't beat Illumos ZFS. FreeBSD ZFS is pretty standalone in the OS; It can only since recently deal with a hot spare drive and thats it. IO scheduling on FreeBSD is spartan; It will always favor large IO and starve small read / writes.
LXC doesn't beat FreeBSD jails or Solaris zones since LXC is not considered a security boundary.
Openvswitch can perhaps measure itself with illumos crossbow.
Systemd doesn't beat SMF on Illumos. I think SMF really nailed it (Systemd is overkill and plain RC scripts in FreeBSD are a pain).
So IMHO Solaris/Illumos/SmartOS sits nicely between Linux and FreeBSD.
Just a reminder, we also run FreeBSD on our CDN servers at Netflix.
I have no information, but there aren't very many dots to connect here.
Adrian Chadd did most of the FreeBSD RSS work, and gave a good talk about it at BAFUG: https://www.youtube.com/watch?v=7CvIztTz-RQ
The RSS in Linux was just used for load spreading (the last I checked, I haven't used Linux much since I left Google 1.5 years ago). If this has improved, I'd love to hear about it.
Linux RFS depends on the packets being dispatched to the correct CPU for the connection by the interrupt handler running wherever the packet happened to land. This has cache & memory locality implications, especially on NUMA.
Linux aRFS lets the NIC do the steering. Unfortunately, each connection requires an interaction with the NIC to poke it into the steering table, and most NICs can't steer 100,000 connections.
So, to sum up, Linux has a lot of cool tech for steering individual connections and support for that varies greatly by NIC. Windows and FreeBSD use standard RSS to predictably steer an unlimited number of connections. For a large CDN server, the latter is more useful. However, for low-latency / high bandwidth applications, I can see the advantage to aRFS.
Linux is the platform of choice for bufferbloat research, although FreeBSD isn't far behind in adopting the results of it:
Netflix gets nearly 100Gbps from storage out the network on their FreeBSD+NGINX OCA appliances. Some details in the "Mellanox CDN Reference Architecture" whitepaper at http://www.mellanox.com/related-docs/solutions/cdn_ref_arch..... The closest equivalent I've found on Linux was a blog post on BBC streaming getting about 1/4 of the performance.
Chelsio has a demo video (with terrible music) using TCP zero copy of 100Gbps on a single TCP session, with <1% CPU usage https://www.youtube.com/watch?v=NKTApBf8Oko.
At SC16 NASA had a "Building Cost-Effective 100-Gbps Firewalls for HPC" demo, using FreeBSD and netmap: https://www.nas.nasa.gov/SC16/demos/demo9.html
Another interesting optimization we've done (and which needs to be upstreamed) is TLS sendfile. There is a tech blog about this at http://techblog.netflix.com/2016/08/protecting-netflix-viewi....
We don't have a paper yet about the latest work, but we're doing more than 80Gb/s of 100% TLS encrypted traffic from a single socket Xeon with no hardware encryption offloads.
I was very sad when alpha got axed, but I agreed with killing it. FreeBSD is about current hardware.
I work directly with both of the gents who gave this talk about 100G networking (on Linux) and still find that much of the actual cutting edge research is done on Linux. Perhaps I'm biased! I've also been to one of Mellanox's engineering offices (Tel Aviv) to speak with their engineers at my previous employer 7-8 years ago. They told me they do most all of their prototyping and initial development on Linux, and RHEL to be specific. Then then port to other platforms.
Maybe I was wrong on some of this, but my use case (due to my employer's industry being finance) is lower latency, where Linux absolutely and positively crushes anything else.
Actually, while we're on the subject, SmartOS with CPU bursting from illumos is the leader in low latency trading:
Additionally, I don't believe (Experts please correct me if this is wrong) SmartOS has an equivalent to Linux's isolcpus boot command line flag (or cpu_exclusive=1 if you're in a cpuset) to remove a cpu core entirely from the global scheduler domain. This prevents any tasks from running on that CPU, including kernel threads. Kernel threads will still occasionally interrupt applications if you simply set the affinity on pid 1 so that does't count.
These two features, along with hardware that is configured to not throw SMIs, allow Linux to get out of the way of applications for truly low latency. As far as I'm aware, this is impossible to do in Solaris/SmartOS. I'm not even getting into the SLUB memory allocator being better or the lazy TLB in Linux massively lowering TLB shootdowns, etc, etc. There is a reason why virtually every single major financial exchange in the world runs Linux (CME in Chicago, NYSE/NYMEX in New York, LSE in London, and Xetra in Frankfurt), it is better for the low latency use case.
On timers: we (I) added arbitrary resolution interval timers to the operating system in 1999 -- predating Linux by years. (We have had CPU binding and processor sets for even longer.) The operating system was and is being used in many real-time capacities (in both the financial and defense sectors in particular) -- and before "every single major financial exchange" was running Linux, many of them were running Solaris.
One final question while I've got you that your response didn't seemingly address. Does the cyclic subsystem allow turning off the cpu timer entirely ala Linux's nohz_full? If so, I stand corrected.
I've done a great deal of reading and research on OS ethos, IMO a thriving and production worthy operating system can be maintained with as few as 40 people in total. The superiority of Linux feels exaggerated, and systems innovation has chilled because of it.
Im not sure what you mean. Linux has led TCP implementations for a decade now.
The Linux network stack is great. It's the preferred system of choice for nearly every researcher in the networking field. I don't know what Facebook meant in their case.
The main remark seems to be:
> The predominant difference is that the FreeBSD network stack was much more carefully designed. The Linux stack was less careful and thus is much more haphazard. Also, more work has been put into optimizing the FreeBSD stack.
It is not my area of expertise, but the Linux skbuf seems to fit the description of haphazard while the FreeBSD mbuf seems to fit the description of more carefully designed. The same could be said about epoll versus kqueue.
The remark about more work in optimizing the FreeBSD stack also seems to be true. While I cannot speak for everything in FreeBSD's network stack, I do know that FreeBSD's netmap far exceeded anything Linux could do at the time and while it is available on Linux, I never hear of it being used anywhere but on FreeBSD:
Development of FreeBSD's network stack had plenty of innovative things in development at the time Facebook's post was made:
That included additional contributions from a major network equipment vendor that had made many contributions throughout the years. If I checked the commit history, I imagine I would find performance work done by said vendor. From what I can tell, FreeBSD's network stack is improving regardless of whether the rest of us hear about it.
Lastly, there have been multiple things discovered to be wrong in the Linux network stack since that facebook job listing. Two prominent ones that I recall offhand are:
They both could fall into the category of stability problems to which facebook had alluded. The second one more so though:
> The end result is that applications that oscillate between transmitting lots of data and then laying quiescent for a bit before returning to high rates of sending will transmit way too fast when returning to the sending state. This consequence of this is self induced packet loss along with retransmissions, wasted bandwidth, out of order packet delivery, and application level stalls.
This is covered by my previous team's page:
Note: "On newer Linux OSes this is no longer needed." (IE, it's already set properly).
For the second one, they fixed a bug in Linux TCP cubic implementation. FreeBSD didn't get cubic until 8.2, which was around 2009. So, you're criticizing Linux for having a in a bug in a feature that FreeBSD didn't even add until 7 years ago.
Again, I will repeat: I worked on a team that did multi-OS TCP/IP optimization. What you're describing in terms of oscillation is a well-known problem in many implementations. All of the people doing research on this are now using linux as their platform for research and development.
Not implementing cubic in FreeBSD when there was a bug in the only implementation of it in the world could have been an advantage in certain situations, including Facebook's.
There seems to be a hubris by many Linux users that Linux is the best solution in the world for everything and it is not. There is always someone who does something better. Maybe not in everything, but the same applies to Linux. No matter how good it becomes, it is not the best in everything. Networking is a broad topic. I don't think Linux is the best in every area of networking. I am not even sure if it is the best in many of them, given that many platforms do things very well and at some point, it is hard to be better.
Subsystems are now done with up front design and some degree of consensus in the BSDs, closer to the cathedral and commercial development than the bazaar of Linux. This necessarily means we are not usually at the forefront of cutting edge features. It doesn't necessarily mean we don't have features before Linux; if the idea exists in academia or other OSes enough to reason about it's reasonable to propose, design, and build. Netmap is a good example. The new FreeBSD selectable TCP stacks are another, where we avoid incremental growing pains and baggage. When these designed features hit, they tend to be coherent, usable, obvious, and lasting.
My opinion of Linux features is that little due diligence was done, especially public acknowledgement of inspiration and why one route was taken over another. For instance, the Linux KPIs are littered with questionable decisions made in isolation. epoll and the various file notification calls are examples. That attitude manifested strangely up to userland through IPC/DBus with the continued systemd drama.
A little bit of logical inference.. there are financial drivers vendors are fleeing the Linux kernel in preference of userspace (i.e. Intel's DPDK and SPDK). One is licensing, which is not an issue with BSD nor userland. The other is the rate and quality of KPI churn. Linux KPIs break all the time, switch licenses all the time, and it is a general nuisance to maintain a vendor tree whether it is open or closed source. The good side is that hopefully drivers and products end up open source. The bad side is, in many modern usages, that does not happen because GPL is not relevant to hosted services, as well as low motivation/quality/incentive/license violation for IoT type things. The BSDs start with no pretense of GPL nor flippant APIs, so it is a lot more comfortable to consume and build great products.
This remark seems more to me like a statement of belief that no one else can do good things other than Linux. That is far from true.
"In linux, buffers in the tx queue hold a reference to the socket so completions can be used to notify sockets. Implementing the same mechanism in FreeBSD should be relatively straightforward. "
"We don’t have software TCP segmentation, we have to carry information in the mbufs.
Performance was doubled, without hardware support, by doing segmentation very low in the stack, right before input into driver. (Student project.) Linux calls this approach GSO, pushing large segments through the stack; the hardware can do segmentation if supported, otherwise we do it at the bottom layer. Simplifies TCP code since you can send arbitrarily large segments. "
"Linux has their standard ifnet interface, with a single pointer to the extensions; if the interface does not support them, the system still runs. If it does, have interfaces to configure numbers of queues, numbers of buffers, etc.
All of this is slow-path (configuration) code.
Think we should go for a similar route — ease configuration of 10gig interfaces"
the rest of the stuff in there is just low level optimizations to update the design that was written out in the original FreeBSD book.
I never said that people can't do good things in OSes other than linux. I said that Linux's networking stack has been better than BSD's for ten years, I can cite numerous factual arguments and research papers to support this, along with my extensive experience with linux (my experience with BSD is less, but enough to know it's stack isn't magically better.
Linux does have plenty of nice things and plenty of nice work, but I am not going to dismiss everything being done elsewhere by declaring Linux to be "better". At best, I would say that it is ahead in some areas, behind in other areas and the same in many areas. As for what some of those "other areas" are, I recall Adrian Chadd implementing time division multiplexed atheros wifi support in FreeBSD that Linux does not have. Netflix also contributed a rather nice thing to FreeBSD that Linux did not have:
There are plenty of nice things in both platforms. Labelling one as "better" just doesn't do justice to either of them. It ignores opportunities for the "better" one to improve by denying that opportunities for improvement have been demonstrated to exist. It also denies the "lesser" one the acknowledgement of having done something worth while.
When I say something is "better", I mean "I've looked at the data, and integrated over a wide range of parameters".
I'm still waiting to hear about a magical BSD feature that is better. That hasn't happened in about 10 years, hence my statement.
If you are as experienced in networking as you claim, you should stop waiting to hear about magical features that are better. Nothing will ever impress you as being magical. That is a downside of having experience.
Maybe you would find talking to an actual expert on FreeBSD's network stack more interesting. I am not one and while I could list several other things I know, I am clearly is not doing it justice.
(I keep the DTrace book within reach when I sit at the keyboard. This is fan mail. Many thanks, for your work has helped me become a better computer person.)
Why not OpenBSD? I'm not an advocate of either; I'm trying to learn more about their usefulness in real world applications.
SMP scalability in general is far ahead of OpenBSDs the last time I looked, as is device support for 100G NICs, NVME storage, etc.
The performance monitoring is also far ahead on FreeBSD, with tools like Dtrace, Intel's PCM tools, and Intel's VTune available for FreeBSD.
OTOH, enterprise business workloads (SAP, OLTP databases, etc) typically serve thousands of users simultaneously. They do pay roll, accounting, etc etc. Such workloads can not be cached in the cpu cache, so you need to go out to RAM all the time. RAM is typically 100ns, which corresponds to 10 MHz cpu. Do you remember 10 MHz cpus? This means business workloads have huge scalability problems because you need to place all cpus on the same bus, in one single large scale-up server. If you try to run business workloads on a scale-out server, performance will drop drastically as data is shuffled among nodes on a network, instead on a fast bus.
Thus, business workloads use one single large scale-up servers, with max 16 or 32-sockets. This domain belongs to Unix/RISC and Mainframes. HPC number crunching use large clusters such as SGI UV3000 which has 10.000s of cores.
The largest Linux scale-up server is the new HP Kraken. It is a redesigned old Integrity Unix server with 64-sockets. The x86 version of the Integrity maxes out at 16-sockets only. Other than that, the largest x86 server is vanilla 8-socket servers by IBM, HP, Oracle, etc.
Linux devs only have access to 1-2 socket PCs so Linux can not be optimized nor tested on large 8-16 socket servers. Which Linux dev have access to anything larger than 4-sockets? No one. Linus Torvalds? No, he does not work on scalability on 16-socket servers. There is no Linux dev working on scalability on 16-socket servers. Why? Because, until last year, 16-socket x86 servers hardly even existed! Google this if you want, try to find a 16-socket x86 server other than the brand new HP Kraken and SGI UV300H. OTOH, Unix/RISC and Mainframes have scaled to 64 sockets for decades.
Look at the SAP benchmarks. The top scores all belong to 32-socket UNIX/RISC doing large SAP workloads. Linux on x86 has the bottom part, doing small SAP workloads. The HP Kraken has bad SAP scores, considering it has 16-sockets. It is almost the same as the 8-socket x86 SAP scores. Bad scalability.
Thus, if you want to run workloads larger than 2-4 sockets, you need to go to Unix/RISC. Linux maxes out at 2-4 sockets or so. The new Oracle Exadata server sporting SPARC T7 (same as the M7 cpu) runs Linux and it maxes out at 2-sockets. If you want 16-socket workloads, you must go to Solaris and SPARC. All large business servers, use Unix or Mainframes. No Linux nowhere.
Linux = small business workloads. Solaris = large business workloads. And the big money is in large business servers. If Oracle kills off Solaris, then Oracle is stuck at 2-4 sockets (small revenue). Only Solaris can drive large business servers (big revenue).
It does not make sense to kill of Solaris, because then Oracle can not offer (expensive) large business servers. Then Oracle will be stuck at small cheap business servers with Linux and Windows.
Regarding Linux vs Solaris code quality: