BSD doesn't have drivers for Infiniband and other HPC interconnects. Nor does it have client drivers (let alone server implementation) for Lustre which is the distributed filesystem used by most super computers.
I imagine MPI support on BSD is also likely non-existent.
Then there is the matter of accelerator support, i.e NVidia GPUs and Intel Xeon Phi.
It's not to say that some vendor couldn't reasonably build a BSD based supercomputer, it's just highly unlikely given how much stuff is missing.
This page mentions that Mellanox has provided work on this. Also storage vendors uses external infiniband stacks on FreeBSD for many years (I got my first Isilon cluster something like 8 years ago, and it was using an IB backend, with a forked FreeBSD 7 kernel iirc) and are stable and widely deployed.
That said, the primary platform is Linux and HPC is a very demanding workload. Unless I had a lot of time to invest in BSD kernel development I would stick with Linux.
That is putting aside Lustre too which is usually a non-negotiable requirement for HPC.
Also NUMA is very important on supercomputers, and it works well on Linux.
The other thing worth noting is the much better support IBM has for Linux on PowerPC (2 in the top 10). I think Sunway (most powerful in the world) is a Linux shop too.
As others have mentioned there was a Mellanox stack in Free circa 2005 that I worked with. It was used at Isilon (BSD based) in production.
There really isn't a technical discussion here at all, when an overwhelmingly large part of your userbase uses X, it would be pretty stupid to only support Y, and probably not defensible to support X and Y
That's maybe good for a router but simply not HPC material.
And it's not like Linux is somehow famous for poor scalability, unless you're talking about the 1990'ies. Yes, back in the 1990'ies it was certainly much worse than Solaris. But for the 2.6 and subsequent releases SGI and others put a lot of work into improving it. SGI at some point sold 4096-way (might even have been 4096 cores and 8192 hw threads?) single-image supercomputers running Linux, which AFAIK is bigger than anything Solaris has been deployed on.
That being said, most HPC systems consist of 1 or 2-socket nodes connected via a network, so the kernel scaling to such extreme systems isn't that relevant in the vast majority of deployments.
This gave Linux a head start and the self-reinforcing effects of such a head start did the rest, it made answering the question 'for which OS should we start writing drivers?' for specialty HPC hardware a no-brainer.
3c59x: Donald Becker and others. www.scyld.com/network/vortex.html
On the other hand, people are used to Linux, in my environment literally everybody has Ubuntu on their notebook and workstation. They know how to run their python analysis scripts there and the only thing they have to change when going to the cluster is the adoption of an environment managament system (such as http://modules.sourceforge.net/).
(However, I have to admit I never got in touch with BSD and don't know the differences in user space)
This can substantially reduce the time to deploy new software and cut down on overhead related to managing multiple modules.
Singularity, unlike Docker, is designed to require only minimal privilege escalation and as such it's an easy sell to HPC admins, who can (at least somewhat) get out of the business of helping users figure out what the heck is weird about their environment when trying to get something running on a cluster for the first time. You can also take these containers with you and be reasonable certain they'll work on another system.
It also helps with versioning. It is not uncommon to see various versions of gcc and intel compilers. In essence the user should be able to load their environment with a few module loads.
Here's some more info, if you're interested
They are really different tools for different jobs.
"Originally, the top 500 list was populated entirely by proprietary Unix systems from vendors like Cray research, SGI, etc.
In June 1998, the first Linux system entered the top 500 list. By June 2003, Linux systems passed the 25% mark, accounting for 139 of the top 500. By November of 2003, Linux systems comprised over 56% of the top 500. By November 2006, Linux made up more than 75% of the top 500. You get the idea. Over the years, there were a few attempts by microsoft to get into supercomputing, and there were BSD and Mac systems."
Since time is sold on these supercomputers they probably want to run all the same/similar OS so they can compete selling time on them. Also if one person has success everyone else will copy them.
Slashdot has a ton of comments discussing bsd vs linux on this subject matter, but I didn't see anything to helpful.
My only thought is large companies like netflix use bsd more for CDN because from what I been told bsd has the best I/O handling. Why they don't use it for the rest of there infrastructure? Maybe linux is better at crunching numbers and bsd is better for network and security? No idea thats my best guess.
Linux has two things that are extremely useful compared to BSD:
1) commercial backing (should one choose it)
2) first class support for inifinband, top end ethernet (should they use it) and storage controllers
On Linux you basically have to use blocking threads to emulate async disk I/O, which means tons of threads and overhead when you’re handling 10k-100k concurrent connections per box.
A video CDN needs lots of concurrent access to a file system.
From the current aio man page:
“Work has been in progress for some time on a
kernel state-machine-based implementation of asynchronous I/O (see
io_submit(2), io_setup(2), io_cancel(2), io_destroy(2),
io_getevents(2)), but this implementation hasn't yet matured to the
point where the POSIX AIO implementation can be completely
reimplemented using the kernel system calls.”
On high performance async IO, this works quite well in Linux, and there are no blocking threads that I am aware of in that stack. The kernel uses bio dispatches to perform the actual block io. If you are complaining about using bio to perform the actual IO, and that linux includes this in its load calculation, sure, that is a conscious decision as I understand it on the part of the block layer folks. Is it wrong or bad? I don't think so, though others have different opinions.
FWIW ... I work at a place now using SmartOS as its primary OS. There are many people I know preferring BSD. Many people preferring linux. I have a different view, one that is not as popular as I hope it would be.
Specifically, I look at operating systems now, largely, as an implementation detail for your stack. You have a mission in many cases, unless you are an OS developer, that consumes the OS services layers to help you perform your mission. In many cases, specifics of the OS don't matter, as long as they don't get in your way. Sometimes the specifics of the OS help you.
From my view as an HPC guy, a hardware guy, a storage/compute/ML/GPU guy, I generally can work in Linux and BSD without pain. Minor config difference, but I am comfortable in both.
I am not, and have not been comfortable in AIX, HP/UX, and UnicOS. I used to enjoy IRIX until I started playing with Linux. I used Solaris and SunOS in the past, and SmartOS/illumos today.
As long as the OS has the tools I need, the libraries I need, or a way for me to build them, and doesn't constrain me or force me to contort to vagaries of the OS itself, I am fine with it.
A problem arises when people get caught up in "my OS > your OS", which, this overall question at least brings in under the covers. This usually comes around from various esoteric aspects of little relevance for the vast unwashed masses of users (like me). On the OS dev side, when this happens, it is usually defensive because something needed is missing, or some OS dev/manager (mis)believes that users don't actually need the features they are requesting.
That is actually a major problem, and it tends to drive people from your platform. Users aren't dumb, and there are many sophisticated people who have a deeper appreciation for the issues, than "my OS > your OS".
Why *BSD isn't used might be for historical reasons, momentum, etc. It is perfectly fine as an OS, and quite usable for HPC. Similar for illumos/SmartOS (not simply saying that as I work for a company using SmartOS). There are missing things in both of these, and I am working (on the side) to try to help SmartOS get some of these things (user space stuff). FreeBSD in particular has most of what is needed.
Basically pick the system that works for you and your users. The OS, as I noted, can be viewed as a detail of the implementation. Or not.
But its not a reason to create friction/tension between groups claiming OS1 > OS2 ...
The VI/Emacs wars are so 80s/90s ...
That leads to people implementing “async io” threadpools in userland. Those threads then do “regular” blocking io which is able to use the page cache etc. having hundreds or thousands of blocking IO threads then causes lots of other perf/scheduling issues.
An OS is a priori better if I get to sleep through the night without an incident.
FreeBSD (and perhaps some other BSDs) support “top-end” Ethernet as well. There was a great post on the Netflix blog a couple of months ago (discussed on HN) about how Netflix optimized their systems to serve video at 100Gbps.
FreeBSD does not support Infiniband, afaik.
But all the supercomputers use their own custom linux. So no commerical backing. Also these computers are not your standard data center. They cut networking and storage to a minimal because those are bottlenecks. These things are just massive ram/cpu/gpu boxes connected properly through pci.
Edit: I was looking at Sunway hardware specs the number one supercomputer they use a PCI-E 3.0 connection for all there nodes. Communication between the nodes is 12GB/second with a latency of 1 us. Their total ram is 1.31 PB
This is just wrong. Yes, they use custom Linux, but it is highly highly supported. You buy a Cray or a BlueGene and you get dedicated kernel engineers as well as on site support etc etc.
They cut networking and storage to a minimal because those are bottlenecks. These things are just massive ram/cpu/gpu boxes connected properly through pci.
This is just wrong. Networking is extremely important in supercomputers - but it isn't like setting up a LAN. They use custom networking, Infiniband, Aries, OmniPath etc. There isn't much information about the "PCIe Network" on the Sunway, but the fact it is PCIe isn't very interesting - everyone has fast optical networking. It's the topology and protocol which makes things interesting.
Sure with how cheap inifiband is (especially compared to 40/100 gig ethernet) one _could_ cobble together a system your self.
Where the magic sauce comes in, and where the like of cray really make things shine is the software they provide to allow end users _easily_ do multi-machine scaling.
libraries for just in time delivery of data directly into ram? yup. location aware job dispatchers that co-locate jobs near each other logically? yup.
All of those hard things are solved for you.
Sometimes it is billed to different departments, but in most universities or research groups the "billing" is really quota allocation.
So, in no particular order:
- Linux came on the scene when BSD's were mired in legal uncertainty. After the legal issues were settled, Linux had already become the default choice for someone wanting a FOSS Unix-style kernel, and the BSD's never caught up.
- The GPL license meant that improvements were shared rather than squirreled away in various proprietary spin-offs and thus lost when whatever company was behind them folded (generally, exceptions going both ways surely exist!).
- Due to Linux gaining the initial momentum, developers flocked (and keep flocking!) to it, leaving the BSD's ever further behind.
- Linux was more welcoming to new contributors, whereas the BSD's were controlled by a small circle of core developers sitting on the commit access. And of course, the BSD way of solving disagreements was forking the entire thing, further splitting up the already small developer base.
I've always debated this. You would think that BSD licenses would be more attractive to corporations like google, amazon, facebook, etc and GPL licenses were more attractive to researchers and one would have thought that the BSD systems ( freebsd, netbsd, openbsd, etc ) would be the dominant unix-style OSes. Instead the GPL linux based OSes became dominant.
In the research space peer-review and reproducible results are critical. So GPL does fit in well. The makers of supercomputers have to accommodate their clients' requirements.
I think it's simply down to Linux being where the money is.
The big players (IBM, Dell, etc) are all actively promoting Linux, and trained personel is also somewhat easy to find.
So Linux is "the beast you know".
As for FreeBSD, it might be a technically better platform, but it is living in Linux' shadow.
Personally i run FreeBSD for the excellent documentation, stability, features like ZFS, but nothing i run couldn't just as easily run on Linux.
Linux has wider, not better hardware support. If OpenBSD supports some hardware it simply works out of the box and is stable and rock solid.
"Done is better than perfect".
While Linux has better hardware support, i usually find this to be in the more exotic direction.
Edit: support as in commercial support with SLA etc
Linux has ZFS.
You have the choice of the FUSE version that's legally free and clear but has obvious FUSE related performance limitations, or the kernel version which has great performance but is questionable at best from a legal standpoint because the CDDL is not compatible with the GPL.
Canonical has decided they're willing to take the risk by bundling it in Ubuntu and so far it hasn't backfired on them, but there's good reason to believe that Oracle's lawyers may have something to say about it if they ever feel that ZFS-on-Linux is threatening any of their products.
Many of the vendors for HPC (I'm looking at you Mellanox) primarily develop and certify their products on Linux. While they might work on BSD ok, you're not going to get the full performance and all of the features on a BSD system. If you paid for Mellanox EDR 100G Infiniband switches and all of the fancy VPI network cards, you want to use them to the fullest performance capable. The vendor tells you to use Linux for that, you use Linux.
TL;DNR: Linux is what the hardware manufacturers overwhelmingly target and work with. HPC users use what vendors support best.
HPC is generally a "softball" workload because the code is going to be more sympathetic to the hardware than many other computer usages. Processes will batch allocate a lot of RAM and peg runnable state for a long time.
SMP.. "it depends", again a parallel vector matrix multiply is just going to sit in the runnable state on all the cores and the kernel is pretty irrelevant. There is a lot of junior job stuff left in FreeBSD to move locks around. The VFS is quite bad. In an HPC type workload these things probably wont matter that much unless you see a lot of "system %". They will show up in profiles and are generally also easy to fix. But it's not hard to construct a microbenchmark showing Linux > $else in those areas.
SLUB.. no. What kind of HPC workload is going to care much about this? The Linux allocators are pretty awful at contig kernel memory allocation (see ZFS on Linux). I don't see why UMA would architecturally flop here.
NUMA is a sore point on FreeBSD. It should be usable in 12.0. Isilon and Netflix are paying Jeff Roberson to work on it. Some folks on my team are also doing minor NUMA and locking work, but for commercial CDN workloads.
Mellanox does a pretty stellar job on FreeBSD Ethernet and Infiniband support. Unfair dig at them. I generally prefer Chelsio, but Mellanox has lowest latency which is relevant for HPC.
SLUB was written by Christoph Lameter when he was at Silicon Graphics for their monster Altix machines. It took Linux hours to boot (with SLAB) on that machine. He wrote SLUB in a fit of brilliance to make Linux suck less on these, of which HPC workloads can most certainly be ran. Just like some of the crazy Cray computers, SGI machines used to own HPC. Note that I work with Christoph in the same office and have discussed this with him in person. Regarding contiguous memory allocation, a lot of serious HPC workloads use huge pages set at boot to defeat this, so that part of Linux's fail is a non-issue (You're entirely right btw). Really awesome to hear about NUMA bits in FreeBSD being improved, and I sufficiently feel hit with a cluebat on it.
The bit from Mellanox was from their engineers (in their Haifa, Israel office before lunch) telling me they build their products for Linux first, and then port to everything else. They care deeply that it works on Linux, and it is nice if it works on other systems but not as important. It wasn't a dig at them, it was what the engineer said to me.
Last I heard from my rep, intel was discontinuing icc altogether because it didn't make a lot of sense to not put the optimizations in the compilers most people use.. gcc, llvm, vcpp.
I'd assumed Intel kept their compilers as a competitive advantage even if they weren't profitable by themselves. Could certainly see it happening though.
Dragonfly's design shows promise, but it's not anywhere near ready for supercomputers yet.
Dragonfly seems to be hard at work on the problems.