Ask HN: Why are supercomputers all running Linux and not BSD? - dbosch
======
jpgvm
HPC may look like COTS gear but it's not.

BSD doesn't have drivers for Infiniband and other HPC interconnects. Nor does
it have client drivers (let alone server implementation) for Lustre which is
the distributed filesystem used by most super computers.

I imagine MPI support on BSD is also likely non-existent. Then there is the
matter of accelerator support, i.e NVidia GPUs and Intel Xeon Phi.

It's not to say that some vendor couldn't reasonably build a BSD based
supercomputer, it's just highly unlikely given how much stuff is missing.

~~~
renchap
FreeBSD does support Infiniband:
[https://wiki.freebsd.org/InfiniBand](https://wiki.freebsd.org/InfiniBand)

This page mentions that Mellanox has provided work on this. Also storage
vendors uses external infiniband stacks on FreeBSD for many years (I got my
first Isilon cluster something like 8 years ago, and it was using an IB
backend, with a forked FreeBSD 7 kernel iirc) and are stable and widely
deployed.

~~~
znpy
No matter what your argument is, someone will always come up with the counter
argument for a particular example, missing the general point.

~~~
oliv__
I need to print this on a T-shirt

------
jacquesm
Because of one man: Donald Becker. At the beginning of the commodity super
computer era Donald did an absolutely amazing job squeezing out every last bit
of performance from commodity networking hardware for 'Beowulf' style
clusters.

This gave Linux a head start and the self-reinforcing effects of such a head
start did the rest, it made answering the question 'for which OS should we
start writing drivers?' for specialty HPC hardware a no-brainer.

[https://en.wikipedia.org/wiki/Donald_Becker](https://en.wikipedia.org/wiki/Donald_Becker)

~~~
pgtan
~ $ dmesg | grep Beck

3c59x: Donald Becker and others. www.scyld.com/network/vortex.html

~~~
kelnage
Sadly that link is long dead, but the Wayback Machine comes to the rescue:
[https://web.archive.org/web/20000619092736/www.scyld.com/net...](https://web.archive.org/web/20000619092736/www.scyld.com/network/vortex.html)

------
ktpsns
Scientific high energy physicist here with regional HPC center on the same
floor. My observation is that administrators tend to enterprise distributions
such as Scientific linux, Suse linux enterprise server (SLES), together with
commercial MPI implementations such as IBM MPI and Intel MPI.

On the other hand, people are used to Linux, in my environment literally
everybody has Ubuntu on their notebook and workstation. They know how to run
their python analysis scripts there and the only thing they have to change
when going to the cluster is the adoption of an environment managament system
(such as [http://modules.sourceforge.net/](http://modules.sourceforge.net/)).

(However, I have to admit I never got in touch with BSD and don't know the
differences in user space)

~~~
nerdponx
Modules looks really interesting. Makes me wonder why Continuum is out there
trying to reinvent the wheel with Anaconda. Glad to have something I can use
at work to replace Conda environments. Now all it needs is Powershell/CMD
support so I don't have to use it inside Cygwin...

~~~
godelski
Modules are really about environments (including software management).
Anaconda doesn't handle this. For example, Conda its version of HDF5 and
points to its environment path. Let's say you want to be using a different
version of HDF5. An easy way to do this is just use a module so that you load
this. You are creating an easy way for the user to set up their environment,
where they really don't have to know anything about it.

It also helps with versioning. It is not uncommon to see various versions of
gcc and intel compilers. In essence the user should be able to load their
environment with a few module loads.

Here's some more info, if you're interested

[1][http://www.admin-magazine.com/HPC/Articles/Managing-the-
Buil...](http://www.admin-magazine.com/HPC/Articles/Managing-the-Build-
Environment-with-Environment-Modules)

[2][http://www.admin-magazine.com/HPC/Articles/Managing-
Cluster-...](http://www.admin-magazine.com/HPC/Articles/Managing-Cluster-
Software-Packages)

[3][https://uisapp2.iu.edu/confluence-
prd/pages/viewpage.action?...](https://uisapp2.iu.edu/confluence-
prd/pages/viewpage.action?pageId=115540061)

------
smilesnd
A comment I found while looking at linux OS that are run on super computers.

"Originally, the top 500 list was populated entirely by proprietary Unix
systems from vendors like Cray research, SGI, etc.

In June 1998, the first Linux system entered the top 500 list. By June 2003,
Linux systems passed the 25% mark, accounting for 139 of the top 500. By
November of 2003, Linux systems comprised over 56% of the top 500. By November
2006, Linux made up more than 75% of the top 500. You get the idea. Over the
years, there were a few attempts by microsoft to get into supercomputing, and
there were BSD and Mac systems."

Since time is sold on these supercomputers they probably want to run all the
same/similar OS so they can compete selling time on them. Also if one person
has success everyone else will copy them.

[https://linux.slashdot.org/story/17/11/14/2223227/all-500-of...](https://linux.slashdot.org/story/17/11/14/2223227/all-500-of-
the-worlds-top-500-supercomputers-are-running-linux)

Slashdot has a ton of comments discussing bsd vs linux on this subject matter,
but I didn't see anything to helpful.

My only thought is large companies like netflix use bsd more for CDN because
from what I been told bsd has the best I/O handling. Why they don't use it for
the rest of there infrastructure? Maybe linux is better at crunching numbers
and bsd is better for network and security? No idea thats my best guess.

~~~
KaiserPro
I think netflix use BSD because they wanted to use BSD. Sure some flavours of
BSD have ZFS built in, but thats a pretty rare corner case.

Linux has two things that are extremely useful compared to BSD:

1) commercial backing (should one choose it) 2) first class support for
inifinband, top end ethernet (should they use it) and storage controllers

~~~
tatersolid
Netflix uses BSD for OpenConnect because asynchronous disk I/O-which is
critical for a CDN-remains a tire fire on Linux after more than 20 years.

On Linux you basically have to use blocking threads to emulate async disk I/O,
which means tons of threads and overhead when you’re handling 10k-100k
concurrent connections per box.

~~~
hpcjoe
Not true. I've regularly demonstrated very high throughput/connectivity with
lots of little connections. The problem I have seen (not only with linux) has
been over-aggressive congestion controls, usually configured/set wrong.

On high performance async IO, this works quite well in Linux, and there are no
blocking threads that I am aware of in that stack. The kernel uses bio
dispatches to perform the actual block io. If you are complaining about using
bio to perform the actual IO, and that linux includes this in its load
calculation, sure, that is a conscious decision as I understand it on the part
of the block layer folks. Is it wrong or bad? I don't think so, though others
have different opinions.

FWIW ... I work at a place now using SmartOS as its primary OS. There are many
people I know preferring _BSD. Many people preferring linux. I have a
different view, one that is not as popular as I hope it would be.

Specifically, I look at operating systems now, largely, as an implementation
detail for your stack. You have a mission in many cases, unless you are an OS
developer, that consumes the OS services layers to help you perform your
mission. In many cases, specifics of the OS don't matter, as long as they
don't get in your way. Sometimes the specifics of the OS help you.

From my view as an HPC guy, a hardware guy, a storage/compute/ML/GPU guy, I
generally can work in Linux and _BSD without pain. Minor config difference,
but I am comfortable in both.

I am not, and have not been comfortable in AIX, HP/UX, and UnicOS. I used to
enjoy IRIX until I started playing with Linux. I used Solaris and SunOS in the
past, and SmartOS/illumos today.

As long as the OS has the tools I need, the libraries I need, or a way for me
to build them, and doesn't constrain me or force me to contort to vagaries of
the OS itself, I am fine with it.

A problem arises when people get caught up in "my OS > your OS", which, this
overall question at least brings in under the covers. This usually comes
around from various esoteric aspects of little relevance for the vast unwashed
masses of users (like me). On the OS dev side, when this happens, it is
usually defensive because something needed is missing, or some OS dev/manager
(mis)believes that users don't actually need the features they are requesting.

That is actually a major problem, and it tends to drive people from your
platform. Users aren't dumb, and there are many sophisticated people who have
a deeper appreciation for the issues, than "my OS > your OS".

Why *BSD isn't used might be for historical reasons, momentum, etc. It is
perfectly fine as an OS, and quite usable for HPC. Similar for illumos/SmartOS
(not simply saying that as I work for a company using SmartOS). There are
missing things in both of these, and I am working (on the side) to try to help
SmartOS get some of these things (user space stuff). FreeBSD in particular has
most of what is needed.

Basically pick the system that works for you and your users. The OS, as I
noted, can be viewed as a detail of the implementation. Or not.

But its not a reason to create friction/tension between groups claiming OS1 >
OS2 ...

The VI/Emacs wars are so 80s/90s ...

~~~
donavanm
I think the issues tatersolid has with linux aio is implicit dio. Thats really
painful if youre working with hdd or high concurrent read scenarios. See my
sibling comment for why.

That leads to people implementing “async io” threadpools in userland. Those
threads then do “regular” blocking io which is able to use the page cache etc.
having hundreds or thousands of blocking IO threads then causes lots of other
perf/scheduling issues.

------
jabl
I think, largely, the same reasons apply to Linux vs. BSD in supercomputers as
Linux vs. BSD generally. You might as well ask why Linux and not *BSD is used
in Android, on servers generally, or by large technical knowledgeable
organizations such as Google, Amazon, Facebook, etc.

So, in no particular order:

\- Linux came on the scene when BSD's were mired in legal uncertainty. After
the legal issues were settled, Linux had already become the default choice for
someone wanting a FOSS Unix-style kernel, and the BSD's never caught up.

\- The GPL license meant that improvements were shared rather than squirreled
away in various proprietary spin-offs and thus lost when whatever company was
behind them folded (generally, exceptions going both ways surely exist!).

\- Due to Linux gaining the initial momentum, developers flocked (and keep
flocking!) to it, leaving the BSD's ever further behind.

\- Linux was more welcoming to new contributors, whereas the BSD's were
controlled by a small circle of core developers sitting on the commit access.
And of course, the BSD way of solving disagreements was forking the entire
thing, further splitting up the already small developer base.

~~~
eighthnate
> The GPL license meant that improvements were shared rather than squirreled
> away in various proprietary spin-offs and thus lost when whatever company
> was behind them folded (generally, exceptions going both ways surely
> exist!).

I've always debated this. You would think that BSD licenses would be more
attractive to corporations like google, amazon, facebook, etc and GPL licenses
were more attractive to researchers and one would have thought that the BSD
systems ( freebsd, netbsd, openbsd, etc ) would be the dominant unix-style
OSes. Instead the GPL linux based OSes became dominant.

~~~
CyberFonic
The question is about supercomputers specifically, which are mostly used by
researchers and some applications like weather forecasting, aerodynamic
simulations, etc. The infrastructures used by Google, Facebook, Amazon are
massive clusters of computers, but they are not supercomputers.

In the research space peer-review and reproducible results are critical. So
GPL does fit in well. The makers of supercomputers have to accommodate their
clients' requirements.

------
8fingerlouie
Some will say better hardware support. While Linux has better hardware
support, i usually find this to be in the more exotic direction.

I think it's simply down to Linux being where the money is. The big players
(IBM, Dell, etc) are all actively promoting Linux, and trained personel is
also somewhat easy to find. So Linux is "the beast you know".

As for FreeBSD, it might be a technically better platform, but it is living in
Linux' shadow.

Personally i run FreeBSD for the excellent documentation, stability, features
like ZFS, but nothing i run couldn't just as easily run on Linux.

~~~
alvil
> While Linux has better hardware support

Linux has wider, not better hardware support. If OpenBSD supports some
hardware it simply works out of the box and is stable and rock solid.

~~~
scardine
From the famous poster hanging on Facebook's office:

"Done is better than perfect".

------
SEJeff
Because BSD's SMP support has traditionally been pretty terrible compared to
Linux's. They still have a SLAB memory allocator (compared with Linux's
default of SLUB which is much better for heavily SMP systems).

Many of the vendors for HPC (I'm looking at you Mellanox) primarily develop
and certify their products on Linux. While they might work on BSD ok, you're
not going to get the full performance and all of the features on a BSD system.
If you paid for Mellanox EDR 100G Infiniband switches and all of the fancy VPI
network cards, you want to use them to the fullest performance capable. The
vendor tells you to use Linux for that, you use Linux.

TL;DNR: Linux is what the hardware manufacturers overwhelmingly target and
work with. HPC users use what vendors support best.

~~~
kev009
Your final line is 100% correct but all your supporting details are not.

HPC is generally a "softball" workload because the code is going to be more
sympathetic to the hardware than many other computer usages. Processes will
batch allocate a lot of RAM and peg runnable state for a long time.

SMP.. "it depends", again a parallel vector matrix multiply is just going to
sit in the runnable state on all the cores and the kernel is pretty
irrelevant. There is a lot of junior job stuff left in FreeBSD to move locks
around. The VFS is quite bad. In an HPC type workload these things probably
wont matter that much unless you see a lot of "system %". They will show up in
profiles and are generally also easy to fix. But it's not hard to construct a
microbenchmark showing Linux > $else in those areas.

SLUB.. no. What kind of HPC workload is going to care much about this? The
Linux allocators are pretty awful at contig kernel memory allocation (see ZFS
on Linux). I don't see why UMA would architecturally flop here.

NUMA is a sore point on FreeBSD. It should be usable in 12.0. Isilon and
Netflix are paying Jeff Roberson to work on it. Some folks on my team are also
doing minor NUMA and locking work, but for commercial CDN workloads.

Mellanox does a pretty stellar job on FreeBSD Ethernet and Infiniband support.
Unfair dig at them. I generally prefer Chelsio, but Mellanox has lowest
latency which is relevant for HPC.

~~~
SEJeff
Awesome response, thanks for taking the time to write it.

SLUB was written by Christoph Lameter when he was at Silicon Graphics for
their monster Altix machines. It took Linux hours to boot (with SLAB) on that
machine. He wrote SLUB in a fit of brilliance to make Linux suck less on
these, of which HPC workloads can most certainly be ran. Just like some of the
crazy Cray computers, SGI machines used to own HPC. Note that I work with
Christoph in the same office and have discussed this with him in person.
Regarding contiguous memory allocation, a lot of serious HPC workloads use
huge pages set at boot to defeat this, so that part of Linux's fail is a non-
issue (You're entirely right btw). Really awesome to hear about NUMA bits in
FreeBSD being improved, and I sufficiently feel hit with a cluebat on it.

The bit from Mellanox was from their engineers (in their Haifa, Israel office
before lunch) telling me they build their products for Linux first, and then
port to everything else. They care deeply that it works on Linux, and it is
nice if it works on other systems but not as important. It wasn't a dig at
them, it was what the engineer said to me.

------
mkj
Intel compilers aren't available for bsd. The improved optimisation versus gcc
is worthwhile.

~~~
kev009
They actually are, as is VTune and some other commercial stuff from intel and
the open source libraries like ISA-L and IPP and frameworks like DPDK, SPDK
and NV-DIMM stuff.. all work on FreeBSD.

Last I heard from my rep, intel was discontinuing icc altogether because it
didn't make a lot of sense to not put the optimizations in the compilers most
people use.. gcc, llvm, vcpp.

~~~
mkj
Ah released in 2015, cool.

I'd assumed Intel kept their compilers as a competitive advantage even if they
weren't profitable by themselves. Could certainly see it happening though.

------
RantyDave
Because the people who use supercomputers just want to crunch numbers - the
operating system is a distraction at best, and Linux is the path of least
resistance.

------
snvzz
Because BSDs don't scale in that direction.

Dragonfly's design shows promise, but it's not anywhere near ready for
supercomputers yet.

~~~
frankharv
Yes I agree most with your comments. The reason is NUMA. FreeBSD performance
on NUMA is poor. It was only implemented in 2015. It still needs tuning.

Dragonfly seems to be hard at work on the problems.

[http://lists.dragonflybsd.org/pipermail/users/2017-February/...](http://lists.dragonflybsd.org/pipermail/users/2017-February/313242.html)

------
VSpike
I wonder if this is another validation of the "worse is better" philosophy as
described recently in an HN article
[http://minnie.tuhs.org/pipermail/tuhs/2017-May/009935.html](http://minnie.tuhs.org/pipermail/tuhs/2017-May/009935.html)
and also discussed at [https://www.jwz.org/doc/worse-is-
better.html](https://www.jwz.org/doc/worse-is-better.html)

~~~
erikj
JWZ seems to dislike Hacker News for some reason, judging by the redirect.

~~~
VSpike
I forgot about that. Try
[https://web.archive.org/web/20171114181219/https://www.jwz.o...](https://web.archive.org/web/20171114181219/https://www.jwz.org/doc/worse-
is-better.html)

------
loop0
I would guess it is because linux has a wider hardware support than bsds. As
you're building a supercomputer it makes sense to have the faster hardware,
implying they are new technology.

------
vectorEQ
This might shed some light on your question:
[https://en.wikipedia.org/wiki/Comparison_of_operating_system...](https://en.wikipedia.org/wiki/Comparison_of_operating_system_kernels)

------
fdik
SMP + NUMA performance

