
Solaris to Linux Migration 2017 - hs86
http://www.brendangregg.com/blog/2017-09-05/solaris-to-linux-2017.html
======
luckydude
I'm ex-sun, kernel group. I'm retired, early at 55, but after reading
Brendan's write up, I'd work for that guy. Holy smokes, he is all over it.
Reminds me of me when I was in the groove. Brendan if you read this, I'm old
school, not good at all the ruby on rails etc, but systems, yeah, pretty good.
Not looking for money, looking for working with smart people, I can be paid in
stock. If it works out, great, if it doesn't still great because I like smart
people.

Sorry for making it about me, you all should read his post, it's someone who
is completely in the groove, has breadth and depth, knows about systems. These
people are rare, but hugely valuable when you need to scale stuff up.

~~~
jamesmishra
If you are indeed interested in coming out of retirement, Brendan Gregg does
performance engineering at Netflix. It might be worth reaching out.

[http://www.brendangregg.com/blog/2017-05-16/working-at-
netfl...](http://www.brendangregg.com/blog/2017-05-16/working-at-
netflix-2017.html)

~~~
luckydude
I'm kinda old and burned out but thanks for the link. I'll reach out. It could
be a boatload of fun.

~~~
rurban
It's even much more fun than described in this article, because Netflix mostly
works with FreeBSD not Linux. So no NIH syndrom, proper dtrace (no eBFS
hacks), proper ZFS, proper tooling, easy kernel maintenance.

~~~
gribbly
As I've seen it described here before (by Brendan Gregg ?) Netflix uses
FreeBSD for the CDN servers (streaming the video), and Linux for everything
else, browsing Netflix, encoding, etc.

~~~
filomeno
That's true, although he was right about the Linux counterparts. Reading this
migration guide, at least to me, it seems that all Linux replacements for
things like dtrace, ZFS, SMF, zones... are just subpar.

~~~
luckydude
So I got a different vibe. I can't speak to how good all that stuff is in
Linux vs Solaris (or BSD) so it's just a vibe. The feeling I got was it was
someone who was extracting the best value out of the system he was using.

Which is refreshing, the constant whining about Linux vs $SOMEONES_FAVORITE_OS
gets old. Very pleasant to have a "just the facts" pile of info. And I learned
a few tools that I didn't know about.

~~~
filomeno
Of course it is valuable, and being myself a Linux user, I'm glad somebody did
the effort to get all those tools available for us.

However, I feel sad seeing the best tech from Sun going down the drain despite
being open source.

~~~
luckydude
I've been here before. I fought like crazy to prevent SunOS 4.x from being
tossed on the trash pile. If you think Solaris going away sucks, it sucked
harder (for me) to see SunOS go away. It was a far more pleasant environment
than Solaris ever was.

As an example of that, because it's fair to go "what??!", I installed some
open source solaris in the last few years to play with it. The default install
was miserable, the tools were crap. I was having lunch with Bryan Cantrill and
some of his team and I mentioned my negative reaction. They laughed and said
you have to install the GNU tools and put that in your path first.

Say what? SunOS 4.x came with good tools. In /usr/bin. The default install was
useful. Solaris was "standard" in that it came with all the System V stuff in
/usr/bin. And Shannon and crew "protected" those standard tools and refused to
evolve them to make them useful. They started sucky and stayed sucky all in
the name of being "standard". Except nobody else used those tools. There was
no other System V based Unix that had any significant volume. *BSD certainly
didn't move to System V, Linux wasn't System V, the only System V Unix with
any volume was Solaris. So they were standard for no good reason and all these
years later Solaris still has crappy stuff in /usr/bin, you want /opt/GNU/bin
or something.

Sorry for the rant, and it's off topic. Well maybe it's off topic, maybe not.
I sort of wonder if Sun had shipped the GNU stuff and installed it by default
as the real tools, would it have made any difference? Probably not but boy, do
the default tools make a bad first impression.

~~~
filomeno
> Sorry for the rant, and it's off topic

No, I always find it interesting to hear stories about people who worked for
companies like Sun in the past.

Regards

------
jsiepkes
Nice article! Though I do think the article could have more clearly noted that
Linux containers are not meant as security boundaries. It doesn't explicitly
say it but it is a very important distinction.

Unlike FreeBSD jails and Solaris Zones. You can't run multiple docker
tennant's safely on the same hardware. Docker is basically the equivalent of a
sign which says: "don't walk on the grass" as opposed to an actual wall which
FreeBSD jails and Solaris zones have. Now if you have a very homogene
environment (say you are deploying hundreds of instances of the exact same
app) then this is probably fine. Docker is primarily a deployment tool. If
your an organization which runs all kinds of applications (with varying levels
of security quality) that's an entirely different story.

~~~
zlynx
There are dangerous things that you can allow Docker to do. But if you don't
do those things, it is pretty difficult to break out of a container.

Redhat has been especially good here, with not allowing anyone but host-root
to connect to Docker and using SELinux and seccomp filtering. With those
working, it doesn't matter if your container mounts a host filesystem since it
won't have the correct SELinux roles and types anyway.

Many people claim that ruins Docker, since now you can't use Docker from
within Docker. But that's the price you pay for security.

I believe that with the correct precautions, a Linux container is just as safe
as a jail or zone. Perhaps the problem is just how easy it is for a sysadmin
to put holes into the containers that ruin the security.

~~~
mirashii
> Perhaps the problem is just how easy it is for a sysadmin to put holes into
> the containers that ruin the security.

I think there's a bit more to it than that. For some examples of other reasons
people might be wary vs zones:

docker itself is still a daemon that runs as root, combining a large number of
different functionalities which require root access into a single binary with
a large attack vector and a lot of code which doesn't need to be privileged.
While isolation of responsibilities of docker has begun, even their own
security page [1] admits that there's a long way to go here.

Zones are, as many of the articles in this thread point out, a first class
feature designed and implemented. What docker/"containers" allow you to do is
the culmination of many building blocks which have been incrementally added to
the Linux kernel. Some of those have been pretty recently, and without an
overall design, their interactions with other portions of the Linux kernel or
other components of the system have often been surprising and led to a number
of security issues over time. In comparison, both the code and the design of
the system are relatively young. A good example of this can be found at [2],
which ends with the following very apt quote:

> Why is it that several security vulnerabilities have sprung from the user
> namespaces implementation? The fundamental problem seems to be that user
> namespaces and their interactions with other parts of the kernel are rather
> complex—probably too complex for the few kernel developers with a close
> interest to consider all of the possible security implications. In addition,
> by making new functionality available to unprivileged users, user namespaces
> expand the attack surface of the kernel. Thus, it seems that as user
> namespaces come to be more widely deployed, other security bugs such as
> these are likely to be found.

It might also be interesting to read [3], which is already showing that 3.5
years later, user namespaces are still a breeding ground for security issues
that lead to privilege escalation.

[1] [https://docs.docker.com/engine/security/security/#related-
in...](https://docs.docker.com/engine/security/security/#related-information)
[2] [https://lwn.net/Articles/543273/](https://lwn.net/Articles/543273/) [3]
[https://utcc.utoronto.ca/~cks/space/blog/linux/UserNamespace...](https://utcc.utoronto.ca/~cks/space/blog/linux/UserNamespacesWhySecurityProblems)

~~~
lobster_johnson
I can appreciate that the features needed to be developed and matured over
time, but I don't understand why Linux didn't invent an umbrella concept to
tie everything together.

It seems to me a grab bag of things which Linux allows to be independently
namespaced/isolated: Cgroups, networking, PIDs, VFS, etc. From a kernel point
of view, this would be the perfect use case for an "object-oriented" design
with some kind of abstract container concept that reflected the nesting of
each container, but instead it seems very scattered and ad-hoc.

In particular, each mechanism is opt-in and must configured separately, very
carefully; to approximate Zones you have to combine all of the mechanisms
together and hope you didn't forget something, and also hope that the kernel's
separation is perfect (which, given the vast amounts of "objects" it can
address, is doubtful). To my untrained (in terms of kernel development) eye,
this seems the opposite of future proof, because if the kernel invents some
new namespacing feature, an application that uses all of the existing
mechanisms won't automatically receive it, because there's no concept of a
"container" as such.

opt-in seems like the wrong approach. The safer alternative would be that a
new container process was completely isolated by default, and that whoever
forked the process could explicitly specify the child's access (e.g. allow
sharing the file system). This is, I believe, how BSD jails work.

~~~
CrystalGamma
That "object-oriented" design is called capabilities.

------
espadrine
> _It 's a bit early for me to say which is better nowadays on Linux, ZFS or
> btrfs, but my company is certainly learning the answer by running the same
> production workload on both. I suspect we'll share findings in a later blog
> post._

I am eager to read this piece!

Even though I am afraid to see it confirm that btrfs still struggles to catch
up…

The 2016 bcachefs benchmarks[0] are a mixed bag.

[0]: [https://evilpiepirate.org/~kent/benchmark-full-
results-2016-...](https://evilpiepirate.org/~kent/benchmark-full-
results-2016-04-19/terse)

~~~
jsiepkes
Is BTRFS really an option now that RedHat has decided to pull the plug on
their BTRFS development? That basically leaves Oracle and Suse I think? As far
as I can tell the future of BTRFS doesn't look good.

Facebook using it doesn't mean anything since they are probably using it for
distributed applications. Meaning the entire box (including BTRFS) can just
die and the cluster won't be impacted. I really can't imagine they are using
BTRFS on every node in their cluster.

~~~
geofft
> _Facebook using it doesn 't mean anything since they are probably using it
> for distributed applications. Meaning the entire box (including BTRFS) can
> just die and the cluster won't be impacted._

I don't think that follows for lots of reasons:

\- If enough of your boxes die that you lose quorum (whether from filesystem
instability or from unrelated causes like hardware glitches), your cluster is
impacted. So, at the least, if you expect your boxes to die at an abnormally
high rate, you have to have an abnormally high number of them to maintain
service.

\- Filesystem instability is (I think) much less random than hardware
glitches. If a workload causes your filesystem to crash on one machine,
recovering and retrying it on the next machine will probably also make it
crash. So you may not even be able to save your service by throwing more nodes
at the problem. A bad filesystem will probably actually break your service.

\- Crashes cause a performance impact, because you have to replay the request
and you have fewer machines in the cluster until your crashed node reboots. It
would take an extraordinarily fast filesystem to be a net performance win if
it's even somewhat crashy.

\- Most importantly, distributed systems generally only help you if you get
_clean_ crashes, as in power failure, network disconnects, etc. If you have
silent data corruption, or some amount of data corruption leading up to a
crash later, or a filesystem that can't fsck properly, your average
distributed system is going to deal very poorly. See Ganesan et al.,
"Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage
Reactions to Single Errors and Corruptions",
[https://www.usenix.org/system/files/conference/fast17/fast17...](https://www.usenix.org/system/files/conference/fast17/fast17-ganesan.pdf)

So it's very doubtful that Facebook has decided that it's okay that btrfs is
crashy because they're running it in distributed systems only.

~~~
jsiepkes
This article [https://www.linux.com/news/learn/intro-to-linux/how-
facebook...](https://www.linux.com/news/learn/intro-to-linux/how-facebook-
uses-linux-and-btrfs-interview-chris-mason) explains somewhat what Facebook
does with BTRFS.

"Mason: The easiest way to describe the infrastructure at Facebook is that
it's pretty much all Linux. The places we're targeting for Btrfs are really
management tasks around distributing the operating system, distributing
updates quickly using the snapshotting features of Btrfs, using the
checksumming features of Btrfs and so on.

We also have a number of machines running Gluster, using both XFS and Btrfs.
The target there is primary data storage. One of the reasons why they like
Btrfs for the Gluster use case is because the data CRCs (cyclic redundancy
checks) and the metadata CRCs give us the ability to detect problems in the
hardware such as silent data corruption in the hardware. We have actually
found a few major hardware bugs with Btrfs so it’s been very beneficial to
Btrfs."

The sentence: "We also have a number of machines running Gluster, using both
XFS and Btrfs." seems to imply Facebook is not using it heavily for actual
data storage. What I distill from this (which is obviously my personal
interpretation) is that Facebook mostly uses it for the OS and not for actual
precious data.

~~~
geofft
I'm reading that as quite the opposite: they're saying that Gluster, a
networked file storage system, is being backed with btrfs as the local
filesystem, so all data stored in Gluster is ultimately stored on btrfs
volumes. (They're also using it for OS snapshotting, yes, but insofar as the
data stored in Gluster is important, they're storing important data on btrfs.)

See also [https://code.facebook.com/posts/938078729581886/improving-
th...](https://code.facebook.com/posts/938078729581886/improving-the-linux-
kernel-with-upstream-contributions/)

 _" We have been working toward deploying Btrfs slowly throughout the fleet,
and we have been using large gluster storage clusters to help stabilize Btrfs.
The gluster workloads are extremely demanding, and this half we gained a lot
more confidence running Btrfs in production. More than 50 changes went into
the stabilization effort, and Btrfs was able to protect production data from
hardware bugs other filesystems would have missed."_

------
Veratyr
> Linux has also been developing its own ZFS-like filesystem, btrfs. Since
> it's been developed in the open (unlike early ZFS), people tried earlier
> ("IS EXPERIMENTAL") versions that had serious issues, which gave it
> something of a bad reputation. It's much better nowadays, and has been
> integrated in the Linux kernel tree (fs/btrfs), where it is maintained and
> improved along with the kernel code. Since ZFS is an add-on developed out-
> of-tree, it will always be harder to get the same level of attention.

[https://btrfs.wiki.kernel.org/index.php/Status](https://btrfs.wiki.kernel.org/index.php/Status)

So long as there exists code in BTRFS marked "Unstable" (RAID56), I refuse to
treat BTRFS as production ready. If it's not ready, fix it or remove it. I
consistently run into issues even when using BTRFS in the "mostly OK" RAID1
mode.

I don't buy the implication that "it will always be harder to get the same
level of attention" will lead to BTRFS being better maintained either. ZFS has
most of the same features plus a few extra and unlike BTRFS, they're actually
stable and don't break.

I'm no ZFS fanboy (my hopes are pinned solidly on bcachefs) but BTRFS just
doesn't seem ready for any real use from my experience with it so far and it
confuses me. Are BTRFS proponents living in a different reality to me where it
doesn't constantly break?

EDIT: I realize on writing this that it I might sound more critical of the
actual article than I really am. I think his points are mostly fair but I feel
this particular line paints BTRFS to have a brighter, more production-ready
future than I believe is likely given my experiences with it. BTRFS proponents
also rarely point out the issues I have with it so I worry they're not aware
of them.

~~~
brendangregg
We're using both btrfs and zfsonlinux right now, in production, and
fortunately we're not consistently running into issues (I'd be hearing about
it if we were!).

I should note that we do have a higher risk tolerance than many other
companies, due to the way the cloud is architected to be fault tolerant. Chaos
monkey can just kill instances anytime, and it's designed to handle that.

Anyway, getting into specific differences is something that we should blog
about at some point (the Titus team).

~~~
Veratyr
Do you use replication at all? I have a feeling the reason why nobody else
sees my problems is that most folks will be using a system like Ceph for
distributed replication rather than BTRFS for local replication.

------
unethical_ban
This is a great set of information comparing features and tools between the
two ecosystems. I like it, and wish more were available for Linux -> BSD and
even lower level command tool comparisons like apt-get vs. yum/dnf.

In fact, this works a general purpose intro into several important OS concepts
from an ops and kernel hacker perspective.

My only surprise is that this is written as a specific response to Oracle
Solaris' demise. From that specific perspective, how many target viewers are
there? 10? Illumos isn't losing contributors, and there are still several
active Illumos distros. Nevertheless, interesting.

~~~
brendangregg
Yes, I hope to write one for Solaris -> BSD.

This post was written for the illumos community as well.

~~~
dijit
Please do. I often advocate for BSD when it fits the need but I get pushback
due to its lack of popularity. If all Solaris users migrated solely to Linux
my position becomes weaker. :( and BSD is very good at quite a number of
things.

------
holydude
It is actually sad to see that you had to write this.

Funny how the most used / popular technology and a mismanagement from a single
company can crush other competing tech.

It is frightening how much of what was invested in Solaris is now lost because
of it.

------
cmurf
SmartOS might be easier for the Solaris familiar, looking to deploy Linux
containers rather than go fully Linux.

~~~
yellowapple
Does SmartOS actually support Linux containers (aside from the obvious
approach of running containers in a Linux VM)? Last I checked, SmartOS just
used the word "container" to refer to Solaris Zones.

~~~
notnarb
SmartOS has code specifically in place for using 'lx' branded zones as
wrappers for docker containers (which is distinct from its support of kvm)

[https://github.com/joyent/smartos-
live/blob/master/src/docke...](https://github.com/joyent/smartos-
live/blob/master/src/dockerinit/README.md)

[https://www.cruwe.de/2016/01/27/using-docker-on-smartos-
hype...](https://www.cruwe.de/2016/01/27/using-docker-on-smartos-
hypervisors.html)

It's been a while since I've looked in to it, but if memory serves they are
using docker filesystem snapshots without modification and running them on a
thin translation layer of Linux system calls to Solaris system calls. Hard to
find anything backing this up, so I could be way off the mark as to how it's
implemented.

EDIT: forgot that's what 'lx' zones are: zones which allow the execution of
Linux binaries

------
dwheeler
If you can't access it directly, here's a cached version:
[https://web.archive.org/web/20170905181357/http://www.brenda...](https://web.archive.org/web/20170905181357/http://www.brendangregg.com/blog/2017-09-05/solaris-
to-linux-2017.html)

------
SrslyJosh
> If you absolutely can't stand systemd or SMF, there is BSD, which doesn't
> use them. You should probably talk to someone who knows systemd very well
> first, because they can explain in detail why you should like it.

I can't imagine trying to sell _anything_ with the phrase "why you should like
it". SMF certainly doesn't need that kind of condescending pitch--it just
fucking works and doesn't get in your way.

~~~
scott_karana
It helps that SMF doesn't include reimplementations of DNS resolvers and NTP
clients, both of which have caused security flaws in systemd. :/

~~~
bonzini
You certainly can only use systemd's pid1 together with ntpd (systemd only
does SNTP, not full-blown NTP) and no caching resolver.

In fact it's the default in most Linux distributions. The only mandatory
pieces if you use systemd as pid1 are udevd and journald.

Misinformation like this is exactly why Brendan said you should ask an actual
user of systemd.

~~~
scott_karana
I actually _like_ systemd as an init system, and use it that way on all my
current machines. (I'm a fan of journald so far too.)

However, the mere fact that (s)NTP and DNS are re-implemented in the same
codebase is still unsettling to me.

Nor is it misinformation to mention their existence, or the bugs/limitations
caused by the duplication of effort.

[http://people.canonical.com/~ubuntu-
security/cve/2017/CVE-20...](http://people.canonical.com/~ubuntu-
security/cve/2017/CVE-2017-9445.html)

[https://github.com/coreos/bugs/issues/391](https://github.com/coreos/bugs/issues/391)

(Looks like the sNTP client hasn't caused security flaws, though. I was wrong
there.)

~~~
bonzini
They are independent, you can use one without the other. They are just
residing in the same repository and build system.

------
cbhl
> _" Xen is a type 1 hypervisor that runs on bare metal, and KVM is type 2
> that runs as processes in a host OS."_

Is it not the other way around, that KVM runs on bare metal (and needs
processor support) while Xen runs as processes (and needs special kernel
binaries)?

~~~
detaro
No, it's not.

It's true that KVM needs processor support: it kind of adds a special process
type that the kernel runs in an virtualized environment, through the hardware
virtualization features. The linux kernel of the host schedules the execution
of the VMs.

Xen has a small hypervisor running on bare metal. It can run both unmodified
guests using hardware support or modified guests where hardware access is
replaced with direct calls into the hypervisor (paravirtualization). The small
hypervisor schedules the execution of the VMs. For access to devices it
cooperates with a special virtual machine ( _dom0_ ), which has full access to
the hardware, runs the drivers and multiplexes access for the other VMs - the
hypervisor is really primarily scheduling and passing data between domains,
very micro-kernel like. Dom0 needs kernel features to fulfill that role.

~~~
cbhl
Today I learned something. Thanks!

------
severino
Solaris support ends in November, 2034. Yeah, 17 years from now. No need to
hurry ;-)

~~~
pjmlp
Given that they fired everyone, who do you think is going to give that support
and fix bugs?

~~~
mrpippy
They didn't quite fire everyone: Alan Coopersmith
([https://twitter.com/alanc/status/904366563976896512](https://twitter.com/alanc/status/904366563976896512))
is still present. I'm sure lots of support and bug fixes will be neglected and
probably moved offshore though.

Also, it certainly seems like Oracle Solaris 11.3 (released in fall 2015) will
be the last publicly available version. Between-release updates (SRUs) have
always been for paying customers only, but now it seems like there will never
be another release.

------
sandGorgon
I think two things need to be mentioned :

1\. ZFS is officially supported by Canonical on Ubuntu as part of their
support plans. 2\. Docker over raw containers or zones.

------
Ologn
> Crash Dump Analysis...In an environment like ours (patched LTS kernels
> running in VMs), panics are rare.

As the order of magnitude of systems administered increases, rare changes to
occasional changes to frequent. Especially when it is not running in a VM.

Also, from time to time you just get a really bad version of a distro kernel,
or some off piece of hardware that is ubiquitous in your setup, and these
crashes become more frequent and serious.

(Recent example of a distro kernel bug -
[https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1674838](https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1674838)
. I foolishly upgraded to Ubuntu 17.04 on its release in stead of letting it
get banged around for a few weeks. For the next five weeks it crashed my
desktop about once a day, until a fix was rolled out in Ubuntu proposed)

Most companies I've worked at want to have some official support channels, so
usually we'd be running RHEL, and if I was seeing the same crash more than
once I'd probably send the crash to Red Hat, and if the crash pointed to the
system, then the server maker (HP, Dell...) or hardware driver maker (QLogic,
Avago/Broadcom).

Solaris crash dumps worked really well though - they worked smoothly for years
before kdump was merged into the Linux kernel. It is one of those cases where
you benefited from the hardware and software both being made by the same
company.

~~~
rodgerd
> As the order of magnitude of systems administered increases, rare changes to
> occasional changes to frequent. Especially when it is not running in a VM.

I think the perf engineer for Netflix is quite aware of this.

~~~
jsiepkes
While I really respect Brendan's opinion (I've got most of his books and he is
one of my IT heroes) I do think he is very netflix-IT-scale minded. When your
Netflix you can maintain your own kernel with ZFS, DTrace, etc. and have a
good QA setup for your own kernel / userland. Basically maintain your own
distro. However when your in a more "enterprisy" environment you don't have
the luxury of making Ubuntu with ZoL stable yourself. I know from first hand
experience that ZoL is definitely not as stable as FreeBSD ZFS or Solaris ZFS.

~~~
Annatar
And there it is - the elephant in the room noone mentioned. People in 99% of
the IT shops get an existential crisis if you mention during the interview
that you want to do kernel engineering. Thank you!

------
agentile
OmniOS anyone? [https://omnios.omniti.com/](https://omnios.omniti.com/)

~~~
ssvss
Article from 4 months back,

> OmniTI will be suspending active development of OmniOS

[https://news.ycombinator.com/item?id=14179006](https://news.ycombinator.com/item?id=14179006)

~~~
ptrott2017
Since OmniTI pulled back from the project - there has now been a reboot by the
community. Regular releases started in early July with regular updates
following with support for USB, ISO and PXE versions. The most recent release
was August-28-2017. See [http://www.omniosce.org/](http://www.omniosce.org/)
for more info.

