Sorry for making it about me, you all should read his post, it's someone who is completely in the groove, has breadth and depth, knows about systems. These people are rare, but hugely valuable when you need to scale stuff up.
Which is refreshing, the constant whining about Linux vs $SOMEONES_FAVORITE_OS gets old. Very pleasant to have a "just the facts" pile of info. And I learned a few tools that I didn't know about.
However, I feel sad seeing the best tech from Sun going down the drain despite being open source.
As an example of that, because it's fair to go "what??!", I installed some open source solaris in the last few years to play with it. The default install was miserable, the tools were crap. I was having lunch with Bryan Cantrill and some of his team and I mentioned my negative reaction. They laughed and said you have to install the GNU tools and put that in your path first.
Say what? SunOS 4.x came with good tools. In /usr/bin. The default install was useful. Solaris was "standard" in that it came with all the System V stuff in /usr/bin. And Shannon and crew "protected" those standard tools and refused to evolve them to make them useful. They started sucky and stayed sucky all in the name of being "standard". Except nobody else used those tools. There was no other System V based Unix that had any significant volume. *BSD certainly didn't move to System V, Linux wasn't System V, the only System V Unix with any volume was Solaris. So they were standard for no good reason and all these years later Solaris still has crappy stuff in /usr/bin, you want /opt/GNU/bin or something.
Sorry for the rant, and it's off topic. Well maybe it's off topic, maybe not. I sort of wonder if Sun had shipped the GNU stuff and installed it by default as the real tools, would it have made any difference? Probably not but boy, do the default tools make a bad first impression.
No, I always find it interesting to hear stories about people who worked for companies like Sun in the past.
Since I retired I've been doing some for hire tractor/excavator work (I live above you in the Santa Cruz mountains, have 3 Kubotas) and a lot of wrenching with my mechanic. Who is a decent mechanic but because he's in so much pain he frequently self medicates which makes him not a member of the sharp people team.
So I'm missing the conversations that engineers have. Normal people are fine and all but not as fun as poking at a hard problem with someone smarter than me.
I kinda think the job that might be good for me is helping out a VC firm that funds sort of system stuff, like cloud stuff, I/O stuff, etc. I'm a dinosaur, I like C, I like kernels, I like thinking about I/O and how to scale it. Systems stuff is where I like to be and there don't seem to be too many places that want that anymore. Or I'm just not aware of them.
Brendan if you ever want to go to lunch and yap about lmbench or tell me what you are working on, hey, lunch is on me. We can go to crappy chinese (I used to have offices in the water tower plaza and that's what we called the chinese place that's just up the street from you. It's the best chinese we could find but it's nowhere near as good as San Francisco chinese food, hence the name) or whereever is good, I'd love to find a new good place for food down there.
Happiness from work comes from the work you are doing and whom you are working with.
If offered the chance to work with someone like Brendan I would jump at the opportunity.
I'd want to have some idea that it would be a good fit, so if you are really interested please contact me via email. Look at my profile, I think I stuck my email there, if not you can find it easily enough.
Yeah, that's mine and I think I was the first to do it that way though I saw the idea described in Hennessy & Patterson and they didn't give me credit so they must have dreamed it up too. It's clever but pretty obvious once you think about it.
Very cool that you are still using it. lmbench has aged pretty well, mhz.c still works. BTW, lat_mem_rd has changed processors all over the place. They all have code to detect sequential strides and prefetch. I think Carl and I switched it to go backwards to try and defeat their prefetch. Dunno if they prefetch that way as well. The prefetch is sort of cool and sort of annoying because it hides how the hardware would perform if the access was random. Without the prefetch you can determine cache line size, L1,L2,L3 size and latency, and TLB size, and main mem latency. With the prefetch you think you are getting all that stuff but it looks different. If you have ideas on how to defeat the prefetch and still get the info I'd love to hear them.
Another funny thing about lmbench: I was recently approached by a company that's got it working on phones. They want to feed back their stuff and have me do another release that makes it easy to run it on phones. I might do that, still waiting their dump of code. Kind of a neat idea.
I still tinker with lmbench, if you have something you want measured and it fits with the idea of measuring bandwidth and latency of everything (and fit's with what Linus and I wanted out of lmbench, if you tune for those metrics you are making the hardware/OS better, so no silly show off your whatever, has to be generically useful), let me know what it is and I'll see if I can code up a benchmark. Making new measurements using the lmbench framework is pretty easy for me, I know that code.
BTW, I have an internal version of lmbench that adds stdio support to lmdd (if you haven't played with lmdd, man you've missed out. It can simulate many I/O benchmarks and give you results really quickly).
So why would I want stdio support? Because BitKeeper has an enhanced stdio library that can stack filters on a FILE*. And it has some interesting filters, like gzip, lz4. And some much more interesting filters that do CRCs on blocks and an XOR block at the end. And another one that is so complicated I would want to describe it in person.
You can combine lz4 with the CRC stuff. Once I had stdio support, I linked with BK's stdio instead and added options to push all those filters and I could see what the overhead was. I did all this when Wayne Scott was putting that crud in BK so I could make sure we were not slowing BK down (we weren't, I think all that crud runs around 1 GB/sec which was fast enough for me). Oh and I also added support to do I/O backwards because we did that in BK as well so we had an append only file format for the ChangeSet file. Lots of funs stuff in there, I need to package it up and release it. If that I/O stuff is of any interest to you, it's open source under the Apache v2 license and if the license bugs you tell me what one you want and I'll rerelease it under that one. We picked that license because we thought it was the one that was easiest for everyone to use, if we're wrong we'll change it.
Nice writeup on the migration stuff, sounds like you are having fun. Getting paid to have fun is sweet, that's what it was like for me at Sun in the SunOS 4.x days. I got there and spent the first 3 years telling people "man, I love this job so much. I'd work here for free if I had the money." Someone took pity on me and said "Do you never want a raise? Because that's how you never get a raise" :) I shut up and started getting raises.
Enjoy that job. Not every job is that fun, you are at a special part of your career, enjoy the heck out of it.
Pretty sure you just mean a random sun machine, there weren't a lot of sunboxen sold.
Unlike FreeBSD jails and Solaris Zones. You can't run multiple docker tennant's safely on the same hardware. Docker is basically the equivalent of a sign which says: "don't walk on the grass" as opposed to an actual wall which FreeBSD jails and Solaris zones have. Now if you have a very homogene environment (say you are deploying hundreds of instances of the exact same app) then this is probably fine. Docker is primarily a deployment tool. If your an organization which runs all kinds of applications (with varying levels of security quality) that's an entirely different story.
Redhat has been especially good here, with not allowing anyone but host-root to connect to Docker and using SELinux and seccomp filtering. With those working, it doesn't matter if your container mounts a host filesystem since it won't have the correct SELinux roles and types anyway.
Many people claim that ruins Docker, since now you can't use Docker from within Docker. But that's the price you pay for security.
I believe that with the correct precautions, a Linux container is just as safe as a jail or zone. Perhaps the problem is just how easy it is for a sysadmin to put holes into the containers that ruin the security.
I think there's a bit more to it than that. For some examples of other reasons people might be wary vs zones:
docker itself is still a daemon that runs as root, combining a large number of different functionalities which require root access into a single binary with a large attack vector and a lot of code which doesn't need to be privileged. While isolation of responsibilities of docker has begun, even their own security page  admits that there's a long way to go here.
Zones are, as many of the articles in this thread point out, a first class feature designed and implemented. What docker/"containers" allow you to do is the culmination of many building blocks which have been incrementally added to the Linux kernel. Some of those have been pretty recently, and without an overall design, their interactions with other portions of the Linux kernel or other components of the system have often been surprising and led to a number of security issues over time. In comparison, both the code and the design of the system are relatively young. A good example of this can be found at , which ends with the following very apt quote:
> Why is it that several security vulnerabilities have sprung from the user namespaces implementation? The fundamental problem seems to be that user namespaces and their interactions with other parts of the kernel are rather complex—probably too complex for the few kernel developers with a close interest to consider all of the possible security implications. In addition, by making new functionality available to unprivileged users, user namespaces expand the attack surface of the kernel. Thus, it seems that as user namespaces come to be more widely deployed, other security bugs such as these are likely to be found.
It might also be interesting to read , which is already showing that 3.5 years later, user namespaces are still a breeding ground for security issues that lead to privilege escalation.
It seems to me a grab bag of things which Linux allows to be independently namespaced/isolated: Cgroups, networking, PIDs, VFS, etc. From a kernel point of view, this would be the perfect use case for an "object-oriented" design with some kind of abstract container concept that reflected the nesting of each container, but instead it seems very scattered and ad-hoc.
In particular, each mechanism is opt-in and must configured separately, very carefully; to approximate Zones you have to combine all of the mechanisms together and hope you didn't forget something, and also hope that the kernel's separation is perfect (which, given the vast amounts of "objects" it can address, is doubtful). To my untrained (in terms of kernel development) eye, this seems the opposite of future proof, because if the kernel invents some new namespacing feature, an application that uses all of the existing mechanisms won't automatically receive it, because there's no concept of a "container" as such.
opt-in seems like the wrong approach. The safer alternative would be that a new container process was completely isolated by default, and that whoever forked the process could explicitly specify the child's access (e.g. allow sharing the file system). This is, I believe, how BSD jails work.
Docker-in-docker is a trashfire that barely works anyway, it's no real loss.
Some discussion about this article here on HN:
https://news.ycombinator.com/item?id=13982620 (160 days ago, 235 comments)
A “container” is just a term people use to describe a combination of Linux namespaces and cgroups. Linux namespaces and cgroups ARE first class objects. NOT containers.
VMs, Jails, and Zones are if you bought the legos already put together AND glued. So it’s basically the Death Star and you don’t have to do any work you get it pre-assembled out of the box. You can’t even take it apart.
Containers come with just the pieces so while the box says to build the Death Star, you are not tied to that. You can build two boats connected by a flipping ocean and no one is going to stop you.
Docker is a bunch of boxes floating on a flipping ocean. They could have made a deathstar, but they chose not to.
1) There are known ways to perform Docker escapes on any or some common Docker setup. You could write a Docker escape binary or script today and it would not be a zero day. That's just the way Docker is.
2) You simply have less faith in the ways Docker performs isolation. One could write a Docker escape exploit and it would be a zero day, but you expect there to be more of such zero days in Docker than in Jails/Zones.
If 1) I'd be really interested in seeing it and if 2) I'd like to know more about what additional levels of isolation jails and zones (and LXC?) perform.
I think this is dramatically overstating the risks. It is possible to run containers securely, it is just much more difficult to secure containers on linux than on BSD or Solaris. It is significantly difficult to break out of a properly configured container (using user namespaces, seccomp, and selinux/apparmor), and I know of no cases where it has been done successfully.
I still separate tenants onto VMs because I don't want to be the first example of a breakout, but I don't think people who isolate with containers are crazy, just a little less risk-averse.
The more difficult it is for sysadmins to harden their Docker setup, the less credence can be given to the claim that Docker is designed for security isolation.
In the real world, security compromises happen far more often because of misconfigured setups, not because of zero-day exploits. This is why it's important for software in general to be shipped with secure defaults.
I can't remember the source, so I'll paraphrase, but a rather wise guy once said: "I don't care how hard it is to break a hardened setup - what I'm concerned about is how easy it is to harden."
Out of the box, Docker is designed to solve deployment problems, not security problems. And that's a crucial distinction.
I agree, but it's a strange way to frame it - the technology is lxc - and lxc (along with some enabling technologies and tools) do have, and has a history of, a focus on being a real boundary.
docker has never (does now?) claimed, or implemented - or been about security boundaries.
Stretching the metaphor, docker is using tar to package dependencies in a snapshot, and using plain chroot to run "containers" - even when jails are available.
In terms of marketing, "Linux containers" might mean "docker" - but in technology (as contrasted with zones, jails) that's not quite right.
So yes, "docker is not about security", lxc&friends maybe not quite equivalent to modern jails - but new ways of running "docker containers", like in actual vms - certainly can blend some convenience/popularity and security.
I'm sure someone will chime in with an updated family tree of Linux chroots, capability frameworks and process isolation features.
[ed: i believe one source of confusion is that docker started with (userspace part of) lxc as its only driver, and now docker-the-binary makes system calls directly, and avoids lxc-the-userspace-toolset - but they employ a mish-mash of kernel features, many-of came from the lxc project (on the kernel side)?]
So, yes you can completely sandbox a "container", just be prepared to put in some work (SELinux, AppArmor, user namespaces, and reviewing the defaults capabilities quite carefully).
Or maybe you can break out of the Google App Engine linux container and let me know how it looks (linux containers, quite well-worn at this point)?
Or perhaps you can check for me whether AWS Lambda actually collocates tenents or not (unknown linux containers)?
Or you can launch a heroku dyno and break into another dyno and steal some keys (lxc linux containers)?
In reality, many services do colocate multiple different users together in linux containers. If you use seccomp and ensure the user is unprivileged in the container, it's fairly safe.
Heroku has been doing it for years upon years now.
The other services I named above likely do.
Linux containers absolutely are intended as security boundaries. Kernel bugs which allow escaping a properly setup mount namespace or peeking out of a pid namespace or going from root in a userns to root on the host are all treated as vulnerabilities and patched.
That clearly expresses the intent.
Yes, I agree that in reality they're likely not yet as mature / secure as jails or zones, but I think it's disingenuous to say that Linux containers aren't meant to be security boundaries.
This is because these vulns can be exploited locally without containers.
With user namespacing in use (available in Docker since 1.12) I'm not currently aware of any trivial container-->host breakouts or container --> container breakouts.
There are information leaks from /proc but they don't generally allow for breakout, and in general Dockers defaults aren't too bad from a security standpoint.
The only exception for the general case is, I'd say, the decision to allow CAP_NET_RAW by default, which is a bit risky.
All code has bugs, some of those are security bugs. There's a big difference between "if you haven't patched or I have a 0-day I can compromise you" and "no matter how well patched you are this isn't a security boundry so it can be bypassed"
My reading of the top comment was it was suggesting the latter with regard to Linux containers, and I'm not sure that's true.
It's not that Docker is a gaping security hole, it's just not something I trust as much as the Linux Kernel or Xen. I probably trust it about as much as I trust a well-updated web browser. It's suitable for everyday use, but I don't click the link on the phishing or spam email just to see what happens.
The point I didn't agree on was the top comment which basically, to me, seemed to be saying "Docker is not a security boundry" because that's not (in my experience) true.
There are a load of companies running Multi-tenant systems using Linux containers, so if they're not a security boundry, a lot of people are going to be having a bad time :)
I wonder if adding a proper wifi stack and commodity hardware supported would have helped. Maybe it's just wishful thinking but I thought it would have been nice for cheap routers and home nas.
The fact that there is so little documentation also probably didn't help it
Though I admit this would not be core docker stuff; more like selinux and other controls.
I doubt you would choose between the two technologies based on security.
* Not everyone needs or is an expert on containers, just as not everyone is knowledgable about the TCP stack, dynamic routing, assembly optimization, or name your topic.
* It's a true and well-stated comment in itself and deserves to be recognized, even if many already know it.
I am eager to read this piece!
Even though I am afraid to see it confirm that btrfs still struggles to catch up…
The 2016 bcachefs benchmarks are a mixed bag.
Facebook using it doesn't mean anything since they are probably using it for distributed applications. Meaning the entire box (including BTRFS) can just die and the cluster won't be impacted. I really can't imagine they are using BTRFS on every node in their cluster.
Just because Facebook is fault tolerant doesn't mean we don't care about failures. We actively run down any issues we hit, so while it doesn't have much of impact in the short term, we don't just ignore issues.
And also Red Hat hasn't contributed much to btrfs in years, and Oracle has one developer working on it. We are constantly working on it, and now that my priorities have shifted back to btrfs I hope that we will start to close out some of these long term projects.
Just because some code is added to the main kernel, assuming it is, doesn't mean it will be propagated to the main LTS distributions anytime soon.
I don't think that follows for lots of reasons:
- If enough of your boxes die that you lose quorum (whether from filesystem instability or from unrelated causes like hardware glitches), your cluster is impacted. So, at the least, if you expect your boxes to die at an abnormally high rate, you have to have an abnormally high number of them to maintain service.
- Filesystem instability is (I think) much less random than hardware glitches. If a workload causes your filesystem to crash on one machine, recovering and retrying it on the next machine will probably also make it crash. So you may not even be able to save your service by throwing more nodes at the problem. A bad filesystem will probably actually break your service.
- Crashes cause a performance impact, because you have to replay the request and you have fewer machines in the cluster until your crashed node reboots. It would take an extraordinarily fast filesystem to be a net performance win if it's even somewhat crashy.
- Most importantly, distributed systems generally only help you if you get clean crashes, as in power failure, network disconnects, etc. If you have silent data corruption, or some amount of data corruption leading up to a crash later, or a filesystem that can't fsck properly, your average distributed system is going to deal very poorly. See Ganesan et al., "Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions", https://www.usenix.org/system/files/conference/fast17/fast17...
So it's very doubtful that Facebook has decided that it's okay that btrfs is crashy because they're running it in distributed systems only.
"Mason: The easiest way to describe the infrastructure at Facebook is that it's pretty much all Linux. The places we're targeting for Btrfs are really management tasks around distributing the operating system, distributing updates quickly using the snapshotting features of Btrfs, using the checksumming features of Btrfs and so on.
We also have a number of machines running Gluster, using both XFS and Btrfs. The target there is primary data storage. One of the reasons why they like Btrfs for the Gluster use case is because the data CRCs (cyclic redundancy checks) and the metadata CRCs give us the ability to detect problems in the hardware such as silent data corruption in the hardware. We have actually found a few major hardware bugs with Btrfs so it’s been very beneficial to Btrfs."
The sentence: "We also have a number of machines running Gluster, using both XFS and Btrfs." seems to imply Facebook is not using it heavily for actual data storage. What I distill from this (which is obviously my personal interpretation) is that Facebook mostly uses it for the OS and not for actual precious data.
See also https://code.facebook.com/posts/938078729581886/improving-th...
"We have been working toward deploying Btrfs slowly throughout the fleet, and we have been using large gluster storage clusters to help stabilize Btrfs. The gluster workloads are extremely demanding, and this half we gained a lot more confidence running Btrfs in production. More than 50 changes went into the stabilization effort, and Btrfs was able to protect production data from hardware bugs other filesystems would have missed."
But I think the inference to make is that Facebook trusts btrfs to increase reliability, not that Facebook trusts their distributed systems to cover for btrfs decreasing reliability to gain performance (or features).
So long as there exists code in BTRFS marked "Unstable" (RAID56), I refuse to treat BTRFS as production ready. If it's not ready, fix it or remove it. I consistently run into issues even when using BTRFS in the "mostly OK" RAID1 mode.
I don't buy the implication that "it will always be harder to get the same level of attention" will lead to BTRFS being better maintained either. ZFS has most of the same features plus a few extra and unlike BTRFS, they're actually stable and don't break.
I'm no ZFS fanboy (my hopes are pinned solidly on bcachefs) but BTRFS just doesn't seem ready for any real use from my experience with it so far and it confuses me. Are BTRFS proponents living in a different reality to me where it doesn't constantly break?
EDIT: I realize on writing this that it I might sound more critical of the actual article than I really am. I think his points are mostly fair but I feel this particular line paints BTRFS to have a brighter, more production-ready future than I believe is likely given my experiences with it. BTRFS proponents also rarely point out the issues I have with it so I worry they're not aware of them.
I should note that we do have a higher risk tolerance than many other companies, due to the way the cloud is architected to be fault tolerant. Chaos monkey can just kill instances anytime, and it's designed to handle that.
Anyway, getting into specific differences is something that we should blog about at some point (the Titus team).
Bcachefs is developed by a single person (however brilliant he may be) on a part time basis (I see only 4 commits since May this year).
Unless I'm missing something (please correct me) bcachefs will not be replacing BTRFS, or any other filesystem, anytime soon.
I'm not sure how true this is of filesystems, but a lot of the best software I know is (or at least, was) developed by a single person. For example, Redis, Sqlite, Lmdb, most malloc implementations, varnishcache, h2o, (nginx?), chipmunk-2d, most of dragonflybsd (including HAMMER).
Many projects get more contributors after the code is working well enough to see popular production use. But programmers working alone can accomplish some pretty impressive stuff.
The other products you mentioned BEGAN with a single person. They had more resources later.
The ZFS feature set is not a strict superset of what btrfs offers. The ability to online-restripe between almost any layout combination is quite useful for example. So is on-demand deduplication, which is also far less resource-intensive than ZFS dedup.
This is true but since the only stable replication options on BTRFS are RAID1 and single, this online restripe is of very limited usefulness.
I'd _really_ like BTRFS to fix its issues so I could use this reshaping (it's the main feature I'm missing from current filesystems) but it's been years and replication is still unstable.
In fact, this works a general purpose intro into several important OS concepts from an ops and kernel hacker perspective.
My only surprise is that this is written as a specific response to Oracle Solaris' demise. From that specific perspective, how many target viewers are there? 10? Illumos isn't losing contributors, and there are still several active Illumos distros. Nevertheless, interesting.
This post was written for the illumos community as well.
Funny how the most used / popular technology and a mismanagement from a single company can crush other competing tech.
It is frightening how much of what was invested in Solaris is now lost because of it.
It's been a while since I've looked in to it, but if memory serves they are using docker filesystem snapshots without modification and running them on a thin translation layer of Linux system calls to Solaris system calls. Hard to find anything backing this up, so I could be way off the mark as to how it's implemented.
EDIT: forgot that's what 'lx' zones are: zones which allow the execution of Linux binaries
I can't imagine trying to sell anything with the phrase "why you should like it". SMF certainly doesn't need that kind of condescending pitch--it just fucking works and doesn't get in your way.
In fact it's the default in most Linux distributions. The only mandatory pieces if you use systemd as pid1 are udevd and journald.
Misinformation like this is exactly why Brendan said you should ask an actual user of systemd.
However, the mere fact that (s)NTP and DNS are re-implemented in the same codebase is still unsettling to me.
Nor is it misinformation to mention their existence, or the bugs/limitations caused by the duplication of effort.
(Looks like the sNTP client hasn't caused security flaws, though. I was wrong there.)
Is it not the other way around, that KVM runs on bare metal (and needs processor support) while Xen runs as processes (and needs special kernel binaries)?
It's true that KVM needs processor support: it kind of adds a special process type that the kernel runs in an virtualized environment, through the hardware virtualization features. The linux kernel of the host schedules the execution of the VMs.
Xen has a small hypervisor running on bare metal. It can run both unmodified guests using hardware support or modified guests where hardware access is replaced with direct calls into the hypervisor (paravirtualization). The small hypervisor schedules the execution of the VMs. For access to devices it cooperates with a special virtual machine (dom0), which has full access to the hardware, runs the drivers and multiplexes access for the other VMs - the hypervisor is really primarily scheduling and passing data between domains, very micro-kernel like. Dom0 needs kernel features to fulfill that role.
Though, really, I'm just quoting Anthony Liguori from 6 years ago, so credit where it's due:
Also, it certainly seems like Oracle Solaris 11.3 (released in fall 2015) will be the last publicly available version. Between-release updates (SRUs) have always been for paying customers only, but now it seems like there will never be another release.
An example: the "old" Sun gave me a loaded T1000 system to run SUNHELP.ORG on.
The "New" Sun wouldn't even give me Solaris patches / security updates (which used to be free) without a support contract.
I had to eventually move the site to being hosted on a Debian box because I couldn't afford the hundreds of dollars they wanted every year for patch access.
It really chapped my hide. I'd even been part of the external OpenSolaris release team.
"New" Sun after the Oracle acquisition: "Unless you're a business paying us money for support, we don't care, and you get NOTHING."
So, people stopped using and playing with Solaris/SPARC at home, and eventually stopped using it at work too.
1. ZFS is officially supported by Canonical on Ubuntu as part of their support plans.
2. Docker over raw containers or zones.
As the order of magnitude of systems administered increases, rare changes to occasional changes to frequent. Especially when it is not running in a VM.
Also, from time to time you just get a really bad version of a distro kernel, or some off piece of hardware that is ubiquitous in your setup, and these crashes become more frequent and serious.
(Recent example of a distro kernel bug - https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1674838 . I foolishly upgraded to Ubuntu 17.04 on its release in stead of letting it get banged around for a few weeks. For the next five weeks it crashed my desktop about once a day, until a fix was rolled out in Ubuntu proposed)
Most companies I've worked at want to have some official support channels, so usually we'd be running RHEL, and if I was seeing the same crash more than once I'd probably send the crash to Red Hat, and if the crash pointed to the system, then the server maker (HP, Dell...) or hardware driver maker (QLogic, Avago/Broadcom).
Solaris crash dumps worked really well though - they worked smoothly for years before kdump was merged into the Linux kernel. It is one of those cases where you benefited from the hardware and software both being made by the same company.
Kernel developers still have to use crash dumps to root-cause an individual crash, but crash dumps are most useful for extremely hard-to-reproduce crashes that are rare (but if you are using the "Pet" model as opposed to the "Cattle" model, even a single failure of a critical DB instances can't be tolerated). For crashes that are easy to trigger, crash dumps are useful, but they are much less critical to figure out what's going on. If your distributed architecture can tolerate rare crashes, then you might not even consider worth the support contract cost to root cause and fix every last kernel crash.
Yes, it's ugly. But if you are administrating a very large number of systems, this can be a very useful way of looking at the world.
I think the perf engineer for Netflix is quite aware of this.
> OmniTI will be suspending active development of OmniOS