
DTrace for Linux 2016 - okket
http://www.brendangregg.com/blog/2016-10-27/dtrace-for-linux-2016.html
======
jdesfossez
It would be worthwhile to clarify the term "tracing" to distinguish between
live aggregation and post-processing approaches. The general confusion around
the "tracing" terminology seems to imply a competition between these two,
while they should rather be seen as complementary.

DTrace, SystemTap and eBPF/BCC are designed to aggregate data in the critical
path and compute a summary of the activity. Ftrace and LTTng are designed to
extract traces of execution for high resolution post-processing with as small
overhead as possible.

Aggregation is very powerful and gives a quick overview of the current
activity of the system. Tracing extracts the detailed activity at various
levels and allows in-depth understanding of a particular behaviour after the
fact by allowing to run as many analyses as necessary on the captured trace.

In terms of impact on the traced system, trace buffering scales better with
the number of cores than aggregation approaches due its ability to partition
the trace data into per-core buffers.

Both approaches have upsides and downsides and should not be seen as being in
competition, they address different use-cases and can even complement each
other.

~~~
brendangregg
You're right that a key feature and differentiator of DTrace/stap/BPF is
kernel aggregations, but they can do per-event output as well. But I think I
know what you mean, especially as I was at the sysdig summit yesterday and
could see a major difference.

I think the two models for tracers, playing on their strengths, are: 1. real-
time analysis tracers (DTrace/stap/BPF), and 2. offline analysis tracers
(LTTng, sysdig). Both can do the other as well, but I'm just pointing out
strengths.

sysdig (and I believe LTTng) has done great work at creating capture files
that can then be analyzed offline in many many different ways, and they've
optimized the way full-event dumps can be captured and saved (which I know
LTTng has done as well). DTrace/stap/BPF don't have any offline capture file
capabilities -- they could do it, but it's not been their focus.

------
AceJohnny2
I've only recently tried out DTrace on OS X, and I'll admit to being kinda
floored at what it can do. To think I used to be satisfied with strace on
Linux!

Seeing the tracing capabilites of Linux expand is exciting indeed.

Edit: the couple of tutorials that finally unlocked DTrace (on OS X) for me
are:

[https://www.objc.io/issues/19-debugging/dtrace/](https://www.objc.io/issues/19-debugging/dtrace/)

[https://www.bignerdranch.com/blog/hooked-on-dtrace-
part-1/](https://www.bignerdranch.com/blog/hooked-on-dtrace-part-1/)

~~~
tkinom
Agree, DTrace on OS X is supper powerful.

I once try to debug the open source libusb app in Mac OS, with DTrace I can
trace the App, Kernel USB API call, libusb internal thread in user space, etc.

Much better visibility to system activities compare to simple strace.

Absolutely love the power of what it can do.

BTW, Can a DTrace script to use to monitor a system with potential "Dirty COW"
type privilege escalation issue?

------
helper
The most challenging thing for us is running a new enough kernel to get these
features. While upgrading to a newer kernel isn't particularly hard, small
companies don't have a lot of engineering resources to run kernels that aren't
maintained by their distro of choice (usually on the LTS release).

The good thing is this is solved simply by waiting long enough. The bad thing
is most developers can't just pick this up today without a bunch of extra
effort.

If you are looking for something you can use with old kernels you should
definitely checkout Brendan's perf-tools repo[1]. It takes advantage of older
kernel features and works with things as old as ubuntu 12.04.

*Edit: Fixed Brendan's name

[1]: [https://github.com/brendangregg/perf-
tools](https://github.com/brendangregg/perf-tools)

~~~
technofiend
This is the same problem shared by RedHat customers, although RH is great
about backporting features to older kernels, I'm not sure they'll be able to
move this to 3.x from 4.9. The price we pay for stability.

~~~
4ad
You mean the price paid for stable kernel ABIs, so proprietary drivers can use
them?

The Linux kernel is the most safe operating system component to upgrade,
mostly because the Linux kernel people care deeply about compatibility and not
breaking user space. While it is still not free to upgrade, the cost is
minimal compared to the cost (in real money, risk, security) of backporting
major components to older kernels, like Red Hat is doing.

Red Hat is maintaining old kernels for having a stable kernel ABI (among some
other reasons), not for general "stability".

Personally I run only CentOS, but always with a recent kernel, albeit usually
a LTS one. Mostly to get exactly these type of features described in this
article.

~~~
technofiend
Well they show the same caution and care around curating their repositories
and releasing security fixes, so in my opinion (you may disagree) they try to
ensure stability beyond the kernel's ABI.

~~~
jlgaddis
Indeed. The kernel is not the only component (of the complete system) that
gets features backported.

I, personally, am quite happy that I don't really have to worry that a routine
"yum update" is going to break any of my installed applications.

------
wyldfire
Congrats, this is good news.

> On Linux, some out-of-tree tracers like SystemTap could serve these needs,
> but brought their own challenges.

I was pretty happy with stap, it had a really rich feature set.

> DTrace has its own concise language, D, similar to awk, whereas bcc uses
> existing languages (C and Python or lua) with libraries.

I think we need more creative names for languages. The short and simple ones
like "go" and "D" keep on having collisions. :)

>BPF adds ... uprobes

uprobes + all the other stuff is really killer, I like the idea of watching
for stuff like "my app has crossed this threshold and then this system
condition occurs". At least when I tried it a couple years ago with stap my
kernel wasn't built with uprobes support and I wasn't inclined to rebuild it.
Hopefully it becomes (or has become) more mainstream.

~~~
brendangregg
> I was pretty happy with stap, it had a really rich feature set.

So are other companies. I mentioned it in the post, as in a way this hurt BPF
development, as companies that normally would have contributed resources said
they were satisfied with stap. Exciting times might be ahead for stap, if it
continues its BPF backend.

As for naming, yes, we need better names. Maybe the bcc/Python/BPF combination
can be named something?

------
qwertyuiop924
Will there every be way to write probes/tracing scripts without dropping into
C? I don't mind C in general, but I don't want to have to dig out the
documentation for the eBPF C library and start writing hundreds of lines of C
every time I want to run a trace.

DTrace made this really nice, because you would write your tracing scripts in
a high-level, awk-like language, which is the sort of thing well-suited to the
purpose.

~~~
brendangregg
Yes, see the section "A higher-level language", which mentions at least two
projects: SystemTap+BPF and ply.

Think of the current bcc/Python/C interface as a lightweight skin that was
necessary during BPF development to kick the tires on various features,
prototype tools, see what else needed to be done, etc. It may be good enough
to stay around, as lots of tools have been written for it that will get used
and be valuable. But there's room for higher-level languages too.

If Sasha keeps developing his "trace" tool (and its summary counterpart,
argdist), that may serve many such custom needs (as another option). See the
various examples:
[https://github.com/iovisor/bcc/blob/master/tools/trace_examp...](https://github.com/iovisor/bcc/blob/master/tools/trace_example.txt)
, like:

    
    
        # trace 'sys_read (arg3 > 20000) "read %d bytes", arg3'
        TIME     PID    COMM         FUNC             -
        05:18:23 4490   dd           sys_read         read 1048576 bytes
        05:18:23 4490   dd           sys_read         read 1048576 bytes
        05:18:23 4490   dd           sys_read         read 1048576 bytes

~~~
caf
How feasible would it be to compile the dtrace language itself to an eBPF
backend?

~~~
qwertyuiop924
Huh. Is there a formal definition anywhere?

If we have some kind of spec (even if it's not formal), it might be possible,
since they are roughly equivalent, AFAIK. However, since I haven't worked in-
depth in either, I'm unsure what work would be involved: it's possible that
only a subset would compile.

Anyways, this is probably a good goal to shoot for. Dtrace is the system
tracer on pretty much all of the other big unixes, so it's a good idea to
support as much of its language as possible. Plus, there are a lot of scripts
already written in DTrace's language: having access to them would be
invaluable.

~~~
brendangregg
Nobody would love a DTrace/BPF front-end more than I. And not just because I'd
sell more copies of my DTrace book (I joke :). It is a really nice language,
although missing a few things that BPF can do that DTrace can't (like saving
and retrieving stacks), so it'd need to be enhanced.

But with the warning that I'm not a lawyer: before beginning work on a
DTrace/BPF front-end, I'd start by talking to a copyright lawyer to see if
permission or a license is needed from Oracle. DTrace is Oracle copyrighted,
and re-implementing a DTrace front-end on Linux sounds a lot like re-
implementing an Oracle-copyrighted API.

~~~
Annatar
What would stop one from enhancing the DTrace in illumos further? That is
licensed under the CDDL, and Adam has made several enhancements to DTrace
lately, and if my memory serves me correctly, Bryan fixed a couple of bugs in
it recently as well.

~~~
qwertyuiop924
Wouldn't really solve our problem, though: We'd need a new compiler for the
language. As I understand it, that would require a almost entirely new
codebase. But I don't work on DTrace internals, so I may be wrong.

~~~
Annatar
What? That doesn't make any sense. How is dtrace(1) built on illumos, then?

~~~
qwertyuiop924
I'm uncertain, but I thought dtrace on illumos was interpreted. Am I wrong?

If I am, than all we'd have to rewrite was the compiler backend, which is much
easier, so that would be nice.

~~~
Annatar
dtrace(1) itself is compiled from C into an ELF binary executable.

    
    
      % file `which dtrace`
      /usr/sbin/dtrace: ELF 32-bit LSB executable 80386 Version 1, dynamically linked, stripped
    

The DTrace language, D, is interpreted. By DTrace.

The problem lies in the fact that neither the GNU/Linux kernel, nor the GNU
applications provide DTrace probe points. On Solaris and thus on illumos, and
thus on SmartOS, there are tens of thousands of probe points and numerous
probe providers. Some external applications, like PostgreSQL or PHP, added
DTrace probes, and all is well on Solaris / illumos / SmartOS. Some, like
node.js had providers and probes added by engineers at Joyent.

[http://dtrace.org/blogs/dap/2012/04/25/profiling-node-
js/](http://dtrace.org/blogs/dap/2012/04/25/profiling-node-js/)

[http://dtrace.org/blogs/dap/2013/11/20/understanding-
dtrace-...](http://dtrace.org/blogs/dap/2013/11/20/understanding-dtrace-
ustack-helpers/)

[http://dtrace.org/blogs/blog/category/node-
js/](http://dtrace.org/blogs/blog/category/node-js/)

GNU/Linux would have to do the same thing. It currently only has but a handful
of DTrace probes and providers, which is understandably not very useful.

[http://dtrace.org/blogs/ahl/2011/10/05/dtrace-for-
linux-2/](http://dtrace.org/blogs/ahl/2011/10/05/dtrace-for-linux-2/)

~~~
qwertyuiop924
None of that is at all what I meant: allow me to clarify.

I was talking about the DTrace language, which I have been avoiding calling D
up until this point, so as to avoid confusion with the other D. In this post,
when I talk about D, I will be referring to the DTrace language.

Linux now has a tracing system called eBPF, which provides many of the same
advantages of DTrace. This is what Gregg's blog post was about.

However, eBPF requires compiling tracer scripts, or at least parts of the
tracer scripts, into bytecode (IIRC). Currently, the bytecode is usually
compiled from C code. However, it is ungainly and impractical to write a bunch
of C every time you want to run a trace. So I asked if they had plans to
support compiling a higher level language. At this point, somebody suggested
that somebody should work on compiling the DTrace language, D, to eBPF
bytecode. I thought that this would be a good idea, and we were discussing how
viable it would be.

I thought that you had suggested using Illumos DTrace as a base for this
compiler. Since by my impression, Illumos DTrace interprets D, I thought that
this would require almost a complete rewrite, and thus it wouldn't very
helpful.

It seems you meant something else. So what did you mean?

~~~
Annatar
I think we both got lost, didn't we? (:-)) So let's rewind:

 _It is a really nice language, although missing a few things that BPF can do
that DTrace can 't (like saving and retrieving stacks), so it'd need to be
enhanced. But with the warning that I'm not a lawyer: before beginning work on
a DTrace/BPF front-end, I'd start by talking to a copyright lawyer to see if
permission or a license is needed from Oracle._

I fail to see how Oracle would be relevant, given that the entire DTrace is
released under the CDDL. My question to Brendan was why he couldn't just take
or enhance the existing DTrace codebase to do the front-end for BPF?

(Personally, I think the other guy working on integrating BPF deeper into the
GNU/Linux kernel, rather than _biting the bullet_ , taking DTrace and doing
the same thing Solaris engineers did to the SunOS kernel, is terribly mis-
guided, especially since some probes and providers already exist in Linux. In
the end, it will be a "Linux zoo", like everything else in Linux: "56"
competing solutions for doing one thing, none of them comprehensive, and no
consistency. Linux history is being repeated again. You have one comprehensive
tool which works across several operating systems, DTrace, and Linux is yet
again different from everyone else. Reminds me a lot of Microsoft Windows.)

~~~
qwertyuiop924
It's not ideal, but eBPF works now, and it works on all the new kernels, which
is more than can be said for any of the DTrace on Linux projects. It looks
like eBPF is becoming that unified solution (mostly: ftrace and a few others
still exist, but eBPF is the most capable, and is picking up steam). As for
being different from what the other Unixes did, that's why I was in favor of
developing a frontend that supported D, so that we could at least have a
shared language with the rest of unixland.

>My question to Brendan was why he couldn't just take or enhance the existing
DTrace codebase to do the front-end for BPF?

Brendan didn't answer, so I don't know, but my guess was that turning the
interpreter into a compiler would require a near-complete rewrite.

~~~
brendangregg
> that's why I was in favor of developing a frontend that supported D, so that
> we could at least have a shared language with the rest of unixland.

Yes, D everyone would be nice, but what exactly does it mean? We can share
DTrace scripts? Docs? Blog posts? Books? I've been porting them all over to
bcc/BPF. Am I missing something?

People have already began work developing new languages, eg, ply
[https://wkz.github.io/ply/](https://wkz.github.io/ply/). What if we develop a
language that's much better than D? We need to make enhancements, anyway.

I should reiterate something I covered in the post: most people won't care
about this. Most people didn't write DTrace scripts when they had the option
to (did either of you write DTrace scripts? have some on github?). Most people
used the tools. And today, people can "apt install bcc-tools" and use
DTraceToolkit like tools.

If someone wants to engage lawyers & Oracle and see if or what needs to be
done to use DTrace, then great, it'd make my job easier when developing these
tools (and I'd sell more DTrace books :). But I'd also like to see someone
take a swing at developing a better language as another possibility.

~~~
Annatar
_did either of you write DTrace scripts?_

I did, some, in the beginning (circa 2006): I was stymied mostly by the
realization that _deep, deep_ knowledge of the kernel structures was required
to make use of DTrace (I wasn't working as a kernel engineer per se at the
time.)

I forgot a lot of it in the meanwhile: just yesterday I was trying to get a
simple ustack() using DTrace on a running process, and I even pulled open
"Solaris Performance and Tools", and eventually when the process finished, I
threw up my hands in frustration. All I wanted to do was see why the running
process (oggenc) was taking so long. (But this BPF thing looks far, far more
complicated and convoluted than D.)

Nevertheless, I think D is ideal, because, in my case, it plays on my
experience in programming AWK: apart from needing to know what in the kernel I
wanted to probe, I could immediately start writing DTrace programs without
having to learn the language. And _that_ is amazing.

~~~
brendangregg
See my other comments -- Annatar, what is your real name?

I'll add: as someone who has written and published countless DTrace and BPF
scripts, I don't know that pursing a DTrace front end is wise right now, for
reasons I've already covered.

I'm sorry you weren't able to solve that issue. I'd suggest starting with a
profiler (timed sampling) if it was running on-CPU, to see where CPU time is
spent.

------
lallysingh
So we're not getting DTrace proper, it seems. Instead something else will stem
up from the various linux tracing systems. Maybe this BPF-based one.

It's a shame. One of the nice things about dtrace was that there was a book on
it. Good, in-depth documentation on performance tools is hard to find.

~~~
brendangregg
Thanks, I wrote the DTrace book with Jim Mauro, and there will be a BPF
tracing book as well.

BTW, I wouldn't say "maybe" regarding BPF, as it's integrated in the Linux
kernel (unlike most of the other tracers, which are add-ons). Sooner or later
everyone who runs Linux is getting it.

~~~
cthalupa
I think I bother you about a new tracing book for Linux every time one of your
articles is posted, so I'll give my obligatory: We want a new Brendan Gregg
tracing book! ;)

Things have been moving so fast it's probably a good thing you didn't. It
sounds like 4.9 will slow a lot of that down to a more manageable pace for
writing a book, though.

------
asymmetric
> In 2014 I joined the Netflix cloud performance team. Having spent years as a
> DTrace expert, it might have seemed crazy for me to move to Linux

I thought Netflix was mostly running FreeBSD [1]. Is it only the Open Connect
Appliance?

[1]:
[https://www.freebsdfoundation.org/testimonial/netflix/](https://www.freebsdfoundation.org/testimonial/netflix/)

~~~
brendangregg
When you login to Netflix and browse videos, you're running on the Netflix
cloud, which is massive, AWS/EC2, and mostly Ubuntu Linux. When you hit play,
you're running on the OCA FreeBSD CDN, which is also a large deployment.

~~~
Annatar
So why didn't they just deploy FreeBSD across the entire server park? That
would also give you DTrace again...

~~~
X86BSD
I'll put $1 on politics. I mean look, you have an OS that bgregg has had to
pour how much effort into to get the observability that FreeBSD already had?
And that's just the observability part. Then you have the FreeBSD network
stack. To me it's clear based on the work done on Linux it was a political
choice.

~~~
Annatar
Yeah but if that's the case, it's really bad. There is no place for politics
in computer science or information technology.

~~~
empthought
I think it's more likely because Netflix uses the JVM -- probably the Oracle
JDK -- which is supported on Linux but not on FreeBSD.

~~~
Annatar
That makes _even less sense_ , and reeks even more of irrationality: if
they're using JVM, a Solaris based system like SmartOS would be the best
choice - Solaris is where Java is developed, after all.

It's like buying a NetApp appliance to run NFS servers, when Solaris is _the_
reference NFS server implementation. Humans do not make any sense with their
decisions governed by feelings instead of logic.

~~~
brendangregg
Yes, we're using the JVM. No, Solaris or SmartOS would not be the best choice.
Would it help if I went into detail as to why?

~~~
Annatar
Yes it would.

~~~
brendangregg
I worked on Solaris for over a decade, and for a while it was usually a better
choice than Linux, especially due to price/performance (which includes how
many instances it takes to run a given workload). It was worth fighting for,
and I fought hard. But Linux has now become technically better in just about
every way. Out-of-box performance, tuned performance, observability tools,
reliability (on patched LTS), scheduling, networking (including TCP feature
support), driver support, application support, processor support, debuggers,
syscall features, etc. Last I checked, ZFS worked better on Solaris than
Linux, but it's an area where Linux has been catching up. I have little hope
that Solaris will ever catch up to Linux, and I have even less hope for
illumos: Linux now has around 1,000 monthly contributors, whereas illumos has
about 15.

In addition to technology advantages, Linux has a community and workforce
that's orders of magnitude larger, staff with invested skills (re-education is
part of a TCO calculation), companies with invested infrastructure (rewriting
automation scripts is also part of TCO), and also much better future
employment prospects (a factor than can influence people wanting to work at
your company on that OS). Even with my considerable and well-known Solaris
expertise, the employment prospects with Solaris are bleak and getting worse
every year. With my Linux skills, I can work at awesome companies like Netflix
(which I highly recommend), Facebook, Google, SpaceX, etc.

Large technology-focused companies, like Netflix, Facebook, and Google, have
the expertise and appetite to make a technology-based OS decision. We have
dedicated teams for the OS and kernel with deep expertise. On Netflix's OS
team, there are three staff who previously worked at Sun Microsystems and have
more Solaris expertise than they do Linux expertise, and I believe you'll find
similar people at Facebook and Google as well. And we are choosing Linux.

The choice of an OS includes many factors. If an OS came along that was
better, we'd start with a thorough internal investigation, involving
microbenchmarks (including an automated suite I wrote), macrobenchmarks
(depending on the expected gains), and production testing using canaries. We'd
be able to come up with a rough estimate of the cost savings based on
price/performance. Most microservices we have run hot in user-level
applications (think 99% user time), not the kernel, so it's difficult to find
large gains from the OS or kernel. Gains are more likely to come from off-CPU
activities, like task scheduling and TCP congestion, and indirect, like NUMA
memory placement: all areas where Linux is leading. It would be very difficult
to find a large gain by changing the kernel from Linux to something else. Just
based on CPU cycles, the target that should have the most attention is Java,
not the OS. But let's say that somehow we did find an OS with a significant
enough gain: we'd then look at the cost to switch, including retraining staff,
rewriting automation software, and how quickly we could find help to resolve
issues as they came up. Linux is so widely used that there's a good chance
someone else has found an issue, had it fixed in a certain version or
documented a workaround.

What's left where Solaris/SmartOS/illumos is better? 1. There's more marketing
of the features and people. Linux develops great technologies and has some
highly skilled kernel engineers, but I haven't seen any serious effort to
market these. Why does Linux need to? And 2. Enterprise support. Large
enterprise companies where technology is not their focus (eg, a breakfast
cereal company) and who want to outsource these decisions to companies like
Oracle and IBM. Oracle still has Solaris enterprise support that I believe is
very competitive compared to Linux offerings.

So you've chosen to deploy on Solaris or SmartOS? I don't know why you would,
but this is also why I also wouldn't rush to criticize your choice: I don't
know the process whereby you arrived at that decision, and for all I know it
may be the best business decision for your set of requirements.

I'd suggest you give other tech companies the benefit of the doubt for times
when you don't actually know why they have decided something. You never know,
one day you might want to work at one.

~~~
Annatar
It was Jeff Bonwick's team which proved that the number of engineers or even
developers working on a given problem is completely irrelevant: ZFS was
developed by a team of, what, five people? Meanwhile, how many people are
working on BTRFS? It's nowhere near ZFS.

But, let's chalk that up to an isolated, one off statistical aberration. From
what I understand Adam and Bryan wrote DTrace almost single handedly, with
some help from Mike, and even with all the contributions, you can still count
the people who made DTrace a working production tool on the fingers of your
one hand.

However, let's chalk that up to a one-off, statistical aberration as well.
Meanwhile, how many people are working on how many tracing frameworks for
Linux?

Next, we have zones, a complete, working, production proven virtualization
solution, augmented by KVM, lx, TRITON, Consul, et cetera. One coherent
solution. Built upon technology on which _I_ ran production Oracle databases
on, way back in 2006, powering a very large institution which was making very
large amounts of money. By the second. How many engineers did it take to
design, architect, and code all that up?

Meanwhile, there are how many competing cloud virtualization solutions based
on Linux? And remarkably, except for SmartOS, _none_ are a complete,
comprehensive solution: they all lack one thing or another. Not one of those
Linux based solutions is paranoid about data integrity or correctness of
operation. Those things are not even an afterthought of Linux.

Should I chalk that up to a one-off, statistical aberration, or would you say
that there is a pattern here?

Amiga Intuition library, the foundation on which the GUI is built into the
system, was written single-handedly by one just one person: RJ Mical. In a
couple of days! For almost two decades, it was _the_ reference on how to build
a library of GUI primitives with almost unlimited flexibility.

Star Control 2, one of the greatest games in history, was developed by just
two guys in the span of three years.

Dave Haynie almost single handedly developed not one, but entire series of
Commodore computers, the C16, C116, C Plus/4 (Commodore 264). Those are the
lessons not only of history, but of our contemporaries, people you used to
work with: KVM was ported from the Linux kernel by what, three engineers, and
form what I can tell, it runs faster on illumos than it does on Linux where
it's developed! Why is that?

You and I apparently drew a completely different set of conclusions: when you
wrote _Linux now has around 1,000 monthly contributors, whereas illumos has
about 15_ you seem to equate the number of people working on a product with
that product's capability and quality, whereas I drew the conclusion that the
number of people is _irrelevant_ , but what the individuals or individual can
do makes all the difference in the world.

Where you are absolutely correct is that the job market for illumos based
operating systems is non-existent, at least in the country where I live, and
slim elsewhere (I used to work in Silicon Valley and in other parts of the
States). That's a fact. But I wouldn't rush to the conclusion that it's
because illumos or SmartOS are worse products, because I see no evidence of
that. Furthermore, at the end of the day, _people still need to run a cloud on
something which actually works_ , and Linux is not it. It doesn't work
correctly, when it works at all. Not even after 20 years, billions of dollars
and a world wide army of people working on it. What is the alternative?
SmartOS.

I read the Netflix tech blog from time to time. And over time, one thing
became clear to me: Netflix can do the things it does because they have _one
single_ application to scale, but most of the world _out there, in the
trenches_ , has more than one application. You write of people with deep
knowledge of the kernel and performance: I've been working in this industry
for _decades_ , and I've yet to meet anyone like that (they must all either be
a secret society, or I'm just way too paranoid, but I do know a lot of IT
professionals). So perhaps it's a living in an enclave problem, or perhaps
both you and I work in enclaves, only different ones? I'm the only person I
know _in IT_ that has done or has any interest in kernel, system engineering
or performance; I must either be incredibly bad at picking companies to work
for, or people you mentioned are really _few and far between_ , or a third
possibility is that it's a fluke coincidence?

Let me tell you about my world: I work on and with Linux professionally. Where
Netflix has only one major application (according to their tech blog) to worry
about, I work at a place where we literally have _several thousands_ of
applications, some bought, some developed in-house; for just about every
problem, we have an average of _five_ applications, all different, but
basically doing the same thing; and some of our applications are so exotic, so
complex, and so custom, that it is impossible to find anyone on the market
with any experience in them. _Thousands._

So while you might be picturing this in your head, imagine running Linux, and
suddenly your database keels over: Linux didn't fail over to the other path,
so multipathing doesn't work right. Then imagine having systems with data
corruption, but Linux can't fix it, because ZFS isn't supported by redhat
which we run, so there goes that - another outage (we have regulators and
governments to worry about, so the company is reluctant to start hacking their
own custom kernel and a ZFS-based Linux). Next, Linux suddenly has an outage
because the NFS mount is flapping. Why is it flapping? Because Linux's NFS
implementation doesn't play well with NetApp. Now imagine stuff like this
happening on a scale of 72,000 systems, spread across the planet. I never had
such problems with Solaris. Not once.

But, since that's anecdotal evidence and experience, we have to discount that
as well.

Then, I have hardware (from one of Oracle's competitors), very, very
expensive, intel-based 80-CPU Xeon monsters, with .5 TB of memory per system,
where the serial console _hangs at random_ : redhat points the finger at the
hardware manufacturer, hardware manufacturer points the finger at redhat.
Result: console is still hanging at random, with both companies telling us
they have no clue what the problem is. That's Linux for you.

Serial console always worked just fine on illumos. After all, it's basic
functionality.

Then there's the issue of Linux not getting shutdown properly: you'd think
that after 20 years of development and as you correctly noted, a world wide
_army of developers_ and billions of dollars in investments, the shutdown
procedure wouldn't try to write to an already unmounted filesystem; it's basic
functionality, after all; but even that is too much to expect, apparently (I
can dig out the redhat bug if you're interested).

That last one, we cannot chalk up to a fluke, and even worse, sgi's XFS was
the only one which actually detected that write and panicked the kernel - ext3
was oblivious to this data corruption. It's mighty difficult for me to
engineer highly reliable services on such a substrate... but let's not dwell
on that too much right now. It's too depressing.

Then there is tracing: you know there are several frameworks at play. Then
there is also lack of proper DWARF2 support (I researched the subject, and
found out that the "solution" was to replace my run time linker!) Can you
imagine something like that being a solution on an illumos based system? I
think everybody would commit collective suicide or quit altogether like Keith
Wesolowski did before _casually suggesting_ such a thing, but let's not dwell
on that either. (At this point, I think it fair to sue for pardon if I _don
't_ want my operating system made by people who think nothing of casually
replacing the run time linker only to get DWARF 2 debugging support. Do you
agree?)

Then there's this issue of startup: while SMF has been humming along for more
than a decade, Linux is still trying to figure out some sort of a _complete
working solution_ : currently that's systemd, and based on how it's
architected, it looks like Windows and Linux are finally converging.
Meanwhile, to make a startup which sort of reminds of the working SMF, systemd
has several different configuration states for its services... and no fault
management architecture to speak or write of.

One thing's for sure: your and my expirences are radically different. You
shocked me to the core, but I also understand your thinking and motives for
leaving illumos behind better, and it's the kind of appreciation I'm unable to
put in words. You are also a much more flexible: after having seen just how
convoluted, complex, slow, and resource wasting Java is, I would _never_ go
work at another company which used it (the place where I work now, Java is
_the_ language and _the_ platform). I'd just quit the industry like Keith did.

In spite of all of this, if you let me know how to reach you, I'll provide you
with enough information on how to get in touch: I'd still love to have you
over if you're in the country, and cook you dinner.

~~~
brendangregg
You've just discounted quite a lot of what I said as "no evidence", and have
made some incorrect assumptions about both development at Sun and Netflix.
Along with your other comments, at this point it's clear you are bashing on
Linux, Netflix, and me personally, and you still haven't revealed your real
name.

I'd like to know what your real name is. If you really cannot post it here,
then feel free to contact me at bgregg@netflix.com.

~~~
Annatar
I _am_ bashing on Linux, absolutely; that massive bleeding wound is _very raw
and painful_. I have no reason to bash on Netflix; I merely pointed out that,
in my view, Netflix's problem domain is very narrow, and a luxury: most IT
departments don't have only one (however massive) application to worry about.

As for you personally, I have nothing but highest respect for you. You are one
of the reasons why I still haven't quit this industry. In fact, I still cannot
believe I've actually communicated with _Brendan Gregg_. To me personally,
you're a living legend. If I believed in personal heroes, you'd be one of
them.

------
easytiger
Really rather unfortunate that big enterprise platforms such as banks and so
forth are so far behind on their kernel version that it will be approximately
7-8 years before they will have this capability, unless RH backport of course.

~~~
twblalock
On the other hand, I'm glad the banks who handle my money don't upgrade to the
latest and greatest software without taking very, very stringent precautions
to make sure everything will work.

~~~
obitoo
In my experience thats not the case - its more like 'It works, no-one touch
it! We're spending our money on more visible things' (several years later):-
"Whats that, its no longer going to be supported? Damn, now we _have_ to
upgrade"

------
4ad
Linux is not my favorite operating system, but it seems like we're stuck with
it. I'm very happy for all these improvements. Once you got used to a system
with a quality and functional tracer, Linux was hard to get back to. But Linux
tracing is getting better and better now. I am very satisfied.

~~~
Annatar
_Linux is not my favorite operating system, but it seems like we 're stuck
with it._

It only seems that way. We're never stuck with something as long as we don't
accept it. One other factor is at play which works against Linux, and that is
that people in IT like shiny new things, and therefore something else always
comes along. Hopefully this time around, that something else will be the old
new thing (learning from the past, and re-discovery). One way or the other,
the clock is ticking on Linux, and one of these days, it won't be as popular
any more, because something else will be the new-new thing. It's the nature of
this industry:

change is the only constant.

You don't have to accept anything. Don't bow to peer pressure.

------
honkhonkpants
So how does this relate to uprobes? I've been looking into that lately because
I want frequency counts (or coverage analysis) of user space programs but
without the nop-sled overhead of xray. Does dtrace supplement or replace
uprobes? Or am I really just confused?

~~~
cthalupa
DTrace is a Solaris (and BSD/OSX) tracing tool that never quite made it to
Linux (There are some attempted ports, but none of them really caught on). BPF
(and adding in frontends like BCC) give you the same sort of functionality in
Linux.

BPF can take advantage of uprobes and instrument around them, but it interacts
with them, and does not replace them

