
Linux tracing systems and how they fit together - ingve
https://jvns.ca/blog/2017/07/05/linux-tracing-systems/
======
616c
I am a huge fan of jvns. I wish I can be her when I grow up.

Does anyone know of someone doing the same style of introspection tools, for
tracing and profiling and networking, like the body of her work, not just this
post, but for Windows?

I know a few scattered posts here and there, usually PFEs at Microsoft Blogs
scattered, but the landscape of dedicated bloggers seems lacking to a novice
like me.

~~~
hannasm
Vance Morrison is a pretty significant guy in the windows (.NET) tracing world

[https://blogs.msdn.microsoft.com/vancem/](https://blogs.msdn.microsoft.com/vancem/)

~~~
windowsworkstoo
Yeh Vance is great and is largely responsible for getting a sensible wrapper
for ETW in .NET

------
bitcharmer
I'm surprised Brendan Gregg hasn't been mentioned here yet. He's the Linux
tracing/profiling god.

[http://www.brendangregg.com/linuxperf.html](http://www.brendangregg.com/linuxperf.html)

Don't get me wrong, I respect Julia Evans as a professional, but what she
mostly does is simplify other people's hard work and in-depth analysis of
difficult problems in various layers of the technology stack.

~~~
zenlikethat
How exactly are the next generation of systems programmers supposed to get
there without great minds like Julia documenting their journey along the way?
If no one can apply someone's work for practical purposes then what good does
it do?

Julia mentions Brendan in her post already and she's done _plenty_ of great
work. Don't tear other people down, it's not cool.

~~~
bitcharmer
I already admitted my respect for her as a professional. There is nothing
negative in what she does. I'm just stating a fact - if you want more details
you need to turn to people who she sources from.

What is bad about it?

~~~
bogomipz
I didn't think there was anything bad in what you said. It was your opinion
and I thought you were respectful.

What I have seen though is that saying anything even slightly negative about
the content in this individual's blog posts will draw a strong rebuke as you
have just seen.

I personally find the amount of fawning commentary and aggressive
defensiveness for this particular blog and author a bit cultish.

------
0xcde4c3db
I'm still not seeing the part where they actually fit together; it basically
just looks like an accumulation of ad-hoc tools with no overarching concept.
They look like they could be very useful tools, but there doesn't seem to be
any architecture there.

~~~
cyphar
There definitely isn't any overarching design. You see the same thing all
throughout Linux's interfaces (containers are another great example). If you
contrast this with the Solaris or BSD interfaces you notice that Linux has
always been odd in this respect (DTrace, Jails/Zones, ZFS, pf,
kqueue/eventports and so on).

Depending on your view you can view this as a positive or a negative. One view
is to say that Linux is more collaborative, and only the "common core"
interfaces are actually put into the kernel (with the higher levels being
provided in userspace by vendors). A good example of this is the live patching
code that came from a distilling of Red Hat's ksplice and SUSE's kgraft
systems. You can track done most of Linux's features to this sort of
development model.

illumos and BSD however, usually work far more like traditional software
products. Some engineer goes off and implements a really interesting system
that then gets put into a release. That's how ZFS, Jails, DTrace and so on
were developed (in Sun) and I'm sure you can come up with other examples from
BSD. The key point here is that this is a far more authoritarian model -- one
set of engineers have decided on a system and developed it mostly in
isolation. Linux doesn't permit that style of development because large
patchsets need to be broken down over releases.

Personally I must say that I don't care for the Linux style of things, mainly
because I feel it hampers OS innovation in some ways. But the upside is that
the facilities are more like libraries than frameworks and so you're forced to
design your abstractions in userspace. Is that good? I don't know.

Note that following along with the above theme, there is an overarching
architecture for Linux's tracing tools (in userspace) in the form of bcc[1].

[1]: [https://github.com/iovisor/bcc](https://github.com/iovisor/bcc)

------
everybodyknows
For those willing to get their hands dirty in low-level kernel code, the
perhaps simplest tracing tool is _dynamic-debug_ :
[https://github.com/torvalds/linux/blob/master/Documentation/...](https://github.com/torvalds/linux/blob/master/Documentation/admin-
guide/dynamic-debug-howto.rst)

------
relyio
You should mention it is the Ecole Polytechnique... de Montreal. Usually, when
people say "Ecole Polytechnique" they mean the original one.

It's like saying you went to MIT (Minnesota Institute of Technology).

Other than this small nit, great article.

~~~
jvns
fixed! I thought it was really cool that the DORSAL lab
([http://www.dorsal.polymtl.ca/en](http://www.dorsal.polymtl.ca/en)) is doing
such interesting and practical linux research -- I often think of academics as
living 'up in the clouds', but it seemed like they were contributing a lot to
the Linux kernel and building really useful tracing tools. So a bunch of
people in academia are doing super practical systems work & writing papers
about it and I think that's awesome. It made me want to read more academic
systems papers.

> Research done at the lab concentrates on improving the performance,
> reliability and resilience of distributed, cloud and multi-core systems.

------
bjackman
In my team we use a tool we developed called TRAPpy [1] to parse the output of
ftrace (and systrace) into Pandas DataFrames, which can then be plotted and
used for rich analysis of kernel behaviour.

We also integrate it into a rather large wider toolkit called LISA [2] which
can do things like describe synthetic workloads, run them on remote targets,
collect traces and then parse them with TRAPpy to analyse and visualise the
kernel behavior. We mainly use it for scheduler, cpufreq and thermal governor
development. It also does some automated testing.

[1] [https://github.com/ARM-software/trappy](https://github.com/ARM-
software/trappy)

[2] [https://github.com/ARM-software/lisa](https://github.com/ARM-
software/lisa)

------
kronos29296
For a guy who knows nothing about tracing, it was a post that made me
understand atleast some of it. The doodles really made me smile instead of a
boring flowchart or diagram using a drawing tool. Another post going to my
collection of interesting posts.

------
lma21
Great article, sums up the linux tracing domain in a pretty neat way. I've
used strace / perf / dtrace extensively since these are the only tools our
clients' infrastructures can support, and it's always a bugger when you're on
an older system and your hands are tied. Never tried eBPF yet, I should look
into it once the kernel 4.7+ release hits RHEL

------
emilfihlman
Your pictures are broken?

~~~
jvns
should be fixed now (delightfully, I needed to get some new TLS certs for my
site to fix this issue, and that took literally like 30 seconds to do thanks
to the magic of Let's Encrypt)

------
dangisafascist
I'm confused why BPF exists in the first place. Can't we just compile kernel
modules that hook into the tracing infrastructure?

It seems like a webassembly for the kernel but local software has the benefits
of knowing the platform it is running on. I.e. Why compile C code to eBPF,
when I can just compile to native code directly?

I can potentially see it solving a permissions problem, where you want to give
unprivileged users in a multi-tenant setup the ability to run hooks in the
kernel. Is that actually a common use case? I don't think it is.

~~~
vbernat
Yes, you can just compile kernel modules, but you take the risk of crashing
the kernel. eBPF provides a safe way to interact with the kernel due to not
being turing complete and additional restrictions. Systemtap is another
example of such language but compiles to kernel modules instead.

This is quite important when you want to run this code in production. You
don't want to accidently crash your kernel.

~~~
dangisafascist
I'm not sure this argument makes sense. Avoiding accidentally crashing the
kernel doesn't require a BPF layer.

For instance, you could just write your kernel module in a sufficiently safe
language, like Rust, and have the same benefits. You could even pre-compile
eBPF for the exact same level of safety. Still no need for the bpf() system
call or the eBPF VM or JIT in the kernel.

~~~
sargun
(e)BPF has the following guarantees:

* Strictly typed -- registers, and memory are type checked at compilation time. If you use something like Rust, you'd have to bring rustc into the kernel

* Guaranteed to terminate -- you cannot jump backwards, and there is an upper bound on the instruction count

* Bounded memory -- The registers, and accessible memory via maps are a fixed size. We don't have a stack per se.

Compiling Rust to this is possible, but it'd require quite a bit of
infrastructure in the kernel to verify that the code is safe, versus the
simplicity of eBPF. Early attempts at a general purpose in-kernel VM included
passing an AST in, and then doing safety checking on the AST, but they proved
too complicated to do safely.

~~~
dangisafascist
I'm not arguing against eBPF the language. It's safety guarantees make sense
to me.

I'm arguing against the in-kernel eBPF infrastructure: bpf system call, the
JIT and the VM.

I think it makes more sense to just compile eBPF (or rust or whatever safe
language you want) to a kernel module.

~~~
codyps
The idea with having eBPF in the kernel is that we can limit the amount of
trust given to a particular user-space task.

Accepting compiled stuff in the form of a kernel module requires root
privileges and requires that the kernel essentially have complete trust in the
code being loaded.

Loading eBPF eliminates the need to trust the process/user doing the loading
to that level.

~~~
dangisafascist
The bpf() system call and SOCK_RAW both require root. Is there an example of
using bpf that doesn't require root?

~~~
sargun
The BPF syscalls don't require cap sys admin. Only specific invocations. You
can setup a socket filter without sys admin, and a device or XDP filter with
net admin.

~~~
dangisafascist
Sure but how common is that case? How common are multi-tenant Linux systems
with untrusted users that give those specific permissions? Do you want
untrusted users sniffing the packets of others?

------
throwme_1980
I am not sure what to think of these pictures, it triviliases the entire
subject and makes it look child like, no engineer worth his salt will be seen
with these doodles on his desk. Stop reading as soon I saw that

~~~
darksim905
no offense, but jvns is kind of known for her drawings & doodles that explain
key concepts of a larger, confusing subject. They are awesome, to the point &
great for digesting information we need to know, quickly.

~~~
throwme_1980
Known is relative ....

------
equalunique
I should take it upon myself to get familiarized with all of these.

