Hacker News new | comments | ask | show | jobs | submit login
Linux tracing systems and how they fit together (jvns.ca)
256 points by ingve on July 8, 2017 | hide | past | web | favorite | 59 comments

I am a huge fan of jvns. I wish I can be her when I grow up.

Does anyone know of someone doing the same style of introspection tools, for tracing and profiling and networking, like the body of her work, not just this post, but for Windows?

I know a few scattered posts here and there, usually PFEs at Microsoft Blogs scattered, but the landscape of dedicated bloggers seems lacking to a novice like me.

Have a look at Bruce Dawson blog [1] - he has written many articles explaining ETW and its tooling. You may start at the summary page [2] to check the covered topics. "Ask the perf team" blog [3] contains also a lot of interesting posts on Windows diagnostics, but unfortuantely is no longer updated.

[1] https://randomascii.wordpress.com

[2] https://randomascii.wordpress.com/2015/09/24/etw-central/

[3] https://blogs.technet.microsoft.com/askperf/

Vance Morrison is a pretty significant guy in the windows (.NET) tracing world


Yeh Vance is great and is largely responsible for getting a sensible wrapper for ETW in .NET

ETW might be what you are looking for.

I'm surprised Brendan Gregg hasn't been mentioned here yet. He's the Linux tracing/profiling god.


Don't get me wrong, I respect Julia Evans as a professional, but what she mostly does is simplify other people's hard work and in-depth analysis of difficult problems in various layers of the technology stack.

How exactly are the next generation of systems programmers supposed to get there without great minds like Julia documenting their journey along the way? If no one can apply someone's work for practical purposes then what good does it do?

Julia mentions Brendan in her post already and she's done _plenty_ of great work. Don't tear other people down, it's not cool.

The OP clearly mentioned their respect for the individual in their comment.

They didn't appear to be "tearing anybody down." Accusing people of malice for expressing their opinion is also "not cool."

It's perfectly acceptable to not be a fan of someone's blog posts. It's also perfectly acceptable to express that in a respectful manner which they did.

Saying you respect someone is meaningless if immediately followed up by a suggestion that they're just simplifying other peoples' "real work":

> what she mostly does is simplify other people's hard work

OP also called Brendan Gregg a "god" in the same post. So Brendan is a "god" but Julia is just distilling other peoples' work? Sounds pretty disrespectful to me.

>"Saying you respect someone is meaningless if immediately followed up by a suggestion that they're just simplifying other peoples' "real work"

Why are respectful and critical mutually exclusive?

Do you believe that one negates the other?

It seemed to me he was recognizing her skill as a distiller of information. That's what a lot of her posts are, how is that disrespectful?

> simplify other people's hard work

I don't mean to pull the argument one way or another one. But, IMO simplifying other people's hard work is hard work too. At least I personally find it to be.

Depending on our experience we can understand OP as being respectful (or not). I would like to give her/him the benefit of the doubt.

I already admitted my respect for her as a professional. There is nothing negative in what she does. I'm just stating a fact - if you want more details you need to turn to people who she sources from.

What is bad about it?

I didn't think there was anything bad in what you said. It was your opinion and I thought you were respectful.

What I have seen though is that saying anything even slightly negative about the content in this individual's blog posts will draw a strong rebuke as you have just seen.

I personally find the amount of fawning commentary and aggressive defensiveness for this particular blog and author a bit cultish.

Had you left off the final paragraph, I think the comment simply would have been upvoted, and nothing more would have been said.

i don't think that's what she does. i think she's the Bob Ross of Linux. what she does is to evangelize learning about low-level debugging / tracing tools and such to a wide audience, with a tone of "you can do this! it's ok if you're a novice". if all talented programmers understood and encouraged beginners like she did, we'd have an embarrassment of great programmers.

Yes - this. I'm pleased that younger people in my team are learning things from her posts. Being a grey-hair *nix guy (not a grey beard!) the posts are interesting, but I know there is a LOT more beneath the surface. She's a valuable addition to the net!

> What she mostly does is simplify other people's hard work

This is the opposite of "respecting" someone as a professional.

Julia mentions and links to Brendan Gregg four times in the this article.

Plus crazy experiments. Don't forget the crazy experiments.

This is why we can't have nice things :(

I'm still not seeing the part where they actually fit together; it basically just looks like an accumulation of ad-hoc tools with no overarching concept. They look like they could be very useful tools, but there doesn't seem to be any architecture there.

There definitely isn't any overarching design. You see the same thing all throughout Linux's interfaces (containers are another great example). If you contrast this with the Solaris or BSD interfaces you notice that Linux has always been odd in this respect (DTrace, Jails/Zones, ZFS, pf, kqueue/eventports and so on).

Depending on your view you can view this as a positive or a negative. One view is to say that Linux is more collaborative, and only the "common core" interfaces are actually put into the kernel (with the higher levels being provided in userspace by vendors). A good example of this is the live patching code that came from a distilling of Red Hat's ksplice and SUSE's kgraft systems. You can track done most of Linux's features to this sort of development model.

illumos and BSD however, usually work far more like traditional software products. Some engineer goes off and implements a really interesting system that then gets put into a release. That's how ZFS, Jails, DTrace and so on were developed (in Sun) and I'm sure you can come up with other examples from BSD. The key point here is that this is a far more authoritarian model -- one set of engineers have decided on a system and developed it mostly in isolation. Linux doesn't permit that style of development because large patchsets need to be broken down over releases.

Personally I must say that I don't care for the Linux style of things, mainly because I feel it hampers OS innovation in some ways. But the upside is that the facilities are more like libraries than frameworks and so you're forced to design your abstractions in userspace. Is that good? I don't know.

Note that following along with the above theme, there is an overarching architecture for Linux's tracing tools (in userspace) in the form of bcc[1].

[1]: https://github.com/iovisor/bcc

The answer is they don't fit well together. `perf-tools` is probably the most complete tool when it comes to number of performance counters, which be aggregated. However it is not well composable or extensible - so you have to use it as it is. I also find the code hard to read. `bcc` and `lttng` try to make things by creating a more usable API. They partially reuse the kernel part of `perf` (aka `perf_events`) and also add their own kernel modules to provide new functionality.

For those willing to get their hands dirty in low-level kernel code, the perhaps simplest tracing tool is dynamic-debug: https://github.com/torvalds/linux/blob/master/Documentation/...

You should mention it is the Ecole Polytechnique... de Montreal. Usually, when people say "Ecole Polytechnique" they mean the original one.

It's like saying you went to MIT (Minnesota Institute of Technology).

Other than this small nit, great article.

fixed! I thought it was really cool that the DORSAL lab (http://www.dorsal.polymtl.ca/en) is doing such interesting and practical linux research -- I often think of academics as living 'up in the clouds', but it seemed like they were contributing a lot to the Linux kernel and building really useful tracing tools. So a bunch of people in academia are doing super practical systems work & writing papers about it and I think that's awesome. It made me want to read more academic systems papers.

> Research done at the lab concentrates on improving the performance, reliability and resilience of distributed, cloud and multi-core systems.

Fwiw, in recent years the school has preferred the more distinctive name Polytechnique Montréal, rather than either École Polytechnique or École Polytechnique de Montréal, although there are plenty of materials not yet using the new branding.

In my team we use a tool we developed called TRAPpy [1] to parse the output of ftrace (and systrace) into Pandas DataFrames, which can then be plotted and used for rich analysis of kernel behaviour.

We also integrate it into a rather large wider toolkit called LISA [2] which can do things like describe synthetic workloads, run them on remote targets, collect traces and then parse them with TRAPpy to analyse and visualise the kernel behavior. We mainly use it for scheduler, cpufreq and thermal governor development. It also does some automated testing.

[1] https://github.com/ARM-software/trappy

[2] https://github.com/ARM-software/lisa

For a guy who knows nothing about tracing, it was a post that made me understand atleast some of it. The doodles really made me smile instead of a boring flowchart or diagram using a drawing tool. Another post going to my collection of interesting posts.

Great article, sums up the linux tracing domain in a pretty neat way. I've used strace / perf / dtrace extensively since these are the only tools our clients' infrastructures can support, and it's always a bugger when you're on an older system and your hands are tied. Never tried eBPF yet, I should look into it once the kernel 4.7+ release hits RHEL

Your pictures are broken?

should be fixed now (delightfully, I needed to get some new TLS certs for my site to fix this issue, and that took literally like 30 seconds to do thanks to the magic of Let's Encrypt)

I'm confused why BPF exists in the first place. Can't we just compile kernel modules that hook into the tracing infrastructure?

It seems like a webassembly for the kernel but local software has the benefits of knowing the platform it is running on. I.e. Why compile C code to eBPF, when I can just compile to native code directly?

I can potentially see it solving a permissions problem, where you want to give unprivileged users in a multi-tenant setup the ability to run hooks in the kernel. Is that actually a common use case? I don't think it is.

Yes, you can just compile kernel modules, but you take the risk of crashing the kernel. eBPF provides a safe way to interact with the kernel due to not being turing complete and additional restrictions. Systemtap is another example of such language but compiles to kernel modules instead.

This is quite important when you want to run this code in production. You don't want to accidently crash your kernel.

I'm not sure this argument makes sense. Avoiding accidentally crashing the kernel doesn't require a BPF layer.

For instance, you could just write your kernel module in a sufficiently safe language, like Rust, and have the same benefits. You could even pre-compile eBPF for the exact same level of safety. Still no need for the bpf() system call or the eBPF VM or JIT in the kernel.

(e)BPF has the following guarantees:

* Strictly typed -- registers, and memory are type checked at compilation time. If you use something like Rust, you'd have to bring rustc into the kernel

* Guaranteed to terminate -- you cannot jump backwards, and there is an upper bound on the instruction count

* Bounded memory -- The registers, and accessible memory via maps are a fixed size. We don't have a stack per se.

Compiling Rust to this is possible, but it'd require quite a bit of infrastructure in the kernel to verify that the code is safe, versus the simplicity of eBPF. Early attempts at a general purpose in-kernel VM included passing an AST in, and then doing safety checking on the AST, but they proved too complicated to do safely.

I'm not arguing against eBPF the language. It's safety guarantees make sense to me.

I'm arguing against the in-kernel eBPF infrastructure: bpf system call, the JIT and the VM.

I think it makes more sense to just compile eBPF (or rust or whatever safe language you want) to a kernel module.

The idea with having eBPF in the kernel is that we can limit the amount of trust given to a particular user-space task.

Accepting compiled stuff in the form of a kernel module requires root privileges and requires that the kernel essentially have complete trust in the code being loaded.

Loading eBPF eliminates the need to trust the process/user doing the loading to that level.

The bpf() system call and SOCK_RAW both require root. Is there an example of using bpf that doesn't require root?

The BPF syscalls don't require cap sys admin. Only specific invocations. You can setup a socket filter without sys admin, and a device or XDP filter with net admin.

Sure but how common is that case? How common are multi-tenant Linux systems with untrusted users that give those specific permissions? Do you want untrusted users sniffing the packets of others?

I love rust but it's not a panacea. It'll prevent memory errors and type errors in a lot of cases but that's not the only way you can crash a kernel. Logic errors and giving the wrong data over the interfaces to the kernel have potential to either kill processes, lock up the kernel or cause it to corrupt data. The ebpf interfaces by design don't suffer from these problems because of their restricted nature. They purposefully say there are things you can't compute here so they don't have to solve the halting problem and various other things!

Can you give us an example of such a module?

I can but I don't see why that is necessary. It's plain to see that it's possible and performs better in production since it avoids the JIT step.


RESF!! Why have sandboxes because rust solves every programming error !!

If a kernel module crashes, you panic the kernel (normally).

eBPF probes can't crash and are determanistically safe (they aren't actually Turing complete). So you are unlikely to heavily impact application performance.

If you write your kernel module in eBPF (by pre-compiling to native code) it can't crash either.

BPF was initially added for packet filtering, iirc. Compiling kernel modules for each filtering rules you'd add would not really work out very well.

Since then, BPF has grown to be used by more subsystems, including tracing, and allows user programs to do advanced (and fast) things. See for example https://github.com/ahupowerdns/secfilter . AFAIK, this doesn't require privileges, which loading a kernel module would.

For experimentation and testing, a kernel module for each rule doesn't seem unworkable. Just hide all the details behind a nice tool.

For production, placing all rules in a single module seems best. If you could avoid the overhead of executing BPF in production, wouldn't you?

I agree with the privilege argument but I don't think normal users can filter packets or add tracing with the current situation either.

See the github link I gave. Also, the chromium sandbox doesn't require privileges elevation and uses seccomp BPF.

Having used eBPF/kprobe for work, the main advantage over a precompiled kernel module is convenience. It's much easier to write a C file which hooks a kernel function, then reports that back up to a python script than it is to build and maintain a kernel module and have that talk to some higher level code.

I am not sure what to think of these pictures, it triviliases the entire subject and makes it look child like, no engineer worth his salt will be seen with these doodles on his desk. Stop reading as soon I saw that

no offense, but jvns is kind of known for her drawings & doodles that explain key concepts of a larger, confusing subject. They are awesome, to the point & great for digesting information we need to know, quickly.

Known is relative ....

You need to rethink.

Many doctors will tell you how useful some colouring-in books were to them.

Here's one example, but there are others: https://www.amazon.co.uk/Human-Brain-Coloring-Book-Concepts/...

Richard Feynman said that if you wanted to quickly learn a subject you'd start with the books for children to get a quick overview.

No True Scotsman...

I like the drawings. Cute, positive and informative.

Superficial way to judge the content, too.

I guess I'm not worth my salt.

I should take it upon myself to get familiarized with all of these.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact