
Bcc: Taming Linux 4.3+ Tracing Superpowers - hepha1979
http://www.brendangregg.com/blog/2015-09-22/bcc-linux-4.3-tracing.html
======
aktau
Awesome post, as always. I know you've at one point taken a look at shark
([https://github.com/sharklinux/shark](https://github.com/sharklinux/shark)),
which seems to be similar in a way except one makes probes (either perf_events
or eBPF) in just Lua. Do you have any notes about how it compares vis-a-vis
BCC?

This seems to be an example eBPF back and front-end:
[https://github.com/sharklinux/shark/blob/master/samples/bpf/...](https://github.com/sharklinux/shark/blob/master/samples/bpf/io_sys_write_hist.lua)

From the looks of it, the eBPF program is also specified in some C (dialect?).
I wonder how it's similar/different to BCC. From the dependencies on shark,
LLVM is also involved.

 _EDIT_ : seems like they're encouraging people to download a precompiled llc-
bpf for now: [https://github.com/sharklinux/llc-
bpf](https://github.com/sharklinux/llc-bpf)

 __EDIT2 __: (sorry for all the edits). My last question would be: you say
"What's new in Linux 4.3 is the ability to print strings from Extended
Berkeley Packet Filters (eBPF) programs.", and then that you needed this for
many tools. However, I can't see that in the biolatency source you pasted. It
seems to me like it's aggregating a map in kernel-space, and printing it out
from userspace once it's done. Just like was possible in kernel 4.1+. What am
I missing?

~~~
drzaeus77
Author of BCC here... I hadn't seen shark before, but the mindset does appear
similar.

A couple comparison points I noticed while briefly investigating shark:

C code is passed to clang+llc as external calls, whereas in BCC clang+llvm are
statically linked in.

Both support native (lua/python) bindings to the eBPF maps.

In shark, I don't see that it is easy to dereference kprobe'd function
arguments, as it is in `bpf_trace_printk("1 W %s %d %d ?\\\n",
req->rq_disk->disk_name, ...)` of <bcc>/tools/biosnoop.

This should also answer your last question, which was where is "%s" used.
tools/opensnoop also uses string printks.

Comparison points aside, I intentionally made sure that the clang legwork that
is being done in BCC is wrapped with a C api, so any language bindings besides
python should be trivial to implement. It would be ideal (in my mind) if shark
could leverage libbcc and make both tools better in the process.

~~~
fche
Can you outline how bcc translates that req->rq_disk->disk_name expression to
bytecode? AIUI, there are no pointer-dereferencing bytecodes. Is it using the
BPF_FUNC_probe_read?

~~~
drzaeus77
Exactly, it is using bpf_probe_read. Internally, BCC uses clang's Rewriter
functionality to mangle valid C (but invalid BPF) into valid C with bpf helper
functions.

The req->rq_disk->disk_name expression would expand into:

({ typeof(char [32]) _val; memset(&_val, 0, sizeof(_val));
bpf_probe_read(&_val, sizeof(_val), (u64)({ typeof(struct gendisk *) _val;
memset(&_val, 0, sizeof(_val)); bpf_probe_read(&_val, sizeof(_val), (u64)req +
offsetof(struct request, rq_disk)); _val; }) + offsetof(struct gendisk,
disk_name)); _val; }));

If you are playing with the tools, the BPF() class takes an optional argument
debug=, where bit 2 (0x4) will print the rewritten C output for your
edification.

------
blinkingled
Attempts to reinvent Dtrace continue unabated thanks to stupid Sun making up a
new OSS license. Such a shame.

It'd have been great to have GPLed, in-kernel Dtrace and ZFS for Linux.

~~~
brendangregg
Yes, it would have been nice to have DTrace in Linux a while ago. But the
reality is a bit more complex than it might look.

Many people I know haven't been aware that Linux has had in-built dynamic
tracing capabilities for years, with ftrace, kprobes and later uprobes. These
are much more difficult to use (which is why I wrote some front-ends:
[https://github.com/brendangregg/perf-
tools](https://github.com/brendangregg/perf-tools)). But if you really cared
about performance, you could use them. So it's not that Linux has been
completely missing out; It's been missing out on a some specific features (eg,
in-kernel aggregations and variables), and a nice interface.

As for attempts unabated to reinvent DTrace: FWIW, that's not really the
genesis in the case of bcc/eBPF. Extended Berkeley Packet Filtering (eBPF) was
developed to create virtual network infrastructures. Things like this:
[https://www.iovisor.org/sites/cpstandard/files/pages/images/...](https://www.iovisor.org/sites/cpstandard/files/pages/images/building_virtual_network_infrastructure_with_io_visor.jpg)
. It provides a kernel virtual machine for executing sandboxed bytecode. As a
bonus, it can be used for system tracing as well, to do custom capabilities
that ftrace/perf_events was missing.

There are too many tracers for Linux already, so I'm pretty glad that eBPF is
getting integrated into the Linux kernel, since it should stop discourage
anyone inventing yet-another-linux-tracer, since the kernel will already have
a powerful tracer in-built. We'll no longer need new tracers. We'll want
front-ends, like bcc.

~~~
lobster_johnson
Can you explain, for someone who isn't familiar with the kernel, why there are
so many different and apparently overlapping tracing tools built in, instead
of just one? Are there plans to clean up and unify them under something like
bcc?

Another example of duplication is BPF and iptables, but I suppose iptables
isn't going away soon simply because it's so popular (and, compared to BPF,
simple to use)?

~~~
brendangregg
There's at least 9 different tracers for Linux, but only 3 built-in.

\- perf_events (the "perf" command), which is intended as the official end-
user profiler/tracer. It's great at PMCs and sampling. It can do dynamic
tracing and tracepoints.

\- ftrace (currently being renamed to "trace", to bring its ambiguity on-par
with doing an internet search for "perf"), which is really a collection of
custom lightweight tracing capabilities, developed to aid the real time kernel
work.

\- eBPF, which is an engine that both perf_events and trace could make use of.

I talked about the more here:
[http://www.brendangregg.com/blog/2015-07-08/choosing-a-
linux...](http://www.brendangregg.com/blog/2015-07-08/choosing-a-linux-
tracer.html)

So you could say that the plan is there is one: perf_events. There's already
work to bring eBPF to perf. perf already has a Python scripting interface, so
it's not inconceivable that one day bcc will become part of perf.

Some (f)trace capabilities could be rewritten/improved in eBPF, which could
mean some cleanup. But the ftrace implementation wasn't bulky to start with.

------
ised
Am I the only one who thought for a moment this might be about Bruce Evans' C
compiler (bcc)? I believe Linux used bcc in the early days. I know they used
the assembler (as86) and I think they still do. Could be wrong.

------
mkesper
Is there a platform independent way of using this or do we need specific C
code for every architecture?

