
How does perf work? (in which we read the Linux kernel source) - bartbes
http://jvns.ca/blog/2016/03/12/how-does-perf-work-and-some-questions/
======
cthalupa
I'm not an expert, but I believe I can answer some of the questions at the end
of the article.

>are kprobes and ftrace the same kernel system? I feel like they are but I am
confused.

They are not. kprobes and kretprobes have been around in the kernel for a lot
longer than ftrace, and are exposed to multiple tracing programs. ftrace,
perf, bpf, etc, all can make use of kprobes. (ftrace is pretty heavily
dependent on kprobes for being useful, kprobes are still useful without
ftrace)

>what is the relationship between perf and kprobes? if I just want to sample
the registers / address of the instruction pointer from ls's execution, does
that have anything to do with kprobes? with ftrace? I think it doesn't, and
that I only need kprobes if I want to instrument a kernel function (like a
system call), but I'm not sure.

Kprobes make a copy of the instruction you are probing, and replaces the first
bytes of the instruction with a breakpoint instruction. When the CPU hits
this, a trap occurs, and the registers are saved, and passed to the kprobe

Perf has the ability to collect CPU performance counters, tracepoints,
kprobes, and uprobes. Tracepoints are added to the code of the application -
they will include a definition in the header, and the actual tracepoint
statement in the code itself. uprobes allow dynamic tracing of user level and
library calls. You echo in a probe name, executable location, and offset, and
then you can start tracing that probe.

------
srcmap
Nice!

Here's another way to view/understand the kernel and user space code:

[http://www.srcmap.org/sd_share/14/a2313bc5/Cross_Reference_C...](http://www.srcmap.org/sd_share/14/a2313bc5/Cross_Reference_Code_on_Perf_in_Linux_Kernel_and_User_Space.html)

------
cushychicken
"rdpmc" is shorthand for "read performance monitoring register". It's exactly
what the author guessed - a register that increments every time an instruction
is executed.

------
chris_wot
rr uses perf_events_open - I know because it gives me an error when I run it
on VirtualBox... Initially I thought it was a capabilities issue, so I ran it
as root. Then it segfaulted.

Not sure what's going on. I probably need to file an issue.

------
lttlrck
rdpmc will give a cycle count, so presumably perf stat hooks into the
scheduler to the maintain the count per process?

------
nkurz
I made a comment directly on the blog post which I'll copy here:

I've also spent a lot of time recently trying to understand how 'perf' works
at a low level. I'm getting closer, but a lot of it is still pretty
impenetrable. There are four main resources that I'd recommend.

The first, which you've almost surely found, is the "Perf Wiki":
[https://perf.wiki.kernel.org/index.php/Main_Page](https://perf.wiki.kernel.org/index.php/Main_Page)
There's not a lot there, but it's a good introduction.

The second, which you might have stumbled across, is the text documentation
scattered throughout the kernel source. Most are in
[https://github.com/torvalds/linux/tree/master/tools/perf/Doc...](https://github.com/torvalds/linux/tree/master/tools/perf/Documentation),
but the most useful one is up one directory at
[https://github.com/torvalds/linux/blob/master/tools/perf/des...](https://github.com/torvalds/linux/blob/master/tools/perf/design.txt).

The third is Andi Kleen's PMU Tools: [https://github.com/andikleen/pmu-
tools](https://github.com/andikleen/pmu-tools) The 'jevents' library within
this illustrates how to use 'perf' to set up the counters while using 'rdpmc'
from userspace to read them.

The fourth is Vince Weaver's Unofficial Linux Perf Events Web-Page
([http://web.eece.maine.edu/~vweaver/projects/perf_events/](http://web.eece.maine.edu/~vweaver/projects/perf_events/))
and his associated Perf Event Testsuite
([https://github.com/deater/perf_event_tests](https://github.com/deater/perf_event_tests)).
Tests make wonderful examples.

The deeper I got into it, the more I realized is that 'perf' is still
evolving, and there is a lot of anger and discontent below the surface. There
were (and are) competing alternatives, but 'perf' is politically in control.
Much of what you read about 'perf' should be probably be viewed through the
lens of "history written by the victor", and "the vanquished" may have
different perspectives.

\---

Separately, in case anyone is already familiar with the internals, here's an
aspect where I'm currently stuck. There is an "offset" field which one is
supposed to add to the result read from "rdpmc", but when I do so I get
strange problems: [https://github.com/nkurz/pmu-
tools/commit/f2ab49207d4c7b7ddd...](https://github.com/nkurz/pmu-
tools/commit/f2ab49207d4c7b7dddf37a00aa293b47c7f7b897)

------
b3h3moth
perf is simply awesome.

