Hacker News new | past | comments | ask | show | jobs | submit login
How does perf work? (in which we read the Linux kernel source) (jvns.ca)
78 points by bartbes on Mar 13, 2016 | hide | past | web | favorite | 7 comments



I'm not an expert, but I believe I can answer some of the questions at the end of the article.

>are kprobes and ftrace the same kernel system? I feel like they are but I am confused.

They are not. kprobes and kretprobes have been around in the kernel for a lot longer than ftrace, and are exposed to multiple tracing programs. ftrace, perf, bpf, etc, all can make use of kprobes. (ftrace is pretty heavily dependent on kprobes for being useful, kprobes are still useful without ftrace)

>what is the relationship between perf and kprobes? if I just want to sample the registers / address of the instruction pointer from ls's execution, does that have anything to do with kprobes? with ftrace? I think it doesn't, and that I only need kprobes if I want to instrument a kernel function (like a system call), but I'm not sure.

Kprobes make a copy of the instruction you are probing, and replaces the first bytes of the instruction with a breakpoint instruction. When the CPU hits this, a trap occurs, and the registers are saved, and passed to the kprobe

Perf has the ability to collect CPU performance counters, tracepoints, kprobes, and uprobes. Tracepoints are added to the code of the application - they will include a definition in the header, and the actual tracepoint statement in the code itself. uprobes allow dynamic tracing of user level and library calls. You echo in a probe name, executable location, and offset, and then you can start tracing that probe.


Nice!

Here's another way to view/understand the kernel and user space code:

http://www.srcmap.org/sd_share/14/a2313bc5/Cross_Reference_C...


"rdpmc" is shorthand for "read performance monitoring register". It's exactly what the author guessed - a register that increments every time an instruction is executed.


rr uses perf_events_open - I know because it gives me an error when I run it on VirtualBox... Initially I thought it was a capabilities issue, so I ran it as root. Then it segfaulted.

Not sure what's going on. I probably need to file an issue.


rdpmc will give a cycle count, so presumably perf stat hooks into the scheduler to the maintain the count per process?


I made a comment directly on the blog post which I'll copy here:

I've also spent a lot of time recently trying to understand how 'perf' works at a low level. I'm getting closer, but a lot of it is still pretty impenetrable. There are four main resources that I'd recommend.

The first, which you've almost surely found, is the "Perf Wiki": https://perf.wiki.kernel.org/index.php/Main_Page There's not a lot there, but it's a good introduction.

The second, which you might have stumbled across, is the text documentation scattered throughout the kernel source. Most are in https://github.com/torvalds/linux/tree/master/tools/perf/Doc..., but the most useful one is up one directory at https://github.com/torvalds/linux/blob/master/tools/perf/des....

The third is Andi Kleen's PMU Tools: https://github.com/andikleen/pmu-tools The 'jevents' library within this illustrates how to use 'perf' to set up the counters while using 'rdpmc' from userspace to read them.

The fourth is Vince Weaver's Unofficial Linux Perf Events Web-Page (http://web.eece.maine.edu/~vweaver/projects/perf_events/) and his associated Perf Event Testsuite (https://github.com/deater/perf_event_tests). Tests make wonderful examples.

The deeper I got into it, the more I realized is that 'perf' is still evolving, and there is a lot of anger and discontent below the surface. There were (and are) competing alternatives, but 'perf' is politically in control. Much of what you read about 'perf' should be probably be viewed through the lens of "history written by the victor", and "the vanquished" may have different perspectives.

---

Separately, in case anyone is already familiar with the internals, here's an aspect where I'm currently stuck. There is an "offset" field which one is supposed to add to the result read from "rdpmc", but when I do so I get strange problems: https://github.com/nkurz/pmu-tools/commit/f2ab49207d4c7b7ddd...


perf is simply awesome.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: