Hacker News new | past | comments | ask | show | jobs | submit login

This is a good idea for the short-term. As of now, frame pointers are the most reliable way to ensure that software can be profiled by tools like perf*. The core issue is that the kernel must be the one to unwind the userspace stack, and it only knows how to unwind stacks with frame pointers**. The .eh_frame data will never be supported by the kernel, because it involves a turing-complete program that must be executed to compute the necessary unwind info***.

For the long term, the more exciting option that's emerging is SFrame[1]. This is a new data section which would be generated by the compiler and contains unwind tables which the kernel will be able to understand. Unlike DWARF/.eh_frame, these tables would remain in the final binary (i.e. not be stripped away), and on exec(), the kernel would store them for use during profiling. Since the format is quite similar to ORC(*), and Steven Rostedt is quite invested in the format, it seems a safe bet that support will land in the kernel.

My hope isn't necessarily that a distribution completely disables frame pointers once this format becomes available... though it could be an interesting thing to try. Rather, there can be a conscious choice about whether frame pointers are used, or SFrame, which would be useful for cases like Python, where it's mentioned that frame pointers may still have a significant performance impact. The kernel should be able to fall back to frame pointers when SFrame is unavailable, which means that either will be acceptable. Ideally, in a few years time we'll be able to go back to forgetting about frame pointers for most cases :)

---

* Ironically, the kernel itself tends not to use frame pointers! It has its own unwind format called ORC, which gets generated by an in-kernel program called "objtool" which essentially reverse engineers the assembly generated by the compiler. It's x86_64-specific and frequently needs adjustment when the compiler changes code generation. It can't be used for userspace programs.

** it also knows how to unwind kernel stacks with ORC (see above)

*** There is an option to allow perf to unwind with DWARF, but it's a total hack (though a very effective one). By passing --call-graph=dwarf, you can instruct the kernel to copy the userspace stack (by default, 8k bytes!) into the perf event buffer with each sample (this can be as many as 100 or 1000 samples per second, per CPU...). Later, the perf userspace program will use that info, along with information about each process's address space, and the debuginfo for each program, to unwind the stacks. This has huge performance overhead, and it requires that you have easy access to debuginfo, which may not be the case, especially for container workloads.

[1] https://lwn.net/Articles/940686/




We’ve also figured out an alternative format to use from within eBPF to unwind stacks (we happen to only support dwarf at the moment but theoretically any source information could work): https://www.polarsignals.com/blog/posts/2022/11/29/dwarf-bas...


Yeah I saw Vaishali & Javier's presentation [1] at LPC last year! Great stuff, & certainly available to use now rather than when SFrame becomes available and supported.

In the same spirit, it seems that the .eh_frame -> BPF unwind table process could be (relatively) easily modified to produce SFrame, which you could attach to the binaries if you have a trustworthy way of doing that (which is... a big if). So that once SFrame support becomes available in the kernel, you could apply it to applications without rebuilding them.

[1]: https://lpc.events/event/16/contributions/1361/


I would need to double check with the team on this detail, but if I recall correctly the architecture as it is is specifically designed to make the bpf verfier happy and we didn’t think it was going to be possible with existing formats. But happy to reconsider, we’d of course much rather use a standardized format if possible!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: