Hacker News new | past | comments | ask | show | jobs | submit login

No. It's DWARF based.

The main two tricks are: it preprocesses all of the DWARF info at startup for faster lookups, and it dynamically patches the return addresses of functions on the stack injecting an address to its own trampoline, which allows it to skip going through the whole stack trace every time it needs to dump a backtrace. For example, if you're running a function nested 100 stack frames deep and that function calls malloc 100 times then Bytehound will only go through ~300 stack frames in total (~100 times for the first call then only ~2 frames for each successive call, if my math is right), while other similar tools will go through 10000 stack frames (going through all ~100 frames to the very bottom for every call).




Dynamic patching of return addresses is a very cool trick. I don't think I've seen this before. Have you run into any situations where this crashes programs or otherwise interferes with their execution?


Turbo Pascal used it for the overlay implementation (for DOS) -- overlays = virtual memory at home.

TP 5.0 from 1988 was the first version that had it.

The idea was to make sure the code the CPU returned to would actually be in memory.

I'm pretty sure Windows 1.0 did something very similar.


If the program's already doing weird stuff with the stack/control flow/etc., yes, but that should be relatively rare and for the majority of the programs it should work fine.


Thanks for the reply. I ended up implementing this idea in Go and wrote a blog post about the results: https://blog.felixge.de/blazingly-fast-shadow-stacks-for-go/

I'm curious if you've done any benchmarking for your implementation as well?


> Thanks for the reply. I ended up implementing this idea in Go and wrote a blog post about the results: https://blog.felixge.de/blazingly-fast-shadow-stacks-for-go/

Nice!

> I'm curious if you've done any benchmarking for your implementation as well?

Not in any detail; I just checked that it's significantly faster than doing it naively and left it at that since it was fast enough for my use case.


It's going to play poorly when C++ exceptions are thrown/caught.


Looking at the code [1] it seems like the library is actively trying to handle this problem.

[1] https://github.com/koute/not-perf/blob/master/nwind/src/loca...


It should support C++ exceptions. The trampolines have exception landing pads included to catch and rethrow any exceptions which are thrown through them.


Any plans to extend this idea into a performance profiler?

Also nice use of Gimli - did something similar to make creating stack traces on crash cheaper to symbolicate.


Not currently.

For performance profiling I find that `perf`-like sampling profiling works well enough to find the hot spots, and then Valgrind's Callgrind is great for micro-optimizing the hot spots code on the assembly level.

Of course, it would be cool to have a unified memory + performance analysis tool like this, but I don't think I can justify the time investment to write one in my spare time.

Yeah, I'm really happy that Gimli exists, considering the absolute insanity/complexity pit of DWARF.


For what it’s worth, Valgrind completely fails to run on the Glommio runtime (something about it causes some threading code on startup to deadlock), so I’ve been looking for an alternate profiler that can give me better insights than perf. Also a profiler that can give me deeper insights without all the overhead of Valgrind would be sweet.


Any way this can work on arm64 without dwarf info at runtime? Would be very interested.


I'm not sure about this implementation, but the parca implementation only needs the .eh_frame section of the binary (which is part of, but not all of "DWARF") which still exists even in stripped binaries.

However you then still need debug symbols of some kind to convert those to names.


Yes, it should also work without any debugging info. You'll still need unwinding tables though (used for handling exceptions in C++/panics in Rust/etc.), which are technically DWARF too (except on 32-bit ARM, which is special).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: