Hacker News new | comments | ask | show | jobs | submit login
On libunwind and dynamically generated code on x86-64 (corsix.org)
96 points by daurnimator on Apr 12, 2016 | hide | past | web | favorite | 29 comments

I concur with the author, it's an absolute mess. And I would love to see a single unified API for this. It's been a wish of mine for many years.

Here are even more interfaces than the three the author listed (libunwind, __register_frame, GDB):

- oprofile has a JIT interface: http://oprofile.sourceforge.net/doc/devel/jit-interface.html

- perf has a JIT interface: https://github.com/torvalds/linux/blob/master/tools/perf/Doc...

I wrote a blog article about the craziness around unwinding a few years back: http://blog.reverberate.org/2013/05/deep-wizardry-stack-unwi...

One of the questions I answer in my article which isn't addressed as much here is: why do we care about generating backtraces in the first place? It might seem obvious, but I identified four separate reasons which are all important: to support runtime exceptions (a la C++), to offer backtraces in debuggers, to sample the stack for profilers, and to print a backtrace from the program itself. So if you have a JIT and you don't offer proper backtraces, your users are going to be annoyed when they use any of these tools.

You probably knew this, but just to add in case others weren't aware: Those reasons are the reasons you want unwinding on Unix, but on Windows it's even more important. The Windows equivalent of Unix signals (very roughly speaking) is Structured Exception Handling (SEH), in which the kernel actually arranges for your stack to be unwound to deliver you signals such as segfaults [1]. So unless you maintain proper unwinding info, your app isn't in a position to receive any signals sent by the kernel (at least, the way you're intended to).

[1]: http://www.nynaeve.net/?p=201

That blog looks very interesting, thanks.

I found out the hard way that Microsoft has two sets of ARM compilers for WinCE7, and only one of them generates code that's compatible with the SEH unwinder.

I don't think SEH actually calls destructors when unwinding C++, unlike C++ exceptions. This makes it leaky to use for anything other than fatal exceptions.

> I don't think SEH actually calls destructors when unwinding C++, unlike C++ exceptions. This makes it leaky to use for anything other than fatal exceptions.

How do you come to the conclusion that SEH is a leaky abstraction instead of the behavior of C++ destructors?

He means it causes resource leaks in any classes that implement RAII.

SEH was there before C++ was fully standardized (first standardized version was C++98). In this sense one can argue that the definition of RAII is leaky since it "forgot" that SEH was already there and not the other way round. A proper solution would have been that functions where RAII is used must not call functions that can throw an SEH exception.

All kinds of functions can throw SEH exceptions, because they're machine check exceptions (divide by zero, reference invalid page/null pointer).

I went and looked this up to refresh my memory: https://msdn.microsoft.com/en-us/library/swezty51.aspx

Microsoft recommend you don't mix the use of SEH exceptions with C++, or that you use the flag which turns SEH into C++ exceptions (and therefore calls destructors). In our case we didn't want to do that, because we're doing our own crash reporting and all we want is to log a backtrace while exiting the program, and do not want to call destructors which may themselves crash because the program is in an invalid state.

To me this all looks like stuff that is undefined behavior in C++, so a C++ program for which such a problem occurs is by definition invalid. In other words: You have a large problem in your code.

Like I said: it's crash logging code, to be used for detecting problems in the code. Undefined behaviour is not invalid: if C++ compilers rejected all UB it would be considerably easier to ship (harder to write!) reliable programs.

No: The C++ standard requires from programmer that it will not trigger UB. This is a kind of contract between the programmer and the compiler.

The compiler vendor is free to extend the standard and define UB.

C++ exception handling is implemented using SEH (so is CLR's IIRC, and Chakra's). SEH is implemented by the OS and is available to non-C++ code too (including C).

We mainly care about those reasons because many popular languages have exceptions and meaningful backtraces. If you remove exceptions, or add asynchronous/lazy/continuationful code, then the game becomes very different.

>It so happens that the Windows API gets this right,

This is my general impression on Win vs Linux: Windows offers more fundamental services as a core OS component, whereas Linux delegates this to a hodge-podge of competing and mutually incompatible user-space components. (Examples: tracing (ETW), transactional database (BlueJet/ESE), etc).

Dynamic linking is the other big one. Windows had DLLs from the beginning, all the various Unices didn't and it's always looked like a bit of an afterthought to me.

Windows DLLs are not position-independent though. This has its advantages and disadvantages.

Oh hey, somebody else feels the pain! It's not only the crazy number of interfaces you have to support (not to mention that some of the interfaces are not particularly well defined, e.g. GDB's interface), but also what happens after. Most of these interfaces were not designed with very many JIT functions in mind and have O(n^2) behavior. One of the Julia core contributors recently worked with the GDB folks to fix their O(n^2), but libunwind still has it (to the point where one of the julia tests takes 15% of it's total runtime just having the unwinder try to find the JIT frame). Library support for JIT code is surprisingly bad.

While RtlInstallFunctionTable callback was designed specifically for JIT compilers, RtlAddFunctionTable may also be used. The trick is to give every function its own function table. I wrote Chakra's P/XData handler and was surprised to find that they still haven't switched to function table callbacks [1], like the CLR has.

[1] https://github.com/Microsoft/ChakraCore/blob/7587147fac16654...

A few months ago I tried to port the "supposedly portable" libunwind to OpenBSD, as a part of an effort to port Julia under OpenBSD. I failed miserably, as I found the task was too complex for my abilities (see this post of mine: https://groups.google.com/forum/#!topic/julia-dev/ndzuU9NixK...).

At that time, I blamed myself for not having been able to understand libunwind's code. I confess that this post has changed a bit my perspective.

If you ever pick this up again, I'd recommend starting with llvm's libunwind library, which is an import of Apple's libunwind + support for other operating systems. As far as I can tell it's the only fully functional unwind library that's actually maintained at the moment. We discussed porting whatever functionality it is missing (probably linux support and the basic assembly profiling heuristics we added to our fork of Apple's libunwind) and just using it for julia on all platforms.

In the "C ecosystem", there is no such thing as universal unwinding. Everything that needs unwinding rolls its own.

The "unwinding" that libunwind refers to is merely the restoration of the procedure state (stack pointer and callee-saved registers). It doesn't know the ad hoc schemes for cleaning up the resources associated with a frame.

So that is to say, we can't take this library into the implementation of some language, and gain the ability to unwind through the frames of a foreign library which we called (and which called back into us) for doing things like safely performing a non-local return across that library. It's not going to deal with its resources, even if that library has some sort of unwind-protect scheme.

Basically, it's hard to understand what libunwind is for; what do I gain by integrating libunwind, beyond dumping back traces? Maybe I can integrate an interactive mini-debugger into the program where you can step through frames and examine state?

To be fair it's not unreasonable that every language has its own approach.

In "the old days" some systems had system-wide calling conventions. For example VMS had a standard calling convention all languages were supposed to use. Sounds good -- tools and subroutines can interoperate, right? Actually not really: object representation differed, for example. Plus new features (such as exceptions) that hadn't been thought up when the calling convention was developed couldn't be supported.

SVR4 attempted to make a common set of calling conventions, and to some degree succeeded, but really didn't help that much for the same reasons above.

As with so many problems in computing, the problem can possibly be solved by adding a layer of indirection: add a special ELF section with special entry points that support unwinding and examining frames. As long as you compile all your code with one compiler you should be OK...but what happens when new language features are added?

Unwinding under Android, especially from JNI is also a mess.


Is anyone able to add perspective as to how this compares to OS X?

In general, OS X infrastructure looks more like Linux than Windows. If the overall complaint is that it's not built into the OS, well, welcome to UNIX, island of isolated, uncooperative, and sometimes broken toys.

I like UNIX, don't get me wrong, but some of the API design is truly appalling.

Luckily Apple is sane enough to compile its core libraries with frame pointers, so unwinding is pretty trivial. See for yourself (this function is called by backtrace(3)): http://opensource.apple.com//source/Libc/Libc-825.40.1/gen/t...

On the Windows side, Vista was the first client release to begin reincluding frame pointers, having omitted them since (at least) NT 3.51.


> If the overall complaint is that it's not built into the OS, well, welcome to UNIX, island of isolated, uncooperative, and sometimes broken toys.

> I like UNIX, don't get me wrong, but some of the API design is truly appalling.

I guess this is time for my annual post on Plan 9, which is basically Unix Done Rightâ„¢. It really does feel like a coherent operating system, where all the parts fit together, feel well-thought-out and are well-finished. Every time I play with it (and really, at this point that's most of what it's good for) I want to cry at the lost potential. In that way, it's like SmallTalk or Lisp: something better, ignored by the world.

Sadly, while it offered built-in stack unwinding, it looks like the functions involved were different depending on RISC vs. CISC, and they could unwind at most 40 levels deep at a time (!): http://plan9.bell-labs.com/magic/man2html/2/debugger

Friday Q&A has a good little writeup on OS X stack unwinding: https://mikeash.com/pyblog/friday-qa-2012-04-27-plcrashrepor...

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact