Hacker News new | past | comments | ask | show | jobs | submit login
Stack unwinding in Rust (pocoo.org)
91 points by the_mitsuhiko on Oct 31, 2014 | hide | past | favorite | 28 comments

> Currently there is definitely a case where there are too many calls in Rust that will just panic.

So, as a newcomer to rust, how can I identify which calls could blow up my single-task app? The docs on println! (http://doc.rust-lang.org/std/macro.println!.html) say nothing about this.

I could use task::try everywhere (http://doc.rust-lang.org/guide-tasks.html#handling-task-pani...), but I'd prefer to have an indication of which functions are pure and cannot error (modulo bugs).

> For instance on Windows there is Structured Exception Handling (SEH) which however is not used by LLVM currently and as such not by Rust.

FYI, LLVM does actually support SEH on 64bit windows and Rust does actually make use of it on that platform. It's SEH on 32bit windows (which is completely different) that isn't supported.

It's sad that all of this would have been a lot easier if only the x86-64 C ABI didn't decide to say "Eh, you don't have to reserve rbp and rsp for stacks".

The code to unwind C-code that follows rbp and rsp is super trivial (and lightning fast). Once you get into the hell of DWARF you're talking tons and tons of cycles. In all cases, dealing with inlined functions is messy and imprecise (one of the reasons I've heard for not relying on rbp/rsp), but when building a runtime it just feels sad to have tossed out the stack frames.

I don't get it. If you're unwinding the stack aren't you going to be referring to DWARF tables anyway for things like symbols, CFI to locate catch blocks (in C++), or something else that is actually going to allow you to do something useful? Having to switch off of the instruction pointer to decode the stack doesn't seem like much of an extra burden. Just dumping a list of stack frame addresses isn't terribly useful.

On another note, the DWARF standard is fairly approachable reading: http://dwarfstd.org/doc/DWARF4.pdf

You can offer quite a lot of info (like backtraces, sampling profilers, etc.) for many runtimes just with the function name (which you can do without full on DWARF). The old LLVM JIT engine had a good NotifyFunctionEmitted hook for exactly this kind of usage. It's really about taking DWARF parsing / loading off the critical path for common use cases.

I'm not terribly fond of either option in Rust; however I would encourage the formalization of a protocol for handling non-fatal errors. It's the route Common Lisp went when it inherited the idea from the Multics PL/1 OS. It was a huge win and they chose a very user-friendly (ie: programmer-friendly) design.

Did anyone else read this and get the feeling they'd absorbed words but learnt nothing?

To be honest I don't know enough about the argument for/against to form an opinion. However this article did absolutely nothing to sway me. The article gives a couple of "reasons" as to why stack unwinding is hard, and seems to imply that they wouldn't be an issue with API level error reporting - but doesn't explain how or why or give any reasoning or practical examples.

Had absolutely no idea this issue of stack unwindling having a cost and being non portable even existed, so that article was really interesting to me.

Fair enough. I was mostly just confused as to why this had many up-votes but no discussion around it. I guess there was a bit more content in the article than I realised, but I just happened to already be familiar with that part.

Performance is an argument pro unwinding, because the default case (no exception, no unwinding) does not have to check for errors.

It is not nearly as simple as you are hoping. The threat of unwinding forces compilers to insert landing pads (increasing code size) and forgo some optimisations because they have to be careful to preserve semantics even during unwinding, restricting things like how much code can be reordered etc.

It certainly makes things more complex for the compiler. However, I don't know an example where it would prevent an optimization.

It's not just a matter of the compiler missing optimizations. Some (unsafe, low-level) algorithms have to be written differently (and less efficiently) because of the possibility of unwinding. Disallowed patterns typically look something like

  * Put an object in an invalid state
  * Perform some operation that might unwind
  * Fix the object again
If you unwind while the object is in an invalid state, and its destructor is called during unwinding, you can end up with undefined behavior.

SIGABRT (or similar) approach is similarly tempting. No checking for errors, no complicated control paths. If Unix systems let processes register (and unregister) files to be automatically deleted on abnormal exit, it'd be pretty convenient.

The generalized form of this would take arbitrary cleanup actions, not just file deletions, and would look a lot like atexit(); In fact, one could put a SIGABORT handler in that would (a) deregister itself and (b) call exit() to run atexit handlers.

Of course, if the reason for the abort was memory scrambling that destroyed the registry of cleanup actions, things get messy, which is why it needs to deregister the signal handler.

I'm not keen on having to hijack the signal handler, nor of the global variables needed for the atexit (or equivalent) handler registry.

All things considered, I'd rather language + compiler support for unwinding with user settable cleanup handlers, a/k/a real exceptions.

That is already possible in UNIX: just unlink(2) the file and it will be automatically deleted once its last file descriptor is closed.

Linux 3.11 added support for the O_TMPFILE flag to open(2) so it's not even necessary to call unlink(2).

So how does Rust currently deal with calling into a C function which calls into a Rust function that panics?

From what I understand based on the article and assuming you don't know which C compiler was used, it seems like this would be impossible to properly deal with and produce undefined behavior.

Unwinding through C is undefined behavior, so a Rust function called from C must catch all exceptions and (preferably) return an error code to the C caller.

I didn't know that Rust doesn't have exceptions! What's the rationale? Is it about memory management, or the performance cost, or something else?

From the article's description I'd say Rust has exceptions, you just cannot catch them. In Java-speak, you only have try-finally. They are catched at the bottom of the call stack automatically, killing the thread.

Now I'm even more confused. If the machinery for try-finally (automatically cleaning up resources during unwinding) already exists and is paid for, what's the rationale for not having catch?

Indeed, I've heard at least one Rust developer argue for exactly that.

However, since exceptions are not part of the public function API, I think they have to be carefully packaged to ensure that they are only used in cases where catching them makes no sense (that is, for violated invariants, not normal runtime errors). The biggest nontechnical problem with unchecked exceptions in other languages is that they escape the type system and make it impossible to reason about code by looking at function signatures. This makes them unsuitable as general-purpose error reporting mechanisms, even though table-based unwinding is faster in the common case than error propagation through return statements.

Java tried to solve this problem using checked exceptions, which are probably my favorite feature in the whole language, but sadly I appear to be pretty much alone in liking them. Besides the Java community's total rejection of the feature, it was always hampered by the many ways Java offers to subvert it: unchecked exceptions of various kinds, misfeatures like Thread.stop, a standard that allows exceptions to be thrown at literally almost any point in the presence of VM problems, etc.

Personally I generally agree with the article author--exceptions are useful for some projects, but most of the time the best solution is to abort. Not having to worry about exception safety makes writing unsafe code vastly simpler and doesn't force you to write code in ways that can cause missed optimizations. You also don't have to worry about corrupted shared state as you do with thread unwinding, since processes actually have isolated address spaces. You can take advantage of hardware traps on failure modes like division by zero, which are significantly cheaper than exceptions. And, of course, you get faster compile times and smaller binaries (though the latter is not a big deal most of the time). So my feeling is that that should be the default, with an option to opt into using checked exceptions for projects that need it (web servers, Servo).

My other concern with fully supporting C++ exceptions would thus be that I don't think it would be nearly as easy to switch between the abort method and try-catch as between abort and task failure. I can see the argument for it, though--you're already going to have different semantics with abort, and in some cases the performance benefits of landing pads compared to return result propagation and/or process restart are important enough to outweigh the disadvantages.

How is abort better for reasoning about the code?

When code aborts, you don't have to reason about it in calling code. When you decide to abort, you are not just suggesting that there might be a problem. You are asserting "nope, there is no reasonable way I can continue here." Which is exactly what task failure is supposed to be used for, incidentally (which is why it was renamed to panic!)--normal, expected error conditions that you can actually do something useful about should be handled through the type system, not propagated through code that might not even be aware of its existence.

In the meantime, abort completely cleans up any in-process state--closing network pipes, deleting temporary files, deallocating mmaped memory, and so on. And what it can't clean up can't be relied on anyway, because programs can always be aborted unexpectedly in other ways. Whether it's the OOM killer, various Rust behavior that triggers abort (panicking during unwinding, for example), stack overflow, power loss, or just SIGKILL, a robust program can never rely on its destructors running anyway (and destructors failing to run is explicitly not part of Rust's definition of unsafe).

So ultimately, the reason abort is easier to reason about is that except in a few special cases (like embedded, where you may have complete control of the hardware--but you likely don't want to be using task failure there), your program already has to be designed to expect an abort at literally any time. Aborting may not always be desired behavior, but it never introduces additional cognitive load, or creates unsafety where it didn't already exist, in the way that exceptions do.

I expect it is memory management and safety.

In C++, exceptions can cause problems with dangling pointers if used in constructors, I wouldn't be surprised if the same were true in Rust.

Rust doesn't have the concept of a constructor build into the language, although often you'll see a type have a "new" method, to create an instance of it. I think exceptions interact badly with destructors too, though, and Rust does have those.

So if the x86-64 ABI is suboptimal for getting a stracktrace, can anyone name an architecture's ABI that is better, and in what way?

Applications are open for YC Summer 2023

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact