
Stack unwinding in Rust - the_mitsuhiko
http://lucumr.pocoo.org/2014/10/30/dont-panic
======
aidanhs
> Currently there is definitely a case where there are too many calls in Rust
> that will just panic.

So, as a newcomer to rust, how can I identify which calls could blow up my
single-task app? The docs on println! ([http://doc.rust-
lang.org/std/macro.println!.html](http://doc.rust-
lang.org/std/macro.println!.html)) say nothing about this.

I could use task::try everywhere ([http://doc.rust-lang.org/guide-
tasks.html#handling-task-pani...](http://doc.rust-lang.org/guide-
tasks.html#handling-task-panics)), but I'd prefer to have an indication of
which functions are pure and cannot error (modulo bugs).

------
pitterpatter
> For instance on Windows there is Structured Exception Handling (SEH) which
> however is not used by LLVM currently and as such not by Rust.

FYI, LLVM does actually support SEH on 64bit windows and Rust does actually
make use of it on that platform. It's SEH on 32bit windows (which is
completely different) that isn't supported.

------
boulos
It's sad that all of this would have been a lot easier if only the x86-64 C
ABI didn't decide to say "Eh, you don't have to reserve rbp and rsp for
stacks".

The code to unwind C-code that follows rbp and rsp is super trivial (and
lightning fast). Once you get into the hell of DWARF you're talking tons and
tons of cycles. In all cases, dealing with inlined functions is messy and
imprecise (one of the reasons I've heard for not relying on rbp/rsp), but when
building a runtime it just feels sad to have tossed out the stack frames.

~~~
nly
I don't get it. If you're unwinding the stack aren't you going to be referring
to DWARF tables anyway for things like symbols, CFI to locate catch blocks (in
C++), or something else that is actually going to allow you to do something
useful? Having to switch off of the instruction pointer to decode the stack
doesn't seem like much of an extra burden. Just dumping a list of stack frame
addresses isn't terribly useful.

On another note, the DWARF standard is fairly approachable reading:
[http://dwarfstd.org/doc/DWARF4.pdf](http://dwarfstd.org/doc/DWARF4.pdf)

~~~
boulos
You can offer quite a lot of info (like backtraces, sampling profilers, etc.)
for many runtimes just with the function name (which you can do without full
on DWARF). The old LLVM JIT engine had a good NotifyFunctionEmitted hook for
exactly this kind of usage. It's really about taking DWARF parsing / loading
off the critical path for common use cases.

------
agentultra
I'm not terribly fond of either option in Rust; however I would encourage the
formalization of a protocol for handling non-fatal errors. It's the route
Common Lisp went when it inherited the idea from the Multics PL/1 OS. It was a
huge win and they chose a very user-friendly (ie: programmer-friendly) design.

------
Benjamin_Dobell
Did anyone else read this and get the feeling they'd absorbed words but learnt
nothing?

To be honest I don't know enough about the argument for/against to form an
opinion. However this article did absolutely nothing to sway me. The article
gives a couple of "reasons" as to why stack unwinding is hard, and seems to
imply that they wouldn't be an issue with API level error reporting - but
doesn't explain how or why or give any reasoning or practical examples.

~~~
bsaul
Had absolutely no idea this issue of stack unwindling having a cost and being
non portable even existed, so that article was really interesting to me.

~~~
Benjamin_Dobell
Fair enough. I was mostly just confused as to why this had many up-votes but
no discussion around it. I guess there was a bit more content in the article
than I realised, but I just happened to already be familiar with that part.

------
ash
Another thread:
[https://news.ycombinator.com/item?id=8536412](https://news.ycombinator.com/item?id=8536412)

------
qznc
Performance is an argument pro unwinding, because the default case (no
exception, no unwinding) does not have to check for errors.

~~~
cliffbean
SIGABRT (or similar) approach is similarly tempting. No checking for errors,
no complicated control paths. If Unix systems let processes register (and
unregister) files to be automatically deleted on abnormal exit, it'd be pretty
convenient.

~~~
the_why_of_y
That is already possible in UNIX: just unlink(2) the file and it will be
automatically deleted once its last file descriptor is closed.

Linux 3.11 added support for the O_TMPFILE flag to open(2) so it's not even
necessary to call unlink(2).

------
DasIch
So how does Rust currently deal with calling into a C function which calls
into a Rust function that panics?

From what I understand based on the article and assuming you don't know which
C compiler was used, it seems like this would be impossible to properly deal
with and produce undefined behavior.

~~~
veddan
Unwinding through C is undefined behavior, so a Rust function called from C
must catch all exceptions and (preferably) return an error code to the C
caller.

------
cousin_it
I didn't know that Rust doesn't have exceptions! What's the rationale? Is it
about memory management, or the performance cost, or something else?

~~~
qznc
From the article's description I'd say Rust has exceptions, you just cannot
catch them. In Java-speak, you only have try-finally. They are catched at the
bottom of the call stack automatically, killing the thread.

~~~
cousin_it
Now I'm even more confused. If the machinery for try-finally (automatically
cleaning up resources during unwinding) already exists and is paid for, what's
the rationale for not having catch?

~~~
Jweb_Guru
Indeed, I've heard at least one Rust developer argue for exactly that.

However, since exceptions are not part of the public function API, I think
they have to be carefully packaged to ensure that they are only used in cases
where catching them makes no sense (that is, for violated invariants, not
normal runtime errors). The biggest nontechnical problem with unchecked
exceptions in other languages is that they escape the type system and make it
impossible to reason about code by looking at function signatures. This makes
them unsuitable as general-purpose error reporting mechanisms, even though
table-based unwinding is faster in the common case than error propagation
through return statements.

Java tried to solve this problem using checked exceptions, which are probably
my favorite feature in the whole language, but sadly I appear to be pretty
much alone in liking them. Besides the Java community's total rejection of the
feature, it was always hampered by the many ways Java offers to subvert it:
unchecked exceptions of various kinds, misfeatures like Thread.stop, a
standard that allows exceptions to be thrown at literally almost any point in
the presence of VM problems, etc.

Personally I generally agree with the article author--exceptions are useful
for some projects, but most of the time the best solution is to abort. Not
having to worry about exception safety makes writing unsafe code vastly
simpler and doesn't force you to write code in ways that can cause missed
optimizations. You also don't have to worry about corrupted shared state as
you do with thread unwinding, since processes actually have isolated address
spaces. You can take advantage of hardware traps on failure modes like
division by zero, which are significantly cheaper than exceptions. And, of
course, you get faster compile times and smaller binaries (though the latter
is not a big deal most of the time). So my feeling is that that should be the
default, with an option to opt into using checked exceptions for projects that
need it (web servers, Servo).

My other concern with fully supporting C++ exceptions would thus be that I
don't think it would be nearly as easy to switch between the abort method and
try-catch as between abort and task failure. I can see the argument for it,
though--you're already going to have different semantics with abort, and in
some cases the performance benefits of landing pads compared to return result
propagation and/or process restart are important enough to outweigh the
disadvantages.

~~~
cousin_it
How is abort better for reasoning about the code?

~~~
Jweb_Guru
When code aborts, you don't _have_ to reason about it in calling code. When
you decide to abort, you are not just suggesting that there might be a
problem. You are asserting "nope, there is no reasonable way I can continue
here." Which is exactly what task failure is supposed to be used for,
incidentally (which is why it was renamed to panic!)--normal, expected error
conditions that you can actually do something useful about should be handled
through the type system, not propagated through code that might not even be
aware of its existence.

In the meantime, abort completely cleans up any in-process state--closing
network pipes, deleting temporary files, deallocating mmaped memory, and so
on. And what it can't clean up can't be relied on anyway, because programs can
_always_ be aborted unexpectedly in other ways. Whether it's the OOM killer,
various Rust behavior that triggers abort (panicking during unwinding, for
example), stack overflow, power loss, or just SIGKILL, a robust program can
never rely on its destructors running anyway (and destructors failing to run
is explicitly not part of Rust's definition of unsafe).

So ultimately, the reason abort is easier to reason about is that except in a
few special cases (like embedded, where you may have complete control of the
hardware--but you likely don't want to be using task failure there), your
program _already_ has to be designed to expect an abort at literally any time.
Aborting may not always be _desired_ behavior, but it never introduces
additional cognitive load, or creates unsafety where it didn't already exist,
in the way that exceptions do.

------
asb
So if the x86-64 ABI is suboptimal for getting a stracktrace, can anyone name
an architecture's ABI that is better, and in what way?

