
Zero overhead deterministic failure: A unified mechanism for C and C++ [pdf] - Sindisil
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2289.pdf
======
twic
I opened the document and immediately searched for "Sutter" to see how this
interacts with Herbceptions. It turns out, nicely. Well done!

That said, the rationale seems a bit bogus to me:

> The C calling convention has the caller allocate the space for the returned
> value before calling a function. union { T value; E error; } takes the size
> of whichever the bigger of types T or E is, which is as optimal as it can
> be. However the additional discriminant in struct { union { T value; E
> error; }; _Bool failed; }; could take up to eight additional bytes more than
> that, or worse, depending on packing. This may not seem like much, but it is
> more than optimal.

> For fails(E) returns, it is proposed for at least AArch64, ARM, x86 and x64,
> that the discriminant be returned via the CPU's carry flag. This is because
> compilers can often fold the setting or clearing of the CPU's carry flag
> into the ordering of other operations, thus making this a zero runtime
> overhead choice of discriminant.

If the ABI for returning discriminated unions is inefficient, then change the
ABI. Don't add a whole new language feature and then only give that an
efficient ABI.

As for the errno delaying stuff, how about adding a new function
consume_errno(), which (up to "as if") reads errno and then sets it to an
undefined value? Compilers would then be free not to set the real errno at all
if they could prove that a setting of errno was always followed by a call to
consume_errno.

I chuckled at this:

> There are two specific subcategories of C++ types which C can transport
> safely:

... followed by descriptions of Rust types which are Copy, and Rust types
which aren't!

The concept of being "insufficiently zero overhead" is also a good one, a bit
like being a little bit pregnant.

~~~
tomp
> If the ABI for returning discriminated unions is inefficient, then change
> the ABI. Don't add a whole new language feature and then only give that an
> efficient ABI.

That's exactly what they did; there's nothing preventing the compiler from
reusing that signalling mechanism for generic (binary) discriminated unions.

~~~
ksherlock
I'm not sure that's true. For ZOE, the compiler knows whether it's returning a
failure or not, and thus whether to set/clear the carry flag. The callee
immediately handles it so there's no need to store the carry flag. For a
generic user defined union, neither of those is true; the compiler won't be
able to deduce which field is active in all situations and won't be able to
preserve the carry flag (even if it wasn't modified by other instructions, how
would you handle 2 discriminated union variables?) so the flag needs to be
stored in the union. The compiler could automatically add a flag variable and
update it appropriately, but then again, that can be done today in c++ ...
call it std::variant.

------
saurik
Note that this seems to mean "zero extra cost over existing solutions in C",
not "zero extra cost over C++ exceptions" (though I guess they try to make
that argument also): C++ exceptions have no runtime cost at all unless you
throw an exception (which is supposed to be rare), but when that happens the
cost is unpredictable (as the required tables might never have even been
loaded to RAM). This solution is fast and deterministic during an error
condition, but certainly has overhead during the success path: they argue in
the paper that this overhead isn't as bad as you might _expect_ \--as one
might expect a pipeline stall caused by a check on "did an error occur?",
which doesn't happen due to speculative execution--but the reality is it still
uses at least a couple extra instructions _for every function call that can
fail_ (which rapidly becomes "all function calls"). To avoid burning an extra
register (or making functions return aggregates), what they are proposing is:
"for at least AArch64, ARM, x86 and x64, that the discriminant be returned via
the CPU's carry flag. This is because compilers can often fold the setting or
clearing of the CPU's carry flag into the ordering of other operations, thus
making this a zero runtime overhead choice of discriminant." (with a footnote
stating "As it is a single bit being branched upon, status register update
pipeline stalls should not occur."). It seems like the key part of the
argument is that "Adding explicit checks for whether an exception has been
thrown to the successful code path is now usually free of cost on such CPUs,
_as stalls in other parts of the pipeline will block the CPU, thus leaving
idle spare CPU execution units_." (and so saying "we had spare capacity before
we were wasting, so now we are filling it)... I guess I would feel more
comfortable with benchmarks (as I could have sworn someone did benchmarks on
similar schemes a few years ago, due to Rust, and showed what I considered to
be a non-trivial overhead) :/. (Also, don't CPUs normally execute other
hyperthreads during those stalls? I would want to know not just "my program
isn't being extra limited _if on an otherwise idle machine_ ", but have a
concept of what total system impact this change would have.)

~~~
user492966
I'm no C++ expert (I'm actually just starting my first job in a few weeks
using it!) but is it really true that C++ exceptions have no runtime cost at
all? I thought various optimisations aren't possible to implement in the
presence of exceptions; for example, a move constructor that isn't marked
"noexcept" will not be used during a std::vector resize (instead it defers to
the copy constructor), right? Isn't that a runtime cost?

~~~
josefx
That isn't a consequence of exceptions, it is a consequence of the fact that
your move can fail and that std::vector has to maintain a sane state in the
case of an error. You could write a bad::vector that just clears its contents
in case of an error and it could use a throwing move just fine.

------
anfilt
Hmm, I am not sure what to think. I read through it, but C being able to call
C++ functions and handle errors would be nice.

However, it does add things to both languages, and I generally think it's best
to avoid adding things at all costs. It just makes things more complicated.

~~~
xenadu02
It trades false simplicity (errno) for explicit yet bounded implementation
complexity, yielding a much simpler model for the programmer to actually use.

There may be a fatal flaw but at first glance I like it.

~~~
int_19h
IMO the biggest advantage of this proposal is that it'd make error reporting
part of the C ABI, which still is - and is likely to remain - the common base
for all kinds of FFI.

~~~
earenndil
Most languages will probably still do their own error reporting, so it will
arguably not do very much, except make extra work for language maintainers to
make their own error reporting coexist with the c type.

~~~
int_19h
The point is that right now, that error reporting usually ends at the FFI
boundary. But with this thing, it could be propagated _across_ the boundary,
in a standard way that the other side hopefully has a meaningful projection
for.

And if you cross the boundary twice (back and forth), and nothing in between
handles the error, then it can even be propagated across both boundaries in a
way that allows the _original_ error to be preserved or reconstituted. So you
could e.g. throw Python exceptions across C++ stack frames and catch them on
the other side - and C++ would correctly execute destructors etc.

------
eps
Definitely interesting, but the net result is a C code that lacks C's inherent
simplicity. Too much declarative noise basically, and a lack of operational
transparency. It basically makes C look (and feel) like it's been... erm...
pollutted with C++.

