The second() function with LLVM libcxx using normal DWARF exceptions calls pthread_rwlock_rdlock SIX TIMES and calls pthread_rwlock_wrlock one time. Overall, this program triggers ~4000 function calls according to --ftrace (see https://justine.lol/ftrace/)
On the other hand, if I use SJLJ, then this program triggers 269 function calls. It never locks anything. Although, like the first program, it does call pthread_once three times. That's doesn't cost anything though. It's the rwlocks you need to worry about.
Please note that SJLJ exceptions carries a tradeoff. While SJLJ will make throwing exceptions much faster, you'll incur a slight performance degradation on normal code that isn't throwing exceptions. Using GCC in SJLJ mode will cause most of your C++ functions to have extra code inserted into the prologue and epilogue which calls _Unwind_SjLj_Register and _Unwind_SjLj_Unregister. Modern DWARF exception handling makes the intentional tradeoff of making exception throwing extremely slow, so that zero overhead is added for normal non-exceptional code. Both methods (SJLJ and DWARF) will bloat your binary size. With SJLJ it's just a few instructions inserted into prologues and epilogues. But with DWARF you get a lot of mandatory .eh_frame junk that's generated by everything, and it's centralized into a particular section of your binary that's only touched if exceptions are actually thrown.
Scalability issue in OP exceptions is because their code is throwing a lot of exceptions from multiple threads at the same time and therefore causing resource contention in libgcc unwind implementation.
Since you have only shown a simple example of throwing exceptions I'm not sure I understood your point how SJLJ exceptions solve the scalability problem? Truth to be told I'm not very familiar with SJLJ.
SJLJ exceptions do not need to perform ancillary data lookup using program counter values. Instead, there is code on the non-error path that maintains a linked list of exception landing sites when try blocks are entered/exited. Run-time type information is still needed for catch clause matching, but that data can be accessed easily via the vtable pointer and information next to the vtable. None of this needs to access mutable global data structures.
DWARF exception handling needs DWARF unwind information, which is mapped in memory along with the code. But the stack only has program counters, not the addresses of the DWARF data. Therefore, the first step is to locate the DWARF data. The semi-portable interface (supported on GNU/Linux, various BSD, and likely others) for that is dl_iterate_phdr, and that invokes a callback under a global lock. The global lock is required because the list of objects can change in response to dlopen and dlclose calls. With the new way, the GCC unwinder directly asks the glibc dynamic linker for the ELF object corresponding to a program counter value and its DWARF unwind information address. This way, the synchronization with dlopen/dlclose is an internal dynamic linker implementation detail.
The SJLJ runtime doesn't lock a global mutex seven times each time you throw an exception. I'm not kidding. If you use DWARF exceptions then, each time you use the throw keyword, libunwind::DwarfFDECache<libunwind::LocalAddressSpace>::_lock needs to be acquired six time in read mode, and once in write mode. Yes it's a GIL. I traced the address of the lock objects myself. That effectively makes throwing an exception a serialized operation. Only one exception can be thrown at a time, per process. Not only will this limit your throughput to one core, it'll also saturate all the other cores on your system with pointless busy work too, if your pthread_rwlock_t implementation isn't very good, e.g. glibc, musl. See also https://justine.lol/mutex/ The example I gave is just the tip of the iceberg. I'm using it to analyze the runtimes beneath.
Generally warming up caches (l1i, l1d, l2, ...) and other hardware state (e.g. out-of-order engine) so that you get a better picture of how your program will run in the steady state, rather than how it will perform on the first iteration with 'cold caches'. The justification for this is that in most (but not all) software applications, you should expect a reasonable amount of spatial/temporal locality and thus for your caches to be useful. So the 'average' performance of your program would be best captured by how the program runs when 'warm'.
Warm -> populated with the code/data your program uses. Cold -> empty, or filled with stuff your program doesn't use.
I do think that the case of exception handling is one where I would question the validity of this approach: if you catch exceptions rarely, it's quite reasonable to expect that the relevant lines would have been conflict'ed out of your l1i at some point in the interim, and therefore that the l1i (or even l2!) fetch cost is reasonable to include in the benchmark. However, if you expect your exceptions to fire 'relatively often' then it again becomes reasonable to warm up the caches before measuring, so it really comes down to the characteristics of your workload.
I'm just teasing out the one-time costs. Keep in mind, I'm using the above example to trace the libc/libc++/libcxxabi/libunwind runtimes. The first time you throw, it does a bunch of extra stuff, like getenv() and additional locking. I think the first throw needs about thirteen synchronization barriers total.
Aren't we paying for Result types every time it's used? A result of a 64 bit value wouldn't fit into $eax for example (I understand there are exceptions to this rule through invalid bit pattern optimizations)
Just be aware that this is a variant and can't be returned via a register, but needs the stack. In (very) hot paths it can be worth switching back to single return values.
I've been programming in Zig recently and I must say, that's the best error handling I've ever used in any programming language. They are just error return type, much like in Go or Rust, but the integration into the language makes it so seamless. It feels like using exceptions, but without the mental and computing overhead coming from the fact that anything can raise an exception.
I think the issue is that many languages lacked support for option types for a long time so exceptions got overused and abused. Now the pendulum has swung towards "exceptions are bad".
I think both ways of handling errors serve different use cases and it is good to have proper language support for both. I like to use exceptions for errors that are... literally exceptions. Stuff that is out of my control, being out of memory, stuff that I don't expect to happen on the happy path. When the default should be to just log the thing and crash. And if your language supports proper exceptions you get nice stack traces in your logs to see what went wrong.
On the other hand error types are superior for things that you should explicitly handle in your code. That are reasonable invariants to take care of. Here the language forcing you to handle them or to explicitly opt out of handling them is extremely valuable.
How do you deal with putting information in your error messages?
My understanding is that Zig errors can't contain any data. But an error message like "file not found" is awful. Are your error messages awful, or is there some way to get information into them so that you can say "file 'foo/bar.json' not found"?
I personally replace it with logging. In most other languages, the common thing is to log error message up in the stack, where you actually handle the error in some way. In Zig, I do it whenever I trigger an error. I've only used Zig for one server application, so having a lot of error messages in logs works fine, even if some of them are possible redundant.
Zig runtime optionally records few last points in sources where the error is propagated to the caller so you get stack traces.
If you need more than that and error codes, then either log as the sibling suggested or use custom return types. Anything more complex than what can be encoded as error codes is not an exception.
In Go you can do errorf("%w: %s", err, path) so that you get "file not found: foo/bar.json", while still being able to match on the wrapped error value with errors.Is(). The caller can do the same with this error and so on up the stack
I dislike this style in Go. It adds a lot of noise to the source and essentially builds a stack trace manually. In Go 1.23 there is errors.Wrap() that adds the error stack automatically so maybe at some point Go will consider again adding some sugar for automatic error propagation.
You might be right, but it doesn't seem correct to extrapolate that Zig's error return type is slow just because C++'s is. If Zig was designed with this kind of error handling from the start, it may well have optimized for it.
You can't really "optimize" that every function call now has a branch after it to check for an error. That's a design choice Zig made at the cost of performance, which is not inherently bad. After all, lots of languages intentionally sacrifice performance for some other benefit. But those types of choices are not things you can just waive a magic wand at later to hope it goes away.
> You might be right, but it doesn't seem correct to extrapolate that Zig's error return type is slow just because C++'s is. If Zig was designed with this kind of error handling from the start, it may well have optimized for it.
It seems pretty correct to me? It's just based on first principles. Returning a value + error is strictly more work than returning just a value, and compilers can't solve the halting problem to optimize away arbitrary amounts of extra work. Ergo, performance hit.
Comparing zig error handling to no error handling is a bit disingenuous. Error handling does work and therefore costs something. This whole thread is about how exceptions are very expensive when thrown. The question then becomes how different error handling strategies handle real workloads (which will include some percentage of error cases)
Are there any benchmarks of Windows EH? The implementation is very different from DWARF/SJLJ EH and it would be interesting to see how it fares. I've seen some pretty exceptional claims for both sides of the argument (e.g. "[...] has no code impact for x64 native, ARM, or ARM64 platforms" [1]), but there's rarely if ever any data to back this up.
32-bit Windows SEH is very different for sure, but I thought 64-bit so-called “zero-overhead” SEH was a table-driven affair fairly similar to DWARF / Itanium ABI exceptions?
> The root cause is that the unwinder grabs a global mutex to protect the unwinding tables from concurrent changes from shared libraries.
> We did a prototype implementation where we changed the gcc exception logic to register all unwinding tables in a b-tree with optimistic lock coupling.
Did they try just stripping synchronization altogether to see what the performance ceiling would be?
Why not? That’s a good indicator of how much it’s costing,
In the late 90s/early 2000s when languages started promoting general purpose threading, they did so without planning and simply threw locks on everything.
Everyone now pays the cost of that feature whether you use it or not.
Is throwing and catching exceptions across threads a good idea?
+1 there is so much unnecessary synchronization in things. It's crazy how much cost we pay for unnecessary locking in single threaded applications. Even multi threaded applications can end up scaling poorly. There are better ways to achieve partitioning of state that scales better than to simply litter granular locks everywhere.
From what I understand this about how the performance of code that throws scales with the number of cores.
What is the state of current compilers for the non-throwing case? If I have code with exceptions, but test a case that does not actually throw? Will it be as fast as the same code but with exceptions removed altogether?
EDIT: I just saw that Justine's (jart) comment answers this.
Having exceptions as a possibility requires at least some bookkeeping and prevents certain kinds of optimizations due to the compiler not being able to know where an exception could come from. Generally however this is not that much of an issue, and only worth dealing with in the rarest of cases.
That makes sense. I wonder how the amount for this bookkeeping stacks up against the overhead of returning a sum type composed of error and result (like Rust Result<T, E>)?
Always returning a slightly "fatter" type must certainly have an associated cost as well and it is to pay in any case - error or not.
At the limit, you can make Result<T,E> very cheap: with a specifically designed ABI you just need a bit in the flags register and a single (predictable) conditional jump on the return path (see for example the Herbceptions papers).
At the limit limit, there is no difference between Result<T,E> and checked exceptions, with enough syntactic sugar they can be equivalent both semantically and syntactically [1].
[1] One advantage of typical C++ exceptions implementations, is that if the exception is not caught the stack is not unwound and the stack trace is preserved in the core file. In principle the same thing could be done for Result.
With an exception (not checked exceptions) you get the advantage of the compiler knows about this and can optimize some code paths - any function that doesn't handle the function or have clean up to do can be bypassed, while result/checked exceptions need to go into that function anyway - which may be faster, more benchmarks needed. (though is possible a "sufficiently smart" compiler could understand thus pattern and turn it into an exception).
So the real question is syntax - which do you prefer: having to write some result type all over instead of the regular return type you mean, or knowing that "magically" at unpredictable times your function will just abort in the middle. There are pros and cons to both, so anyone arguing absolutely for either side is wrong - at least now when we don't have decades of experience with result in large projects (though maybe we do in a single project)
> With an exception (not checked exceptions) you get the advantage of the compiler knows about this and can optimize some code paths while result/checked exceptions need to go into that function anyway.
Once you convert them in CPS form there is really no difference between the two. In principle you could compile both to exactly the same code. [edit: also not sure why are you singling out checked exceptions]
[edit: specifically you could compile any code that returns an Either to a function that takes two return continuations instead of just one, and in CPS form exception-like non-local exits are trivial]
Of course in practice traditional procedural languages do not compile through CPS and use abnormal edges in basic blocks to represent exceptions.
> There are pros and cons to both, so anyone arguing absolutely for either side is wrong
I'm in the camp that, in a well designed language[1], there would be no difference between the two :D
[1] according to my personal preferences that I can assure you are completely objective!
It's trivially observed that exceptions simply have less code in the normal path. There's no post-call checking for errors, which is of course present in the Result<T,E> approach. And even where the exception is caught, that's not part of the "hot loop". It's not executed at all normally. Unwinding exceptions are completely free when they don't happen. The same is simply not true and never can be true of a return-value based propagation.
"At the limit limit," you can detect the unstructured branches littered through the codebase of people using Result and convert it all into clean structured exceptions, and even automatically remove all the boilerplate this paradigm uses to propagate errors; this is at least a reasonable thought process due to the same reason you can (much more trivially) implement your exception mechanism by burning a return register for an error code and compiling in a ton of branches (which isn't insane if you in fact want to throw an error often: this is all just a tradeoff that Rust made awkward and its community turned into a religious war).
It isn't easy, though, as the Rust way of doing this--letting people pass around a bare Result without the structure of a monad (which is why Haskell's version of this idea looks like exceptions, and why it is so frustrating when people incorrectly claim Rust has "monadic" error handling)--makes it extremely easy to pull off what should be "stunts", such as holding onto a Result and trying to resolve them in an irreducible order; but, "hopefully", most code isn't making that mistake (and you can still implement it by adding a spurious expensive catch in just the places they did that)... but like, that's why I think the person you are responding to might have said "at the limit limit", to really emphasize that this is a pretty complicated reduction to automatically apply globally and truly result in 0 cost.
What optimizations are disabled? I don't see any reason any of them would be, nor is there any bookkeeping. The stack itself is the bookkeeping which is required regardless. The exception-specific stuff is stored elsewhere, which is why throwing an exception is expensive, but if it doesn't throw then nothing in the hot path was pessimised.
One idea is merely a form of control flow (like 'if' or 'goto') that could jump down the call stack. This kind of exception handler could be made fast if necessary.
The other idea is to trigger the processor's actual fault handler, and proceed to the operating system, which then dispatches the exception to the program. The advantage of the latter is that you can catch exceptions raised by the processor that way. But since you need to enter the operating system's exception handler code, it goes slower. C# exceptions are like this.
>One idea is merely a form of control flow (like 'if' or 'goto') that could jump down the call stack.
Interestingly this is more or less how Symbian handled exceptions back in the day. It was a giant mess. You basically called setjmp in a surrounding function (using a macro IIRC) and then longjmp (in the form of user::leave()) which unwound the stack without calling any destructors.
It was very fast and the source of endless memory leaks.
Not only were they fast but the codegen was very small. It had exception support before there was any real standardisation on how they should work.
It was also before RAII was a thing and so you had to manage the ‘CleanupStack’ [0] yourself.
At least for the development of the OS itself, and not frameworks such as the awful Series60 UI, we had have unit tests that brute forced memory correctness by getting the memory allocator to deliberately fail at each next step of the code being developed until it succeeded or panicked.
Are the numbers in the second table correct? For 2, 4, 8 and 16 threads it shows a speedup for 0.1% failures vs 0% failures. For 2, 4, and 8 threads it also shows a speedup for 1% failures vs 0% failures.
That speedup isn’t small. The largest one is for 8 threads, 0.1% vs 0%: 32 vs 64, so occasionally throwing exceptions seems to double the speed of that micro-benchmark.
Is there so much noise in that data? If not, what could cause this?
In general, if you are benchmarking exceptions, it's a sign that you don't understand exceptions.
Exceptions are not meant to replace error codes, that you use to get a detailed picture of the current environment. They are rather like a kind of assertions, where you don't actually care too much what happens next, because the exception being thrown means someone, somewhere, somehow, has already fuc*ed up.
Just to state the obvious, this is a design philosophy (one of many) and not something imposed by the language. And reasonable people can disagree about whether it's a good one.
I think where a lot of engineers go wrong is where they go crazy baking exception-throwing into their logic and error handling code, leaving a beautiful "happy path" in their code, but they are not disciplined about actually catching and handling the exceptions. If you're going to go all in on throwing, then you need to go all in on catching. Otherwise your product is just going to be full of unhandled crashing. A couple projects worth of this, and a developer will start to think "hmm, exceptions are bad, I agree we shouldn't use them!" Which is how we get blanket company policies against using any exceptions at all.
Look at the error rates they are testing. A 0.1% exception rate, which is certainly something that'd be "rare", starts to happen an awful lot when you have a program that's using 128 cores, and 5th generation Epyc can push 384 threads in a single socket now.
Basically, a naive implementation of exceptions uses a lock to guard unwind tables, and this scales poorly. The lock is needed only because .so libraries can be loaded and unloaded dynamically. One solution is to say that after a certain moment, the program doesn't load/unload dynamic libraries and doesn't use the lock after that. That's what ScyllaDB and Yandex[0] do, for example. Another solution is to just use the new glibc which fixes this issue.
That aside, both exceptions and error codes introduce overhead, but they do it differently: exceptions add most[1] of their overhead when an error happens, while checking error codes adds overhead when an error doesn't happen[2]. One might conclude that exceptions should be used when errors are exceptionally rare; otherwise, use error codes. Or, just reclassify errors that happen too often as something you expect to happen, return them as a case of valid data and keep using exceptions.
Checking error codes adds overhead only in the way it is implemented in modern programming languages, by returning an integer error code that must be tested for a conditional jump after returning to the invoking procedure.
Many early programming languages, including various dialects of Algol and Fortran, included a better implementation method.
The invoking procedure passed to the invoked procedure not a single return address, but multiple return addresses. One return address was for the normal return and the other addresses were for alternate returns when errors were detected.
On a modern CPU with many registers, all these return addresses would be passed in registers.
The invoking procedure had at its end one or more handlers for error conditions, corresponding to the alternate return addresses. When a procedure was invoked, no tests were done at the invocation place, because any error would cause a jump to the error handler. In the invoked procedure, a test and a conditional jump must always exist in order to detect an error, and there the alternate return was executed only when an error was detected, so this implementation could omit one test and one conditional jump in comparison with the modern implementations.
Moreover, the text of the procedure was more clear in this way, with all the error handlers separated from the normal execution case, but nonetheless close enough to examine them when necessary.
Thanks, that's an interesting idea, even though I'm not sure that extra instructions to put these return addresses into registers and reduced number of registers available for the codegen is better than cmp and jmp. Also I'm not sure if that will play well with cpu's branch predictor.
No use exceptions. In micro benchmarks we have known for years exceptions are slow. However more recient realistic benchmarks have discovered exceptions are generally faster than the checking - assuming you write all the if statements needed to unwind errors to where they can be handled - nobody does that in practice and so exceptions are slower than ignoring errors is what we were really saying when calling exceptions slow.
I have to admit I have a hard time relating your comment to my experience.
I remember a case where we were using a library throwing exceptions when it could not parse some string. I think we were parsing IP addresses, or something like this. There was a case where this was called in a hot loop (like thousands of IP addresses in a roe) and for whatever reason in some cases most of them were invalid, thus throwing. Then simple error handling: the caller catches the exception, logs the error and checks the next one.
The program was frozen for seconds. Replacing the parser with the same logic but returning a bool instead was something like 1000x faster. That was on MacOs Clang Arm.
I don't see how a few extra ifs (though none were required in this case) would change anything.
Generally I'm not a fan of exceptions, you never what something is going to throw. The majority of the crashes we see in production are unhandled exceptions. Turned out the documentation did not mention that this function could also throw std exceptions on top of the lib ones in this specific case.
You can always find a case where you are stressing the very thing that people are using in a micro-benchmark; like, no one is claiming that table unwinding is the correct choice for every single function!
But, the other extreme you get is saying all error handling should be done with checks and branches, and the idea is that this slows down the code which isn't throwing lots of exceptions so much that, for average workloads, it is the wrong tradeoff.
Ideally, you could make this decision more locally, and yet continue to use the cleaner syntax which comes from structured error handling, but I don't know of any language which implements this :/.
And, so, the best we have is to just use a language which maps exception syntax to tables as you can always -- as you did -- fix the few places it matters to use a result type by adding manual checks.
As always, and generally speaking, if anything they show that drawing the line is difficult since they show in their paper [1] that both std::expected and boost::LEAF are worse than std::exception.
Excerpts from paper:
> Single threaded the fib code using std::expected is more than four times slower than using traditional exceptions.
> This has much less overhead than std::expected, but it is still not for free. For fib we see a slowdown of approx. 60% compared to traditional exceptions, which is still problematic.
Now, admittedly, these workloads are too simple (sqrt and fibonacci) and only shown for the single-threaded use-cases but I guess you have to start from something that is easy enough to reason about otherwise getting to any sane conclusions might be too difficult.
However, they continue to show in their blog how they significantly sped up the std::exception implementation in a very non-trivial workload - multi-threaded and JIT'ed database kernel.
So, to prove the point about exceptions being slower in complex workloads, such as the one above, somebody would actually have to rewrite their whole codebase to use something else such as std::expected, return values or whatever. Non-trivial to say the least.
"Make sure your performance-critical code / tight-loops don't do things which can throw exceptions, and then you won't have to worry (much) about the performance of exceptions"
?
It may not be what the article says, but it's how I avoid having to delve into exception performance.
No, the only TLDR I got from that is, exceptions are should be reserved for exceptional cases. Their code is borderline abusing them, probably out of necessity.
However, in this case this abuse provided a nice boost for everyone, so I can't complain.
Honest question: why use exceptions at all? They seem like far more hassle than benefit. Why not just use explicit error handling? This also results in much, much, much more readable code.
Because they are just better for the actual rare issue.
Consider for example out of memory. "proper" error handling via return values would mean everything that allocates memory has to have an error return path and everything that calls anything that has that somewhere in the chain needs to still propagate that error. Essentially every function call now has a branch after it. If done explicitly, that's a lot of code nobody wants to write and, more importantly, that nobody ever actually tests. If done automatically, that's still just a metric shitload of branches which wastes branch predictor entries, it wastes cache size, etc...
Exceptions, meanwhile, are typically free unless actually thrown. So for the truly 0.001% situation, like the aforementioned out of memory, they are perfect. You also get stack traces for free which greatly aid in debugging.
Also typically you either handle errors pretty close to where the occur, or relatively high up at a more macro level. Exceptions lets your "middle" layer be significantly less branch-y & verbose for situations where it's a more macro-level error handling strategy.
The alternative approach, like what Rust does, is to just say that rare errors like that are unrecoverable and hard-crash when encountered. That's certainly a choice you could make, but it's pretty heavy-handed for the language itself to mandate that. Which is then why Rust also provides std::catch_unwind, but with notes that it doesn't necessarily work because people can have chosen to compile with panic set to hard-abort period. It's pretty much a big mess.
Significant chunks of the c++ community also compile with exceptions disabled, turning them into asserts.
Handling some errors in band is simply not useful in modern programs. While kernels and embedded use cases benefit from in band allocation failures, most applications would do better to receive an out of band signal that they are close to memory thresholds and to perhaps shed load by doing things like stop accepting new sockets connections, shrink some caches, etc. I really used to think otherwise but too much software has been written to assume they will never experience a memory failure in band.
Also to clarify, I don't think an exception is out of band. They have a significant cost to program size and performance even if they are never used.
"~zero" is not zero. The existence of exceptions forces changes in the design of normal code to allow for that possibility, which does have a runtime cost. Some objects cannot be safely destroyed at arbitrary points in the program without additional logic to mitigate side effects.
Exceptions don't eliminate error checking conditionals, they just hide them elsewhere in the code.
Can you provide an example of that on godbolt or similar? In what scenarios do exceptions introduce the claimed cost?
The only thing I could see is just the density of code could mean more page spills and less efficient cache line usage for code. There's otherwise no actual cost to the assembly executed itself. And that's compared to no error handling at all, even. Compared to return values it's very clearly faster. But maybe there's some edge case you're aware of that does something different.
Whether or not exception flow control has a cost, it forces you to add code elsewhere that does have a cost which only exists to make the program exception safe.
Some objects cannot be safely destroyed at arbitrary points during program execution because it will corrupt memory, leak resources, etc. In a lot of systems-y code you can have things in-flight that the code can’t reliably stop or undo as an intrinsic property of the system. If errors occur, you have to partially defer the unwinding of state indefinitely (or block) until it is safe to do so. This is pretty idiomatic for things like high-performance I/O.
Making this exception safe requires much more defensive programming. You need extra code on the happy path to collect context in case of an exception, extra code in the error path to interpret that context to figure out what is and is not safe to destroy at the point in execution where the exception occurred, and extra code in the handling path to execute the deferral logic based on an interpretation of the context.
With return code style handling, this is usually implicit in the structure of the code. It is much nicer to do this at compile-time than run-time. If you only write high-level C++ then this might not apply but most C++ tends to be used for systems-y software these days. The widespread practice of disabling exceptions for C++ code bases at companies big and small didn’t happen for no reason.
> "proper" error handling via return values would mean everything that allocates memory has to have an error return path and everything that calls anything that has that somewhere in the chain needs to still propagate that error.
That seems like a win to me for the aforementioned "readable" criteria—failures should be enumerated. But I can see how that would be annoying to write.
I'm also of the opinion that if handling out of memory is a serious consideration for your process, you should consider using your own allocator that doesn't fire an exception. This is how rust handles it, for instance. And on top of this I'm a little confused because exceptions are mostly disallowed from embedded codebases, which is the primary place where you would likely run out of memory and want to actually handle it.
Disallowing exceptions is a company policy, though, not something imposed by the language. I'm not going to go into whether I think it's a good or bad policy, but just pointing out that you can write embedded software that properly handles exceptions. Granted, the discipline, will (and yes, budget) to do this is pretty rare in embedded shops.
Well, that's rather beside the point—if you're really worried about allocators failing, exceptions don't provide any benefit over simply returning NULL.
I wrote a program that throws an exception and catches it to warm up, and then throws a second exception.
The second() function with LLVM libcxx using normal DWARF exceptions calls pthread_rwlock_rdlock SIX TIMES and calls pthread_rwlock_wrlock one time. Overall, this program triggers ~4000 function calls according to --ftrace (see https://justine.lol/ftrace/)On the other hand, if I use SJLJ, then this program triggers 269 function calls. It never locks anything. Although, like the first program, it does call pthread_once three times. That's doesn't cost anything though. It's the rwlocks you need to worry about.
For an example of a modern C++ toolchain that uses SJLJ exceptions, see https://github.com/jart/cosmopolitan/releases/tag/3.7.1
Please note that SJLJ exceptions carries a tradeoff. While SJLJ will make throwing exceptions much faster, you'll incur a slight performance degradation on normal code that isn't throwing exceptions. Using GCC in SJLJ mode will cause most of your C++ functions to have extra code inserted into the prologue and epilogue which calls _Unwind_SjLj_Register and _Unwind_SjLj_Unregister. Modern DWARF exception handling makes the intentional tradeoff of making exception throwing extremely slow, so that zero overhead is added for normal non-exceptional code. Both methods (SJLJ and DWARF) will bloat your binary size. With SJLJ it's just a few instructions inserted into prologues and epilogues. But with DWARF you get a lot of mandatory .eh_frame junk that's generated by everything, and it's centralized into a particular section of your binary that's only touched if exceptions are actually thrown.