The SJLJ runtime doesn't lock a global mutex seven times each time you throw an exception. I'm not kidding. If you use DWARF exceptions then, each time you use the throw keyword, libunwind::DwarfFDECache<libunwind::LocalAddressSpace>::_lock needs to be acquired six time in read mode, and once in write mode. Yes it's a GIL. I traced the address of the lock objects myself. That effectively makes throwing an exception a serialized operation. Only one exception can be thrown at a time, per process. Not only will this limit your throughput to one core, it'll also saturate all the other cores on your system with pointless busy work too, if your pthread_rwlock_t implementation isn't very good, e.g. glibc, musl. See also https://justine.lol/mutex/ The example I gave is just the tip of the iceberg. I'm using it to analyze the runtimes beneath.