One of the major differences between X Window and the win32 GUI APIs is that the windows one builds in thread safety, and it cannot be removed. This means that you pay the price of mutexes and the like (what the windows world likes to call "critical sections"), even if you have a single threaded GUI. X Window, on the other hand, decided to do nothing about threads at all, leaving it up to the application.
30 years after these decisions were made, most sensible people do single threaded GUIs anyway (that is, all calls to the windowing API come from a single thread, and all redraws occur synchronously with respect to that thread; this does not block the use of threads functioning as workers on behalf of the GUI, but they are not allowed to make windowing API calls themselves).
Consequently, the overhead present in the win32 API is basically just dead-weight, there to make sure that "things are safe by default".
There's a design lesson here for everyone, though precisely what it is will likely still be argued about.
"If you detached a thread in your application using a non-Cocoa API, such as the POSIX or Multiprocessing Services APIs, this method could still return NO."
Also, I've never heard of this behavior despite years developing for macOS (admittedly tangentially). I don't see how that could work given that threads can come and go during the life of the application.
Interesting. Definitely a 3rd approach that threads the needle between what win32 and X Window chose. Thanks for the link.
[ EDIT: not quite sure how to think about this ... if I create NSThreads to act as worker threads that do not make cocoa calls, I still have to deal with new overhead in any cocoa call stacks. That's not ideal, but again, it's a "middle-way" approach, and like every other approach has its own pros and cons. ]
Yet 30 years later people are calling setenv()/getenv() from different threads even though "it is known" that it crashes. For whatever reason the lesson from GUIs doesn't apply here.
Judging from a lot of the comments in this thread, the idea that there could even be parts of the *POSIX API* that are not thread-safe seems like an idea that hasn't even occured to a lot of (younger?) programmers ...
uncontended mutexes are very cheap but not free. lock cmpxchg has way higher latency (and coherency traffic) costs than a simple move or (xchg). Java had lock elision, effectively trying to solve the hardware problem in software back in mid 00s. There are optimizations to be made it running on a single core (no need for the lock), e.g. docker with taskset
30 years after these decisions were made, most sensible people do single threaded GUIs anyway (that is, all calls to the windowing API come from a single thread, and all redraws occur synchronously with respect to that thread; this does not block the use of threads functioning as workers on behalf of the GUI, but they are not allowed to make windowing API calls themselves).
Consequently, the overhead present in the win32 API is basically just dead-weight, there to make sure that "things are safe by default".
There's a design lesson here for everyone, though precisely what it is will likely still be argued about.