If you only lock on writes rather than reads, it prevents two writers from colliding, but readers can see torn data (a mix of old and new states). Technically speaking, if either the write or read is non-atomic, this is outright undefined behavior, and it's legal for the compiler to generate (eg. reading) code which assumes concurrent writes never happen and misbehaves if they do (for example, the reader thread reading the value twice, expecting the value to be the same, and misbehaving if it's not).
To prevent tearing, you can use a regular mutex (only one reader at a time), or https://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock, which allows multiple concurrent readers, but has more overhead than a regular mutex, and when contention occurs can prioritize either readers or writers. If you don't want readers or writers to block, you can use (for <=64-bit values) atomic reads and writes, (for single writers and readers) triple buffers, or (in general) RCU or hazard pointers.
If the value is small enough, like 1 byte, and you know for a fact that your CPU + RAM will always be able to update it atomically, then yes technically you could get away with not locking on reads.
BUT, for large values (structs) you will end up getting ragged reads if another thread is writing at the same time. And if another thread is not writing at the same time, what's the harm in locking for reads? Modern mutexes lock fast if there's no contention.
BUT ALSO, many operations are actually a read, followed by a related write. If you don't lock on reads, it's easy to accidentally compose multiple small, correct functions, into a large, incorrect function.
Your comment reinforces nyanpasu64 point that the average C++ developer doesn't understand multithreading. Your comment is wrong in that it is never correct in a standard conforming C++ program to read and write to the same object from different threads without the use of a memory barrier, even if the CPU + RAM supports it. The C++ compiler itself will not support such an access pattern and the compiler may reorder instructions that are not protected by a barrier in such a way that results in undefined memory access.
It is always wrong. Compiler optimization and CPU dark magic like instruction reorder will wreak havoc on your unprotected reads. That is what undefined behavior really means, not “it’s actually defined but we don’t want to tell you”.
> There are important language features missing from C (and C++) that make it impossible to implement a Rust-style mutex API with the same guarantees
I've done some experimenting here, and I think it is possible to add a borrow checker to a C or C++ -like language, like the author hints. It couldn't be like Rust's borrow checker, it would just need to be a borrow checker that operates on a "per region" basis, where everything inside the mutex is one region and everything outside is another.
I wrote a little bit about this idea for the Vale language in [0] and then realized shortly afterward that the idea can be used to add fearless concurrency to any existing language.
It is definitely possible. What is not possible is convincing all your co-workers to use it or updating all the libraries you depend on to use it. Thus, the value prop becomes quite low.
Perhaps, but maybe there's a better approach. I prefer having a new language which can build on and speak to other languages in their own paradigms. That's what Swift did for Obj-C and also the direction we're going with Vale's region borrow checker.
> why is the C mutex API structured in a way that is hard to use and trivial to misuse, requiring elaborate comments or even static analysis to get right? .... There are important language features missing from C (and C++) that make it impossible to implement a Rust-style mutex API with the same guarantees
The simple answer is that the C11 Threads API was designed so that it could be trivially implemented using either POSIX or Windows threading primitives. One annoying consequence is that C11 mutexes do not support static initialization, unlike POSIX mutexes. (AFAIU Windows does provide some primitives to accomplish this, but not the standard mutexes it was presumed the C11 Threads would be mapped.)
In principle C11 could have specified a safer API along with any necessary semantic changes. It had to do this to some degree with atomics. Heck, it could have even introduced some narrowly scoped dependent typing, which is basically what some of the recent proposals regarding arrays do. But AFAIU nobody even looked in that direction as the focus was on finding a common, simple subset of POSIX and Windows API facilities.
There are two types of static initialization (initialization before main): constant and running code. Only C++ supports running code before main to initialize static-lifetime variables, and this feature is regarded as dangerous due to the static initialization order fiasco (referencing non-constant-initialized globals defined in other TUs will randomly see either all zeros or the initialized value, depending on the linking order of TUs). (C on Unix has a platform-specific __attribute__ ((constructor)) to do the same thing, but it's not standardized).
Standard C and Rust don't allow running code before main to statically initialize mutexes. Rust mutexes cannot be statically const-initialized either. So if you want a global mutex, you have to use a once_cell::sync::Lazy or lazy_static! to construct the contents on first read (adding an extra branch on every access). On the other hand, parking_lot offers a statically constructible (const-initializable) Mutex independent of OS synchronization primitives (https://docs.rs/parking_lot/latest/parking_lot/type.Mutex.ht...) which doesn't require a OnceCell/Lazy/lazy_static around it.
I think the main reason rust mutexes look the way they do, is the lack of a 'try finally' construct which is emulated with a destructor. In Nim (you can have both) but I prefer using a template that uses 'try finally' and is indented so you can tell this code is inside the critical section. Also there is an .lock annotation which ties some data with a specific lock, so accessing them without locking is a compile-time error. A much more flexible design than Rust's.