For another thread to be able to add a reference, it must already have a reference to that object, so the reference count can't be less than two. Two threads can't be both operating on the same pointer object.
Reference counted pointers themselves are usually not atomic, only the reference count updates are. To atomically access the reference counted pointer itself, you need either a different algorithm (using hazard pointers for example), or some form of mutual exclusion.
shared_ptr's thread safety is simple: it's perfectly thread-safe as long as each thread has its own copy. It of course does not provide any synchronization for the object it's pointing to.
Almost exactly one month ago I have a junior developer share the same shared_ptr across threads, where some threads could be writing to it by calling reset(). They forget that accessing the shared_ptr itself needs to be protected.
Rust prevents this problem completely though. You are either going to deal with Arc<Mutex<T>> or Mutex<Arc<T>>, depending on the situation (in this case the latter), and forgetting the Mutex will simply cause compilation failure.
Thanks for this! I've been diving into the basics of lock-free datastructures recently. Some posts that I've been working through which I've found useful:
Side question : could anyone recommend exercises for mid-experienced developer that really has troubles spotting potential race conditions, and generally struggle with concurrency ? I’ve realized i’m spending a lot of time reviewing his PR, and i’d like to help him improve faster on that topic.
It's actually really easy. Traditional threading has a very obvious design pattern, where if someone isn't doing this you can be sure there's an issue.
TL;DR, if you don't have a "getter/setter" you're doing it wrong and it's a sign of bad code.
Firstly, make sure that there's only ONE place to write. Write being changing the variable in anyway(Obviously make sure you have a lock or other primitive here). If there's multiple places, that's a big smell for a race condition, and a massive turd for a shitty design pattern.
Secondly, just like the write, all reads should be done from a single place.
And then for the advance developer, you also may need to make sure that the cache is synchronized. Pretty much all libraries will do this for you, but if you ever need to dive into internals that alot of "guides" or papers DON'T cover, you need Fences or whatever your architecture provides. I have this page bookmarked for when I need a quick refresher https://stackoverflow.com/questions/27627969/why-is-or-isnt-...
Sorry but it is not easy at all. Getter and setters do not help and can give a false sense of security.
And caches do not need to be explicitly synchronized. On all common platforms the hardware will do it for you without user intervention. Fences have a different purpose.
Honestly just to prove how wrong you are and what you're saying is blatant misinformation, i'll correct you, since I don't think you're talking in good faith, or have any idea what you're saying.
Regarding fences, In the intel system programming guide, section 8.1.3
"
Handling Self- and Cross-Modifying Code
The act of a processor writing data into a currently executing code segment with the intent of executing that data
as code is called self-modifying code. IA-32 processors exhibit model-specific behavior when executing selfmodified code, depending upon how far ahead of the current execution pointer the code has been modified.
As processor microarchitectures become more complex and start to speculatively execute code ahead of the retirement point (as in P6 and more recent processor families), the rules regarding which code should execute, pre- or
post-modification, become blurred. To write self-modifying code and ensure that it is compliant with current and
future versions of the IA-32 architectures, use one of the following coding options:
(* OPTION 1 )
Store modified code (as data) into code segment;
Jump to new code or an intermediate location;
Execute new code;
Vol. 3A 8-5
MULTIPLE-PROCESSOR MANAGEMENT
( OPTION 2 )
Store modified code (as data) into code segment;
Execute a serializing instruction; ( For example, CPUID instruction )
Execute new code;"
It further states,
"The act of one processor writing data into the currently executing code segment of a second processor with the
intent of having the second processor execute that data as code is called cross-modifying code. As with selfmodifying code, IA-32 processors exhibit model-specific behavior when executing cross-modifying code,
depending upon how far ahead of the executing processors current execution pointer the code has been modified.
To write cross-modifying code and ensure that it is compliant with current and future versions of the IA-32 architecture, the following processor synchronization algorithm must be implemented:
( Action of Modifying Processor )
Memory_Flag := 0; ( Set Memory_Flag to value other than 1 )
Store modified code (as data) into code segment;
Memory_Flag := 1;
( Action of Executing Processor )
WHILE (Memory_Flag ≠ 1)
Wait for code to update;
ELIHW;
Execute serializing instruction; ( For example, CPUID instruction *)
Begin executing modified code;"
So please tell me how the intel developer manual is wrong, how i'm wrong, and how they all server a "Different purpose" than what is given. I can link more instances from the Intel Developer manual if you want. I assume you've read it if you're trying to talk about the subject.
Fences are not mentioned anywhere in your quoted text. Note that serializing instructions are not fences (although there are overlaps: some fences are also serializing instructions).
In any case on x86 the instruction cache is coherent with the data cache all the time, the serializing instruction is needed to flush the instruction pipeline (not the cache).
I'll concede that there are some non-x86 architectures where the instruction cache is not necessarily coherent with the data cache, but it is not terribly relevant as the topic here is concurrency, not self modifying code.
Fences are meant for serializing. Although the intel documentation says CPUID, you use a fence there. This should be obvious, as if you know what a fence is, you know they're entire purpose is serialization. I'll put what the manual says on them, but it's not a hard concept, the intel manualy literally says sfence = SERAILIZES STORES. Lfence = SERIALIZES LOADS. Mfence = SERIALIZES LOADS AND STORES. You clearly have ZERO idea what you're talking about if you don't even know what these simple instructions mean. So to answer your question in the quoted text, where it says to use a serializing instruction, it means to use the appropriate fence! Not only this, but your entire premise of a properly written getter/setter being "Wrong" is entirely wrong, as the entire reason for a race condition is having improper getters and setters!
Here are the following fences referenced in that article, and any good asm programmer will tell you that fences are used for this purpose(NOT CPUID)
LFENCE: (Intel manual Vol. 2A 3-585) *Description* Serializes load operations
Performs a serializing operation on all load-from-memory instructions that were issued prior the LFENCE instruction.
MFENCE: (Intel manual 4-22 Vol. 2B) *Description* Serializes load and store operations.
Performs a serializing operation on all load-from-memory and store-to-memory instructions that were issued prior
SFENCE:(4-620 Vol. 2B) Description: *Serializes* store operations
Orders processor execution relative to all memory stores prior to the SFENCE instruction. The processor ensures
that every store prior to SFENCE is globally visible before any store after SFENCE becomes globally visible. The
SFENCE instruction is ordered with respect to memory stores, other SFENCE instructions, MFENCE instructions,
and any serializing instructions (such as the CPUID instruction).
the MFENCE instruction.
I'm not sure what you think fences are, or what they're for. But they're clearly for serializing. And were talking about concurrency, that article is from the intel manual on multiprocessor programming, which is concurrency! Do you not even know basic terms? Do you not even read replies?
Please do not resort to personal attacks, we are just having a technical discussion.
In intel parlance 'serializing instruction' is a Word of Power; Please see Vol 3A, under serializing instructions. CPUID is listed there.
In the same section, LFENCE, MFENCE, SFENCE are listed as 'memory ordering instructions' and the documentation explicitly says that:
The following instructions are memory-ordering instructions, not serializing instructions. These drain the data memory subsystem. They do not serialize the instruction execution stream
Handling self modifying code requires serializing the instruction stream, so the memory fences are neither necessary nor sufficient[1].
Going back to the original argument, the memory fence instructions (and the various LOCK prefixed operations) do indeed serialize memory operations to guarantee ordering, which I don't dispute. But that's nothing to do with synchronizing caches like you claimed.
[1] IIRC MFENCE it is actually also a serializing instruction in practice, but Intel has not yet committed to document this behavior as architectural.
> Honestly just to prove how wrong you are and what you're saying is blatant misinformation, i'll correct you, since I don't think you're talking in good faith, or have any idea what you're saying.
Are you this rude and condescending in real life to people? A different phrasing: "you are incorrect; here are some sources." This is neutral and still conveys your knowledge. You could even level it up a bit with, "I believe you are incorrect and here is why."
Yes, when someone acts like a complete idiot and acts with such bad faith, I always respond the same. The person wasn't here for intellectual discussion, he led with a downvote of my original post and posted that "UMMMM AKTUALLY" reply, while being wrong. He's a troll and trolls shouldn't be treated nicely.
People who do that, as well as leading with the downvote, "AKTUALLY UR WRONG", hamper a sites intellectual discussion and drives away contributers and people who want to have an actual interesting conversation, and should be treated with no tolerance. My responses treat him like the child he is, one who is closed minded and doesn't want a discussion.
Feel free to look at my profile, and I had a longstanding profile before I forgot the login to when I changed jobs. First time i've ever encountered a troll on this site, and first time i've ever had a dismissive tone with someone. He's unique, he's a troll, and he doesn't want a discussion, he wants to troll.
This sounds very similar indeed. The central thesis of the paper seems to be that the grain of locking is the concern, which is what Disruptor directly touches on as well. In a Disruptor implementation, you are typically processing batches of things, and locks can be acquired along lines of batches.
Tangentially related, but I’m interested in doing file system traversal for finding some paths in Python: using a simple scandir to process takes minutes on the data I’m working with: shelling out to find is faster but still takes minutes. I know ‘fd’ exists and does parallel traversal, but I’d like to learn more about how to implement something like that: what structures do file systems use and how do you know what limitations should be considered when working with files?
Now, OT was frankly a terrible way to do networking. I learned it because this was in the Mac OS 9 days before protected memory and multitasking, so if you wrote games, you often had to deal with interrupts. OT got lost in the weeds with its complete lack of understanding of UNIX sockets (everything is a file). So I went several years thinking in terms of packets instead of streams. Once I saw the enormous benefits of synchronous blocking streams, I threw OT away and never looked back, as did Apple.
But OT had fantastic atomic primitives and FIFO/LIFO lock-free queues (see pages 657, 641, 645 of the PDF). That stuff was so vastly better than anything I had seen in Windows that it made me sad inside to see people still using mutexes. The use of locking primitives generally sets one up for a number of failure modes like poor performance and deadlocks. Unfortunately, many of those bad habits show up today in asynchronous web programming and even patterns as fundamental as locking database transactions.
It's too hard to explain where I'm coming from in a forum post. But there really is this whole other way of doing things that's been all but dead for over two decades. And it ties in with my frustration around the lack of a true multicore CPU with perhaps 1024 cores (I like the article's URL!).
Basically we could be using self-parallelizing code where instead of putting special "concurrent" keywords or intrinsics around loops etc, compilers could analyze loops for side effects and unroll them internally into a sequence of operations that get sent to all cores simultaneously. That would be analogous to how shaders work (and potentially MATLAB, Julia, maybe Clojure), but on the CPU itself so we wouldn't need GPUs anymore. We would just download a library that provides an OpenGL interface or whatever, but runs in ordinary userland code without having to worry about manually passing data around in vertex buffers.
I'd really like to see this done in perhaps a small microcontroller for under $100, that would run circles around the most expensive processors for certain workloads. I was also thinking that maybe someone could make 1 core that provides a roughly 90s level of performance like a MIPS or PowerPC 601, or even DEC Alpha, running at perhaps 1 GHz with no cache, for under $1. Then a grid of those cores could be laid out on a board with self-organized mesh networking using content-addressable memory and copy-on-write to automagically ensure data locality. Then we'd have a scalable architecture and wouldn't have to rely on video card manufacturers. Moore's law allows for just exactly what I'm saying here, it's just that nobody's done it, because we're used to centralized control of the supply chain, and we all spend all of our time making rent, because the open source funding model is broken, etc etc etc. I've ranted about this endlessly in previous comments.
A "lock" in programming generally refers to a mutex, semaphore, or similar mechanism implemented mostly in software, e.g. see here https://en.wikipedia.org/wiki/Lock_(computer_science) . The distinction between such locks and atomic instructions is useful, since atomic instructions have various nice properties that locks (or "other locks", if you want to count atomic instructions as locks) do not, such as having no chance of deadlocks, guarantee of (eventual) progress, no chance of interruption, and no chance of blocking other threads forever when a thread executing an atomic instruction gets unlucky and is never scheduled again.
Naturally lockfree algorithms built with atomic instructions may not have (all) of these nice properties.
What better name would you suggest for lockfree algorithms?
If you're being pedantic, atomic instructions are a misnomer. But "Lock free" is a property of parallel algorithms where if there's N > 1 processors evaluating an algorithm, there is no state where the algorithm cannot make progress ("locked" meaning "all N processors are waiting")
Whether you want to debate if a physical system can realize a lock-free algorithm is a different debate. Hell, half the "well known" lock-free algorithms assume memory allocation/reclamation isn't blocking.
"lockfree"? You mean "heavy lockfree" I guess since atomic depends on implicit hardware locking for proper operations.
And those algos are so much critical and dangerous, a mathematical proof must be provided and peer-reviewed.
For example, as far as I can remember, reader-writer ring buffer with atomic read/write pointers. You have such ring buffers in USB xHCI, and AHCI and NVMe.
Lockfree means at least one thread is always making progress and atomics allow that trivially. If an atomic operation fails it means that a different thread's atomic operation succeeded. I didn't make up this definition.
some links, in no particular order:
CppCon 2017: Fedor Pikus “C++ atomics, from basic to advanced. What do they really do?
https://www.youtube.com/watch?v=ZQFzMfHIxng
CppCon 2016: Fedor Pikus “The speed of concurrency (is lock-free faster?)"
https://www.youtube.com/watch?v=9hJkWwHDDxs
CppCon 2015: Fedor Pikus PART 1 “Live Lock-Free or Deadlock (Practical Lock-free Programming)"
https://www.youtube.com/watch?v=lVBvHbJsg5Y
CppCon 2015: Fedor Pikus PART 2 “Live Lock-Free or Deadlock (Practical Lock-free Programming)”
https://www.youtube.com/watch?v=1obZeHnAwz4
CppCon 2017: Fedor Pikus “Read, Copy, Update, then what? RCU for non-kernel programmers”
https://www.youtube.com/watch?v=rxQ5K9lo034
Tony Van Eerd has also given some talks on lock free queues, I'll leave it as an exercise to the reader to search youtube :)