
Common Multithreading Mistakes in C# – Unsafe Assumptions - douche
http://benbowen.blog/post/cmmics_iii/
======
exDM69
Nice writeup, and most of this isn't specific to C#, but applies to native
code (or Java) as well. These are typical misconceptions of people who have
not studied concurrecy.

What is specific to C# (applies to Java too) is having an implicit
mutex/condition pair in each object. I think this is a terrible design mistake
because it's not very practical and it's confusing to newbies (a big stumbling
block for students when I studied concurrency at the uni).

It's not very practical because in most concurrency tasks I've dealt with (not
using C# or Java), there's typically more conditions than there are mutexes
(typically 2-4 + O(n) conditions per mutex). In Java/C# land, the typical
solution would be to have a complex conditional expression, all the threads
spinning on a .wait() there and then over-use of .notifyAll() instead of
.notify() causing lots of spurious wakeups and wasting precious cpu cycles.

It's confusing because of the reason above, it's easy to go "I'll solve this
by adding another mutex". Unfortunately this is seldom the correct solution
(to any problem), and a much better result would be achieved adding more
condition variables to wait on.

I wouldn't mind too much about a design mistake in a language I almost never
use (Java/C#) if it wasn't the de facto language for learning about
concurrency at universities. This has produced so many engineer with a twisted
view on concurrency. I understand that "Java is easy" and "C is hard" but when
we're already talking about memory models, multi-core, cache coherency and
atomicity, C-the-language isn't the hard part and Java-the-language does very
little to help with those parts.

~~~
pvg
_In Java /C# land, the typical solution would be to have a complex conditional
expression, all the threads spinning on a .wait() there and then over-use of
.notifyAll() instead of .notify() causing lots of spurious wakeups and wasting
precious cpu cycles._

The typical solution is to use the higher-level constructs provided by the
standard library that take care of these details for you. In Java, at least,
explicit mutexes, notify, etc in application code are roughly the equivalent
of typing 'AES'. I'm somewhat surprised this C# article happily tells you to
just keep adding locks until you think things work. "I mean if you have a
hunchback, just throw a little glitter on it, honey, and go dancing."

~~~
zzzcpan
It's more complicated than that. Implicit synchronization and multithreading
in libraries is error-prone, poorly performant, has a significant cognitive
overhead. It's a liability if you are chasing performance and a liability if
you aren't. So to be good citizens libraries have to assume that a user might
need a complete control in choosing and using a concurrency model (of course
unless they are the ones providing concurrency models themselves).

~~~
pvg
I'm talking about libraries that provide higher-level concurrency constructs
not implicit multithreading in a library that does, say, image manipulation.
The whole point of the former is to reduce cognitive load and errors.

~~~
zzzcpan
I assumed you were talking about things like concurrent queues, etc. They do
implicit synchronization and enforce a specific concurrency model, so they
should not be used by libraries.

~~~
m_fayer
Yes, but I think he also meant things like the TPL, PLINQ, TPL Dataflow, RX,
Akka.NET and so on. I have been happily using one or more of these to write
tons of event driven and/or parallel client-side code and haven't touched a
lock in years.

And libraries exposing APIs through observables is a great idea.

~~~
eropple
I don't dislike TPL and PLINQ (though I don't much care for Akka.NET at all),
but at the same time I pretty regularly write code with locks. There are
problems for which the simplest solution is to treat it stupidly and
imperatively and to run those threads side-by-side with well-defined critical
sections.

~~~
m_fayer
For dead-simple parallel execution of a handful of jobs, IMO nothing beats
mapping data to tasks with Linq and then "await Task.WhenAll". And if they
have to share data, use something in System.Collections.Concurrent.

~~~
emn13
For fun bits of API inconsistency there, would you think that...

"task.Wait()" is to "await task" as "Task.WaitAll(tasks)" is to "await
Task.WhenAll(tasks)"?

Maybe it's just me, but I sure did.

~~~
bonzini
WhenAll returns a task (a promise, essentially) that completes "when all" its
arguments have completed.

~~~
emn13
I seem to have misremembered, the odd one out was WhenAny, and the issue is
that it crashes when given the empty set, instead of waiting forever (which is
fine if you're dealing with a specific number of tasks, but not fine if you're
actually using it for a dynamic number of tasks).

I could have sworn it was WaitAll, but heck - clearly not! Sorry :-).

------
sharpercoder
The more threading code I have seen, the more convinced I become that it
should be avoided at all cost. The pitfalls are nonobvious and potentially
incurring datacorruption. Rather, they should be behind understandable
abstractions (like await/async and Task/Task<T>, but those have their own
pitfalls) where applicable. Also, I can see value in an "unsafe" block (e.g. a
"threading {}" keyword) where the writer of code communicates to the reader
"inspect this with special threading awareness"

~~~
ro_sharp
I concur, with a caveat - feel free to use threading / async I/O if you think
you really understand the runtime behaviour of your code.

I've seen so much code that just runs TaskFactory.StartNew some arbitrary
number of times in a loop to run some CPU intensive task with no/effectively
no I/O, presupposing a performance improvement as a result.

Understand runtime performance. Understand the different failure modes
inherent in multi-threaded approaches, and lastly: test, profile, test and
profile again!

~~~
thomasz
Is there something wrong with

    
    
        for(int i = 0; i < Environment.ProcessorCount; i++)
            Task.Factory.StartNew(...);
    ?

~~~
quinnftw
Potentially, yes. Assuming that you are correctly synchronizing the actions
inside each Task as to avoid race conditions (a big assumption), there is no
control on the number of threads here. What if your program is already using a
bunch of threads for something else, and then you go ahead and spawn off 10
more? You will oversaturate the CPU and end up with virtual threads, thus
incurring scheduling overhead. In my opinion when you are doing this type of
thing you should almost always use a thread pool to ensure that you aren't
creating an unreasonable number of threads.

------
ibgib
I enjoyed this article. I started concurrency with Delphi 5 using critical
sections, mutexes, semaphores, etc. Then I started using C# which was a major
upgrade, and I loved having the simplicity of things like the
`BackgroundWorker` class and the `lock` statement which certainly beat the
need to write my own multi-threaded class for every little thing in Delphi. I
recall some of article's mentioned atomicity problems, and I would have to
juggle when to use Interlocked vs locks, etc. For example, it mentions floats
and hardware-specific issues with 64-bit ints, but from what I recall I
treated booleans as atomic and not needing Interlocked or locks, but floats I
did. That CPU caching detail level though sounds mighty insidious. I also
forged ahead and used TPL, async/await, Rx (Observables are pretty awesome)...

But now I live on Erlang's BEAM (via Elixir) and I freaking love it. The real
gain for me is that I found I didn't need mutexes, locks, critical sections,
etc., because the super lightweight thread-like Erlang processes (not OS
processes) themselves run in parallel _but each one runs in a single-threaded
manner_. This effectively turns each process itself into its own critical
section, and it's this aspect that I personally have found _extremely_
valuable.

~~~
bonzini
Do your Erlang processes never execute an asynchronous (await-like) operation?
If they did, any shared data would need to be protected by a mutex just as
like as with multiple OS threads.

~~~
ibgib
> _Do your Erlang processes never execute an asynchronous (await-like)
> operation? If they did, any shared data would need to be protected by a
> mutex just as like as with multiple OS threads._

I think perhaps what you are getting at is that there is still need for
coordination for parallel processes, and that is totally correct. But each
BEAM(!) process has its own non-shared memory and runs on a single thread.
This is my point in that it _doesn 't_ need the mutex because the process'
execution is its own critical section. I've structured my ibGib engine to be
more functional and the need for the mutex/critical sections has disappeared
for me.

For example, check out this SO post that I just googled:
[http://stackoverflow.com/questions/28554114/applying-a-
mutex...](http://stackoverflow.com/questions/28554114/applying-a-mutex-into-
an-erlang-example)

In the answer, he tells the OP (who thinks he needs a mutex) that what he can
actually do is just do the work in the process. This kind of understanding of
single-threaded "bounded context"-like execution, combined with ubiquitous
immutability has been massively helpful for me with ibGib.

In C# for example, I was trying to apply `Task<ib>` __all over the place
__(yes it is non-idiomatic lowercasing of the interface...totally aware of
this sin in C#). And so even with async /await awesomeness, the code is just
riddled with async/await, blah blah blah. The same goes for using Rx in C# for
a "microservices" engine (I started it before that was a fad and I called them
autonomous services lol) and my more recent POC with Rxjs in
TypeScript/JavaScript. The point is the whole approach of concurrency and
whatnot are "addons" in the form of mutexes, semaphores, critical sections,
locks, barriers (I've used quite a few over the years). But with programming
on the BEAM(!) with Elixir and Erlang, parallel coding is just a more natural
experience, because it's just how it's done.

In my view, this largely stems from the aspect I mentioned: that they have
_parallel_ processes, each running its own memory and on a single thread,
communicating with message passing (now called the Actor Model but they didn't
know that at the time). On top of this fundamental design decision, they built
up an awesome infrastructure with OTP, Supervision trees, and more. But
anyway, I didn't mean to write that much - but it's just been a delightful
experience, since my very first application in Delphi was a transcription app
with a from-scratch multi-threaded realtime document checker (like a spell
checker, but more pluggable and geared towards transcription with text
expansion which is a totally different use case than existing off-the-shelf
spell checkers).

------
achr2
It's odd to have a whole section on coherency and never mention the built-in
`volatile` keyword who's entire purpose is to indicate that a variable
requires fresh reads..

~~~
squeaky-clean
> In this case, the correct solution is to mark field _A as volatile. If
> that’s done, the compiler shouldn’t reorder the reads of _A and _B, because
> the load of _A has load-acquire semantics. However, I should point out that
> the .NET Framework, through version 4, doesn’t handle this case correctly
> and, in fact, marking the _A field as volatile will not prevent the read
> reordering. This issue has been fixed in the .NET Framework version 4.5.

From the article linked where they mention volatile and the C# memory model.
Seems like volatile may not actually have worked as intended for this case!

~~~
achr2
I'm not saying volatile is a good solution, but to not cover it in this
article is an odd omission - especially since he shows an explicit memory
barrier.

------
mathw
I'd like some more explanation of why the author thinks it's better to use a
lock statement than an Interlocked method most of the time.

This is really just fundamental concurrency stuff though. Something which is
sadly in short supply in some people's skill sets, but then I guess not
everyone spent four years working on massively multithreaded C++ software
early in their career like I did.

I'd really prefer it if C# made you share memory explicitly - default shared
memory concurrency is just asking for trouble, in this and many other
languages, because you have to do extra to do things _right_ , rather than
extra to do things _wrong_.

~~~
achr2
I agree about the use of Interlocked. Obviously if you already need a lock
around additional code, there is no need to use Interlocked, but for any
single statements where Interlocked is available, it is a simple performance
gain. That said I have seen several incorrect and unnecessary
`CompareExchange` usages that could have been avoided with a simple lock...

~~~
Sharlin
> That said I have seen several incorrect and unnecessary `CompareExchange`
> usages that could have been avoided with a simple lock...

This is, obviously, the answer to the GP's question.

~~~
TillE
That's no reason for avoiding atomics in simple use cases. "Use locks and then
profile" is fairly bad advice. Forget the profiler and _always_ use atomics in
simple situations where you know exactly how they work. Just don't try
anything complicated with them.

------
jaegerpicker
It's a cool series. Mostly review for me but I certainly passed it to my team
to review. Ultimately I say that 99% of the time the correct answer is to not
use raw threads and never share memory. There are much better concurrency and
parallel patterns out there. Akka(.net in this case), Coroutines and Channels,
Erlang Actors, Clojure's core.async, .net's async/await even. Honestly if you
are reaching for threads you probably should ask yourself if some other way
would be better and safer.

------
cm2187
I learned something about Thread.MemoryBarrier().

I expected the following to be safe:

    
    
        Dim V(99) As Integer
        Parallel.For(0, 100, Sub(i)
                               V(i) = i
                             End Sub)
        Dim Result = V.Sum
    

If I understand correctly it looks like I need to flush the memory before
accessing V:

    
    
        Dim V(99) As Integer
        Parallel.For(0, 100, Sub(i)
                               V(i) = i
                             End Sub)
        Threading.Thread.MemoryBarrier()
        Dim Result = V.Sum

~~~
taspeotis
The CLR has some implicit memory barriers that make most things "just work."

Take a look at "[t]he following implicitly generate full fences" here:
[http://www.albahari.com/threading/part4.aspx](http://www.albahari.com/threading/part4.aspx).

Parallel.For would _probably_ be covered by "[a]nything that relies on
signaling." There's more info scattered about in some Stack Overflow answers.
The content is useful, just unofficial.

E.g.
[http://stackoverflow.com/a/6932271/242520](http://stackoverflow.com/a/6932271/242520),
[http://stackoverflow.com/a/681872/242520](http://stackoverflow.com/a/681872/242520)

------
gnur
I know there is a website that has gamified certain race conditions. There was
a interface that allowed you to step through code in 2 "threads" and the goal
was to defeat the enemy army though deadlocking their code.

Does anyone know where it is? It was very informative..

~~~
ttrunck
I know this one:
[https://deadlockempire.github.io/#menu](https://deadlockempire.github.io/#menu)

------
jsingleton
Nice post. Simply written and easy to understand.

There are two others:

\- [http://benbowen.blog/post/cmmics_i/](http://benbowen.blog/post/cmmics_i/)

\-
[http://benbowen.blog/post/cmmics_ii/](http://benbowen.blog/post/cmmics_ii/)

It's well worth learning about the Interlocked class if you want to do
parallel programming. If you get it wrong then you will see incorrect/corrupt
results. It's also worth keeping in mind that for simple tasks it can be
quicker to do them single threaded, due to the overheads involved. I
demonstrated this in a simple benchmark app [0] that I wrote for a chapter of
my book on ASP.NET Core [1].

[0]: [https://github.com/PacktPublishing/ASP.NET-Core-1.0-High-
Per...](https://github.com/PacktPublishing/ASP.NET-Core-1.0-High-
Performance/blob/master/Chapter%206/ParallelBenchmarking/ParallelBenchmarking/src/ParallelBenchmarking/Program.cs)

[1]: [https://unop.uk/book/](https://unop.uk/book/)

------
manigandham
Always use Interlocked over a lock statement if you're just dealing with
incrementing numbers. It's 1-line, faster and always safe. Author is mistaken
for advocating a lock in these situations.

Only need lock() and the full Monitor classes if there's more to be done
within the locked statement.

------
sidlls
These are common mistakes novices to multithreaded programming make, presented
and explained well. Very nice.

In my experience another mistake made by novices and journeymen alike (at
least occasionally by the latter) is to reach for this tool without careful
consideration of whether it's even necessary.

~~~
bbcbasic
I'm such a novice I avoid reaching for the tool ;)

------
TazeTSchnitzel
“Newbie” question: if reading a value without having a lock on it can give you
a torn read, wouldn't it be possible to crash the CLR by getting a torn read
on a reference? Or, does it account for this somehow?

~~~
Eyas
A reference fits perfectly in a word-sized chunk. A pointer/reference is
typically 32 or 64-bits wide, since it is simply pointing to a location in
memory.

------
Pxtl
The one that caught me recently was assuming that you could use a Dictionary
as a concurrent cache.

You can't. Dictionary can be _read_ concurrently, but once you start writing
to it concurrently all bets are off. ConcurrentDictionary implements
IDictionary and allows concurrent writes.

Also, Entity Framework. We had an icky bug coming from somebody storing the
datacontext in a member variable. Don't do that.

~~~
cm2187
The one I am concerned about is iterators (for each). I have read somewhere
that iterators are stateful and I should be careful when using them in a
multi-threaded environment. Currently I am ignoring this advice and use them
happily (on static lists and dictionaries) but I am not sure if this isn't
going to bite me some day. But I don't understand how it would bite me.

~~~
jdmichal
That is because your different threads are not using the same `Enumerator`.
I'm guessing your threads look something like this, based on your mention of
`foreach`:

    
    
        // In multiple threads:
        foreach(element in collection) { ... }
    

`collection` is not an `Enumerator` but rather an `Enumerable`. The `foreach`
statement calls `collection.GetEnumerator`, so each thread gets its own
iterator. You would only see inconsistencies if another thread modified the
collection during iteration.

Try this instead, and you'll guaranteed see issues:

    
    
        // Before:
        IEnumerator enumerator = collection.getEnumerator();
    
        // In multiple threads:
        while(enumerator.Move()) {
            var element = enumerator.Current;
        }

------
Pigo
I've often wondered what the _real_ gains are for using async/await methods
inside a web api call per say, especially when it's a single task being
performed, like a query. I've tried to setup concurrent queries, and do
whenall or parallel.foreach. But I think I always hit a road block because
each query has to be from a separate context for that to work.

~~~
kaliku2
while the awaitable task is .. well.. waiting for the query, the thread
servicing it is returned to the thread pool to be reused. This helps with
thread pool starvation in the case when the process must service many
requests.

~~~
m_fayer
Exactly.

Here's another simple way to think about it: Handling a request usually
invokes a couple of layers of your code until you hit some IO - database, file
system, network, etc. If that IO is done in a blocking fashion, this request
is consuming server resources THE WHOLE TIME, even while it's doing nothing
but waiting for a network response. On the other hand, if the IO is non-
blocking (async), then this request consumes no resources while it's waiting.

So imagine you have a request that takes 2000ms to service, and 1700ms of that
time is spent waiting on the database to fetch some data. Without async, that
request will consume 2000ms of CPU time. With async, only 300ms of CPU time is
used.

~~~
tuwtuwtuw
No, the request wont consume 2000ms of CPU time if it's blocking. It will
prevent the thread from doing other work which may be bad, but the CPU itself
does not have to spend all 1700ms on that thread. At least my CPU doesn't use
100% of a core while waiting for a network package from the database server.

~~~
m_fayer
Should have phrased it as "the blocking wait will consume resources (keep a
thread busy) for 1700ms whereas the async one will not." Hard to stay both
accurate and extremely simple when talking about concurrency.

------
jimmaswell
Not working very well in mobile:
[https://imgur.com/xPn8YLc](https://imgur.com/xPn8YLc)

A simple table would have worked fine.

------
sundvor
Great article. I learned a lot from reading this; being relatively new to C#,
now I certainly know what to look for when coming across multithreading.

------
samfisher83
I think I learned this during learning Peterson's algorithm. The professor
emphasized the flag had to be a single operation variable.

------
jorgeleo
Interesting is also that these recommendations also apply to the use of the
async/await pair

~~~
mihular
Only when running code in different threads. async/await isn't necessarily
multithreaded. Actually usually isn't.

~~~
jorgeleo
Which makes it worse because it opens up the possibility of intermitent bugs,
and also those "it works on my machine, I don't know why it does not work on
the server"

So it is better to treat them as multithreaded always... which falls back on
the original post.

------
novaleaf
the biggest gotcha I've seen with C# multithreaded is the usage of the basic
.NET Collections: Dictionary<K,V> and List<V>

they will mostly work (like multiple threads reading, or 1 writes while
another retrieves), except when two threads try to simultaneously write, or
when 1 writes while another enumerates.

When that happens, all bets are off, and you will silently get obtuse errors
such as null return values or enumerating past the end of the collection, or a
corrupt collection that throws exceptions from other (seemingly) random and
unrelated areas.

I believe the new concurrent collections take care of this, but it is still so
easy for beginners to shoot themselves with async programming. very much in
stark contrast to how dev-friendly the rest of the CLR is.

~~~
dahauns
_New_ concurrent collections?

.NET 4 is over eight years old now.

And for all these years, the docs have explicitly informed about lack of
thread safety in the non-concurrent versions and referenced the concurrent
equivalents.

~~~
novaleaf
yes, but it doens't stop novices from doing it, or perhaps if you have
existing code you are trying to parallelize.

