
Concurrent Programming, with Examples - begriffs
https://begriffs.com/posts/2020-03-23-concurrent-programming.html?hn=1
======
bmn__
Next article in the series: now that you know about the dangerous/complicated
primitives, don't ever touch them again. Instead use the high-level safe
concurrency/parallelism mechanisms in your programming language:
futures/promises, nurseries, channels, observers, actors, monitors. Ideally,
these should be built-in, but a library whose API composes well into most
programs will also do.

Data races can be statically removed by carefully restricting certain parts of
the language design, see Pony. [https://tutorial.ponylang.io/#what-s-pony-
anyway](https://tutorial.ponylang.io/#what-s-pony-anyway)

Bonus: learn aspects of deadlocking by playing a game:
[https://deadlockempire.github.io/](https://deadlockempire.github.io/)

~~~
closeparen
I disagree, first you need to program a few toy projects with the dangerous
primitives, to really internalize how tricky they are.

It's because of my concurrent C homework assignments that I was able to really
appreciate Go's channels in my first internship.

~~~
keymone
“Never touch them again” means don’t reach for those primitives when higher
level tools are available. Learning about them is fine.

It’s like with cryptography - don’t roll your own solutions just because you
know what xor is. You’ll fail miserably.

------
Random_ernest
The article is very nice, thanks a lot for it. Especially since I hear the
word concurrency and parallelism often thrown around without any distinction.

Very off topic, but I have read several times the argument that the rise of
functional programming is due to it's easy concurrency (since functions don't
have side effects) and that concurrency becomes more and more important due to
moores law being dead (i.e. we can't scale the hardware up, we have to add
cores to our processors).

Could someone with more experience comment on that? Is concurrency really
easier in functional languages and is the rising importance of concurrency a
valid reason to look into functional programming?

~~~
01100011
I'm just an idiot EE who learned to program, but it seems like functional
languages without side-effects accomplish their task by basically working on
data structures stored on the stack. That basically means each call stack has
its own data structure. If you wrote threaded code so that each thread had its
own data structure, and worked off a shared set of read-only data structures,
you wouldn't need to worry about locking.

IME locking isn't really all that hard until you need to squeeze out
performance. If you can handle one big dumb lock around everything, it's easy.
The finer grained your locking gets, the harder it is to get right(even then,
lock hierarchies can help to a great degree).

~~~
bcrosby95
The problem is that if you have one big dumb lock around everything then you
have no actual concurrency or parallelism.

~~~
01100011
Depends on what the big dumb lock protects. You may still be able to process
data you accessed while holding the big dumb lock.

------
rrss
Does anyone know the history behind the distinction between concurrency and
parallelism presented here? The most frequent reference I see is Pike's
"Concurrency is not parallelism" talk, but I'm curious who first came up with
this distinction.

~~~
rrss
More food for thought, as I've been thinking about this:

1\. std::thread::hardware_concurrency

This is the number of threads that can execute in parallel, no?

2\. "Memory-level parallelism"

How many memory operations can be "outstanding" at once - seems comparable to
a single core issuing multiple disk reads. The memory operations aren't really
serviced simultaneously, they just have overlapping lifetimes.

For more fun, some people refer to the case where performance is limited by
the amount of memory-level parallelism available as "concurrency-limited":
[https://sites.utexas.edu/jdm4372/2018/01/01/notes-on-non-
tem...](https://sites.utexas.edu/jdm4372/2018/01/01/notes-on-non-temporal-aka-
streaming-stores/)

~~~
pjc50
You can have concurrency without parallelism per the definition of the article
- on a single processor system with timeslicing, for example.

SIMD systems effectively give you parallelism without concurrency - only one
instruction is executing, but it's operating on multiple dataflows.

Your linked definition of "concurrency limited" seems to refer to utilisation.
In the scenario described, how effectively the processor can be utilised
depends on how many _concurrent_ tasks it has in progress so it has something
to do while one of them is waiting for a cache miss.

------
01100011
> The sched_yield() puts the calling thread to sleep and at the back of the
> scheduler’s run queue.

Not necessarily, but it is fine for this purpose I suppose. See
[https://news.ycombinator.com/item?id=21959692](https://news.ycombinator.com/item?id=21959692)

Glad to see lock hierarchies mentioned. Barriers are new to me so that was
nice.

IMO, it would be nice to at least have a mention of lock-free techniques and
their advantages and disadvantages.

~~~
Joker_vD
The main disadvantage of lock-free techniques is that you have to write code
without any critical sections, that is, you have to properly manage arbitrary
interleaving of actions. It's hard enough to manage staleness/inconsistency of
data at high (business logic) level, never mind the low level where the code
is not even executed in the written order.

~~~
01100011
Also lock free usually means atomics which means memory fences. Those can be
slow. It's another tool you should have available though.

~~~
scott_s
Memory fences and atomics are a part of lock-based code as well. They're just
hidden by the locking and notification primitives.

------
inaseer
There is a good body of knowledge around dealing with concurrency issues
within a single process. We've tools (locks, semaphores ...) to deal with the
complexity as well as programming paradigms which help us write code which
minimizes data races. It's interesting to realize that in a world with an
increasing number of micro-services manipulating shared resources (a shared
database, shared cloud resources), or even multiple nodes backing a single
micro-service all reading and writing to shared resources, similar concurrency
bugs arise all the time. Unlike a single process where you can use locks and
other primitives to write correct code, there is no locking mechanism we can
use to protect access to these global shared resources. We have to be more
thoughtful so we write correct code in the presence of pervasive concurrency,
which is easier said than done.

~~~
abjKT26nO8
_> Unlike a single process where you can use locks and other primitives to
write correct code, there is no locking mechanism we can use to protect access
to these global shared resources._

Databases provide transactions. This mechanism is also an inspiration for a
synchronisation model called Software Transactional Memory proposed for
Haskell, and used as "the" synchronisation model in Clojure. Locks and
semaphores are rather lower-level primitives and it's harder for humans to
reason about them with an ease comparable to using CSP or STM.

~~~
inaseer
Yes, database transactions should be heavily leveraged wherever possible.
We've often had to write services which create multiple resources in response
to user requests. As an example, create an entry in the database and trigger
the creation of, say, an Azure storage account. Transactions across
independent services and resources don't work and correctness requires
thoughtful design. In the more general case, whenever your service talks to
more than micro-service to complete an operation, you will probably have to
think through issues of consistency and transactionality.

------
highhedgehog
Is anyone aware of good examples that can be used to explain and implement
parallelism/concurrency that are not the bankers? I have seen it too many
times.

~~~
giu
The dining philosophers problem comes to mind, which originally was formulated
by Dijkstra [0]. You can find implementations in different languages at
Rosetta code [1]

[0]
[https://en.wikipedia.org/wiki/Dining_philosophers_problem](https://en.wikipedia.org/wiki/Dining_philosophers_problem)
[1]
[https://rosettacode.org/wiki/Dining_philosophers](https://rosettacode.org/wiki/Dining_philosophers)

~~~
highhedgehog
Thank you! Actually I saw that too.

I was hoping for more real life examples where you can see the effects of
concurrency parallelism.

------
jayd16
No mention of volatile variables or the concept of stale cpu cache reads when
a value is written to from another core. I think its a pretty common and
fundamental concept that should be in a write up such as this.

~~~
bonzini
If you use the standard blocking synchronization primitives, you cannot have
stale reads. If you don't, the right way to introduce them would be with the
C11 memory model relationships (synchronizes-with, happens-before), volatile
shouldn't be touched with a 10 foot pole except for synchronization with
signal handlers.

------
thallukrish
My experience is, single threaded execution and being able to replicate that
with local data for each instance and remote lookup at a fine grained data
level when needed, is a more easier way to maintain the code. Cocurrency and
all those synchronisation is damn hard to code and debug.

~~~
tabtab
In my opinion it's overhyped. I'll probably take point hits for claiming it,
but so be it. Let the truth ring.

Why copy techniques meant for Netflix and Facebook when your org or app is
most likely 1/1000th their size. Phallic size jealousy at work.

Most concurrent and parallel work can and should be done on a true-and-tried
RDMBS for most orgs and apps. Use transactions/rollbacks properly and let the
RDBMS manage most the grunt work instead of reinvent the wheel in app code.

K.I.S.S. and use-the-right-tool-for-the-job.

~~~
thallukrish
If you want extreme scale, most data in memory, such as a search engine
processing millions of documents running data pipelines, then RDBMS isn't the
way to go. For other MVC types, for a reasonable scale out, straight forward
models with RDBMS should do.

~~~
tabtab
Note that existing RDBMS are gradually adding and improving their text search
engines. Of course there will always be specialized situations that need
dedicated high-end text search engines.

------
latrasis
Thank you for the great read! Wondering how io_uring would be put in place of
this situation...would be very interested in the authors review:
[https://kernel.dk/io_uring.pdf](https://kernel.dk/io_uring.pdf)

------
Jahak
Interesting article and a great blog

------
moring
I'm a bit disappointed that the article doesn't explain the need for a
memory/consistency model and how it interacts with CPU caches. Locks are the
easy part, and the article makes you think that with them you can now write at
least simple concurrent programs.

Why is that? I'm pretty sure that the author's intention is not to equip the
readers with the tools to make buggy programs, yet that is exactly what
happens here.

~~~
01100011
Don't the standard synchronization APIs documented in the article handle
memory barriers for you?

~~~
rrss
Yes. As long as you use correctly-constructed synchronization primitives (e.g.
pthreads), you don't need to worry about memory consistency.

When you need to start worrying is when you start implementing your own
synchronization (either rolling your own primitives or going lock-free).

'moring needs to clarify what they are talking about. It's perfectly possible
to write correct code using pthreads on modern hardware with no understanding
of memory consistency.

