Hacker News new | past | comments | ask | show | jobs | submit login
Teaching Concurrency (2009) [pdf] (microsoft.com)
151 points by luisfoliv on Sept 21, 2016 | hide | past | favorite | 44 comments



Highly recommended: Allen Downey The Little Book of Semaphores (free pdf here: http://greenteapress.com/wp/semaphores/)


I think the first thing people should be taught about concurrency... is when not to use it.

Concurrency can result in increased maintenance costs and complexity.

Concurrency is also not more efficient on a single core.

Concurrency can help with latency and response time.

In embedded systems in particular, there is an over-use of concurrency which often results in bloated, complex code.


> Concurrency is also not more efficient on a single core.

Concurrency can be more efficient even on a single core. When blocking synchronous I/O is involved, concurrency may help saturate the bandwidth with multiple in-flight requests.


if you are in an I/O bound situation, then yes, concurrency will allow you to exploit some parallelism.

Otherwise, no.

(and in general, people just throw concurrency at the problem instead of analysing whether they are in fact I/O bound or CPU bound).

(Downvoted why??)


you may have been downvoted for making an unsubstantiated claim that most people don't know when their CPU is running at 100%


...and that is true.

When was the last time you monitored what parts of your system were truly I/O bound and not just blocked on another task?


People that end up in a position to care about performance are using profilers and bechmarks. This is actually the first rule: don't guess, measure.

Of course it helps when the platform has the proper tools for it. I end up using JMH and YourKit Profiler weekly, because a memory or CPU leak can crash our process and we have pretty strict reliability requirements with a single server processing about 5000 events per second - not extremely demanding, but when it crashes, we can lose money, with the redundancy infrastructure being pretty new.


This very thing has occupied my day job for at least four months now. It happens when a pre-existing system one has to improve has lots of DB-heavy and compute-heavy operations commingled.


Yesterday?


> I think the first thing people should be taught about concurrency... is when not to use it.

Concurrency is usually not a feature of the algorithm but a feature of the problem domain. If you have many requests coming in at the same time, all competing for a limited amount of computational resources -- you have concurrency.

How you handle it is a different matter. You can decide to handle the requests one by one, but then, by Little's law, your throughput would be very low, and your server will crash if the rate of requests is over some small limit (which depends on the time it takes to service each request).


No. If the problem is computational resources, performance can't be increased by mere concurrency (while even completely deterministic STM-style parallelism would help).

Concurrency improves performance when a process accesses both the same computational resource and some other high-latency sharable resource.


What is your "no" referring to? I totally agree with you. When I said "computational resource" I wasn't referring necessarily to CPU, let alone on the same machine, but resources of any kind. When one request is blocked, say, waiting for the DB (a computational resource), another can do some computation.


I think you're confusing concurrency with parallelism. Indeed, when these two are combined in the same process with multiple threads sharing memory or other resources, you're effectively juggling with knives.

But in fact concurrency is inevitable in absence of OS threads that can be blocked (another potential clusterfuck) or of some form of continuations support, because it is a direct consequence of asynchrony.

And asynchrony isn't avoidable, all you can do is to find abstractions that make it more deterministic.


The next thing they should be taught is that even if an existing serial implementation can be made more efficient using concurrency, that doesn't mean it should be. That should be followed quickly by teaching that concurrency should be implemented over the smallest possible surface of the code.


absolutely!


Do you have any suggestions of ressources to learn more on that specific topic. I am interested on ressources to learn more about when to use tasks or when it is better not to. Especially for embedded systems. I find now people use tasks for anything without having a real idea of the costs. I am also interested on when it is needed to use an embedded OS or when you could better do without one. Any kind of ressources, book, website etc is welcomed.


Unfortunately, this is the type of real-world problem that isn't well taught (or even documented).

We are all taught from uni about how concurrency is implemented and most applications use concurrency as a design-tool to decompose a problem into its functions.

Unfortunately, this tends to produce a sub-optimal result as the inefficiencies become visible on small embedded/real-time systems.

Its difficult to give any general advice, but have a look at real-time analysis to get an idea of the real issues.... and don't blindly throw tasks at a problem when a simple superloop is more efficient/simple/maintainable.


> Concurrency is also not more efficient on a single core.

This isn't a hard and fast rule. There is overhead to parallelism.


And if you step out all the parallelism from your concurrency runtime, you get rid of a big chunk of the overhead. But even with purely cooperative single-threaded multitasking, there's still some overhead vs "ordinary" strictly serial code.


For CPU bound code, concurrency is pure overhead.

For I/O bound code, concurrency can give you some benefit.

...but in general, people do not analyse this before throwing tasks at a problem.


> ...but in general, people do not analyse this before throwing tasks at a problem.

Would that be just anecdotal ?


yes... in ~25 years in the business, over ~10 companies.

..and I can tell you that precisely none of them performed a correct analysis of a systems concurrency needs before 'designing' it.

A common artifact of this is far too many tasks.


I have seen egregious abuse of pre-emptive threads ... a lot. Abuse of green threads / fibres / coroutines ... not so much. I think one has to be a half decent programmer to be even aware those options.


When I was a kid, I loved playing transport tycoon deluxe videogame.

When I grew up to be a programmer, never had much problems with concurrent stuff.

IMO designing concurrent programs is conceptually similar to building complex high-throughput low latency railway networks in the game.


An open source version (remake?) of the game exists[1].

[1] https://www.openttd.org/en/




The only actionable advice in here for someone tasked with developing a curriculum for teaching concurrency is to make sure the prerequisite courses instill the idea of computation as a sequence of state transitions.


“Sequence of state transitions” implies you can order them by time. Generally, various state transitions happen in parallel, their relative order is undefined even on an SMP system.

You can serialize the transitions if you want, but that usually costs performance. This is especially true for distributed parallel computing, where the state is also distributed.


It sounds like you're modeling the state space wrong. In a concurrent system, the state is not just one thread's (or one processor's, or one machine's) internal data (including program counter). That would be like saying a (single-threaded) program's state is the contents of the registers and topmost stack frame. The state in a concurrent system is the combination of every thread's internal state. And what actually happens is a sequence of such arrangements. One arrangement is not realized without replacing the previous one.


I’m aware the state you were talking includes state of all threads.

When you combine several sequences of state transitions (one sequence per thread), you don’t get another sequence of state transitions. When two CPU cores perform two state transitions at the same time, you typically cannot determine whichever of those transitions happened first.

We might have different definitions what’s sequence. I mean this: https://en.wikipedia.org/wiki/Sequence As you see, global order is required for bunch of elements (in this case, state transitions) to form a sequence. And in concurrent and especially parallel programming, there’s no global order for those transitions. Hence, those transitions don’t form a sequence.


No, we just have different notions of system state. If we have a pair of, say, Turing machine heads on one tape that aren't necessarily moving in lockstep, I say that the system state is the tape contents, the two heads' locations, and the two heads' states. A transition for this system can be either or both heads moving either direction, overwriting the symbol they're pointing at, and changing its own internal state. The state {tape="000111000", head0loc=0, head0state=2, head1loc=4, head1state=6} might transition to {tape="000101000", head0loc=0, head0state=2, head1loc=5, head1state=2}, or it might transition to {tape="100111000", head0loc=1, head0state=2, head1loc=4, head1state=6}, or to {tape="100101000", head0loc=1, head0state=2, head1loc=5, head1state=2}. In a given run of the system, one of those is what actually happens. Then another happens after that, and so on. So we get a sequence of them.

A transition is not one machine head changing its position and state. {head0loc=0, head0state=2}->{head0loc=1, head0state=2} is not a transition because that does not identify a pair of whole-system states.


> one of those is what actually happens

No it’s not. If the heads aren’t moving in lockstep, what actually happens is you’ll extremely rarely have a moment of time when both locations are whole numbers.

At one moment you have { head1loc=0, head2loc=4.17 }, head #1 reached the cell position on the tape, head #2 is still moving. After a while, you’ll have { head1loc=0.872, head2loc=5}, head #1 is on its way, head #2 reached the cell position.

Looks like for a Turing machine with two asynchronously moving heads, there’s no usable concept of “state” at all. That’s very close to what happens when doing actual parallel programming on real hardware.

Good luck applying your idea of computation as a sequence of state transitions to that.


Pick a reference frame. Sample the state of each CPU on its clock edge. Now you have a sequence of transitions.

The sequence is heavily non-deterministic, but that's life in a concurrent universe.


A non-deterministic sequence ain’t a sequence.

The thing is called “partially ordered set”: https://en.wikipedia.org/wiki/Partially_ordered_set


I'm not sure it's useful to try to discuss concurrency with someone who refuses to believe that a group of machines can be in a state (which is really the point of the article).


Even if a Turing machine were a physical object (I had thought you would be aware that it's not), either one of those moves completes before the other, or both complete simultaneously.


Sure, it’s an abstract machine invented to research mathematical model of computation.

I’d like to point out, the very moment you hooked second asynchronously moving head to the machine, the abstraction leaked spectacularly. With two asynchronous heads, the machine no longer has state; suddenly we need to consider physical time, not just count steps; and so on.

Turing machine is useful abstraction for sequential computing, but nearly useless to research parallel computing problems. The same happens with many other abstractions and approaches when going parallel.

> either one of those moves completes before the other

Yeah, but in parallel computing, we don’t know that order, and have no way of knowing. Meaning the “sequence of state transitions” idea is IMO nearly useless.


if a Turing machine were a physical object ... either one of those moves completes before the other, or both complete simultaneously.

...depending on how fast you're moving relative to the machine heads and in what direction. Simultaneity does not exist in the real world.


We use the tape's reference frame, of course! Looking at data stored on some particular device in the physical system is going to force a particular reference frame, where you will have a defined simultaneity. A global definition of simultaneity is only really needed if your system promises to provide sequential consistency.


The state transitions can be partially ordered, but the way this works in TLA is that every computation is a set of legal behaviors (sequences of states).


So, what is the inductive invariant in that simple algorithm?


The inductive invariant is the correctness property itself (when all processes are done then at least one y is 1) in conjunction with "for all processes, when the process is not on line 1, then its x value is 1". You can probably replace the second conjunct with "for all processes, if the x value is 0, then the process is not done", but the proof gets a bit harder.


The proof then follows like so, by induction on the states of the program:

This is our partial correctness property

    PartialCorrectness ≜ AllDone ⇒ ∃ p ∈ ProcSet : y[p] = 1
And this is the inductive invariant:

    Inv ≜ PartialCorrectness
            ∧ ∀ p ∈ ProcSet : pc[p] ≠ "Line1" ⇒ x[p] = 1
We need to show that Inv implies PartialCorrectness (trivial), that Inv holds in the initial state, and that if it holds in any state s, then it holds in any possible consecutive step s'. It's easy to see that it holds in the initial state. Now, let's assume it holds in s, and prove for s'. To make this transition, some process p either executes line 1 or executes line 2. If it executes 1, then PartialCorrectness doesn't change because no new process is done. The second conjunct holds because we've just left line 1 and x has been assigned (by the definition of line 1). If we are currently in line 2, the second conjunct of the invariant doesn't change. By the definition of this action, we'll be done. Here we have two cases, either we set y to 1, or we set y to zero. If we set y to 1, we're done and PartialCorrectness holds. If we set y to 0 then by the assumption of the invariant, the process we depend on must not be done, hence AllDone is false, and PartialCorrectness holds. QED.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: