Hacker News new | past | comments | ask | show | jobs | submit login
Parallelism and concurrency need different tools (yosefk.com)
166 points by mhd on May 15, 2013 | hide | past | favorite | 69 comments

Awesome article. I definitely agree that parallelism and concurrency require different approaches, and one is not superior to the other.

In Rust we've known for a long time that success in Servo will require many different parallel approaches, not just message passing between actors—for example, parallel layout requires performing multiple tree traversals on heterogeneous, garbage-collected doubly-linked trees, which seems all but impossible to do efficiently in a message passing environment.

We have plans to add safe, race-free support for parallel loops to Rust, in the same vein as the Parallel JS (River Trail) work. There is a prototype of the ideas in the par module:


Rust's type system is designed to enable safe parallel computation. The idea is to use uniqueness, lifetimes, and bounds on closures to achieve this: uniqueness is used to ensure that there's no way to sneak in and access the data structure you're operating on in a racy way, lifetimes are used to ensure that data in a parallel loop cannot leak outside the loop, and bounds on closures are used to ensure that closures can't capture mutable data and race on it.

Finally, one nit: "Rust won't let you share memory unless it's immutable." isn't quite accurate. There is a reader-writer lock type in the standard library, which lets you share mutable data, but uses lifetimes to make sure you take a lock on it. (You have to take a lock on the whole structure at a time, to avoid the bank account problem you described.) We haven't used it much though, because we try to avoid such heavyweight locks in performance-sensitive code.

I'm sorry if I misrepresented the way things were in Rust; I also gleaned from some places that you might add things like parallel loops in the future, and was pretty sure that you were heading towards the same place as, say, Haskell where you have tools for both concurrency and parallelism.

I'll update the article to link to your comment.

No problem! Rust's documentation is pretty scattershot at this point in its life.

I think the author is missing the point regarding Haskell's concurrency vs. parallelism.

"Rule of thumb: use Pure Parallelism if you can, Concurrency otherwise."

This isn't talking about parallelism vs. concurrency, the concepts, this is talking about Parallelism vs. Concurrency, the modules (capitalized). The emphasis here is on using PURE when you can (meaning no side effects like disk IO).

In Haskell, when you're working with pure computations, Haskell can make guarantees about what's going on and effectively manage the concurrency for you. From this concurrency, you can get parallelism almost for free (meaning no extra programming restructuring on the devs part). So the module is called Control.Parallel.

If you want to be doing parallel IO with Haskell, concurrency can't be managed automatically for the dev. This is Control.Concurrent.

"The people behind Haskell realize that one tool for both isn't good enough."

Both Control.Concurrent and Control.Parallel are for parallelism! The difference here is whether you're parallelising pure or side-effectful computations and by extension can get free concurrency. Haskell doesn't have different tools for parallelism and concurrency. It has different tools for parallelising pure vs. effects.

If you have parallelism, you have concurrency. You can have concurrent processes without parallelising them though. Simplifying: parallelism means "at the same time" and concurrency means "can safely happen at the same time".

OK, so here's another quote (http://www.haskell.org/haskellwiki/Concurrency):

"This page contains notes and information about how to write concurrent programs in Haskell. If you're more interested in performance than non-determinism, learn about parallelism first."

Here it's not capitalized so it must be the concepts and not the modules...

"If you have parallelism, you have concurrency" - not according to Rob Pike, who called Sawzall parallel but not concurrent.

The main point I was making though was that computational code can be easily checked even if it's not pure - and the fact that you can express it as pure is a result of the same thing that let you easily check an impure spelling as well (basically it's determinism, and avoidance of constructs where determinism is hard to prove.) For event handling systems you can't easily check them, for much the same reasons that you can't express them as pure code.

Parallelism and concurrency are certainly different. Don't think I'm saying otherwise. Our disagreement is that I'm saying parallelism is a subset/type of concurrency and you are saying parallel execution can somehow not be concurrent execution.

I still stand by my assertion that you misunderstood the quote from the Haskell wiki:

"Rule of thumb: use Pure Parallelism if you can, Concurrency otherwise."

as something that supports your claim that parallelism is not concurrency.

From Wikipedia:

"In computer science, concurrency is a property of systems in which several computations are executing simultaneously, and potentially interacting with each other. The computations may be executing on multiple cores in the same chip, preemptively time-shared threads on the same processor, or executed on physically separated processors."

I find it hard to believe that parallel execution of multiple threads aren't "computations executing simultaneously" and therefor aren't concurrent.

Do you have any more detail about why you (or Rob Pike) think that parallel execution is not also always concurrent execution?

Pike's definition was "handling many things at once" vs "doing many things at once"; Sawzall or, to take a simple example he also used, a dot product where you compute sub-products in parallel, is basically doing one logical thing at a time - you have your inputs and you're computing the output. You're doing it with more resources but it's one thing. SIMD instructions or superscalar execution is like that and everybody calls it a form of parallelism and nobody ever calls this concurrency, though the Wikipedia quote could be used to defend calling this concurrency.

I think I don't want to argue about whether parallelism is a special case or not, really; I'm fine with it being a special case, as long as I get the point across that this special case deserves special treatment and is underserved by tools aiming at the general case...

Seems reasonable. Worth the discussion at least. Thanks for responding to my comments.

I don't think this does a good job of clarifying concurrency and parallelism - rather it muddies the waters, or perhaps I should say will have have the result of confusing many people.

While people are generally clear on what parallelism is, or at least find themselves in very much the same ballpark, there seem to be two separate and completely different common usages of 'concurrency' in vogue as part of current technical vernacular.

Articles like this draw hackles and controversy because because two groups of people who've defined the same word differently refuse to realize that they're having two completely different conversations.

While the may be a 'correct' usage, it's functionally meaningless if large segments of people refuse to recognize the 'correct' definition.

...meh. Though the article actually had some interesting things to say, it seems to sure to have them overlooked just for the sake of begetting another round of terminology war.

So which terms do you suggest? What I was talking about was event handing systems needing different things than computational systems, and misapplications of tools designed for the former in the latter. The way I use the terms "concurrency" and "parallelism" is largely consistent with the way language designers involved in Haskell, Rust and ParaSail use these terms.

What other widespread uses of these terms conflict with my usage, and what should I have said to avoid "another round of terminology war"?

Your usage is slightly different than mine. My usage comes from what I learned way back in my operating systems class in undergrad. I've been doing work in parallel computing of various forms for a bit now, and this is how I think about it (cribbing from what I wrote here a few months back):

Concurrency is a statement about a duration of time. Parallelism is a statement about a moment in time.

During a one second window, did both of the tasks execute? If yes, then they are concurrent. If no, then they are not concurrent in that window.

Was there any moment in time when both tasks executing simultaneously? If yes, then they executed in parallel. If no, then they did not execute in parallel.

From this, it's easy to see that if something is parallel, it must also be concurrent - in order for two tasks to execute during the same moment in time, they must also both have executed during a window of time.

Concurrency without parallelism happens on machines with a single core. Processes and threads must time-share the only core. Frequently, some of these processes will have multiple threads, so the threads need to synchronize with each other, even though they don't execute at the same time.

Concurrency and parallelism implies non-deterministic interleavings of threads and processes, but whether or not the result is deterministic is up to how we program.

I don't really know what terms are best to use, all I know is I've met a fair number of people who don't think context switching is a true form of concurrency.

I think a large segment of this group sees parallelism as many backhoes digging one hole (or working as a team on a few holes) and concurrency as many backhoes all digging their own holes, roughly speaking.

I also encounter another (smaller) group of people who blur/conflate concurrency with distributed systems, probably because of things like websites bragging about how many 'concurrent connections' they can serve.

These two groups can blur with each other fairly easily because they share what I'd term an 'intuition overlap.'

Oddly enough, I think this helps explain why many people are often so confused about things like NodeJS. Touted for its concurrency, a lot of people first using it seem to think that it's somehow parallel by nature.

I think the reason this kind of confusion is so prevalent is that in vernacular English, concurrent is essentially a synonym for 'simultaneous,' while as used in programming (along the lines of how it was used in this article) it comes to mean something more like 'how many balls can a juggler keep in the air,' with highly concurrent systems being able to juggle many more balls with the same two hands, so to speak. And in fact, in a way, all the balls are concurrently in the air, but it's not (I think) the most intuitive thing, hence why many vehemently disagree about usage.

When you find yourself arguing about whether a tree falling in the forest causes a sound, the right thing to do is to start talking about vibrations and audio sensations instead. Ambiguous words can almost always be bypassed by using more words to nail down meanings properly.

And, more elaboration on the subject than necessary: http://wiki.lesswrong.com/wiki/A_Human%27s_Guide_to_Words

I think the solution is simple: define your terms (or link to a definition that agrees with yours).

More to point, please do this. I think it would make your article less confusing and accessible to more people.

In my opinion the article is somewhere between missing the point and plain wrong.

Concurrency is a property of a set of tasks describing dependencies among these tasks. Think of it as a graph with a vertex for each task and an directed edge between two tasks if one task depends on the other tasks result. Two tasks are concurrent if there is no dependency between them. Concurrent task can be executed in parallel, but do not have to. Two non-concurrent tasks can not be executed in parallel,

Parallelism occurs during runtime when independent task are actually executed at the same time (no matter if you do this on two cores or if you are just interleaving the execution of both tasks on one core). Executing two tasks in parallel requires that these two tasks are concurrent.


Two (read-only) web requests are concurrent because there are no dependencies between them. Each web request consists of some subtasks like parsing the request, querying the database, performing some business logic and building and sending the response. These subtasks are not concurrent because each one depends on the result of the one before.

There are now (at least) two options for processing these both web requests - one after the other or both in parallel. In the first case you ignore the concurrency between both web request, in the second case you exploit the concurrency between both web request by parallelizing their execution.

So IMHO there is no choice between concurrency and parallelism. Concurrency is a property of the system you are building (and there is not much you can do in order to add or remove concurrency). You then have a choice to introduce parallelism to exploit concurrency or you just ignore cocurreny, but that's it.

OK, so how come Rob Pike calls his own Sawzall language parallel but not concurrent?

I presented your way of looking at it in the article more than once and I said it was valid, just not the only one; then several comment threads emerge reiterating that view, more or less wrongly, and arguing that it's got to be the one and only way of looking at it. Sigh.

IMHO - and I may of course be wrong - there are (widely) accepted definitions for concurrency and parallelism in the computer science community and in my comment I tried to present them as good as I could (but the comment is far from being flawless). Assuming I am correct with the previous sentence, there is no way of interpreting concurrency and parallelism this way or another because the terms are fixed by their definition. You can of course look at their relationship from different angels and highlight or ignore different aspects in different situations but this does not change their basic relationship or even definition.

I can think of two reasons why one would call a language parallel but no concurrent. First this could obviously happen because somebody is not aware of the meaning of the concepts or uses the words in a non-standard way but I think this is probably not the case - I would be surprised if one designs a concurrent/parallel language and does not know the distinction.

My best guess is that the intended meaning of the statement is, that the language is no good for general concurrent task but well suited for SIMD-like concurrent tasks, that is running the some operations on multiple different inputs in parallel. (I have to admit that I know nothing about Sawzall and did not invest any time to figure this out, just guessing.)

Wonderful explanation. Thanks.

I think the comment that synchronization around mutable data doesn't prevent all data races (as in the nicked iphone and the bank acount debit and credit example) is really important.

A very good discussion on this can be found between Rich Hickey and Cliff Click on the Clojure mailing list


This exposes an important, and to me non-obvious, property of concurrency. That it's not the locking that's really hard, it's how to be sure that every piece of related data is included in the lock (or STM).

I think that according to standard terminology, syncing accesses to mutable data does prevent all data races - but not all race conditions: http://blog.regehr.org/archives/490 (Race Conditions vs Data Races)

The nice thing about eliminating queue-like interfaces - and locks, STM, message passing, lock-free containers etc. are actually a form of a queue-like interface - is that it eliminates all race conditions which aren't data races. Then you can have automated debugging tools find said data races.

Unfortunately, this doesn't work with event handling systems, because they're often required to have queue semantics to resolve inevitable conflicts between requests.

Can you give an example of a system where queue-like interfaces have been removed but data races still exist. I feel that there are some important, and interesting, details I am not grasping.

Sure; a parallel for loop with each task processing a[0], a[1], ... a[N] has no data races but if task 0 modifies a[0] and reads a[1], task 1 modifies a[1] and reads a[2], etc. then this loop is full of data races without there being anything queue-like (that is, you never say "I'll do this thing here unless someone beats me to it in which case I will let them finish first.")


The author is making the same classic mistake of conflating concurrency and parallelism. The main difference here is that the author first creates a concurrency straw-man that he then compares to parallelism, but then when talking about parallelism he makes the same old mistake of conflating the two.

Concurrency is about modeling a problem as an orchestration of multiple things going on at once. It is an abstraction mechanism in much the same way as objects or functions. Whereas parallelism is a machine level occurrence like cache or registers. Not something to be ignored for sure, but not something typically targeted when talking high level abstractions.

I might be confusing things here but does concurrency, at least the definition give here, not directly map to task parallelism?

Task parallelism and data parallelism are two important concepts and the tools we need to implement them are very different. The two words used by the author to convey the difference between those two concepts are not chosen optimal in my opinion. There seems to be a hierarchy between concurrency and parallelism whereas task parallelism and data parallelism are directly comparable.

Task/data are orthogonal to concurrent/parallel.

"Task parallel": compute a() in parallel with b().

"Data parallel": compute a(0), a(1), ..., a(N) in parallel.

"Task concurrent" (not idiomatic, but hear me out): handle mouse events concurrently with keyboard events.

"Data concurrent" (again not idiomatic): handle events from 1000 sockets concurrently.

I don't see how

compute a() in parallel with b() and

handle mouse events concurrently (could be a()) with keyboard events (could be b())

differ and how does

compute a(0), a(1), ..., a(N) in parallel differ from

handle events from 1000 sockets concurrently which would be something like a(event0) a(event1)?

Don't we require the same tools for "task parallel" and "task concurrent" as well as "data parallel" and "data concurrent"? To me they don't seem orthogonal but I believe they are very much the same thing.

How timely! I just wrote a blog post that explains STM in haskell with pictures: http://adit.io/posts/2013-05-15-Locks,-Actors,-And-STM-In-Pi...

I tend to agree. I've raised some hackles with colleagues over, "splitting hairs," on the difference between concurrency and parallelism. They're two different things and this articles does a pretty good job of pointing out some of them.

I've always thought of parallel computation as lines in time that never intersect or share resources but are computed in the same order at the same time:

The results of each line are part of one computation. The problem is split over space and their computation happens in the same order as a single task.

Conversely, concurrency is a little more lenient on causality; the lines are scheduled co-operatively and the computation will complete at some point when all the tasks are complete.

Causality in programming languages is certainly an interesting area to improve upon. Some interesting perspectives are http://bloom-lang.org and my own attempt at implementing it: https://github.com/agentultra/knockblock which attempt to address the causality of eventually-consistent distributed replicas.

It's as simple as, something that's concurrent (simultaneous) may not be parallel (physically) but if it's parallel it's also concurrent. When used as a verb, parallel is interchangeable with concurrent.

_yosefk, I'm trying to digest your argument and I'm wondering if I have this correct:

1. Concurrency is a property of any problem that involves resource contention.

2. Parallelism is merely a performance optimization for certain classes of problems.

3. Tools that address the concurrency problem are capable of performing the parallelism optimization by recasting a problem as a resource-contention problem.

4. However, casting a problem as a resource-contention problem introduces necessary nondeterminism and likely race conditions, as these are inherent properties of the problem domain for which these tools were designed.

5. Instead, use a tool that is explicitly designed to address parallelism in a deterministic, race-free manner.

Is this the essence of your essay? Personally, I'd love to see an example of a small real-world problem solved in e.g. Erlang vs. the same solution in checkedthreads.

It's like you said, roughly, though it's certainly possible to use a system designed for concurrency without introducing non-determinism - by not letting the queues provided by the system (message queues, STM, locks, thread-safe data structures - whatever type of queue) affect semantics. As in, you could easily implement a nice load balancer in Go for computational code and you could implement a checker to make sure the code is deterministic, perhaps based on their race detector. I'm only saying that it should be done, not that it can't be done...

Erlang vs checkedthreads (or Cilk or anything C-based), computing an image convolution kernel or handling some other computational problem? I could, though I think such a showdown with Erlang predictably ripped to shreds would be trolling, and I really like Erlang - I don't necessarily like Armstrong's marketing where he says Erlang makes more sense than C for parallelizing computations (seriously - you gonna send the image sub-regions via message queues byte by byte?), but I like Erlang, and it'd be a pity if anyone looked at such a silly showdown and conclude that Erlang was inferior - it's not, it's just for different things entirely.

I use deterministic results as my #1 debugging tool.

I do this by building queue-based concurrent systems with a resulting nondeterministic order of execution but I then insure that the results themselves are deterministic.

Consider it to be a single queue with multiple vending machines to process requests. No one knows which vending machine will dispense their soda, but they will always get the soda they desire.

No need for special tools there. The overhead of insuring a deterministic result I'd estimate is <2% (the use of 32-bit float to 64-bit int type conversions (fixed point) and 64-bit versus 32-bit atomic operations on GPUs for accumulation). The biggest problems I run into are end users who'd happily trade that predictable output for that <2% perf gain and competitors who are happy to oblige them.

And the problem there is that I don't know how you diagnose race conditions or the quality of the underlying algorithm without a deterministic output from a deterministic input.

Recently, someone told me they address this problem by running single-threaded to check for race conditions. Sigh... But I guess that lets them evaluate the model.

Well, I wouldn't call that "concurrency" just because there's a queue in it - I'd call it "parallel but not concurrent"; or we can call it differently - I really don't claim to use the terms in "the right way" - but that's what I was referring to as "parallel, not concurrent", anyway.

"Insure that the results are deterministic" - how? Race detectors deterministically insure that results are deterministic :) which I always found nice... "No special tools" - you mean you just run the thing several times and check the results don't change, or "just" write the code correctly? How do you know it's yielding deterministic results except that it's supposed to do so by design without any tools?

I run the code multiple times on different GPUs of the same generation (not necessarily the same model) in different physical machines and get the same bit-accurate output. You're right, technically, there could be a race condition somewhere somehow that just doesn't fire very often. It isn't proven to be free of race conditions, but in practice I suspect it is so (right up until it isn't, a situation often detected by end users using a code path in a unique way, at which point it's a bug, and I fix it).

But in practice, since it works like gangbusters to find broken GPUs if one checks for that deterministic output on known working examples, even finding broken GPUs that otherwise pass existing memory tests, I'm reasonably sure it works.

PS in my experience, the existing automated GPU tools for finding race conditions Heisenberg the process sufficiently that the race conditions stop occurring.

So in practice, I end up running the code as usual, but writing all the intermediate data to a buffer, dumping that to a file with a key for ordering it, sorting it, and then diffing an enormous text file to find out what's inconsistent between two runs.

If race detectors so behave, then they aren't very good race detectors - most aren't good, in fact, because they can't be very good, because they support concurrency (that's why Helgrind misses bugs and Cilkscreen doesn't), which was a part of my point.

It's a pity if GPUs, which are really a platform for computation acceleration and not for concurrent event handling, lack deterministic race detectors; a good detector not only wouldn't mask a bug that was occurring, but it would detect a bug which perhaps you never observed until running the tool.

I see; deterministic by design then. I guess I'd trust myself without much tooling as well; my background is a hundred of developers hacking on a million of lines of code, hence my love for automated debugging tools.

Heh, this is one of the main arguments I make in support of deterministic results - hundreds of developers are nearly guaranteed to introduce race conditions - and if your output isn't deterministic, how else can you detect them?

I don't even trust myself to not introduce them hence the harping on determinism.

It mostly falls on deaf ears though. I would love an automated tool that genuinely found these for me. I would immediately incorporate it into my QA tests.

Seriously? I mean, it can't be that hard; if the GPU code is written in OpenCL then it could be cross-compiled to the CPU instead of the GPU and then the checkedthreads checker or the Cilk checker or some such would be almost applicable to the problem, perhaps with minor tweaks.

We can discuss it here or you can drop me a line at Yossi.Kreinin@gmail.com - it sounds like an interesting thing to hack on, and not incredibly difficult, though I may be missing something and I'm curious to find out what.

> Personally, I'd love to see an example of a small real-world problem solved in e.g. Erlang vs. the same solution in checkedthreads.

Wide Finder 2 would be a good demo for checkedthreads, although it does involve some amount of concurrency in io.


A thought: How does one apply this to socket-based IO?

As far as I can tell, everything involving sockets goes through tons of queues like the ones he describes; concurrency, not parallelism.

If I kick off reads on ten different sockets, when the read completes the kernel has to put the thread waiting for the read onto a 'ready to run' queue before any work can actually happen. Scale up to 1000 pending reads and now that queue could get pretty full pretty quick, and lots of data is moving in and out. As I understand things, select and epoll improve on this by making the queue more efficient: now instead of using a thread per socket, and having the OS queue up threads on the ready-to-run queue, the OS helps you manage a queue of 'ready sockets', and you respond to those manually. But there's still a queue!

Are there IO APIs out there in modern OSes that let you reduce queueing in such a way? Like, hypothetically if performing a read on a socket required me to pass in a callback that would process the data from the read, the OS could respond to incoming data on that socket by immediately running the callback, whether it's on an OS-managed user-mode thread pool or via some other mechanism. This would at least reduce the amount of queueing going on - at most you have a queue of tasks for a thread pool that are responding to complete IO (similar to epoll, I guess?) but I bet there are ways to improve on that too.

(If memory serves, IO Completion Ports on NT have functionality that looks like this... but I bet it still involves lots of queueing and scheduling)

I think that this angle adds another dimension - dealing with latency and not letting it hurt throughput. The problem could be "embarrassingly parallel" - 10000 processes with 10000 sockets not ever interfering with each other - or it could be concurrent, with a lot of communication between the processes and with event ordering greatly affecting results.

Latency would regardless be a problem - the fact that you read from a socket and you get blocked for a while, and can you find something useful to do in the meantime or not, and how easy it will be to get back to what you were doing once the data arrives.

In this sense, sockets are not fundamentally different from, say, DRAM latency - it's a much smaller latency, but you still want to do something in the meanwhile.

It depends upon how big of a picture you want to look at. E.g. if you look down to the hardware level (your single ethernet cable connecting your computer to a router) it is again no longer parallel (from what I recall - it's been a while since I've studied this).

In the parallel gifts example, they had to somehow decide who would buy people what, which tends to not be very parallel. If the store has only one entrance then that too turns into a non-parallel part of the problem. If the gifts were mailed there's only 1 mailman who delivers to an area on a given day. And so on.

You're probably always going to find resource contention somewhere, it just isn't always something you have to deal with directly when solving your problem.

If we just let kernel do the job, it's essentially a queue. If we do the packet processing ourselves, it can be a parallel problem.


Good article but perhaps focused too much on the single machine case and/or shared memory architecture. I think that getting the most our of parallelism at scale on commodity hardware is facilitated by or even necessitates good concurrency tools.

Ie the results of computation do no one any good if they are sitting in memory spread across compute nodes in the cloud and combining them back down to the point that they can be presented to the user as quickly as possible usually involves a fair amount of concurrent code.

The vast - VAST - majority of computing power in the world today is standalone devices (PCs, phones, tablets and innumerable embedded crap with stunning horsepower - each car has ~90 microprocessors for a single example). Most of these standalone devices are multicore and it'd be nice to use the extra compute capability somehow, and it'd be a pity to waste performance by using tools designed for the multibox case which isn't going to happen outside a small share of use cases, even if those use cases are loud.

That said, I didn't focus on the single machine case or shared memory architecture, I think - it was more about correctness. A parallel system is very easy to get right compared to a truly concurrent system; I mentioned memory access monitoring but it can be other communication instead of through memory - the point is that you know a parallelism bug when you see one, but not a concurrency bug, because concurrent systems have non-deterministic results and sometimes it's correctly non-deterministic and sometimes incorrectly so.

In particular, though I don't work on these things, I think tools for detecting parallelism bugs that scale past a single chip/box would be very valuable for large compute clusters - and there again using scalable message-passing-based systems without such tools would underserve computational workloads.

Perhaps I am the one is over focused on distributed use. However, their are a lot of servers in the world ;)

Also how do you classify work loads like physical simulation that are one of the classic use cases for parallel processing but not embarrassingly parallel in that they need lots of communication between cores?

Better tools for analyzing distributed computation would be cool. I think that we would have to think carefully about the kind of bugs to look for. Since data access/distribution is slow more thought tends to go into it (and there are systems in place for doing it well) but there are of course loads of other issues that show up.

I think if the simulation is deterministic it's parallel, if it's OK with not being deterministic then it's concurrent; I'd fight to keep it parallel and not concurrent, but I don't know how hard it is, maybe in some cases you can't realistically win the battle.

I very much think it's valuable to treat distributed, parallel systems specially and not just use concurrency-oriented tools - it's better to exploit determinism for automated debugging, it's just not my area at the moment - I work more on shared-memory systems. I was involved in some work with parallelizing on multiple boxes at process granularity, but it wasn't elaborate enough to go deeply into any of these issues.

Things definitely get messy in scientific computing. It is common to have algorithms that are stochastic for other reasons.

I'm working on a parallel random forest implementation at the moment...very little communication between threads is needed but the algorithm itself is stochastic (unless you seed your random number generators in a clever way i guess). (And I've use go's concurrency primitives and race tool for things like writing the forest to disk as trees are grown in parallel and compiling summary statistics).

I do think that tools like the one you are writing are very useful and needed. I just have a bad habit of wanting everything to meet my needs.

Correct me if I'm wrong but Cilk and Cilk Plus are different technologies so I'd write Cilk Plus instead of Cilk to prevent confusion. And where's Scala with Akka? Nice post though.

I do think this article from peter van roy has pretty clear and useful definitions: https://docs.google.com/viewer?url=http%3A%2F%2Fwww.info.ucl...

See page 25 and 34.

When I started doing multi-threaded programming a number of years ago there really wasn't any confusion about this, parallelism, concurrency, preemption and contention were just some of the basic terms you had to learn.

Why is there so much more confusion now that we have "better" tools?

There's plenty of concurrency and parallelism out there but there's not a single piece of important software that is written in a functional language. Many people claim many things but when it comes to empirical evidence. it turns out that imperative, stateful programming is still the best choice. Erlang has some limited and very niche uses in telecom, bua it was replaced even at Ericsson to good degree with c++.

>There's plenty of concurrency and parallelism out there but there's not a single piece of important software that is written in a functional language.

Important to who? I use tools written in functional languages every single day.

>Many people claim many things but when it comes to empirical evidence. it turns out that imperative, stateful programming is still the best choice.

You don't see the obvious hypocrisy there?

>>Important to who? I use tools written in functional languages every single day.<< To the market. There are all kind of servers from web servers to database servers to other kinds of servers. Almost none are made using functional programming. Then, there is lot of software out there, which is made almost exclusively in C/C++, Java & .Net. Haskell & co. are only used in academia and by hobbyists (math people who can't learn proper programming)

Well, at least you were funny that time.

> Parallelism and concurrency need different tools

Not if you're using Akka.

Care to elaborate? Specifically, what mechanisms are used for handling conflicts (two withdrawals from the same bank account), and what mechanisms are available for preventing/detecting conflicts (two tasks modify the same array index)?

Akka is an actor model concurrency library (so instead of threads and locks -they are abstracted away- you pass immutable messages between actor objects which are managed by other actors with queues - there are locks though if you really need them) for scala and java that just happens to use the same interface (more or less) for distributing work between different nodes. It's a tool for both. I don't know of anything else like it.


OK, so I'm seeing the usual concurrency stuff; what I'm not seeing is conflict detection/prevention for parallel stuff, where you don't let two parallel tasks modify the same array element or some such.

> what I'm not seeing is conflict detection/prevention for parallel stuff, where you don't let two parallel tasks modify the same array element or some such

I could be wrong but based on your own article, that doesn't sound like a parallel task. Aren't parallel tasks already assigned work that don't conflict ala map reduce?

> "concurrency is dealing with inevitable timing-related conflicts, parallelism is avoiding unnecessary conflicts"

did I miss something?

Yeah, the question is what if you have a bug - if you thought you were assigning work so that tasks don't conflict but in fact they do. In Map/Reduce you technically can't have this sort of bug, nor can you have it in pure code; restricting the programmer so that such bugs can not be committed is fair enough. If it is in fact possible to have such bugs, as it is with shared mutable memory, then I expect to see tools for finding such bugs.

>If it is in fact possible to have such bugs, as it is with shared mutable memory, then I expect to see tools for finding such bugs.

Well with the actor model, it is less likely since you're supposed to be passing immutable messages around and you're not coordinating threads around locked resources.

The claim that haskell thinks you need separate tools is silly. The fact that the language provides abstractions for parallel computation and for concurrency does not mean the language or its designers consider them unrelated problems requiring different tools. Haskell is a particularly bad example to support this sort of misleading claim, as the language supports both out of the box without using different tools, and the same page you quoted even says that (http://www.haskell.org/haskellwiki/Parallelism):

>Concurrent threads (forkIO) will run in parallel

The simple example being say a webserver. You use forkIO to spawn a green thread per connection. This gets you concurrency. Now if you want parallelism, you simply tell the haskell runtime to multiplex those green threads over more than one OS thread. You don't have to change your code at all. The whole argument and terrible gift analogy seems to rely on redefining parallelism to mean "making sequential code parallel". I think most people are in fact thinking of making concurrent code parallel when they talk about wanting parallelism, like the webserver example.

From here: http://www.haskell.org/haskellwiki/Concurrency

"This page contains notes and information about how to write concurrent programs in Haskell. If you're more interested in performance than non-determinism, learn about parallelism first."

From here: http://www.haskell.org/ghc/docs/7.0.3/html/users_guide/lang-...

"GHC implements some major extensions to Haskell to support concurrent and parallel programming. Let us first establish terminology:

* Parallelism means running a Haskell program on multiple processors, with the goal of improving performance. Ideally, this should be done invisibly, and with no semantic changes.

* Concurrency means implementing a program by using multiple I/O-performing threads. While a concurrent Haskell program can run on a parallel machine, the primary goal of using concurrency is not to gain performance, but rather because that is the simplest and most direct way to write the program. Since the threads perform I/O, the semantics of the program is necessarily non-deterministic.

GHC supports both concurrency and parallelism."

Finally, the part that I quoted in the article:

"Rule of thumb: use Pure Parallelism if you can, Concurrency otherwise."

How isn't this "providing different tools for the two and recommending to use them in different situations" I don't know.

It is almost like you didn't read my post. They provide forkIO as a basic tool to do both. That is explicitly not different tools. They also provide additional abstractions to make different tasks easier. Tasks like parallelising sequential code, which you want to pretend is the only kind of parallelism that exists. Haskell provides different abstractions for lots of things, that doesn't mean those things require different tools. It means convenience matters.

>If you're more interested in performance than non-determinism, learn about parallelism first."

First, not instead of.

>GHC implements some major extensions to Haskell to support concurrent and parallel programming

Yep, that really sounds like they require different tools, the way the mention them together as a single topic.

>GHC supports both concurrency and parallelism

And does so with the same tool: forkIO. Completely contrary to your claim that haskell is evidence that it requires separate tools. This is once again made very clear on a page you link to:

>Ordinary single-threaded Haskell programs will not benefit from enabling SMP parallelism alone: you must expose parallelism to the compiler. One way to do so is forking threads using Concurrent Haskell (Section 7.18.1, “Concurrent Haskell”), but the simplest mechanism for extracting parallelism from pure code is to use the par combinator

They are literally saying "use a higher level abstraction instead of rolling your own directly with concurrency constructs". This is the same advice that is always given when a higher level abstraction is available. It directly contradicts the claim that parallelism requires a different tool. It explicitly states that you can do it with the same tool.

"It is almost like you didn't read my post."


"They provide forkIO as a basic tool to do both."

Computers provide bits to do both and then some. You can call the ensuing distinctions "matters of convenience" and not "requiring different things", of course.

>You can call the ensuing distinctions "matters of convenience" and not "requiring different things", of course.

Or you can read the code and see that it is quite literally an abstraction layer built on the same tool, not two different tools. That is like saying folds and recursion require different tools, and providing "haskell has both recursion and folds" as proof. Of course it does, but folds are simply an abstraction on top of recursion.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact