In Rust we've known for a long time that success in Servo will require many different parallel approaches, not just message passing between actors—for example, parallel layout requires performing multiple tree traversals on heterogeneous, garbage-collected doubly-linked trees, which seems all but impossible to do efficiently in a message passing environment.
We have plans to add safe, race-free support for parallel loops to Rust, in the same vein as the Parallel JS (River Trail) work. There is a prototype of the ideas in the par module:
Rust's type system is designed to enable safe parallel computation. The idea is to use uniqueness, lifetimes, and bounds on closures to achieve this: uniqueness is used to ensure that there's no way to sneak in and access the data structure you're operating on in a racy way, lifetimes are used to ensure that data in a parallel loop cannot leak outside the loop, and bounds on closures are used to ensure that closures can't capture mutable data and race on it.
Finally, one nit: "Rust won't let you share memory unless it's immutable." isn't quite accurate. There is a reader-writer lock type in the standard library, which lets you share mutable data, but uses lifetimes to make sure you take a lock on it. (You have to take a lock on the whole structure at a time, to avoid the bank account problem you described.) We haven't used it much though, because we try to avoid such heavyweight locks in performance-sensitive code.
I'll update the article to link to your comment.
"Rule of thumb: use Pure Parallelism if you can, Concurrency otherwise."
This isn't talking about parallelism vs. concurrency, the concepts, this is talking about Parallelism vs. Concurrency, the modules (capitalized). The emphasis here is on using PURE when you can (meaning no side effects like disk IO).
In Haskell, when you're working with pure computations, Haskell can make guarantees about what's going on and effectively manage the concurrency for you. From this concurrency, you can get parallelism almost for free (meaning no extra programming restructuring on the devs part). So the module is called Control.Parallel.
If you want to be doing parallel IO with Haskell, concurrency can't be managed automatically for the dev. This is Control.Concurrent.
"The people behind Haskell realize that one tool for both isn't good enough."
Both Control.Concurrent and Control.Parallel are for parallelism! The difference here is whether you're parallelising pure or side-effectful computations and by extension can get free concurrency. Haskell doesn't have different tools for parallelism and concurrency. It has different tools for parallelising pure vs. effects.
If you have parallelism, you have concurrency. You can have concurrent processes without parallelising them though. Simplifying: parallelism means "at the same time" and concurrency means "can safely happen at the same time".
"This page contains notes and information about how to write concurrent programs in Haskell. If you're more interested in performance than non-determinism, learn about parallelism first."
Here it's not capitalized so it must be the concepts and not the modules...
"If you have parallelism, you have concurrency" - not according to Rob Pike, who called Sawzall parallel but not concurrent.
The main point I was making though was that computational code can be easily checked even if it's not pure - and the fact that you can express it as pure is a result of the same thing that let you easily check an impure spelling as well (basically it's determinism, and avoidance of constructs where determinism is hard to prove.) For event handling systems you can't easily check them, for much the same reasons that you can't express them as pure code.
I still stand by my assertion that you misunderstood the quote from the Haskell wiki:
as something that supports your claim that parallelism is not concurrency.
"In computer science, concurrency is a property of systems in which several computations are executing simultaneously, and potentially interacting with each other. The computations may be executing on multiple cores in the same chip, preemptively time-shared threads on the same processor, or executed on physically separated processors."
I find it hard to believe that parallel execution of multiple threads aren't "computations executing simultaneously" and therefor aren't concurrent.
Do you have any more detail about why you (or Rob Pike) think that parallel execution is not also always concurrent execution?
I think I don't want to argue about whether parallelism is a special case or not, really; I'm fine with it being a special case, as long as I get the point across that this special case deserves special treatment and is underserved by tools aiming at the general case...
While people are generally clear on what parallelism is, or at least find themselves in very much the same ballpark, there seem to be two separate and completely different common usages of 'concurrency' in vogue as part of current technical vernacular.
Articles like this draw hackles and controversy because because two groups of people who've defined the same word differently refuse to realize that they're having two completely different conversations.
While the may be a 'correct' usage, it's functionally meaningless if large segments of people refuse to recognize the 'correct' definition.
...meh. Though the article actually had some interesting things to say, it seems to sure to have them overlooked just for the sake of begetting another round of terminology war.
What other widespread uses of these terms conflict with my usage, and what should I have said to avoid "another round of terminology war"?
Concurrency is a statement about a duration of time. Parallelism is a statement about a moment in time.
During a one second window, did both of the tasks execute? If yes, then they are concurrent. If no, then they are not concurrent in that window.
Was there any moment in time when both tasks executing simultaneously? If yes, then they executed in parallel. If no, then they did not execute in parallel.
From this, it's easy to see that if something is parallel, it must also be concurrent - in order for two tasks to execute during the same moment in time, they must also both have executed during a window of time.
Concurrency without parallelism happens on machines with a single core. Processes and threads must time-share the only core. Frequently, some of these processes will have multiple threads, so the threads need to synchronize with each other, even though they don't execute at the same time.
Concurrency and parallelism implies non-deterministic interleavings of threads and processes, but whether or not the result is deterministic is up to how we program.
I think a large segment of this group sees parallelism as many backhoes digging one hole (or working as a team on a few holes) and concurrency as many backhoes all digging their own holes, roughly speaking.
I also encounter another (smaller) group of people who blur/conflate concurrency with distributed systems, probably because of things like websites bragging about how many 'concurrent connections' they can serve.
These two groups can blur with each other fairly easily because they share what I'd term an 'intuition overlap.'
Oddly enough, I think this helps explain why many people are often so confused about things like NodeJS. Touted for its concurrency, a lot of people first using it seem to think that it's somehow parallel by nature.
I think the reason this kind of confusion is so prevalent is that in vernacular English, concurrent is essentially a synonym for 'simultaneous,' while as used in programming (along the lines of how it was used in this article) it comes to mean something more like 'how many balls can a juggler keep in the air,' with highly concurrent systems being able to juggle many more balls with the same two hands, so to speak. And in fact, in a way, all the balls are concurrently in the air, but it's not (I think) the most intuitive thing, hence why many vehemently disagree about usage.
And, more elaboration on the subject than necessary:
More to point, please do this. I think it would make your article less confusing and accessible to more people.
Concurrency is a property of a set of tasks describing dependencies among these tasks. Think of it as a graph with a vertex for each task and an directed edge between two tasks if one task depends on the other tasks result. Two tasks are concurrent if there is no dependency between them. Concurrent task can be executed in parallel, but do not have to. Two non-concurrent tasks can not be executed in parallel,
Parallelism occurs during runtime when independent task are actually executed at the same time (no matter if you do this on two cores or if you are just interleaving the execution of both tasks on one core). Executing two tasks in parallel requires that these two tasks are concurrent.
Two (read-only) web requests are concurrent because there are no dependencies between them. Each web request consists of some subtasks like parsing the request, querying the database, performing some business logic and building and sending the response. These subtasks are not concurrent because each one depends on the result of the one before.
There are now (at least) two options for processing these both web requests - one after the other or both in parallel. In the first case you ignore the concurrency between both web request, in the second case you exploit the concurrency between both web request by parallelizing their execution.
So IMHO there is no choice between concurrency and parallelism. Concurrency is a property of the system you are building (and there is not much you can do in order to add or remove concurrency). You then have a choice to introduce parallelism to exploit concurrency or you just ignore cocurreny, but that's it.
I presented your way of looking at it in the article more than once and I said it was valid, just not the only one; then several comment threads emerge reiterating that view, more or less wrongly, and arguing that it's got to be the one and only way of looking at it. Sigh.
I can think of two reasons why one would call a language parallel but no concurrent. First this could obviously happen because somebody is not aware of the meaning of the concepts or uses the words in a non-standard way but I think this is probably not the case - I would be surprised if one designs a concurrent/parallel language and does not know the distinction.
My best guess is that the intended meaning of the statement is, that the language is no good for general concurrent task but well suited for SIMD-like concurrent tasks, that is running the some operations on multiple different inputs in parallel. (I have to admit that I know nothing about Sawzall and did not invest any time to figure this out, just guessing.)
A very good discussion on this can be found between Rich Hickey and Cliff Click on the Clojure mailing list
This exposes an important, and to me non-obvious, property of concurrency. That it's not the locking that's really hard, it's how to be sure that every piece of related data is included in the lock (or STM).
The nice thing about eliminating queue-like interfaces - and locks, STM, message passing, lock-free containers etc. are actually a form of a queue-like interface - is that it eliminates all race conditions which aren't data races. Then you can have automated debugging tools find said data races.
Unfortunately, this doesn't work with event handling systems, because they're often required to have queue semantics to resolve inevitable conflicts between requests.
Concurrency is about modeling a problem as an orchestration of multiple things going on at once. It is an abstraction mechanism in much the same way as objects or functions. Whereas parallelism is a machine level occurrence like cache or registers. Not something to be ignored for sure, but not something typically targeted when talking high level abstractions.
Task parallelism and data parallelism are two important concepts and the tools we need to implement them are very different. The two words used by the author to convey the difference between those two concepts are not chosen optimal in my opinion. There seems to be a hierarchy between concurrency and parallelism whereas task parallelism and data parallelism are directly comparable.
"Task parallel": compute a() in parallel with b().
"Data parallel": compute a(0), a(1), ..., a(N) in parallel.
"Task concurrent" (not idiomatic, but hear me out): handle mouse events concurrently with keyboard events.
"Data concurrent" (again not idiomatic): handle events from 1000 sockets concurrently.
compute a() in parallel with b() and
handle mouse events concurrently (could be a()) with keyboard events (could be b())
differ and how does
compute a(0), a(1), ..., a(N) in parallel differ from
handle events from 1000 sockets concurrently which would be something like a(event0) a(event1)?
Don't we require the same tools for "task parallel" and "task concurrent" as well as "data parallel" and "data concurrent"? To me they don't seem orthogonal but I believe they are very much the same thing.
I've always thought of parallel computation as lines in time that never intersect or share resources but are computed in the same order at the same time:
Conversely, concurrency is a little more lenient on causality; the lines are scheduled co-operatively and the computation will complete at some point when all the tasks are complete.
Causality in programming languages is certainly an interesting area to improve upon. Some interesting perspectives are http://bloom-lang.org and my own attempt at implementing it: https://github.com/agentultra/knockblock which attempt to address the causality of eventually-consistent distributed replicas.
1. Concurrency is a property of any problem that involves resource contention.
2. Parallelism is merely a performance optimization for certain classes of problems.
3. Tools that address the concurrency problem are capable of performing the parallelism optimization by recasting a problem as a resource-contention problem.
4. However, casting a problem as a resource-contention problem introduces necessary nondeterminism and likely race conditions, as these are inherent properties of the problem domain for which these tools were designed.
5. Instead, use a tool that is explicitly designed to address parallelism in a deterministic, race-free manner.
Is this the essence of your essay? Personally, I'd love to see an example of a small real-world problem solved in e.g. Erlang vs. the same solution in checkedthreads.
Erlang vs checkedthreads (or Cilk or anything C-based), computing an image convolution kernel or handling some other computational problem? I could, though I think such a showdown with Erlang predictably ripped to shreds would be trolling, and I really like Erlang - I don't necessarily like Armstrong's marketing where he says Erlang makes more sense than C for parallelizing computations (seriously - you gonna send the image sub-regions via message queues byte by byte?), but I like Erlang, and it'd be a pity if anyone looked at such a silly showdown and conclude that Erlang was inferior - it's not, it's just for different things entirely.
I do this by building queue-based concurrent systems with a resulting nondeterministic order of execution but I then insure that the results themselves are deterministic.
Consider it to be a single queue with multiple vending machines to process requests. No one knows which vending machine will dispense their soda, but they will always get the soda they desire.
No need for special tools there. The overhead of insuring a deterministic result I'd estimate is <2% (the use of 32-bit float to 64-bit int type conversions (fixed point) and 64-bit versus 32-bit atomic operations on GPUs for accumulation). The biggest problems I run into are end users who'd happily trade that predictable output for that <2% perf gain and competitors who are happy to oblige them.
And the problem there is that I don't know how you diagnose race conditions or the quality of the underlying algorithm without a deterministic output from a deterministic input.
Recently, someone told me they address this problem by running single-threaded to check for race conditions. Sigh... But I guess that lets them evaluate the model.
"Insure that the results are deterministic" - how? Race detectors deterministically insure that results are deterministic :) which I always found nice... "No special tools" - you mean you just run the thing several times and check the results don't change, or "just" write the code correctly? How do you know it's yielding deterministic results except that it's supposed to do so by design without any tools?
But in practice, since it works like gangbusters to find broken GPUs if one checks for that deterministic output on known working examples, even finding broken GPUs that otherwise pass existing memory tests, I'm reasonably sure it works.
So in practice, I end up running the code as usual, but writing all the intermediate data to a buffer, dumping that to a file with a key for ordering it, sorting it, and then diffing an enormous text file to find out what's inconsistent between two runs.
It's a pity if GPUs, which are really a platform for computation acceleration and not for concurrent event handling, lack deterministic race detectors; a good detector not only wouldn't mask a bug that was occurring, but it would detect a bug which perhaps you never observed until running the tool.
I don't even trust myself to not introduce them hence the harping on determinism.
It mostly falls on deaf ears though. I would love an automated tool that genuinely found these for me. I would immediately incorporate it into my QA tests.
We can discuss it here or you can drop me a line at Yossi.Kreinin@gmail.com - it sounds like an interesting thing to hack on, and not incredibly difficult, though I may be missing something and I'm curious to find out what.
Wide Finder 2 would be a good demo for checkedthreads, although it does involve some amount of concurrency in io.
As far as I can tell, everything involving sockets goes through tons of queues like the ones he describes; concurrency, not parallelism.
If I kick off reads on ten different sockets, when the read completes the kernel has to put the thread waiting for the read onto a 'ready to run' queue before any work can actually happen. Scale up to 1000 pending reads and now that queue could get pretty full pretty quick, and lots of data is moving in and out. As I understand things, select and epoll improve on this by making the queue more efficient: now instead of using a thread per socket, and having the OS queue up threads on the ready-to-run queue, the OS helps you manage a queue of 'ready sockets', and you respond to those manually. But there's still a queue!
Are there IO APIs out there in modern OSes that let you reduce queueing in such a way? Like, hypothetically if performing a read on a socket required me to pass in a callback that would process the data from the read, the OS could respond to incoming data on that socket by immediately running the callback, whether it's on an OS-managed user-mode thread pool or via some other mechanism. This would at least reduce the amount of queueing going on - at most you have a queue of tasks for a thread pool that are responding to complete IO (similar to epoll, I guess?) but I bet there are ways to improve on that too.
(If memory serves, IO Completion Ports on NT have functionality that looks like this... but I bet it still involves lots of queueing and scheduling)
Latency would regardless be a problem - the fact that you read from a socket and you get blocked for a while, and can you find something useful to do in the meantime or not, and how easy it will be to get back to what you were doing once the data arrives.
In this sense, sockets are not fundamentally different from, say, DRAM latency - it's a much smaller latency, but you still want to do something in the meanwhile.
In the parallel gifts example, they had to somehow decide who would buy people what, which tends to not be very parallel. If the store has only one entrance then that too turns into a non-parallel part of the problem. If the gifts were mailed there's only 1 mailman who delivers to an area on a given day. And so on.
You're probably always going to find resource contention somewhere, it just isn't always something you have to deal with directly when solving your problem.
Ie the results of computation do no one any good if they are sitting in memory spread across compute nodes in the cloud and combining them back down to the point that they can be presented to the user as quickly as possible usually involves a fair amount of concurrent code.
That said, I didn't focus on the single machine case or shared memory architecture, I think - it was more about correctness. A parallel system is very easy to get right compared to a truly concurrent system; I mentioned memory access monitoring but it can be other communication instead of through memory - the point is that you know a parallelism bug when you see one, but not a concurrency bug, because concurrent systems have non-deterministic results and sometimes it's correctly non-deterministic and sometimes incorrectly so.
In particular, though I don't work on these things, I think tools for detecting parallelism bugs that scale past a single chip/box would be very valuable for large compute clusters - and there again using scalable message-passing-based systems without such tools would underserve computational workloads.
Also how do you classify work loads like physical simulation that are one of the classic use cases for parallel processing but not embarrassingly parallel in that they need lots of communication between cores?
Better tools for analyzing distributed computation would be cool. I think that we would have to think carefully about the kind of bugs to look for. Since data access/distribution is slow more thought tends to go into it (and there are systems in place for doing it well) but there are of course loads of other issues that show up.
I very much think it's valuable to treat distributed, parallel systems specially and not just use concurrency-oriented tools - it's better to exploit determinism for automated debugging, it's just not my area at the moment - I work more on shared-memory systems. I was involved in some work with parallelizing on multiple boxes at process granularity, but it wasn't elaborate enough to go deeply into any of these issues.
I'm working on a parallel random forest implementation at the moment...very little communication between threads is needed but the algorithm itself is stochastic (unless you seed your random number generators in a clever way i guess). (And I've use go's concurrency primitives and race tool for things like writing the forest to disk as trees are grown in parallel and compiling summary statistics).
I do think that tools like the one you are writing are very useful and needed. I just have a bad habit of wanting everything to meet my needs.
See page 25 and 34.
Why is there so much more confusion now that we have "better" tools?
Important to who? I use tools written in functional languages every single day.
>Many people claim many things but when it comes to empirical evidence. it turns out that imperative, stateful programming is still the best choice.
You don't see the obvious hypocrisy there?
Not if you're using Akka.
I could be wrong but based on your own article, that doesn't sound like a parallel task. Aren't parallel tasks already assigned work that don't conflict ala map reduce?
> "concurrency is dealing with inevitable timing-related conflicts, parallelism is avoiding unnecessary conflicts"
did I miss something?
Well with the actor model, it is less likely since you're supposed to be passing immutable messages around and you're not coordinating threads around locked resources.
>Concurrent threads (forkIO) will run in parallel
The simple example being say a webserver. You use forkIO to spawn a green thread per connection. This gets you concurrency. Now if you want parallelism, you simply tell the haskell runtime to multiplex those green threads over more than one OS thread. You don't have to change your code at all. The whole argument and terrible gift analogy seems to rely on redefining parallelism to mean "making sequential code parallel". I think most people are in fact thinking of making concurrent code parallel when they talk about wanting parallelism, like the webserver example.
From here: http://www.haskell.org/ghc/docs/7.0.3/html/users_guide/lang-...
"GHC implements some major extensions to Haskell to support concurrent and parallel programming. Let us first establish terminology:
* Parallelism means running a Haskell program on multiple processors, with the goal of improving performance. Ideally, this should be done invisibly, and with no semantic changes.
* Concurrency means implementing a program by using multiple I/O-performing threads. While a concurrent Haskell program can run on a parallel machine, the primary goal of using concurrency is not to gain performance, but rather because that is the simplest and most direct way to write the program. Since the threads perform I/O, the semantics of the program is necessarily non-deterministic.
GHC supports both concurrency and parallelism."
Finally, the part that I quoted in the article:
How isn't this "providing different tools for the two and recommending to use them in different situations" I don't know.
>If you're more interested in performance than non-determinism, learn about parallelism first."
First, not instead of.
>GHC implements some major extensions to Haskell to support concurrent and parallel programming
Yep, that really sounds like they require different tools, the way the mention them together as a single topic.
>GHC supports both concurrency and parallelism
And does so with the same tool: forkIO. Completely contrary to your claim that haskell is evidence that it requires separate tools. This is once again made very clear on a page you link to:
>Ordinary single-threaded Haskell programs will not benefit from enabling SMP parallelism alone: you must expose parallelism to the compiler. One way to do so is forking threads using Concurrent Haskell (Section 7.18.1, “Concurrent Haskell”), but the simplest mechanism for extracting parallelism from pure code is to use the par combinator
They are literally saying "use a higher level abstraction instead of rolling your own directly with concurrency constructs". This is the same advice that is always given when a higher level abstraction is available. It directly contradicts the claim that parallelism requires a different tool. It explicitly states that you can do it with the same tool.
"They provide forkIO as a basic tool to do both."
Computers provide bits to do both and then some. You can call the ensuing distinctions "matters of convenience" and not "requiring different things", of course.
Or you can read the code and see that it is quite literally an abstraction layer built on the same tool, not two different tools. That is like saying folds and recursion require different tools, and providing "haskell has both recursion and folds" as proof. Of course it does, but folds are simply an abstraction on top of recursion.