Data races can be statically removed by carefully restricting certain parts of the language design, see Pony. https://tutorial.ponylang.io/#what-s-pony-anyway
Bonus: learn aspects of deadlocking by playing a game: https://deadlockempire.github.io/
It's because of my concurrent C homework assignments that I was able to really appreciate Go's channels in my first internship.
It’s like with cryptography - don’t roll your own solutions just because you know what xor is. You’ll fail miserably.
Though IIUC Pony is also supposed to prevent deadlocks.
Very off topic, but I have read several times the argument that the rise of functional programming is due to it's easy concurrency (since functions don't have side effects) and that concurrency becomes more and more important due to moores law being dead (i.e. we can't scale the hardware up, we have to add cores to our processors).
Could someone with more experience comment on that? Is concurrency really easier in functional languages and is the rising importance of concurrency a valid reason to look into functional programming?
I think it depends on who's writing the code. A little bit of shared mutable state here and there isn't gonna kill you, but if the program's architecture is poor, that'll blow up and spread everywhere and the next thing you know is you're spending half your time worrying about data races, deadlocks, and a locking nightmare.
Shared-nothing architectures fix that; your synchronization will happen over a narrow range of well defined primitives (e.g. message queues), though this doesn't prevent race conditions.
Functional programming that encourages pure functions & immutable state is a bit of a straight jacket that keeps you from making a mess that blows up. I think it helps, and I wish $stuffAtWork were written in such a language because I'm tired of wrestling with terrible architecture (that I'm not really allowed to fix, given the time constraints).
The other thing that makes a difference is what your problem looks like. Sometimes concurrency is absolutely trivial, like dispatching sections of an array for threads to perform (independent) compute on, and come back with a result. Sometimes it's way more complicated...
Many of them provide a concurrency monad which allows you to write expressions like this:
read_from_socket_into_buffer params >>=
BTW, It's quite easy to implement your own concurrency monad, however: the real work is the wrapping of all the blocking system calls into this framework.
Specifically, concurrent Haskell is good and also quite fast. It doesn't quite compare to Erlang in the concurrency department although the languages are very different in other ways, so there are reasons to prefer Haskell anyway. I think this shows that just having a functional language that emphasizes effect-free programming is not sufficient. If you want a good concurrent language, then you have to design it like that, and not simply rely on the absence of shared mutable state.
In practice, people write massively concurrent systems all the time, in the form of dynamic web sites that handle many concurrent requests, and synchronise access to a shared mutable state via a transactional database of some kind. These don't depend on fancy languages, but on a rigid architecture that tries to isolate synchronisation issues in the web server and database components.
While this article is about concurrency, I'll also add a note on parallelism: Parallel functional programming has IMO not been shown to work well in practice. While it is trivial to parallelise more or less any pure functional program correctly, it is very hard to actually gain meaningful speedup from it. The single point of parallelism is performance, and most current implementations of functional languages (including all general purpose functional languages I know of) simply have too many bottlenecks or pitfalls to scale for computational work as well in practice as one might expect in theory. For example, concurrent Haskell is fast because most of the actual work tends to be IO and GHC has an excellent parallel IO manager and scheduler, while parallel Haskell is hampered by, among other things, a slow garbage collector.
Btw, what are some of those other things hampering Haskell's performance with parallel computing? Wouldn't the garbage collector mainly produce pauses instead of really destroying parallel performance? Although a concurrent garbage collector has recently been merged to the nightly GHC.
Erlang has great concurrency also on a single machine. That is scales well to clusters is of course also good. And yes, it is the actor model that makes it work so well. That the individual processes are written in a (mostly) functional language is less important, I think. It could have been an imperative language and it wouldn't have made much difference, as long as process state was still isolated from other processes.
> Btw, what are some of those other things hampering Haskell's performance with parallel computing? Wouldn't the garbage collector mainly produce pauses instead of really destroying parallel performance?
Haskell's garbage collector is pretty slow when run on multiple cores, and when run on a single core it becomes a bottleneck to parallelism (and can also trash the caches of the other cores). I don't know all the specific technical details of GHCs GC, though.
Another problematic thing is laziness, which is operationally control flow plus side effects. Those are exactly the things that make efficient parallelism difficult.
> Although a concurrent garbage collector has recently been merged to the nightly GHC.
A concurrent GC is a different thing, that interleaves garbage collection with ordinary computation. That can be very useful, but it doesn't help with exploiting more cores.
> it doesn't help with exploiting more cores.
Yeah you're right, I believe the aim is to get rid of those garbage collection pauses for better concurrency (not parallelism).
From what I understand, GHC can do pre-emptive or cooperative scheduling. Pre-emptive being better for concurrency, just like Erlang does it.
It works really well and intuitively I would say, and even more; this is a common application to show the advantages of functional programming. Here is an article comparing sending messages asynchronously between threads in Elixir and Clojure (two common functional languages) https://bentomi.github.io/posts-output/2017-06-04-message-se...
Moreover, I was mainly referring to using the actor model in a language with a static type system of functional languages (ML). What would be the type of the mailbox? The message? I believe even Facebook (Whatsapp) has struggled to find a solution to this question, they're developing a statically typed language on top of BEAM right now.
There's been plenty of good discussion around implementing a strong static type system around BEAM, here's some bits for reference:
Another advantage is, that you can unit test each function on its own, without setting up a whole host of preconditions. That's because of referential transparency.
IME locking isn't really all that hard until you need to squeeze out performance. If you can handle one big dumb lock around everything, it's easy. The finer grained your locking gets, the harder it is to get right(even then, lock hierarchies can help to a great degree).
For example, iterator invalidation. Which is probably the most common thing I've seen when people use dumb locks to make a concurrent program sound. Either your iterators need to be aware of the lock and hold it through their lifetime (which means practically only one iterator can exist, which is undesirable) or your datastructure needs to be aware of outstanding iterators to it and update them (which means you have added logic to simple mutations, which is also undesirable).
It's problems like that you look for in non-functional languages that require library authors to say things like "this is/isn't thread safe" in their documentation, while providing those guarantees manually. And as a consumer of the library you need to be aware of these things and look for them, and what an error looks like when someone messes up.
I don't know what you're trying to say about stacks/the call stack in functional languages, because they implement their patterns in variety of ways - including locking. It's just about providing particular guarantees in the API, and the benefits of those guarantees are that it prevents entire classes of bugs like iterator invalidation and data races.
There are also all sorts of clever data structures you can use that don't require duplication of the underlying data. The easiest is a singly linked list without insertion. No locks required.
Only you know if concurrency is relavant to your work. A lot of internet stuff is deeply concurrent, but there's a lot of sequential work out there, too.
I have experience with Erlang. It's amazing for concurrency, but the reason is really not anything it gives you, but everything it prevents you from doing, or requires you to do.
Erlang's immutability is important because it means you can't save a reference to something and have it magically updated; if you want a value updated somewhere, you need to send the current value or request the current value at time of use; explicit synchronization makes concurrency clearer, and helps influence system design to reduce synchronization. Immutability also makes message passing easier; the content of a message can't change after it's composed which means the implementation is free to make a copy of the message if it's convenient and as a programmer, there's no need to consider the effects of changing something that you sent to another process. Immutability also makes garbage collection very simple; it's impossible to form a looped datastructure, so there's no need for mark and sweep, a simple copying collector is sufficient. Individual stacks per Erlang process (like a green thread) means GC pauses don't stop the world, only individual processes, and pause length is bounded and proportional to the memory use of the process.
You could certainly set up a similar environment in C or whatever traditional language; the problem would be enforcing your system design on your own code, and partitioning off library code (including libc and/or kernel syscalls) to ensure it doesn't impact your system design either.
You can use the same approaches in non-functional languages. You can only use immutable data structures, and pure functions everywhere. Then you get the same benefits of easier reasoning about concurrent execution, as you get from a functional programming language.
So the point of using an actual functional language, is that the compiler pushes you very strongly towards programming this way. It's kind of like using a language with built in garbage collection. You could implement a garbage collector in C, and use it everywhere in your program where you allocate and release memory. But having it build into the language frees you up from having to think about it very much.
So, deep down in the guts of the language's libraries there must be mutable operations. Which kind of defeats the whole point.
I don't get point of functional programming. Machines don't work that way.
A key premise of functional programming is that the constraints of immutable semantics free the compiler up to do all sorts of performance optimizations, and the compiler can apply those optimizations much faster than a human developer can.
Also, when you really need a vector that you can mutate in place, you can still do that. Haskell has a mutable vector type. It's not the default idiom, but if you needed to sort a billion numbers in place, this is what you would use. Note all the function names prefixed with "unsafe": https://hackage.haskell.org/package/vector-0.12.1.2/docs/Dat...
FP is useful because it provides an abstracted model that is pretty easy to reason about. It doesn't matter how it gets implemented, that's the point of the abstraction.
We don't write programs for machines to understand. We write programs for humans to understand.
True, but that program ends up executing on a machine. You normally want that program to execute as efficiently as possible. 'Efficiency' can be measured several ways... time to complete, resources required, financial cost.
A case in point is our company switched from using Cassandra (JVM based) to ScyllaDB (a C++ clone of Cassandra). Query latencies were the same if not faster, and we needed fewer nodes to handle the same load. All because the language used has some mechanical sympathy for the CPU it's going to run on, and no GC.
I see Scala is somewhat the defacto language used for 'big data' stuff. I have no idea why. I reckon if some bean counter translated that choice into $$$ things would change.
No, you normally want that program to execute efficiently enough to do its required job. Abstractions are often necessarily costly, but that's ok because their value is greater than their cost. Otherwise, you'd just throw all the abstractions away and use assembler.
> Query latencies were the same if not faster, and we needed fewer nodes to handle the same load. All because the language used has some mechanical sympathy for the CPU it's going to run on.
It seems like there's a whole load of reasons why it might have been more efficient on ScyllaDB that are unrelated to what language it was written in. Maybe the software design fit your requirements better? Maybe it was just better implemented? Maybe it was easier to configure for your load? Etc. etc. Programs aren't magically "more efficient" because they're written in C.
but... the "required job" is to go as fast as possible - in two ways :
1/ You must go faster than your competitors. Else people will switch, your software will get bad reviews, etc.
2/ you must go fast enough for anything that people will throw at you. And, in many cases, people can throw arbitrarily large problems to your program, and telling them "sorry but it will take 1 hour to process your 2TB dataset" just won't cut it at all. And even when you manage to cut it, then people come at you like "now can this run on a raspberry pi?"
Take for instance image processing software, audio or multimedia software... they will never be fast enough, and there is always an use case where getting a 1% perf improvement will allow a project to make instead of break (especially when working with uncompromising artists).
This is sometimes true, but hardly ever. Most software is not image/audio processing software. Most software is not even interacted with directly by humans. Most software exists as a component as part of a larger system, the performance of which is typically determined by a small fraction of that system. In other words, most software is not on the critical path.
Even if you're on the critical path, the performance requirements could be of the form "Here's some data, I need an answer in an hour". If your current system spends 2 minutes computing and delivering the result, why spend time and money making it faster? Just because you're on the critical path for you, doesn't mean you're on the client's critical path.
This is the whole point of performance engineering - identifying which bits are important and then figuring out how to optimise those. Optimising everything to run as fast as possible is a waste of time and effort - worse, it often degrades the overall system by making previously easy-to-understand components more complex and more difficult to reason about. See the whole "Worse is better" thing.
If you're on the critical performance path, go nuts. Throw away the abstractions and take it down to the metal. If you're not? Why bother?
Like I mentioned previously, Scala's use in Big Data is perplexing. I reckon Cloud providers are making an absolute mint from design / language inefficiencies.
I have never been in the case where the answer to the question "when" wasn't "as fast as physically possible".
Since we are not quite there yet, could you explain in broad terms how a large collection is sorted while maintaining performance and immutability.
There will of course be "impure" writes at the machine level, but at the language level, the abstraction doesn't leak.
It seems like you're saying that abstractions are worthless unless they're perfect, which is obviously not true.
> Since we are not quite there yet, could you explain in broad terms how a large collection is sorted while maintaining performance and immutability.
What do you mean by "maintaining performance"? Relative to what? Who's asking for the sorted collection? How are they going to access it? What are the resource constraints? Give me a real-world example.
Don't get me wrong, I'm not saying that FP is the answer to everything. It's a useful tool, and part of being good at software development is knowing what tools to use and when.
How can efficient sorting be implemented with immutable data structures? Genuinly curious.
Also, along the same lines, it is certainly possible to craft types in some FP languages which allow for this same logic to be captured, mostly via some monad construct.
To start, I am aware of Rust and left it out of the above comment purposefully. While Rust does indeed make many ‘functional programming’ idioms available and, some would say, is basically an impure, eager functional language, I don’t tend to view Rust as a language featuring persistent, immutable data structures. I certainly don’t think that is a negative characterization, just how I see things. I find that one of the defining features of the functional programming ‘persistent’ data structure is the structural sharing that is all too frequent in most of the mainline FP languages (Haskell, ML, Clojure, etc.). Rudy’s focus on low level/systems level programming is much more likely to use standard imperative data structures without the sharing and deep pointer chasing style seemingly inherent in FP implementations.
I have had a few spirited web-discussions with people about the claim that Rust has an affine type system. I don’t think that stance is accurate, from a type theoretic stance. I subscribe to the theoretical standpoint that Rust has a standard non-substructural type systems with constraints placed on types that meet a mutability predicate. I think that those constraints are better described as a uniqueness constraint per instance of a type. But the sequent calculus presentations for Rust (especially concerning mutable data) would look very similar and act very similar as well. So really it’s much ado about nothing, just fun to talk about.
Sorry for the wall of text, and rambling about type theory. Stay safe.
>So, deep down in the guts of the language's libraries there must be mutable operations. Which kind of defeats the whole point.
Why does it defeat the point? If you can guarantee that the mutability doesn't leak (there are ways to do this), it behaves just like a pure implementation.
In fact, some structures that lend themselves to immutability represent the hardware better. Take your vector of a billion elements. It probably is too big for L1/L2 cache. Is it actually a billion things next to each other in memory? Not at all.
If you used something like a radix balanced balanced tree to represent your vector, you get amortized O(1) access and mutation can be implemented using copy on write at individual leaves of the tree. And your abstraction is much friendlier with the cache, because it reflects how the hardware works much better than a simple vector.
In practice, immutable data structures can be much more performant at scale than pretty much anything. They're not just convenient for soundness. Trivial data structures like vectors are really only useful when you're talking hundreds to the low thousands of elements, at which point who cares how much memory they take up?
There is no advantage in sorting a billion numbers the functional way.
There is an advantage to gain, though, when the unsorted numbers are used for other purposes besides sorting. Then, functional programming demands that the output of sorting is written to a new 1-billion array instead of "in place", so other clients can consume the same array in parallel without being disturbed by the sorting.
> So, deep down in the guts of the language's libraries there must be mutable operations. Which kind of defeats the whole point.
It doesn't defeat the point if the abstraction doesn't leak. If a pure functional language offers sorting as a pure primitive, then language-wise it doesn't matter that it actually mutates memory internally.
The problem with "mostly" pure functional languages is that the abstraction leaks enough to make it useless.
Does this make e.g. erlang useless? Processes have mutable dictionaries and you can perform I/O anywhere in your functions..
Many believe the portable Assembler story and aren't aware that ISO C is defined in terms of an abstract machine.
The abstraction level that functional programming allows for, all the way back to Lisp, is to take advantage of the underlying hardware without being forced to hand code lowlevel details.
Something written in ML like language could be running on 70's, 80's, NUMA architectures, GPGPU, whatever, without changing one single line of code. Naturally we are not speaking about the same runtime implementation.
Whereas a language like C, requires a bunch of dialects to actually make use of the low level details, only exposed via Assembly and not part of ISO C.
And you can actually USE these low level details if you really want, with very low friction (if you are willing to lock down to specific architectures). Are you STILL whining?
C level (or Pascal if you're masochistic) is still a very nice abstraction, and for a lot of people it's everything they care about 99% of the time, to get all the control they need, while simultaneously ensuring they're not wasting any time fighting pointless language features that are mutually incompatible.
In the end everybody's using whatever they like. https://twitter.com/fabynou/status/1242517783092400129
This is the number of threads that can execute in parallel, no?
2. "Memory-level parallelism"
How many memory operations can be "outstanding" at once - seems comparable to a single core issuing multiple disk reads. The memory operations aren't really serviced simultaneously, they just have overlapping lifetimes.
For more fun, some people refer to the case where performance is limited by the amount of memory-level parallelism available as "concurrency-limited": https://sites.utexas.edu/jdm4372/2018/01/01/notes-on-non-tem...
SIMD systems effectively give you parallelism without concurrency - only one instruction is executing, but it's operating on multiple dataflows.
Your linked definition of "concurrency limited" seems to refer to utilisation. In the scenario described, how effectively the processor can be utilised depends on how many concurrent tasks it has in progress so it has something to do while one of them is waiting for a cache miss.
Not necessarily, but it is fine for this purpose I suppose. See https://news.ycombinator.com/item?id=21959692
Glad to see lock hierarchies mentioned. Barriers are new to me so that was nice.
IMO, it would be nice to at least have a mention of lock-free techniques and their advantages and disadvantages.
Databases provide transactions. This mechanism is also an inspiration for a synchronisation model called Software Transactional Memory proposed for Haskell, and used as "the" synchronisation model in Clojure. Locks and semaphores are rather lower-level primitives and it's harder for humans to reason about them with an ease comparable to using CSP or STM.
I was hoping for more real life examples where you can see the effects of concurrency parallelism.
Every multiprocessor system nowadays has coherent caches for normal memory, meaning that if you have reads that you perceive as stale, it isn't because of the cache, but because the hardware doesn't guarantee sequential consistency.
You can have problems due to relaxed consistency on a system without caches, and the problems on systems with caches aren't due to the caches.
There are situations when you have to worry about lack of coherency for other memory types, but AFAIK these are rarely exposed to userspace.
Why copy techniques meant for Netflix and Facebook when your org or app is most likely 1/1000th their size. Phallic size jealousy at work.
Most concurrent and parallel work can and should be done on a true-and-tried RDMBS for most orgs and apps. Use transactions/rollbacks properly and let the RDBMS manage most the grunt work instead of reinvent the wheel in app code.
K.I.S.S. and use-the-right-tool-for-the-job.
I like the attitude the article takes. To successfully use concurrency, you should understand the tools available to you and their tradeoffs so you can make the right decision for your requirements.
Why is that? I'm pretty sure that the author's intention is not to equip the readers with the tools to make buggy programs, yet that is exactly what happens here.
When you need to start worrying is when you start implementing your own synchronization (either rolling your own primitives or going lock-free).
'moring needs to clarify what they are talking about. It's perfectly possible to write correct code using pthreads on modern hardware with no understanding of memory consistency.