Hacker News new | comments | show | ask | jobs | submit login
Concurrency I can finally understand and write correctly (inversethought.com)
297 points by jordigh 9 months ago | hide | past | web | favorite | 158 comments

People always used to ask me for advice on multithreading, because I had a reputation for multithreaded code that worked. I recommended they use Retlang (there's a Java version called Jetlang), which works on a shared-nothing message passing basis (and is blindingly fast). They tended to say they didn't want a library and just wanted to use locks and threads. I said that not doing that was how come I had a reputation for correctness.

They're in Java and writing locks and threads. Sigh.

java.util.concurrent is probably the single best concurrency library I have ever used. Bar none.

When I'm writing in Java, and I find myself writing a lock, my first reaction now is: "This is okay for exploration, but which data structure from java.util.concurrent am I going to have swap in here when it's all done. And can I do it now and save myself a lot of grief."

It only took me 7+ years to come to this realization. Apparently I'm a slow learner. :(

My small brush with this: back in the summer of 1990, I did a summer internship with Doug Lea who led the JSR 166 group that led to java.util.concurrent. My work was on a concurrency library for a parallel computer called the Encore Multimax

> summer of 1990

some thread corrupted this date

The main problem with Java is that you rely heavily on libraries and frameworks which may or may not enable you to apply a desired pattern.

Recent example: I am using Tomcat as a servlet engine. For dependency injection I am using JNDI. Tomcat can be configured to call a provided ObjectFactory to generate JNDI objects when they are requested. The environment I am using allows for some kind of blue-green deployment, I can switch between different fired up virtual machines transparently to users. I use this when upgrading. My provided Objectfactory logs when objects are created (some of them should be strictly singletons). Sometimes the logs indicated that multiple objects where created after switching to the newly updated machine. This was due to a high load on the system in which multiple threads where busy with incoming requests simultaneously. Neither the language library nor Tomcat has any safeguards in place (synchronization of any sort) although JNDI and in particular Tomcat supports singleton creation. I had to resort to statically creating the required instances at loading the Objectfactory, a pattern I dislike with a passion (you can't swap the implementation during runtime which can be annyoing in any long-running environment).

To be fair, that's more an issue of using a framework than anything to do with Java.

> "This is okay for exploration, but which data structure from java.util.concurrent am I going to have swap in here when it's all done.

That just doesn't work for most concurrent code. Often you have dependencies between data structures or other operations intervening. Most of the time it isn't a single structure, but a group of semantic operations across the program.

They're using Java instead of Clojure. Sigh.

I'm saying this as a massive clojure fanboy: java.util.concurrent is an amazing set of concurrency utilities and is the building block for many of clojure's concurrency features. It's definitely worth knowing when writing performant java AND clojure (you can wrap the little bit you need so you don't have to scatter java interop all over your codebase). Sometimes a (swap! atom fn) doesn't cut it.

Yeah, because some of the best concurrency primitives in the world aren't quite "what teh cool kids are doin'"

I agree that you shouldn't be touching locks and threads, typically.

However, I think you can get pretty far these days with just the built-in Java class library, using features like Futures (CompletableFuture), ExecutorService, ForkJoinPool, and BlockingQueue and such. Before CompletableFuture there was Guava's ListenableFuture. Synchronized methods are also useful in certain situations. I've built several fairly sophisticated concurrent applications using these primitives and felt they worked out reasonably well. I find that BlockingQueues with thread-pool-backed ExecutorServices provide a model that's fairly robust while being simple to reason about and monitor.

(shared-nothing = copy-everything != blazingly fast)?


(multi-threading + shared-nothing) = much less complexity = more correctness

"shared-nothing = copy-everything != blazingly fast)?"

Not OP, but, if that does not give you more speed I'm not sure your problem should be multithreaded in the first place. Threads themselves are not free, there is a non-trivial cost in switching thread context. Trying to use multiple threads to wiggle the same memory area is insanity in most cases however you look it.

An example where it's not insanity (while it might seem to be): https://arxiv.org/abs/1106.5730

I dont think this is true. The first example that came to my mind was a database. Shared-nothing + copy-everything is a very bad idea there and it is a thing that definitely needs to be multithreaded for performance.

Comment if you can:

We've got an app that is pretty slow. Partially the fault of the front-end doing a lot of synchronous promises.

But on the backend we have a lot of calls where we are getting a list off foos from the DB, then spinning through for loops and making one or two synchronous calls to other microseconds to fill in the details.

I hated algorithms - but this is O(n^2) and we could cut it down to 2n -> O(n) by spinning off the fir loop to futures.

This is java using completable futures with whatever thread pool the generic executor creates.

Any issues?

Make the messages immutable. Then the message passing is blazingly fast. You can argue about whether that’s strictly “shared-nothing” but practically it works.

Ofc, in Rust you can safely send mutable messages and transfer ownership.

You don't need to copy immutable values.

It makes no sense to choose immutable values for the multithreading performance benefit. You can usually get a much larger benefit from using a single threaded mutable implementation. For example, a mutable hash table will outperform a persistent hash-tries for updates.

(Of course there are good reasons for using immutable values; performance is just not one of them.)

Swift collections are logically immutable but may mutate under the hood if there is a unique owner. This conceptually provides a best-of-both-worlds, but a downside is that the language does not provide much help to enforce single-owner (yet).

I feel that immutable (almost) everything is good thing for multi-threaded programming and programming in general.

took decades and hardware capabilities for mainstream to realize that immutability can be extremely good for 1) people reasoning 2) machine execution.

Immutability is quite the opposite of good for machine execution. Much of modern CPU architecture is based on the idea of memory locality, and that you'd be reusing the same memory addresses in a given unit of work.

Also, immutability helps people reason when the language doesn't have a clock. If the language does have a clock -- i.e. it is synchronous, like Eve or Céu -- mutability poses no difficulty.

If you don't copy it and don't wait on any synchronisation mechanisms, how do you protect the memory?

You don’t need to protect it if every thread only ever reads it without writing.

Someone writes to it. And if they write before the read threads are run (and never again), then this is not really equivalent to the complexity we imagine when we say complexity in the context of multi-threading.

You are misunderstanding me. When I said no thread ever writes to it, I truly mean no one. There’s not a possibility of “someone writing to it.”

And I think you also have a very narrow view of multi-threading. Shared memory, mutable-everything isn’t the only way of doing multi-threading (or more precisely concurrency). This very article is trying to introduce to you a new way of doing concurrency that doesn’t involve mutable variables shared between threads.

"shared-nothing" does NOT implicitly mean "copy-everything"

You can pass an object around between threads with no issues, as long as only one of them owns it at a time. This is generally the model that is used in practice (although often in functional languages it's hidden under some CoW semantics).

copy-everything != copy-everything. clever compilers can reuse addresses of variables for as long as they are not written to, and still call it a „copy“. Eg structs in Swift are most of the time not copied, but rather referenced. Even copying is quite cheap nowadays. Waiting on a trap or semaphore, THAT‘s slow.

"clever compilers can reuse addresses of variables for as long as they are not written to"

Still chewing on this, but it would immediately strike me that relying on this behaviour in a software application, for it to be performant, would be a bad idea.

Since the type system is so strict, the compiler can often infer immutability even for mutable variables, and avoid copying in those cases. Swift’s compiler will even warn you if you declared a variable mutable when it’s never mutated. It’s an optimization rather than a guiding principle, but a very powerful one. Swift now is quasi on par with C++ performance, also thanks to optimizations like this.

Thanks for the info, that's interesting (I didn't know that). But it still seems like cheating :)

I'd go one further and suggest, "don't".

It's not just about safety, it's also about frequently not even needing it. I've achieved some pretty tidy code simplifications without harming the performance of the overall application by removing unnecessary multithreading. I've even sometimes achieved some pretty tidy performance increases while simultaneously simplifying the code by removing unnecessary multithreading.

Can you explain at a high level the type of work the second app you mentioned was doing? I know that multithreading actually causes slowdowns if the task each thread isn't substantial enough to offset the context-switching and if the tasks are dependent/interconnected. Are there any other criteria to take into account when considering whether or not to make an app multithreaded?

Pretty much that. It's not just the context switching, it's that it can be tricky to come up with a synchronization strategy that doesn't confound the CPU scheduler's efforts to keep the pipeline full.

It also depends on your environment. That 2nd app I mentioned was running on physical hardware that was running many applications. In that kind of environment, you can end up in a sort of, "double your CPU cores, double your cache misses" situation. And the performance story ends up not just being about one little module; it's about the entire system. There can be a sort of performance prisoner's dilemma, where trying to individually maximize the performance of every single piece in isolation actually results in slower overall performance.

This is not a helpful suggestion (even if it's often correct) in the same way that "don't get a knee replacement" is also unhelpful (though often correct).

Heh. Coincidentally, I've got a bad knee. "Don't get a knee surgery" was one of the most helpful things my doctor told me.

When you're surrounded by people who (with good intentions) advocate fiddling with things, there's a lot of value in also having someone to argue for a more moderate approach.

When Steve Jobs was away due to his illness, Bertrand Serlet presented Mac OS 10.6 (Snow Leopard) at the WWDC June 2009 with the startling slide that said "0 new features". There was sustained applause from the crowd. But in fact there was a very significant under-the-hood new feature called "Grand Central Dispatch". It is a low-level API that removes a lot of the pain of multithreading. Essential for OS X and iOS developers, as it turns out.

Dear God I wish they would do another 0 new features release.

It used to be that the excuse was bug fixes don’t sell. Maybe not, but buggy software does lose customers. They may bring in the newbies with features, but you retain with bug fixes.

Your observation is shared by hosts of the ATP podcast.

Bug fixes do sell when paired with minor development, major impact features. Push the load to a couple of built-in apps to make them more smooth and focus on under-the-hood stuff.

The press will praise that as the best thing since sliced bread - a modern, polished and stable OS release.

Of course this would be a one-off thing and we'll get another major release with features, features, features!

I really don't think the general population abandons entire operating system ecosystems, because of buggy software.

I think in part they do, otherwise OS X wouldn't have gained as nearly as much marketshare on the promise that "It just works"

OS X gained market share because of the insane popularity of the iPod, and subsequently the iPhone. Gained nearly 4% post iPhone launch.

There are other negative effect as well. People will do everything they can to not get the latest updates. See: Windows 10

Which is totally nonsensical, mostly fed off of nonsense news perpetuated through the general public. I don't like Windows 10 because someone on the internet said it was bad. It's a great OS.

I will never install it until they have dialed back their forced updates. I plan on using 8.1 until I can't, and then jumping to another OS. I had a number of blue screens from updating windows 8.1, and I see photos of windows force updating at crazy times. No forced updates from the brickmasters, thank you.

Ah! So that's why a lot of us still think of Snow Leopard as the best OS X release ever.

In my 30+ years experience of concurrency, the shared-nothing message model makes it relative easy to reason about concurrency (since every actor is single threaded).

I have worked with Java, C# and Python, and for each I have developed a concurrent object framework, making it quite easy to work with multible channels of incoming events, for example for process control systems with barcode-scanners, scales, I/O-busses etc. connected to the system

I think ponylang does a pretty good implementation of the Actor model.

Feel free to have a look at the Python impl. of the framework (www.github.com/pylots/pyworks)

In a way it is ironic that we went full speed into threads, shared memory and plugins, only to go back into processes, messages and sandboxing as a means to tame multi-core programming and safety.

I thought threads and shared memory were introduced more as performance hacks than superior architecture. It is not surprising that as the performance profile has changed (due both increased core counts and improved software primitives), we are getting back those things that were deemed too expensive previously

Concurrency abstractions are still built on primitives that involve locks and memory barriers.

Just like the old UNIX process IPC or micro-kernels communication is.

But don't we have thread pools and futures for that ? You can't get a simpler API.

Futures aren't a natural way to formulate many important concurrency problems. If I want a thread polling an external system and putting records in a work queue for as long as my application is running... well, what's that a future of?

That's what thread pools are for. They have a work queue you fill and workers depop it. Literally 5 lines in python and i imagine most modern languages.

Right, but thread pools are simple because they don't offer any sort of API for the tasks in the work queue. You're just writing arbitrary code. So rather than reasoning about an abstract execution model, you have to carefully think about every line to make sure you don't accidentally share mutable state.

In Java with Spring you'd do something like:

@Scheduled(fixedRate = 5000) public void poll() { getCrapAndPutOnConsumerQueue(); }

I'm surprised nobody's mentioned Clojure in the comments. Its concurrency abilities are top-notch, straightforward to use, and immutable data makes it dead-simple to avoid issues.

It also occupies a different subspace. D is still relatively close to the metal.

That's fair. Though the JVM is more performant than most people think.

In the distant past, I did low-level multithreaded programming in the patchwork style of synchronizing access to various data. Partly because the codebase already did it that way and partly because i didn’t know better. These days I would never torture myself with such a thing. I still do concurrency sans framework, but like in the article - lock free with no sharing. Copying a block of data when it’s ready is cheap compared to computing it.

In addition, you can break your program down into flows of tasks up front. Thinking about the synchronization explicitly helps too, often there is too much unnecessary communication.

I think that the data structures used by D for message passing are the real interesting part here. Akin to not using raw pointers, don't use raw data structures to convey state updates and let the owner of that state do it themselves with a queue or some abstraction for messages.

Hey, if you're curious about anything else I wrote in that blog post, just ask me. I thought the concurrency was a big highlight, but there's lots of other parts of D that I would love to talk about.

As I said, the whole thing was very enjoyable. I had lots of fun using D.


yeah but what about recursive named emacs kmacros ?! jk

nice post

I like concurrency in Rakudo Perl 6. (https://docs.perl6.org/language/concurrency)

Especially the .hyper method that turns a regular sequence into hyper sequence that gets processed in parallel. First 5000 prime numbers:

    # single-thread, takes 14.893s
    put ^∞ .grep(*.is-prime).head: 5000

    # multi-threaded, takes 8.853s on my 2-core box
    put ^∞ .hyper.grep(*.is-prime).head: 5000

Concurrency isn't the same thing as parallelism. You can have concurrency without ever executing two things at the same time. You just give some CPU cycles to one threaad, then pause it, give cycles to the other thread, and keep switching back and forth. Indeed, this is what any multitasking operating system has always done, even when every computer only had one CPU.

For parallelism as what you describe, D instead has approximately the same thing. There's a `parallel` function in the standard library that works on any range (i.e. any iterable). You basically instead of doing

   foreach(object; somerange) {
      // ..

   foreach(object; parallel(somerange)) {
      // ...
and it works like your Perl example.

If you want it more functionalish, without writing out the loop, you can use `taskPool.map` instead of ordinary `map` for the same effect. More details:

stdlib: https://dlang.org/phobos/std_parallelism.html

Programming in D chapter on parallelism: http://ddili.org/ders/d.en/parallelism.html

Is this more than just flat data parallelism, in any way? Honest question!

Concurrency is not a first class citizen in Perl 6 though. As with Perl 5 the best you can hope for is actor model implemented on top of promises/futures.

What would make it a first-class citizen?

Even an empty Rakudo Perl 6 program is a multi-threaded program, as the dynamic optimizer runs on a separate thread. I'd figure if that's possible, concurrency/parallelism would be quite a first-class citizen, not a best hope on top of something.

Language support for actor model as opposed to shared memory multithreading would make concurrency a first class citizen.


BeOS had an threading implementation like this in C++ in 1999: https://www.haiku-os.org/legacy-docs/bebook/TheKernelKit_Thr...

I used to program in this, and it was clearly inspired by the actor model (in retrospect, now that I have been programming in elixir for a while). Lots of void* passing, though.

Initially the actor model was designed in the 70s.

For a C++ implementation of actor patterns, see: https://www.actor-framework.org/

A shout out to the GPars framework [1] which makes actor based coding stunningly easy in Groovy / Java. I'm kind of shocked by the performance I can get out of it, to be honest. I once assumed that for trivial tasks (reading lines from a file, transforming them, writing out result etc), a parallel library would lose a lot of overhead in thread synchronisation. However I find that I can trivially construct data flows that massively outperform single threaded C code in Groovy. Which is to say, whatever overhead there is from thread synchronisation is dwarfed by the benefit of parallelising reading, computation and writing. If I had to think hard about concurrency to do it (as I would in C) I would probably never attempt it for simple programs, but GPars makes it so easy, safe and completely portable that there's really no reason not to.

[1] http://www.gpars.org/

On the other hand an Actor modelled implementation can - will - quickly get out of hand if you start using it to directly represent a business domain. Actor signature is Any -> Unit which essentially means “computes anything” plus there’s internal state (become), message queue size, problems with deterministic testability and so on.. I think Actors as an alternative to Threads, particularly valuable for its distributed supervisor trees, and used to build more abstract consumers for a strongly typed model of business representations.

It amazes me to no end that CML-style concurrency is not used more often. (Even though Go is sort of almost there)

I have been using guile fibers for some time, and it is a marvel! Not only can I write most code as if it was single threaded, but I can mix fibers code with pthreads and use fibers message passing when I need it.

If anyone is looking for another language that also uses the actor model for concurrency and has very similar ways of handling it like D, then checkout Erlang. It's language lessons are clearly still relevant today. :)

Erlang does go way beyond concurrency into the domain of distributed and highly-reliable systems, and has more than lessons for us IMO - together with OTP, it’s still arguably one of the most effective and battle-tested environments for highly concurrent systems.

Like anything, it has its envelope and sweet spots, as well as its warts. Some people find the Erlang language syntax to be ugly; I quite like it and find it easier to comprehend in general than Elixir, which is much more verbose to my eyes (except for the very nice pipe operator!)

Yes, the semicolon / period thing in Erlang can be annoying, but not so much that it really gets in my way.

The tooling can be a real barrier to entry, and I’m glad that Elixir is bringing more people into the fold, although I can’t bring myself to really like Elixir for purely subjective reasons.

For example, one of my major complaints about Elixir is that it is literally Erlang with some syntax changes, rewritten libraries, and extensions like macros - but if you don’t grok Erlang thoroughly, you will get frustrated when you hit the bottom layer - like exceptions and error messages - and it’s full of Erlang data structures. I had the same reaction when working with non-Java languages targeting the JVM. They all feel a bit like bolt-on kits. At some point, you really need to understand the core.

In my opinion, of course.

Important to point out Elixir as well, which is touted as a "more modern" way to write code that runs on the Erlang VM (so you get the same concurrency features, etc.)

The _only_ downside I find of Elixir is that it looks too Ruby-ish to my taste. I'm still waiting for the day a C-like language that runs on BEAM becomes mainstream enough.

Here's my thought process while reading your comment, which illustrates why we can't have nice things:

> The _only_ downside I find of Elixir is that it looks too Ruby-ish to my taste.

Yeah, preach it!

> I'm still waiting for the day a C-like language that runs on BEAM becomes mainstream enough.


Haha yeah, I've become too used to C-like syntax (C, Go, Rust, etc).

I wouldn't mind a Lisp syntax too, but Ruby and ML are definitely a "no" from me. I just can't bring myself to like them. Their syntax feels too "loose", not sure if that makes sense.

Btw, I know there are some Lisp-like implementations out there, but same criteria applies: still not mainstream enough as of now.

No, I agree with you completely, I'm just remarking on how everyone will have different aesthetics and things they're used to, and, ultimately, that might not be a valid complaint. Like, sure, I don't like Ruby's syntax much, but if the ecosystem is good and the language is a joy to write, eh, I'm going to compromise because you can't please everyone's subjective preferences.

There is Lisp Flavored Erlang (LFE) from Robert Virding that you may enjoy[0]

[0] http://docs.lfe.io/current/

Could you clarify, what's wrong with the Ruby syntax, exactly? I don't think there's a single programming language I like in terms of the syntax, but I don't see anything really wrong with Ruby. Surely not to the point I wouldn't use otherwise nice language because of it. Is python a no-no as well?

It's not "wrong"; it just doesn't appeal me.

- `elsif`

- `end`

- `unless`

- `a unless b`

- `f 1` vs `f(1)`

- Implicit method calls (`f` vs `self.f`). I know both have different behavior, I just don't like this "implicit self".

... etc.

I find Python okay-ish though, as paradoxical and nonsensical as that might sound. But in my toolchain Python has been reduced to little more than a calculator, since it's been replaced by Go for most of my automation needs.

Btw, friendly reminder that this is my subjective opinion regarding my own tastes, not absolute truth on aesthetics.

EDIT: Formatting.

Speaking for Elixir, a few notes:

I rarely find myself using if/else statements and I definitely avoid 'unless'.

One of the 'Rubyisms' I really disliked was the optional parentheses for method calls. While these are still optional for good reason, in practice this is no longer the case. The formatter will add parentheses and IIRC the compiler will give you a stern lecture too.

Obviously taste is subjective, but I do agree it matters.

I'd say if the Ruby-like syntax is what keeps you from trying Elixir, remember that's it's only skin deep and at least for me, the bad stuff is not really there compared to Ruby.

Furthermore, the advantages of having a functional and 'lispy'/homoiconic language that doesn't look like my bathroom floor after clipping my toenails is worth any remaining 'niggles' like the begin/end thing, or optional [] in the last parameter of a function call, etc. (which just like the optional parentheses is there for a very good reason).

He thinks the syntax doesn’t “pop”, clearly.

I had the exact same reaction.

> The _only_ downside I find of Elixir is that it looks too Ruby-ish to my taste.

Granted. More ML-like would be nice.

> I'm still waiting for the day a C-like language that runs on BEAM becomes mainstream enough.


>I'm still waiting for the day a C-like language that runs on BEAM

I used to have a similar desire!

I actually started working on a language that transpiled to Haskell, like CoffeeScript -> JS. What I found, however, is a lot of little ways to make it more and more terse, until I ended up with something resembling Haskell’s syntax.

After this happened several times, I had some new found appreciation for Haskell’s syntax. It’s god awful for someone coming from a C-like background, but that seems like a very temporary problem.

I went from hating Haskell’s syntax so much I wanted a language wrapping it, to realizing it’s actually rather well thought out and other languages have a lot of noise to them. Lots of curly braces and semicolons that seem to mostly be there to make writing a compiler easier.

I still love C-like syntax, and still don’t like Haskell’s syntax, but not enough to write a CoffeeScript-like language.

Funny, I love Ruby and it's my primary language of choice, but I'm a heretic, using braces whenever I can:

    # events per day
    ndbh.fetch('select ts from tbl').all.map { |row| 
    }.map { |t|
    }.each { |day|
        days[day] += 1
(yes this could be conciser)

This is generally frowned on, and I'm sure everyone who likes having this style

     .map {}
     .select {}
     .blah {}
will frown as well, but it's so much more visually appealing to me, coming from a C/perl background when I first encountered Ruby (many years ago).

Funny how important syntax is. So easy to look at code in another style/language and go THATS NOT RIGHT!!

> Funny how important syntax is. So easy to look at code in another style/language and go THATS NOT RIGHT!!

I'm waaay too picky in this aspect. I once was reading about Ada (before my bias towards C-like languages kicked in). It looked interesting and all, and I was like "Hmm, not bad, learning this might be interesti--"

> Put_Line


The irony here is, my current language of choice is Go, and tests must be named like `TestType_Method` and I'm like "arrrgggghhh".

EDIT: Btw I don't really mind indentation styles, brace placement, "dot placement" (like your example) and those things, as long as they're consistent across the codebase.

> "more modern"

Is it really touted as that? It's just a more Algol-looking way to write for Erlang semantics.

The other reply mentioned tools and documentation, which are the big differentiators in my opinion, but there are also some language level tools in Elixir which feel more "modern" I guess, the macro system which is on part with Lisp, protocols, a consistent standard library which extends Erlang rather than replacing it (although it does duplicate some things, both because binaries are the default string type in Elixir, and because OTP has a bit of an annoying problem with switching the position of the subject), and some conveniences like pipelines, the `with` construction - basically it produces something which semantically is like Erlang at its core, but how you go about solving problems in it offers a lot more options, which oftentimes result in a lot less code written for the same effect.

It's more than that; there's a lot of tooling and documentation around Elixir that either doesn't exist in Erlang or appears to exist only via tribal knowledge ("Well, of course everyone knows you need to {X} to {Y}, why waste time writing it down?"). My experience trying to write a gen_server that could be upgraded on-line in Elixir was vastly simpler than trying to figure it out in Erlang.

That said, I think Elixir has helped inspire the Erlang community to get better at documentation and tooling, so that's a definite win for both.

Would any sort of actor model language have the raw throughput for a AAA game? I get how this model is nice for the web, for example, but I'm wondering why it doesn't get used for high performance computing (that I know of).

There are languages, like Pony [1], that use actor model for the sort of high performance you are talking about. Also check out Anna paper, there is a description and argumentation on how and why they use actor model in C++ for high throughput.

I would also say that performance wise actor model is usually better, than low level shared memory multithreading, because it enforces locality-friendly contention-free architecture and fundamentally maps better to modern hardware.

[1] https://www.ponylang.org/

[2] http://db.cs.berkeley.edu/jmh/papers/anna_ieee18.pdf

Thanks for addressing the latter point. I forgot to draw that distinction. That's a curious point, however, and I'll have to check that out more for myself as you suggest.

See seastar framework for c++

For their AAA games cloud infrastructure it does! In fact, it is a backbone for some of Microsoft's Halo AAA game. Microsoft has this framework called Orleans which uses the Actor pattern. They have used it in various other project's too.

Who is using Orleans: https://dotnet.github.io/orleans/Community/Who-Is-Using-Orle...

Video presentation on it: https://www.youtube.com/watch?v=7OVU9Mqqzgs

Now if you are wondering about raw throughput for graphics and physics stuff in AAA games, that I don't know. I believe that to be a completely different beast, with different requirements, which may or may not benefit from this paradigm.

The Erlang VM (BEAM) is unusual in that it, like the languages it hosts, is very opinionated.

It is designed for robustness, scalability, concurrency, distributed environments. And immutable data. So far as I know, you literally cannot implement a language with mutability on the VM.

So, raw performance will never be its thing.

In general, I think the actor model could achieve high performance, but perhaps only if messaging is syntactic instead of truly distributed with mailboxes, network transparency, etc.

I'm a beginner, so by all means correct me if I'm wrong, but as I understand BEAM is relatively good when it comes to 'soft realtime', and especially when 'latency' is a concern (in part due to per-process GC?).

Am I correct in thinking that this would be pretty okay for most games, AAA or not? Would an FPS be possible (quick updates, small messages)? Or a World Of Warcraft or Sea of Thieves style game with many actors that need sort-of-realtime performance but don't rely on it entirely?

I've been looking into creating a game, and I'm also learning Elixir, so I'm curious what would be realistic when combining both.

In performance terms, Erlang is broadly speaking a scripting language, in the 10-20x slower than C range. It would be unusable for a AAA game because even using Erlang without any concurrency, it's too slow. It will get even slower if you do what you might be inclined to do for a game and make a separate process for every entity in the game and communicate entirely by message passing. It will be a beautifully clean architecture, but while Erlang may have cheap concurrency, it does not have free concurrency, and if you work realistic math on the sheer number of messages you'd have flying around the system it should become clear that it will not be practical to have literally manifested "messages" in that quantity being continuously created and destroyed.

If we tune our sights down from "AAA game", there are two possibilities. One is you can create a less computationally-intensive game that can run Erlang on the desktop. I suspect you'll find you're a bit short of libraries for that use case, but with motivation you can pound through that. I'm not sure if this has ever been done. The other thing you can use Erlang for is being the backend server of a game system, and that is eminently practical, in the sense that it has been done: http://erlang.2086793.n4.nabble.com/Erlang-Who-uses-it-for-g... You'd encounter some bumps if you tried to scale it up, but that's not a very strong criticism since it's constant regardless of what tech you'd end up using.

Thanks for the info. I was definitely not thinking of using Erlang/Elixir for the actual game, but rather as a back-end. Glad to hear that wouldn't be a bad choice.

I'm afraid I don't know enough about game development to be of assistance.

I assume in general they're very graphics intensive, which is (based on a 20+-year-old education I never finished) very matrix mathy, the type of calculations you absolutely would not want to shove through the Erlang VM.

However, one architecture people have used to varying degrees of success is using Erlang as a control layer (messaging, resilience) and C/something else compute-heavy for data manipulation. Erlang has the ability to drive external binaries via NIFs or ports.

Historically that's been a bit risky because the Erlang scheduler requires insight into its processes to do its job properly, but there have been improvements in recent releases.

So...maybe? Probably a question better suited to the Erlang users mailing list.

Ask yourself this question - "Am I doing lots of math and computationally intensive stuff"?

If so, don't use Erlang.

The corrolary is "Am I mostly passing data around, doing I/O, etc?"

If yes, Erlang is probably a reasonable choice (and a GREAT one if you're doing a lot of concurrent stuff).

So games, backend server -can- work; it just depends what it's doing. You may need to mix and match for functionality if you're doing a mix of things, and then you have to ask if it's worth the effort and translation cost if you have to jump between languages.

>Am I correct in thinking that this would be pretty okay for most games, AAA or not?

No. Soft real time just means that the GC will not pause the process and and it will not cause stuttering during garbage collection pauses. In languages with GCs this is usually achieved by not allocating too many objects in your game loop.

If you write the 3D engine in erlang your game won't stutter but it will have an incredibly low framerate.

The secret ingredient to high performance is primarily data locality and obviously a compiler/runtime that can actually translate the program into efficient code. You can make your compiler as good as you want, without data locality and control over the memory layout the performance simply won't be good enough.

What do I mean by data locality? Primarily these things.

Avoid indirection through pointers.

Java's ArrayList is a nice example. You cannot store raw "int"s in an ArrayList. Only "Integer" objects. This means if you want to calculate the sum of the ArrayList the CPU will have to read the data from main memory if it is not in the CPU cache. This is roughly two orders of magnitude more expensive than an a single addition for example. If data can not be found in the cache it is called a "cache miss". Of course in python, javascript, erlang almost everything is a pointer that points.

Continguous storage of data.

Java still have primitive arrays. You can declare an int[] array to avoid the above problem. This means the integers are stored directly inside the array. Well what if the array is not in the cache? Wouldn't that incurr the same problem as a above?

A CPU is actually quite smart. It has a prefetching unit that can detect if you're loading data in a regular pattern and load the next piece of data according to that pattern.

If you're the CPU and see this pattern. What would you do to avoid a cache miss?

arr[0] arr[1] arr[2] arr[3] arr[x]

Of course you would start loading arr[4], arr[5], arr[6], arr[7] ahead of time!

Efficient storage

ArrayList stores a continguous list of pointers. On a 64bit architecture this means every pointer costs you 8 byte of memory. On top of that every object in Java has a "header". I don't know the exact number but let's say it's size is 8 bytes. Then there are alignment restrictions. Let's say each object is aligned by 8 bytes. 8+8+4 is 20. The nearest multiple of 8 is 24. We're storing a 4 byte number in 24 byte worth of memory. In other words if we are memory bandwidth constrained we can increase our performance by another 6x just by reducing the amount of "useless" data we have to read.


There are efficient immutable datastructures but these usually involve indirection through pointers. Avoiding allocation of new memory avoids GC pauses or time spent in malloc/free. Mutating only a small part of a datastructure is more efficient than creating a copy.

Stack allocation

The primary benefit is that the top of the stack is usually in the L1 cache. Managing a stack is more efficient and a GC or manual memory allocation. It's just a pointer that gets incremented or decremented.

Erlang doesn't give you control over either of these. Ponylang has actors with isolated GC heaps and gives you control over memory layout and data locality.

I couldn't have hoped for a better answer. Thanks!

> So far as I know, you literally cannot implement a language with mutability on the VM.

You could do so, several different ways, e.g.:

(1) use the process dictionary,

(2) store data transparently to the new language’s user in ets (or, similarly, dets/mnesia),

(3) use separate (again, hidden from the language user) Erlang processes for mutable cells.

You can't get hig-performance mutability, but you can definitely implement a language with mutability on BEAM.

>So far as I know, you literally cannot implement a language with mutability on the VM.

Elixir allows reassignment, of course, in the end, it uses different "Erlang" variables, but the language abstracts it.

Changing an alias isn't the kind of mutability that would allow high performance, though.

That is not mutability.

It is though. A mutable state layer written on an immutable backend still has all the pitfalls of a native mutable data structure.

Take the context into consideration: this whole thread is about performance and the comment on mutability was speaking of mutability in place, because that matters for performance.

> It is though.

No. That can trivially be inferred from SSA being a thing, mutable bindings can trivially and automatically be converted to immutable ones.

As far as I know, Ada code using its types of actors (tasks and protected objects) can be made very fast, and more importantly for AAA games, very predictable. The default scheduling methods are good enough for many cases, but with the right restrictions, scheduling can even be statically determined and, in principle, a cyclic executive could be generated by the compiler with the same semantics as the original actor code. Unsure if this is actually done in practice, though.

NaughtyDog used a job system with fibers to parallelize their engine [1]. I suppose that's not exactly the same as an actor system since the fibers don't necessarily own all their state nor do they necessarily communicate using messages.

[1]: https://www.gdcvault.com/play/1022186/Parallelizing-the-Naug...

Even in something like Go which has this kind of concurrency in mind, performance critical code is often written with the more traditional "threads & locks" approach with the goal of using the ideal number of goroutines to maximize hardware use but no more than that.

Not saying that other concurrency models don't have the throughput for a AAA game, but when your goal is to get the most out of the hardware of one desktop/console you're going to have different priorities than a server environment.

Performance would be terrible, Erlang is way too slow to do anything in a game client.

Wooga uses Erlang on the server side.

rust has actors too https://github.com/actix/actix

Libraries aren't the same as native support as the latter case influences all uses of concurrency in a language while the former only affects things that use that library and may be incompatible with other concurrency libraries.

That seems kinda fallacious. I don't think inherently there is anything that would make an ecosystem around a library weaker than an ecosystem around a language. I.e. if you have a popular actor library then it is not unreasonable to think that there is as rich if not richer ecosystem of things built on top of that library than a niche language with native actors.

We dabble in Akka - a good option for Java / Scala shops

There are ports of Akka and Akka-like libraries for a number of languages. Actor frameworks are relatively easy to find for most languages. Another variation of a sort of an Actor framework that gets used for relatively large projects is the Orleans framework for .NET: http://dotnet.github.io/orleans/

I'd argue the most interesting problems are those that end up having some sort of shared state requirement. Clojure handles this nicely with language primitives like atoms and refs, which helps avoid the pitfalls of the locking paradigm found in Go and Java, for example. What does D do in this case, then?

There are the basic primitives found in c/any language, but there are also synchronized{} (or synchronized (my_mutex) {}) blocks that essentially automagically handle locking/unlocking and let you just mutate around willy-nilly across threads (in that block). There are also atomics, although the syntax for using them is a bit clumsy (foo.atomicOp!"+="(7), anyone?).

How does it compare to Go channels? I only know actors from Scala, Go felt similar but more er, simple?

Simple to fault, in that channels and Goroutines don’t prevent you from making mistakes the way actors do - ie you can still share state and memory. Go allows for the possibility of writing cleanly concurrent code, but you must study and practice.

Using an actor system, though, involves a bit more study with regards to setting things up, but it’s almost impossible to make mistakes - actors simply can’t access each other’s state (maybe some systems allow it, but you’ll have to struggle against the language to make that mistake).

Erlang does a great job of protecting data from concurrency problems, but you're still vulnerable to mistakes like deadlock.

> Erlang does a great job of protecting data from concurrency problems, but you're still vulnerable to mistakes like deadlock.

without any shared data, and an asynchronous-message-passing style concurrency (i.e sharing memory by communicating, as opposed to communicating by sharing memory) can you please elucidate how we might get into deadlocks ? thanks !

OTP offers two basic messaging options: async or synchronous. The latter is implemented by forcing the caller to wait for the response.

If process A sends a synchronous message to B, and B while processing it sends a different synchronous message either directly to A or to another process that eventually calls back to A, you end up in a deadlocked state.

> OTP offers two basic messaging options: async or synchronous.

with synchronous messaging it is quite easy to see how deadlocks can happen, without trying too hard. for async messaging, which is what i was referring to, it is quite hard (and or convoluted) to get into that state...

receive is a blocking action in erlang so something like:

  loop(State) ->
      msg -> loop(State)
Will block waiting for `msg` to show up. So if you have two processes that are interacting with eachother and aren't careful in constructing their communication you could end up with something (contrived):

  loop(Pid) ->
      msg -> Pid ! msg
If you have two processes A and B that end up in this same loop but referencing each other, they'd deadlock. Substituting A and B for Pid you'd end up with something like:

  loop(B) ->    % Process A
      msg -> B ! msg
  loop(A) ->    % Process B
      msg -> A ! msg
Both waiting for `msg` but never sending it to their partner process.

> Both waiting for `msg` but never sending it to their partner process.

but then you can 'deadlock' for a single pid as well right ? where you wait for a message which no one is sending...

A single process could wait forever if no message ever arrives, yes. But it's not a deadlock since deadlock requires at least two processes that are waiting on each other in some fashion.

And responding to your lower down comment: Yes, if a third party could send `msg` to either of these it'd break the deadlock. In my example (and the cases I've caused this myself) there was no other process around to do that. But this is where, with experience, you learn to design things better and also use `after` clauses in the receive expression. This will cause them to time out and do something. Which may be to terminate and let a supervisor restart the whole thing or some other behavior.

That wouldn't count as a deadlock. A deadlock implies that it's possible to make a logical determination that the system is permanently stuck, and that state cannot possibly change without external intervention.

i.e. a deadlock can only be determined when you look at the system as a whole from the outside and determine that it's permanently stuck.

> That wouldn't count as a deadlock. A deadlock implies that it's possible to make a logical determination that the system is permanently stuck, and that state cannot possibly change without external intervention. i.e. a deadlock can only be determined when you look at the system as a whole from the outside and determine that it's permanently stuck.

yes, i know.

as i have pointed out elsewhere, with synchronous invocations, it is easy to get into a loop (or deadlock) i.e. pid-a -> pid-b -> pid-c -> pid-a. for the async case, which is what was elucidated by gp, it seems to me that the example is 'deadlocking' only because no one is sending messages expected by the other party...

Also livelocks and race conditions. They're just harder, usually requiring some bad design decisions.

you can voluntarily write Go which doesn't share any state, although clearly if thats an invariant you'd like to keep it would be really nice for the language to enforce it for you.

more importantly I've found that as you grow such Go programs, write higher-order actors, and deal with all the error/cleanup cases, the selects start to get really brittle and you end up spending alot of time refactoring them and carefully going over all the edge cases.

I don't know about D, but Go channels and Erlang processes are sort of complements or inverses of each other in some sense.

A Channel in Go is a first-class communications bus that can be passed around as a value, and senders and receivers are implicit/not first class. Arbitrary numbers of readers and writers can use one channel. Channels are also typed; only specific messages can pass across a given channel, though it can be specified by interface.

In Erlang, you have to send a message to a specific process. Thus, the receivers (and symmetrically, the senders) are first-class objects that can be passed around, but the bus is implicit in the language. Processes may also receive any message, and should be able to deal with them. (One failure case that can occur in Erlang is a memory leak because some process is getting messages that it never receives, so they just build up in the mailbox. In practice this only happened to me maybe twice over the five years I was using Erlang, so it's not a stopper, just "something to be aware of", especially while debugging leaks.)

A positive for the Go model is that it is really easy to set up multiple readers for one writer, a common pattern, which Erlang handles somewhat gracelessly. (Yes, I am aware of the "pool" abstractions, all of which last I knew were one variation or another on "send a message to the pool coordinator to find out which process to send a message to", creating a single-process bottleneck on the pool.) There are some other nice ways to set up channel networks in Go to do some things Erlang would only be able to do with a lot more indirection and performance penalty on top of the fact that Erlang is already substantially (albeit not necessarily fatally) slower than Go. A negative for the Go model is that the way they've specified channels means that they rigidly must run in the same OS process; there is not and can not be a "network channel" in Go with the same semantics as a Go channel, because a Go channel is an "exactly once" abstraction, which is impossible to run over a network [1]. Also, on the off chance you want to "guarantee" that a given recipient will process a message, it's on you to guarantee that the channel does not "get around" to goroutines you didn't expect.

A positive for the Erlang model is that you get that sweet, sweet network transparency that makes writing Erlang-based clustered servers sweeter than any other language I know, because they defined the characteristics of their bus from the very earliest days of the language for that use case, in contrast to Go which wrote their fundamental abstraction in a way that network transparency is impossible. (Bear in mind that systems ought to be designed for that early, it is not automatic, but it is still a staggering advantage for the language.) The downside is that when you want to do anything other than have one process send a message to a specified other process, you're going to have some sort of indirection or bad API or bottleneck process or something like that. Depending on the nature of your server this price may range from utterly irrelevant to quite expensive, although I'd expect it to be your "biggest problem" quite rarely.

(I'm also only comparing the channels vs. PID-based message passing. There are other relevant issues like the shared memory in Go vs. enforced isolation in Erlang, etc.

[1]: And Go channels are "truly" exactly-once, too, so even Kafka's somewhat dodgy twisting of the term "exactly once" wouldn't be sufficient to implement them. Channels are used for memory synchronization, so it must be guaranteed that a non-buffered channel has had its message arrive on the other end because the fact the program counter of the receiver has advanced to that point in their code is something the language critically depends on, and a mere promise that it'll get there eventually, maybe twice, someday breaks that completely.

It's going to be the difference between communicating sequential processes, go tries to do this, and the actor model.



Exactly! What might not be immediately obvious is that CSP works OK for concurrency, but not for distributed systems. For more details, see 2.1.3 (page 28) of https://eprints.illc.uva.nl/943/1/MoL-2015-02.text.pdf

The Go model is also faster but less safe.

golang is a great middle ground; it defaults to safe (but might be slow if you do something the wrong way), it can be fast if you avoid costly mistakes (the syntax tends to help you remember what you're actually working with as a data structure so costs are mostly upfront), and if you /need/ to get fancy and know exactly what you're doing you can use the unsafe library to work with pointers (or interface with other libraries; if you're not sure if they're safe to execute concurrently (most state based systems aren't) there's runtime.LockOSThread() to pin that thread's actions).

It's conceptually much simpler. Erlang(/Elixir) does not have separate concepts of routines and channels, you just send messages directly to a process (and it does whatever it wants with that), and not only does it embrace "don't communicate by sharing memory, share memory by communicating" it enforces it: Erlang's data types are mostly immutable and each process has its own private heap.

The language does look and feel somewhat odd though, it was inspired by prolog (so will look very odd if you've not used prolog) but doesn't do full unification (so will feel very odd if you've used prolog).

That’s because Go is simpler :)

I'm a big fan of Erlang—and believe it's a useful language to know even if the daily language is something else—but my understanding of D is that it excels in a different problem space (build efficient native code, etc.)

I don't see what's so good with that. Don't we have multiprocessing and threading pools that does that stuff automatically ?

Obligatory, but Haskell, comment.

Nice post, looks like a decent API. I wonder how similar each implementation of this model in Language X is and how easily it would be to test their implementation meet the spec.

What about Haskell? Haskell has great support for writing code in the Actor style if you wish. But you can also have traditional locks and simple mutable pointers in Haskell too.

It’s great for writing concurrent code that is both easy to understand and be confident in its correctness.

I just didn’t want to derail the conversation away from D and the authors’ discovery.

Try actors.

I'm not a professional programmer, but I had no problems with Java's multithreading libraries. Lock handling, synchronization etc are a pain in the ass to get used to but not so much of a big deal, and I don't see how D is any different than that.

With the actor model you don't share state between threads, hence no need to lock anything.

Each variable can only be updated by a single and unique thread. If another thread (or actor) wants to modify it, it has to send a modification request messages to that one thread to do the job.

That's quite a difference!

Edit: Java threading model is fine for limited amounts of concurrency. Complexity escalates quickly with highly concurrent applications; the actor model makes reasoning about those programs far easier.

So it's like a lock, except you call it "modification request". Great, got it.

The difference is that it doesn't halt your execution unless you want it to. With shared memory and locks you have to wait when you set a value if anyone else is interacting with that object or that part of the object (if you have fine-grained locks). So you wait both for getting and setting.

If your system behavior does not immediately depend on the value of what you're changing, then actors are fantastic. You don't wait, it's async. You send the data to the controlling actor (which is an object in the OO sense itself). Your process continues on as if nothing changed until you need the updated results.

  Process A
    B ! {update, x, newVal},
    %% A bunch of work
    B ! {lookup, y, A}, % where y perhaps depended on x
    Y = receive % Here we finally wait, if B is processing a lot of requests
      {B,y,Val} -> Val
    %% more work

  Process B
  loop(State) ->
      {update, Var, Val} -> loop(update(State,Var,Val));
      {lookup, Var, Pid} -> Pid ! lookup(State, Var), loop(State)
Where B may be much more complex than that, in this case it looks like a simple KV store.

The receive at the end of Process A is blocking and is the only time we end up with a lock in Process A. So if A never needs to do anything but send values to B, then A can run undisturbed by everyone else's data requests. This lends itself well to certain pipelined and dataflow architectures.

Similar in the sense that they protect data from being accessed by multiple threads at once. But the implementation and semantics are quite different.

The main practical drawback, I would say, is that locks may suffer from the mutual exclusion problem; where you have contention, starvation and deadlocking. Also your ability to parallelize operations over shared state quickly vanishes as the number of threads grow...

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact