> However, in most cases threads are no more complex than callbacks, actors, etc. In fact, from what I've seen, concurrent code eventually all converges to some semblance of the actor model anyways.
Threads are the WorseIsBetter approach to concurrency; they're incredibly simple to implement, but that just means that the difficulties are pushed on to the users (ie. developers using the framework/library).
Threads may be a good idea for code which has no 'design flaws' and is not 'too complex', but as we all know everything has bugs and everything is more complex than it seems. The arguments in favour of higher-level concurrency models are basically the same as for tests and version control: if you don't use them, you're making a dangerous gamble which may cause a large price down the road.
Concurrency models like callbacks and actors can make dangerous things more difficult; if we use the callback examples from the article:
> It’s hard for a programmer to reason about which line of code executes on which thread, and passing information from one callback to another is cumbersome as well.
Of course, this is the point of callbacks. The callback model tells us to reason using function arguments and function calls, so of course we can't map lines of code to threads, since neither lines of code or thread have any place in a callback model. Likewise for passing data between callbacks; the problem with threads is that everything is shared all of the time, which makes it incredibly difficult to enforce invariants. When using callbacks, everything is local by default and transfering data between threads requires explicit channels, eg. free variables.
In the actor model the safety comes from messages having no ordering or latency guarantees, so we can't assume that our data is always up to date.
With higher-level concurrency models we end up screaming at our IDEs as we try to contort our code to fit the paradigm. This is how it should be, since this means nothing's gone wrong.
With low-level concurrency models, the machine gladly accepts our dangerously broken code, the number of interleavings is so huge that our tests never hit an error case (or more likely, some of the bugs are so obscure that it never occurred to us to test them). Six months later the application explodes and as we sift through the pieces we find the true extent of the problem, and discover that subtley corrupt output has permeated through every aspect of the business and we can't anything that's been done since that code went live.
> the problem with threads is that everything is shared all of the time, which makes it incredibly difficult to enforce invariants.
Your entire commet comes down to this, and my point is that this is not a problem. Design your threaded code around a simple principle: one thread's code must never touch another thread's data. Now you have safe threaded code. If you want to add some limited well-documented cases where you break that golden rule, go for it and reap the performance benefits.
There are some things that some threading models can be criticized for. For example, POSIX threads cannot be killed if they get stuck. However, threads are a powerful tool. The idea that you can share the in-memory code between all your threads is great. Additionally, you can share state and you control how and when it is shared. Want complete isolation? Communicate via queues! Want some shared state for performance reasons? Go for it! Want complete and utter chaos that will blow up as soon as you look at it funny? Let threads access other thread's data at will.
Your argument is similar to one that table saws are terrible because one cannot guarantee that they will never cut off your fingers.
Edit: one other problem with callbacks. AFAIK, no implementation of callback-based concurrency is able to take advantage of multiple hardware cores for true parallelism. In the meantime OS schedulers already take care of distributing OS threads between CPU cores, and some green thread implementations do this as well.
So, you're not wrong ... but the table-saw argument is a straw-man.
There's a company that sells a revolutionary table saw with intelligent saw stop precisely because experienced, skilled practitioners regularly cut off their fingers.
In general "be smarter / do better" is not a reasonable prescription for large numbers of people. Empirically, if people are fucking up, it makes sense to analyze why and to give them automatic solutions to their fuck-ups.
I don't see it as a straw-man as I see threads as a tool. Existence of the actor model does not detract from the value that OS threads provide, the same way that existence of Common Lisp does not detract from the value that C provides. They are both tools. It's just that some tools are more dangerous than others. In other words, I don't believe that threads are a "worse is better" approach. There are things that can be improved about the specific implementations of threading, but on the whole, the paradigm is far from broken.
> Empirically, if people are fucking up, it makes sense to analyze why and to give them automatic solutions to their fuck-ups.
The problem is that other implementations of concurrency are not as widely adopted and people tend to fall back on threads (especially OS threads) when they really don't need them. But when you really do need threads, very few things are a good substitute.
P.S.: I am aware of the table saw you refer to, and this is the kind of improvement that tooling around threads could use. Note that this new table saw does not completely re-design how you interact with the blade in order to provide the safety.
>Your entire commet comes down to this, and my point is that this is not a problem. Design your threaded code around a simple principle: one thread's code must never touch another thread's data. Now you have safe threaded code. If you want to add some limited well-documented cases where you break that golden rule, go for it and reap the performance benefits.
How do you know which code is touching which data, particularly if you're using libraries? Heck, we can't even reliably keep track of which data another piece of data belongs to - even with code written and audited by experts, memory leaks get found all the time. Just as memory management is too hard to do in complex programs without language support, isolating data to the appropriate threads is too hard to do in complex programs without language support.
Bullshit. You know that you are not violating your one golden rule by only having the one golden rule. Break fingers of any developers that violate it. Testing is important but there is a certain level at which mistrust of your code becomes paranoia. How do you know that your code is not littering the disk with debug files, declaring global variables, adding rogue macros, etc.?
As for libraries, don't use ones where you have not seen the source or good docs that make the guarantees that satisfy you. Thread safety is one of many reasons for this.
As for memory management being too complex for large projects, see Linux kernel, BSD kernels, nginx, apache, and a million other large projects written in C.
The only thing I agree with you on is that often times language support makes things easier. However, using "unsafe" languages does not make large projects impossible.
> How do you know that your code is not littering the disk with debug files, declaring global variables, adding rogue macros, etc.?
I use a language in which functions that perform disk I/O look different (and are typed differently, so this is not just convention but compiler-enforced) from functions that don't, functions that mutate state look different from functions that don't, and macros don't exist.
Yes, you can forcibly cast around these things. But you have to do so explicitly. Whereas in most threaded languages, access to a variable that's owned by another thread looks exactly like access to a variable that's owned by the current thread.
> As for memory management being too complex for large projects, see Linux kernel, BSD kernels, nginx, apache, and a million other large projects written in C.
I do. I watch the growing list of security advisories for each of them with a mixture of amusement and frustration.
> Threads are the WorseIsBetter approach to concurrency;
Threads/Actors are the obvious way to do concurrency. Just like a sibling comment I think you comment confuses the comparison. The semantic difference is between threads _and_ actors vs callbacks.
An actor is a sequential context, ideally isolated, but it can run concurrently with other actors (that are themselves actors). Think of group of entities in a game. Each one executes some simplified sequence of operations. Do x, do y, do z, then go back to x. But there multiple such entities in parallel. Another example is handling web requests. A web GET request is dispatched a new actor is spawned. They read the request body, process it, read some data from database maybe and return the response -- very sequential. But there are multiple potential such requests running concurrently.
Callbacks also form sequences of calls but there is no explicit concurrency context, and if sequence is simple it works ok, but it if it is not it is very easy to get tangled. You are processing one sequence but another piece of input comes in and a parallel sequence of callbacks has started, unless the data is immutable and you have pure functions at some point it becomes a tangled mess.
> With higher-level concurrency models we end up screaming at our IDEs as we try to contort our code to fit the paradigm.
That is why you'd want to run isolated concurrency contexts (actors). You can do this by making copies of data and storing it locally. Talking to threads via queues only. Spawning OS processes. That is how you decompose a highly concurrent system. Using callbacks is not going to fix the problem is only going to make it worse.
> Threads/Actors are the obvious way to do concurrency.
Sure. Actors or other forms of CSPs. But I think that a necessary component is some form of a shared data-structure that works alongside, rather than interfere with, your threading model.
Erlang has ETSs, which are a little limited – not saying that there aren't better concurrent, shared data-structures in Erlang, just that even a language that works purely with the actor model admits that such a data structure is necessary.
Necessary in all cases or necessary in some cases? My take on this is that you can successfully pass state around if it's small enough and only one actor cares about it at a time. Once it gets big enough you probably want to use an external service to store and synchronize it (a database), and then it matters less how your program is structured.
I suppose the exception to this might be gaming and simulations where what's more important is speed as opposed to durability of your data, yet you have lots of state to keep track of.
If it were that simple, people wouldn't be spending so much time configuring caches or using Redis. I think most non-trivial applications require some central, shared, data store. More often then not, this data store becomes a bottleneck that limits scaling. Databases compete with one another over which interferes with scaling the least.
If you accept the premise in the opening quote about Amdahl’s law, then you must consider that any global or semi-global lock has a huge impact on scalability. Sometimes we have no choice, but I believe that we can and should remove many single-points-of-synchronizations while still keeping the programming model relatively simple. I also believe that rather than hindering scalability, a database can help achieve it.
That is definitely true. Databases are necessary. In fact in-memory data stores that can handle large volumes of data are not all that useful since they usually lack things like backups, etc. Not everyone is writing a RabbitMQ-like system. And of course locking plays a central role in all of this.
What I am saying is that when you accept that synchronization is going to be handled by your database of choice, it becomes somewhat less important how you actually structure your application in terms of performance. There are reasons not to use callbacks, but if you go with threads, actors, processes, etc. is now a choice between how you want to utilize memory and to an extent which technology your runtime supports best.
Threads are the WorseIsBetter approach to concurrency; they're incredibly simple to implement, but that just means that the difficulties are pushed on to the users (ie. developers using the framework/library).
Threads may be a good idea for code which has no 'design flaws' and is not 'too complex', but as we all know everything has bugs and everything is more complex than it seems. The arguments in favour of higher-level concurrency models are basically the same as for tests and version control: if you don't use them, you're making a dangerous gamble which may cause a large price down the road.
Concurrency models like callbacks and actors can make dangerous things more difficult; if we use the callback examples from the article:
> It’s hard for a programmer to reason about which line of code executes on which thread, and passing information from one callback to another is cumbersome as well.
Of course, this is the point of callbacks. The callback model tells us to reason using function arguments and function calls, so of course we can't map lines of code to threads, since neither lines of code or thread have any place in a callback model. Likewise for passing data between callbacks; the problem with threads is that everything is shared all of the time, which makes it incredibly difficult to enforce invariants. When using callbacks, everything is local by default and transfering data between threads requires explicit channels, eg. free variables.
In the actor model the safety comes from messages having no ordering or latency guarantees, so we can't assume that our data is always up to date.
With higher-level concurrency models we end up screaming at our IDEs as we try to contort our code to fit the paradigm. This is how it should be, since this means nothing's gone wrong.
With low-level concurrency models, the machine gladly accepts our dangerously broken code, the number of interleavings is so huge that our tests never hit an error case (or more likely, some of the bugs are so obscure that it never occurred to us to test them). Six months later the application explodes and as we sift through the pieces we find the true extent of the problem, and discover that subtley corrupt output has permeated through every aspect of the business and we can't anything that's been done since that code went live.