> However, in most cases threads are no more complex than callbacks, actors, etc...

IgorPartola · on Oct 16, 2013

> the problem with threads is that everything is shared all of the time, which makes it incredibly difficult to enforce invariants.

Your entire commet comes down to this, and my point is that this is not a problem. Design your threaded code around a simple principle: one thread's code must never touch another thread's data. Now you have safe threaded code. If you want to add some limited well-documented cases where you break that golden rule, go for it and reap the performance benefits.

There are some things that some threading models can be criticized for. For example, POSIX threads cannot be killed if they get stuck. However, threads are a powerful tool. The idea that you can share the in-memory code between all your threads is great. Additionally, you can share state and you control how and when it is shared. Want complete isolation? Communicate via queues! Want some shared state for performance reasons? Go for it! Want complete and utter chaos that will blow up as soon as you look at it funny? Let threads access other thread's data at will.

Your argument is similar to one that table saws are terrible because one cannot guarantee that they will never cut off your fingers.

Edit: one other problem with callbacks. AFAIK, no implementation of callback-based concurrency is able to take advantage of multiple hardware cores for true parallelism. In the meantime OS schedulers already take care of distributing OS threads between CPU cores, and some green thread implementations do this as well.

JabavuAdams · on Oct 16, 2013

So, you're not wrong ... but the table-saw argument is a straw-man.

There's a company that sells a revolutionary table saw with intelligent saw stop precisely because experienced, skilled practitioners regularly cut off their fingers.

In general "be smarter / do better" is not a reasonable prescription for large numbers of people. Empirically, if people are fucking up, it makes sense to analyze why and to give them automatic solutions to their fuck-ups.

IgorPartola · on Oct 16, 2013

I don't see it as a straw-man as I see threads as a tool. Existence of the actor model does not detract from the value that OS threads provide, the same way that existence of Common Lisp does not detract from the value that C provides. They are both tools. It's just that some tools are more dangerous than others. In other words, I don't believe that threads are a "worse is better" approach. There are things that can be improved about the specific implementations of threading, but on the whole, the paradigm is far from broken.

> Empirically, if people are fucking up, it makes sense to analyze why and to give them automatic solutions to their fuck-ups.

The problem is that other implementations of concurrency are not as widely adopted and people tend to fall back on threads (especially OS threads) when they really don't need them. But when you really do need threads, very few things are a good substitute.

P.S.: I am aware of the table saw you refer to, and this is the kind of improvement that tooling around threads could use. Note that this new table saw does not completely re-design how you interact with the blade in order to provide the safety.

lmm · on Oct 17, 2013

>Your entire commet comes down to this, and my point is that this is not a problem. Design your threaded code around a simple principle: one thread's code must never touch another thread's data. Now you have safe threaded code. If you want to add some limited well-documented cases where you break that golden rule, go for it and reap the performance benefits.

How do you know which code is touching which data, particularly if you're using libraries? Heck, we can't even reliably keep track of which data another piece of data belongs to - even with code written and audited by experts, memory leaks get found all the time. Just as memory management is too hard to do in complex programs without language support, isolating data to the appropriate threads is too hard to do in complex programs without language support.

IgorPartola · on Oct 17, 2013

Bullshit. You know that you are not violating your one golden rule by only having the one golden rule. Break fingers of any developers that violate it. Testing is important but there is a certain level at which mistrust of your code becomes paranoia. How do you know that your code is not littering the disk with debug files, declaring global variables, adding rogue macros, etc.?

As for libraries, don't use ones where you have not seen the source or good docs that make the guarantees that satisfy you. Thread safety is one of many reasons for this.

As for memory management being too complex for large projects, see Linux kernel, BSD kernels, nginx, apache, and a million other large projects written in C.

The only thing I agree with you on is that often times language support makes things easier. However, using "unsafe" languages does not make large projects impossible.

lmm · on Oct 17, 2013

> How do you know that your code is not littering the disk with debug files, declaring global variables, adding rogue macros, etc.?

I use a language in which functions that perform disk I/O look different (and are typed differently, so this is not just convention but compiler-enforced) from functions that don't, functions that mutate state look different from functions that don't, and macros don't exist.

Yes, you can forcibly cast around these things. But you have to do so explicitly. Whereas in most threaded languages, access to a variable that's owned by another thread looks exactly like access to a variable that's owned by the current thread.

> As for memory management being too complex for large projects, see Linux kernel, BSD kernels, nginx, apache, and a million other large projects written in C.

I do. I watch the growing list of security advisories for each of them with a mixture of amusement and frustration.

rdtsc · on Oct 16, 2013

> Threads are the WorseIsBetter approach to concurrency;

Threads/Actors are the obvious way to do concurrency. Just like a sibling comment I think you comment confuses the comparison. The semantic difference is between threads _and_ actors vs callbacks.

An actor is a sequential context, ideally isolated, but it can run concurrently with other actors (that are themselves actors). Think of group of entities in a game. Each one executes some simplified sequence of operations. Do x, do y, do z, then go back to x. But there multiple such entities in parallel. Another example is handling web requests. A web GET request is dispatched a new actor is spawned. They read the request body, process it, read some data from database maybe and return the response -- very sequential. But there are multiple potential such requests running concurrently.

Callbacks also form sequences of calls but there is no explicit concurrency context, and if sequence is simple it works ok, but it if it is not it is very easy to get tangled. You are processing one sequence but another piece of input comes in and a parallel sequence of callbacks has started, unless the data is immutable and you have pure functions at some point it becomes a tangled mess.

> With higher-level concurrency models we end up screaming at our IDEs as we try to contort our code to fit the paradigm.

That is why you'd want to run isolated concurrency contexts (actors). You can do this by making copies of data and storing it locally. Talking to threads via queues only. Spawning OS processes. That is how you decompose a highly concurrent system. Using callbacks is not going to fix the problem is only going to make it worse.

pron · on Oct 16, 2013

> Threads/Actors are the obvious way to do concurrency.

Sure. Actors or other forms of CSPs. But I think that a necessary component is some form of a shared data-structure that works alongside, rather than interfere with, your threading model.

Erlang has ETSs, which are a little limited – not saying that there aren't better concurrent, shared data-structures in Erlang, just that even a language that works purely with the actor model admits that such a data structure is necessary.

IgorPartola · on Oct 16, 2013

Necessary in all cases or necessary in some cases? My take on this is that you can successfully pass state around if it's small enough and only one actor cares about it at a time. Once it gets big enough you probably want to use an external service to store and synchronize it (a database), and then it matters less how your program is structured.

I suppose the exception to this might be gaming and simulations where what's more important is speed as opposed to durability of your data, yet you have lots of state to keep track of.

pron · on Oct 17, 2013

If it were that simple, people wouldn't be spending so much time configuring caches or using Redis. I think most non-trivial applications require some central, shared, data store. More often then not, this data store becomes a bottleneck that limits scaling. Databases compete with one another over which interferes with scaling the least.

If you accept the premise in the opening quote about Amdahl’s law, then you must consider that any global or semi-global lock has a huge impact on scalability. Sometimes we have no choice, but I believe that we can and should remove many single-points-of-synchronizations while still keeping the programming model relatively simple. I also believe that rather than hindering scalability, a database can help achieve it.

IgorPartola · on Oct 17, 2013

That is definitely true. Databases are necessary. In fact in-memory data stores that can handle large volumes of data are not all that useful since they usually lack things like backups, etc. Not everyone is writing a RabbitMQ-like system. And of course locking plays a central role in all of this.

What I am saying is that when you accept that synchronization is going to be handled by your database of choice, it becomes somewhat less important how you actually structure your application in terms of performance. There are reasons not to use callbacks, but if you go with threads, actors, processes, etc. is now a choice between how you want to utilize memory and to an extent which technology your runtime supports best.