The actor model (share-nothing) is one way to address the problem of shared, mut...

klabb3 · 2024-07-29T09:56:36 1722246996

You can always share immutable data. So shared nothing is a bit strong choice of words imo.

> But what if I want to have my cake and eat it too? What if I want to have thread-safe, shared, mutable state.

No, it’s not possible. Shared mutable state invokes ancient evils that violate our way of thinking about traditional imperative programming. Let’s assume you have a magical compiler & CPU that solves safety and performance. You still have unsynchronized reads and writes on your shared state. This means a shared variable you just read can change anytime “under your feet”, so in the next line it may be different. It’s a race condition, but technically not a data race. The classical problem is if multiple threads increment the same counter, which requires a temporary variable. A magical runtime can make it safe, but it can’t make it correct, because it cannot read your mind.

This unfortunately leaves you with a few options, that all violate our simple way of life in some manner: you can explicitly mark critical sections (where the world stands still for a short amount of time). Or you can send messages, which introduces new concepts and control flow constructs to your programming environment (which many languages do, but Erlang does perhaps the most seriously). Finally, you can switch to entirely different paradigms like reactive, dataflow, functional, etc, where the idea is the compiler or runtime parallelizes automatically. For instance, CSS or SQL engines.

I like message passing for two reasons: (1) it is already a requirement for networked applications so you can reuse design patterns and even migrate between them easily and (2) it supports and encourages single ownership of data which has proven to work well in applications when complexity grows over time.

OTOH, I am still using all of the above in my day to day. Hopefully in the future we will see clearer lines and fewer variations across languages and runtimes. It’s more complex than it needs to be, and we’re paying a large price for it.

gpderetta · 2024-07-29T12:44:57 1722257097

There isn't anything inherently evil with mutable shared state. If you think about it, what is a database if not mutable shared memory? Concurrency control (or the lack of it) is the issue. But you can build concurrency control abstractions on top of shared memory.

Also remember that shared memory and message passing are duals.

mrkeen · 2024-07-29T14:41:40 1722264100

Not if you put it into moralising terms like that. There's nothing wrong with crime either - lack of law enforcement is the issue.

Shared, mutable state means that your correct (single-threaded) reasoning ceases to be correct.

Databases are a brilliant example of safe, shared, mutable state. You run your transaction, your colleague runs his, the end result makes sense, and not a volatile in sight (not that it would have helped.)

toast0 · 2024-07-29T18:26:55 1722277615

> If you think about it, what is a database if not mutable shared memory?

Eh --- I send the database a message, and it often sends me a message back. From outside, it's communicating processes. Most databases do mutable shared state inside their box (and if you're running an in-process database, there's not necessarily a clear separation).

I don't think shared mutable state is inherently evil; but it's a lot harder to think about than shared-nothing. Of course, even in Erlang, a process's mailbox is shared mutable state; and so is ets. Sometimes, the most effective way forward is shared mutable state. But having to consider the entire scope of shared mutation is exhausting; I find it much easier to write a sequential program with async messaging to interface with the world; it's usually really clear what to do when you get a message, although it moves some complexity towards 'how do other processes know where to send those messages' and similar things. It's always easy to make async send/response feel synchronous, after you send a request, you can wait for the response (best to have a timeout, too); it's very painful to take apart a synchronous api into separate send and receive, so I strongly prefer async messaging as the basic building block.

moffkalast · 2024-07-28T19:57:57 1722196677

As much as I hate to say it, I think Java probably has the best eaten cake implementation by far. Volatile makes sure variables stay sane on the memory side, if you only write to a variable from one thread and read it from others, then it just sort of magically works? Plus the executors to handle thread reuse for async tasks. I assume C# has the same concepts given that it's just a carbon copy with title case naming.

Python can't execute in on two cores at once, so it functionally has no multithreading, JS can share data between threads, but must convert it all to string because pointless performance penalties are great to have. Golang has that weird FIFO channel thing (probably sockets in disguise for these last two). C/C++ has a segfault.

throwitaway1123 · 2024-07-28T21:16:17 1722201377

> JS can share data between threads, but must convert it all to string

To be more precise, you can send data to web workers and worker threads by copying via the structured clone algorithm (unlike JSON this supports almost all data types), and you can also move certain transferable objects between threads which is a zero-copy (and therefore much faster) operation.

moffkalast · 2024-07-28T21:31:31 1722202291

Ah yeah dataviews, but you still need to convert from json to those and that takes about as much overhead, plus they're much harder to deal with complexity-wise being annoying single type buffers and all. For any other language it would work better, but because JS mainly deals with data arriving from elsewhere it means it needs to be converted every single time instead of just maintaining a local copy for thread comms.

throwitaway1123 · 2024-07-28T22:27:48 1722205668

> Ah yeah dataviews, but you still need to convert from json to those and that takes about as much overhead

You don't necessarily need to have an intermediate JSON representation. Many of the built in APIs in Node and browsers return array buffers natively. For example:

  const buffer = await fetch('foo.wav').then(res => res.arrayBuffer())
  new Worker('worker.js').postMessage(buffer, [buffer])

This completely transfers the buffer to the worker thread, after which it is detached (unusable from the sending side) [1][2].

[1] https://developer.mozilla.org/en-US/docs/Web/API/Worker/post...

[2] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

moffkalast · 2024-07-30T10:24:09 1722335049

Hmm, well this I have to try sometime, thanks for the heads up :)

pjc50 · 2024-07-29T09:15:39 1722244539

C# has Interlocked https://learn.microsoft.com/en-us/dotnet/api/system.threadin... which is strictly better than volatile because it lets you write lockfree code that actually observes the memory semantics.

"Volatile" specifies nothing about memory barrier semantics in either Java or C++, if I remember correctly?

gpderetta · 2024-07-29T12:46:06 1722257166

std::atomic is equivalent of volatile in C++. C and C++ volatile is for other memory mapped I/O.

neonsunset · 2024-07-28T22:28:06 1722205686

> I assume C# has the same concepts given that it's just a carbon copy with title case naming.

Better not comment than look clueless. Moreover, this applies to the use of volatile keyword in Java as well.

mrkeen · 2024-07-28T20:41:06 1722199266

> Volatile makes sure variables stay sane on the memory side

This doesn't get you from shared-mutable-hell to shared-mutable-safe, it gets you from shared-mutable-relaxedmemorymodel-hell to shared-mutable-hell. It's the kind of hell you don't come across until you start being too smart for synchronisation primitives and start taking a stab at lockfree/lockless wizardry.

> if you only write to a variable from one thread and read it from others, then it just sort of magically works

I'm not necessarily convinced by that - but either way that's a huge blow to 'shared' if you are only allowed one writer.

> Plus the executors to handle thread reuse for async tasks.

What does this solve with regard to the shared-mutable problem? This is like "Erlang has BEAM to handle the actors" or something - so what?

moffkalast · 2024-07-28T21:45:04 1722203104

Well it doesn't get you there because shared-mutable-safe doesn't exist, at least I doubt it can without major tradeoffs. You either err on the side of complete safety with a system that is borderline unusable for anything practical, or you let people do whatever they want and let them deal with their problems once they actually have them.

> either way that's a huge blow to 'shared' if you are only allowed one writer

Yeah for full N thread reading and editing you'd need N vars per var which is annoying, but that kind of every-thread-is-main setup is something that is exceedingly rare. There's almost always a few fixed main ones and lots running specific tasks that don't really need to know about all the other ones.