Your comment seems to be conflating concurrency with parallelism. JS doesn't hav...

usrbinbash · 2023-07-05T10:29:39

> Your comment seems to be conflating concurrency with parallelism.

No, it really doesn't. I mention both concurrency and parallelism, and their main difference.

> Threads are the opposite: They are interfaces for parallel programming

No, they are not. Threads can do both. When waiting for an i/o bound operation, a thread can simply sleep. Added bonus: A thread basd implementation supports io bound concurrency and cpu bound parallelism using the exact same principle, and letting the kernel/runtime take care of the details.

> are you telling me you never write a mutex or implement locking logic?

Pretty sure I never said that. As for what I prefer to debug: Most Mutex-based synchronicity tasks that come up in practice are easy. And if "complex deadlock" does occur, it's usually pretty clear what resource was locked. Debugging that is just a question of going over all callers that access that resource.

And as mentioned before, all that code is synchronous. So each one of them is easy to reason about.

So yea, all in all, I prefer debugging problems arising from deadlocks over wading through callback-hell. By a huge margin.

Oh, and all that is before we even talk about using CSP as an approach to synchronizing threads, which makes it both harder to mess up, and again easier to reason about.

3np · 2023-07-05T11:20:02

> It didn't start because it is so awesome, it started because JS can't do parallel any other way. That's the long and short of it

If the comment is not conflating concurrency and parallelism as you say, mind expanding on this part?

nvy · 2023-07-05T15:54:18

But concurrency is just "single core parallelism" anyways, so this isn't really germane to the discussion.

JS has neither.

Sinidir · 2023-07-05T20:34:22

Concurrency is not "single core parallelism". Concurrency describes tasks/threads of execution making independent progress of each other. Parallelism describes tasks/threads actually running at the same time.

nvy · 2023-07-06T00:08:56

>Concurrency is not "single core parallelism"

Of course it is. Concurrency gives the impression to the user that parallel processing is being done, even when it's not. That's why my parents old 386 could render a moving mouse cursor and a progress bar at the same time (usually).

Concurrency lets you do things "in parallel" even if you can't actually do them in parallel.

usrbinbash · 2023-07-06T09:38:14

> mind expanding on this part?

Certainly. That part is a sentence written to be short and catchy. It sacrifices precision for reasons of brevity and style. It also doesnt mention either concurrency or parallelism, it just uses the word "parallel".

This is acceptable, because the post goes on to more precise statements later on, quote:

    It works for both i/o bound concurrency and cpu bound parallel computing.

End Quote.

FiberBundle · 2023-07-05T16:58:52

> When waiting for an i/o bound operation, a thread can simply sleep.

I mean if you're fine with blocking I/O then obviously you don't need async, but on the other hand having non-blocking I/O is the whole point of async ^^

mohaine · 2023-07-05T17:43:31

It really just depends on what you mean by non-blocking I/O.

Most node code I see in the wild is just a simple `await loadData()` which doesn't block the main node thread but does block that code flow until the data returns. This is roughly the same as what would happen in a normal blocking multithreaded language other than the extra overhead of a thread. If you don't have enough threads (or they are efficent enough in your language of choice) for this overhead to be an issue then you are adding all this complexity for almost no benefit.

Basically it comes down to if you trust your language of choice's threads more or less than your language of choice's event scheduler. Since Node is fully single threaded there isn't really an option but with other languages, a single thread per worker is much simpler.

In python it is even more opaque which to use as the CPython itself is singled threaded so you are comparing its thread implementation to its event scheduler implementation. For this small win you get to rewrite all your code to new, none-standard apis.

dwaite · 2023-07-06T04:10:40

> It really just depends on what you mean by non-blocking I/O.

> Most node code I see in the wild is just a simple `await loadData()` which doesn't block the main node thread but does block that code flow until the data returns.

Agreed. Higher level languages tend to discourage or outright decide not to expose asynchronous I/O. Instead, they optimize blocking I/O within their own runtime - skipping the higher resource needs for the system schedule and thread representation.

If I am writing a web server in C or C++, I'm likely writing asynchronous I/O directly. I may also decide to use custom memory strategies, such as pooling allocators.

If I write one in classic Java, I'm allocating two threads to represent input and output for each active connection, and hoping the JVM knows how to make that mass of threads efficient. In Go, I'm likely using a lot of goroutines and again hoping the language runtime/standard library figured out how to make that efficient.

Java packages like NIO/Netty and Go packages like gaio are what expose asynchronous programming to the developer.

The footgun is that it is hard to use an asynchronous I/O package when you have a deep dependency tree that may contain blocking code, perhaps in some third party package. This was one of the attractions to server-side javascript; other than a few local filesystem operations, everything sticks to the same concurrency model (even if they may interact with it as callbacks, promises or async/await)

jerf · 2023-07-05T19:31:57

I've seen a lot of people who seem to think all blocking IO completely blocks the entire OS process.

A language + runtime like Go or Erlang doesn't so much have "blocking" or "non-blocking" IO as the terms simply not applying. I see them yielding far more confusion than understanding when people try to come from Node and apply them to such threaded runtimes.

But if you had to force a term on such a system, the better understanding is that everything in a large-number-of-threads language+runtime is non-blocking. Both terms yield incorrect understanding, but that one gets you closer to the truth.

gpderetta · 2023-07-05T09:03:36

> Threads [...] are interfaces for parallel programming

Threads are both for parallelism and for concurrency. Threads have been used for decades on machines without hardware parallelism.

> are you telling me you never write a mutex or implement locking logic

aside from the fact that mutexes vs futures is completely orthogonal to async vs threads, I definitely prefer dealing with mutexes. 99% of mutex usage is trivial and deadlocks are relatively easy to debug. The issue with locks is their performance implicaitons.

ngrilly · 2023-07-05T08:51:20

async/await doesn't entirely remove the need for mutexes and locks. We still need them if we have multiple coroutines using a shared resource across multiple yield points.

pkolaczk · 2023-07-05T09:27:05

> We still need them if we have multiple coroutines using a shared resource across multiple yield points.

We still need them if we have multiple parallel tasks (coroutines spawned non-locally) using a shared resource across multiple yield points.

As long as the accesses to the shared variable are separated in time, sharing is fine.

This is correct code:

        let mut foo = 1;
        async { foo += 1 }.await;
        foo += 1;
        println!("{foo}");

See - a shared variable used across multiple yield points. Another (more useful) example I showed below in another post with `select!`.

gpderetta · 2023-07-05T10:03:55

the equivalent threaded code wouldn't need a mutex either:

   int foo = 1;
   std::thread ([&] { foo+=1; }).join();
   foo+=1;
   std::cout <<foo <<'\n';

(sorry for the C++, I don't speak much rust).

pkolaczk · 2023-07-05T10:11:58

Point taken. What about this pattern (pseudo code, obviously it would require e.g. adding some code for tracking how much data there is in the buffer or breaking the loop on EOF, but it illustrates the point):

   mut buffer: &[u8] = ...;
   loop {  
     select! {
       _ = stream.readable() => stream.read(&mut buffer),
       _ = stream.writable() => stream.write(&mut buffer),
     }
   }

gpderetta · 2023-07-05T10:13:44

One you add enough tracking meta data to to know how much there is in the buffer, you literally have implemented an SPSC queue.

pkolaczk · 2023-07-05T10:27:20

Well, not really, because async/await guarantees I don't have to deal with the problem of producer adding data at the same time as consumer is removing the data in this case. In a proper SPSC queue some degree of synchronization is needed.

gpderetta · 2023-07-05T10:30:20

You stop adding data when the queue is full, you stop popping when it is empty. You need the exact same synchronisation for async, just different primitives.

pkolaczk · 2023-07-05T10:57:32

But that's not synchronization between two concurrent things. I can still reason about queue being full in a sequential way.

   select! {
     _ = channel.readable(), if queue.has_free_space() => read(&mut queue),
     _ = channel.writable(), if queue.has_data() => write(&mut queue),
   }

The point is I can implement `has_free_space` and `has_data` without thinking about concurrency / parallelism / threads. I don't need to even think what happens if in the middle of my "has_free_space" check another thread goes in and adds some data. And I don't need to invoke any costly locks or atomic operations there to ensure proper safety of my queue structure. Just purely sequential logic. Which is way simpler to reason about than any SPSC queue.

gpderetta · 2023-07-05T11:00:22

As I mentioned else thread, if you do not care about parallelism you can pin your threads and use SCHED_FIFO for scheduling and then you do not need any synchronization.

In any case acq/rel is the only thing required here and it is extremely cheap.

edit: in any case we are discussing synchronization and 'has_free_space' 'had_data' are a form of synchronization, we all agree that async and threads have different performance characteristics.

pkolaczk · 2023-07-05T15:15:30

> As I mentioned else thread, if you do not care about parallelism you can pin your threads and use SCHED_FIFO for scheduling and then you do not need any synchronization.

I don't think it is a universal solution. What if I am interested in parallelism as well, only not for the coroutines that operate on the same data? If my app handles 10k connections, I want them to be handled in parallel, as they do not share anything so making them parallel is easy. What is not easy is running stuff concurrently on shared data - that requires some form of synchronization and async/await with event loops is a very elegant solution.

You say that it can be handled with an SPSC queue and it is only one ack/rel. But then add another type of event that can happen concurrently, e.g. a user request to reconfigure the app. Or an inactivity timeout. I can trivially handle those with adding more branches to the `select!`, and my code still stays easy to follow. With threads dedicated to each type of concurrent action and trying to update state of the app directly I imagine this can get hairy pretty quickly.

im3w1l · 2023-07-05T10:43:34

Don't you need some kind of way of telling the compiler you would like barriers here? I think otherwise the helper thread could run on another cpu and the two cpus would operate on their own cached copies of foo. But then again I'm not 100% on how that works.

tylerhou · 2023-07-06T00:58:43

There are barriers for join. But without barriers, the risk is compiler reordering/lift to registers/thread scheduling. The CPU cache would not be the direct cause of any “stale” reads. https://news.ycombinator.com/item?id=36333034

im3w1l · 2023-07-06T11:44:58

Well I knew there were possible issues both from the compiler and the cpu. It seems you are right that the cache is kept coherent, however there is another issue owing to out-of-order execution of cpu instructions. Either way, gpderreta is probably right that thread.join tells the compiler to make sure it's all taken care of correctly.

gpderetta · 2023-07-05T10:49:22

No. All synchronization edges are implied by the thread creation and join. Same as for the async/await example.

tylerhou · 2023-07-06T00:56:05

There is an implicit mutex/barrier/synchronization in the join.

spacechild1 · 2023-07-05T10:37:10

You definitely need a mutex here (or use atomics), otherwise you have a race condition

gpderetta · 2023-07-05T10:47:36

Where exactly? Can you point me to the data race? Consider that the thread constructor call happens-before the thread start and the thread termination happens-before the join call returns.

spacechild1 · 2023-07-05T11:27:44

Ah sorry, I missed that you only spawn a single thread. Mea culpa!

dboreham · 2023-07-05T15:38:39

> process forking (depending on runtime) to get actual parallel programming

If you get into a time machine back to the 1980s, then yes.

orangepurple · 2023-07-05T15:53:19

Web Workers are not parallel - they are only concurrent to the main context: think OS threads. Async JS is akin to using very lightweight simulated threads.

You will not necessarily utilize more CPU cores by spawning additional Web Workers because they are not inherenent parallel. The actual performance of Web Workers depend on how your browser and OS schedules threads.

They are OS threads despite the mountain of misinformation on the Internet about them implying that they are truly parallel. They are not.

sa46 · 2023-07-05T17:22:36

I thought the primary purpose of web workers was that the browser can run the workers in parallel to the main thread. As the spec says:

> [Web workers] allow long tasks to be executed without yielding to keep the page responsive

orangepurple · 2023-07-05T19:06:41

The workers don't block painting and they do not run in a separate process. That's why it's concurrent but not parallel. The web worker does work whenever the main thread is not painting and there is a free time slot. The browser is not painting all the time.

You don't get extra calculation performance with web workers. You just create the illusion of a smooth experience because you don't block painting. It does not complete faster.

sa46 · 2023-07-05T22:31:52

Threads can certainly run in parallel with one another if the OS schedules them on different cores. I did a quick experiment and the main thread and worker threads run in parallel.

https://github.com/jschaf/web-worker-experiment/

> You don't get extra calculation performance with web workers

The primary purpose of web workers is extra calculation performance. From MDN:

> Workers are mainly useful for allowing your code to perform processor-intensive calculations without blocking the user interface thread

orangepurple · 2023-07-05T22:44:57

I should clarify that you can't get extra calculation performance which easily scales with core count due to the gotchas around threading that you mentioned.