async/await doesn't entirely remove the need for mutexes and locks. We still nee...

pkolaczk · 2023-07-05T09:27:05

> We still need them if we have multiple coroutines using a shared resource across multiple yield points.

We still need them if we have multiple parallel tasks (coroutines spawned non-locally) using a shared resource across multiple yield points.

As long as the accesses to the shared variable are separated in time, sharing is fine.

This is correct code:

        let mut foo = 1;
        async { foo += 1 }.await;
        foo += 1;
        println!("{foo}");

See - a shared variable used across multiple yield points. Another (more useful) example I showed below in another post with `select!`.

gpderetta · 2023-07-05T10:03:55

the equivalent threaded code wouldn't need a mutex either:

   int foo = 1;
   std::thread ([&] { foo+=1; }).join();
   foo+=1;
   std::cout <<foo <<'\n';

(sorry for the C++, I don't speak much rust).

pkolaczk · 2023-07-05T10:11:58

Point taken. What about this pattern (pseudo code, obviously it would require e.g. adding some code for tracking how much data there is in the buffer or breaking the loop on EOF, but it illustrates the point):

   mut buffer: &[u8] = ...;
   loop {  
     select! {
       _ = stream.readable() => stream.read(&mut buffer),
       _ = stream.writable() => stream.write(&mut buffer),
     }
   }

gpderetta · 2023-07-05T10:13:44

One you add enough tracking meta data to to know how much there is in the buffer, you literally have implemented an SPSC queue.

pkolaczk · 2023-07-05T10:27:20

Well, not really, because async/await guarantees I don't have to deal with the problem of producer adding data at the same time as consumer is removing the data in this case. In a proper SPSC queue some degree of synchronization is needed.

gpderetta · 2023-07-05T10:30:20

You stop adding data when the queue is full, you stop popping when it is empty. You need the exact same synchronisation for async, just different primitives.

pkolaczk · 2023-07-05T10:57:32

But that's not synchronization between two concurrent things. I can still reason about queue being full in a sequential way.

   select! {
     _ = channel.readable(), if queue.has_free_space() => read(&mut queue),
     _ = channel.writable(), if queue.has_data() => write(&mut queue),
   }

The point is I can implement `has_free_space` and `has_data` without thinking about concurrency / parallelism / threads. I don't need to even think what happens if in the middle of my "has_free_space" check another thread goes in and adds some data. And I don't need to invoke any costly locks or atomic operations there to ensure proper safety of my queue structure. Just purely sequential logic. Which is way simpler to reason about than any SPSC queue.

gpderetta · 2023-07-05T11:00:22

As I mentioned else thread, if you do not care about parallelism you can pin your threads and use SCHED_FIFO for scheduling and then you do not need any synchronization.

In any case acq/rel is the only thing required here and it is extremely cheap.

edit: in any case we are discussing synchronization and 'has_free_space' 'had_data' are a form of synchronization, we all agree that async and threads have different performance characteristics.

pkolaczk · 2023-07-05T15:15:30

> As I mentioned else thread, if you do not care about parallelism you can pin your threads and use SCHED_FIFO for scheduling and then you do not need any synchronization.

I don't think it is a universal solution. What if I am interested in parallelism as well, only not for the coroutines that operate on the same data? If my app handles 10k connections, I want them to be handled in parallel, as they do not share anything so making them parallel is easy. What is not easy is running stuff concurrently on shared data - that requires some form of synchronization and async/await with event loops is a very elegant solution.

You say that it can be handled with an SPSC queue and it is only one ack/rel. But then add another type of event that can happen concurrently, e.g. a user request to reconfigure the app. Or an inactivity timeout. I can trivially handle those with adding more branches to the `select!`, and my code still stays easy to follow. With threads dedicated to each type of concurrent action and trying to update state of the app directly I imagine this can get hairy pretty quickly.

im3w1l · 2023-07-05T10:43:34

Don't you need some kind of way of telling the compiler you would like barriers here? I think otherwise the helper thread could run on another cpu and the two cpus would operate on their own cached copies of foo. But then again I'm not 100% on how that works.

tylerhou · 2023-07-06T00:58:43

There are barriers for join. But without barriers, the risk is compiler reordering/lift to registers/thread scheduling. The CPU cache would not be the direct cause of any “stale” reads. https://news.ycombinator.com/item?id=36333034

im3w1l · 2023-07-06T11:44:58

Well I knew there were possible issues both from the compiler and the cpu. It seems you are right that the cache is kept coherent, however there is another issue owing to out-of-order execution of cpu instructions. Either way, gpderreta is probably right that thread.join tells the compiler to make sure it's all taken care of correctly.

gpderetta · 2023-07-05T10:49:22

No. All synchronization edges are implied by the thread creation and join. Same as for the async/await example.

tylerhou · 2023-07-06T00:56:05

There is an implicit mutex/barrier/synchronization in the join.

spacechild1 · 2023-07-05T10:37:10

You definitely need a mutex here (or use atomics), otherwise you have a race condition

gpderetta · 2023-07-05T10:47:36

Where exactly? Can you point me to the data race? Consider that the thread constructor call happens-before the thread start and the thread termination happens-before the join call returns.

spacechild1 · 2023-07-05T11:27:44

Ah sorry, I missed that you only spawn a single thread. Mea culpa!