Hacker News new | past | comments | ask | show | jobs | submit login
Some thoughts on asynchronous Python API design in a post-async/await world (vorpus.org)
301 points by piotrjurkiewicz on Nov 7, 2016 | hide | past | favorite | 114 comments

The idea espoused in this blog post, that

> if you have N logical threads concurrently executing a routine with Y yield points, then there are NY possible execution orders that you have to hold in your head

is actively harmful to software maintainability. Concurrency problems don't disappear when you make your yield points explicit.

Look: in traditional multi-threaded programs, we protect shared data using locks. If you avoid explicit locks and instead rely on complete knowledge of all yield points (i.e., all possible execution orders) to ensure that data races do not happen, then you've just created a ticking time-bomb: as soon as you add a new yield point, you invalidate your safety assumptions.

Traditional lock-based preemptive multi-threaded code isn't susceptible to this problem: it already embeds maximally pessimistic assumptions about execution order, so adding a new preemption point cannot hurt anything.

Of course, you can use mutexes with explicit yield points too, but nobody does: the perception is that cooperative multitasking (or promises or whatever) frees you from having to worry about all that hard, nasty multi-threaded stuff you hated in your CS classes. But you haven't really escaped. Those dining philosophers are still there, and now they're angry.

The article claims that yield-based programming is easier because the fewer the total number of yield points, the less mental state a programmer needs to maintain. I don't think this argument is correct: in lock-based programming, we need to keep _zero_ preemption points in mind, because we assume every instruction is a yield point. Instead of thinking about NY program interleavings, we think about how many locks we hold. I bet we have fewer locks than you have yields.

To put it another way, the composition properties of locks are much saner than the composition properties of safety-through-controlling-yield.

I believe that we got multithreaded programming basically right a long time ago, and that improvement now rests on approaches like reducing mutable shared state, automated thread-safety analysis, and software transactional memory. Encouraging developers to sprinkle "async" and "await" everywhere is a step backward in performance, readability, and robustness.

It's not clear what you're suggesting as an alternative. My understanding is that you're suggesting thread-per-request, which has many known flaws. There are three approaches to serving requests:

1. Thread-per-request. This is a simple model. You have a fixed-size thread pool of size N, and once you hit that limit, you can't serve anymore requests. Thread-per-request has several sources of overhead, which is why people recommend against it: thread limits, per-thread stack memory usage, and context switching.

2. Coroutine style handling with cooperative scheduling at synchronization points (locks, I/O). This is how Go handles requests.

3. Asynchronous request handling. You still have a fixed-size thread pool handling requests, but you no longer limit the number of simultaneous requests with the size of that thread pool. There are several different styles of async request handling: callbacks, async/await, and futures.

#2 and #3 are more common these days because they don't suffer from the many drawbacks of the thread-per-request model, although both suffer from some understandability issues.

Those options aren't as distinct as you might imagine. Would calling it fiber-per-request make you happy?

(By the way: most of the time, a plain-old-boring thread-per-request is just fine, because most of the time, you're not writing high-scale software. If you have at most two dozen concurrent tasks, you're wasting your time worrying about the overhead of plain old pthread_t.)

I'm using a much more expansive definition of "thread" than you are. Sure, in the right situation, maybe M:N threading, or full green threads, or whatever is the right implementation strategy. There's no reason that green threading has to involve the use of explicit "async" and "await" keywords, and it's these keywords that I consider silly.

(I agree that thread-per-request works just fine in the majority of cases, but it's still worthwhile to write about the cases where it doesn't work.)

Responding to your original post: you argue that async/await intends to solve the problem of data races. That's not why people use it, nor does it tackle that problem at all (you still need locks around shared data).

It only tries to solve the issue of highly-concurrent servers, where requests are bound by some resource that a request-handling threads have to wait for the result of (typically I/O).

Coroutines/fibers are not an alternative to async servers, because they need primitives that are either baked into the language or the OS itself to work well.

Coroutines/fibers are completely orthogonal to async anything. The OP is arguing against poor-man coroutines, aka stackless coroutines aka top-level yield only, which are significantly less expressive and composable than proper stackfull coroutines (i.e. first class one shot continuations).

An alleged benefit of stackless coroutines is that yield point are explicit, so you know when your state can change. The OP is arguing that this is not really a benefit because it yield to fragile code. I happen to strongly agree.

Green threads / coroutines / fibers are isomorphic with async keyword transparently implemented as a continuation passing style transform, which is how async callbacks usually work. Actual CPU-style stacks in a green thread scenario are nested closure activation records in an explicit continuation passing style scenario, and are implicit closure activation records (but look like stacks) when using an 'async' compiler-implemented CPS.

Properly composed awaits (where each function entered is entered via an await) build a linked list of activation records in the continuations as they drill down. This linked list is the same as the stack (i.e. serves the same purpose and contains the same data in slightly different layout) in a green threads scenario.

What makes all these things different is how much they expose the underlying mechanics, and the metaphors they use in that exposition. But they're not orthogonal.

(If you meant 'async' as in async IO explicitly, rather than the async / await keyword with CPS transform as implemented in C#, Python, Javascript, etc., then apologies.)

I do mean async as in generic async IO.

As you said, you can of course recover stackful behaviour by using yield/await/async/wathever at every level of the call stack, but in addition to being a performance pitfall (you are in practice heap allocating each frame separately and yield is now O(N): your iterpreter/compiler/jit will need to work hard to remove the abstraction overhead), it leads to the green/red function problem.

Please correct me if I'm wrong, but doesn't asyncio in the form of async/await (or any other ways to explicitly denote context switches) solve the problem of data races in that per-thread data structures can be operated on atomically by different coroutines? My understanding is that unless data structures are shared with another thread, you don't usually need locks for shared data.

I think that the biggest argument against it is code changes. Think about a code change that adds an additional yield point without proper locking.

Has any language tackled this with lazy locking? i.e. lock only on yield. Maybe this could even be done in compile time

async and threads are fundamentally different mechanisms. green threads (async) are scheduled by the runtime, threads are scheduled by the OS.

In CPython threads can be (theoretically) switched at every bytecode instruction. Since calls into extensions / the interpreter are a single instructions, many data structure updates (like dict[a] = b, list.append) will appear atomic from Python.

That being said it is rather rare to have multiple threads run an event loop and process requests in Python. If threads and async are combined in the same process in Python, then it's usually only one event loop thread, and a thread pool for background activity. Usually these will be synchronized through async (eg. tornado.concurrent.run_on_executor) -- but that has nothing to do with context switches.

Edit: Reread your post. I may have slightly missed the point :)

Yes. Often one will find/design that there is no shared state, or that shared state is modified completely between yield points, so no locks between coroutines needed.

In Java, NIO is often slower than threads. And am saying this as somebody who used >60k threads on 256 core machines vs NIO on the same for highly available transacted system.

Why are they silly though? Don't you need them to specify that an operation is in fact async and handle it accordingly? Or in your solution (something about locking?) is that not necessary because everything is safe? How does that work out though? How do I say "No computer, wait for this before we do that"?

Don't be focused on "requests". Requests (where most people mean HTTP requests) are one layer where you need concurrency, but in principal you need it on multiple layers.

E.g. at first you have a server that accepts multiple connections and each must be handled -> Thread per connection or one thread for all connections? If you go for threads you might even need multiples, e.g. a reader thread, a writer thread which processes a write queue and a third one which maintains the state for the connection and coordinates reads and writes.

Then on a higher layer you might have multiple streams per connection (e.g. in HTTP/2), where you again have to decide how these should be represented.

Depending on the protocol and application there might be even more or other layers that need concurrency and synchronization.

But the general approaches that you mention do still apply here: You can either use a thread for each concurrent entity and using blocking operations. Or you can multiplex multiple concurrent entities on a single thread with async operations and callbacks. Coroutines are a mix which provide an API like the first approach with an implementation that looks more like the second approach.

> You have a fixed-size thread pool of size N, and once you hit that limit, you can't serve anymore requests.

If your application acts as a stateless proxy between client machines and your persistence layer, can't you just spin up another instance and load balance them at any time? It's not the most efficient solution at scale, but lots of people use this strategy.

> There are three approaches to serving requests:

Lets not forget about forking servers. The kind where each request forks.

I think the general idea is that if you make your yield points explicit, you see them yourself, and if you generally try for 100% unit test coverage, you end up nudging yourself towards fewer yield points, a simpler state machine, and better karma overall.

The complexity you see affects yourself more than the complexity you don't.

I rather agree.

FWIW my friend Abhijit Menon-Sen wrote a blog post on the matter last year, about some code with excellent test coverage and explicit yield points: http://toroid.org/callback-heaven

Your saying to say that async will still have race conditions, but I disagree with that premise. On async, context switching onlykhappens on an "await", which means race conditions won't happen. Race conditions typically happen when you have a shared global state and you read and write to it in two operations. For example a counter that you want to increment. In async its very rare you have a shared global state, but even if you do, reading and modifying the state is naturally right next to each other, not on opposite sides of the await.

"On async, context switching onlykhappens on an "await", which means race conditions won't happen. Race conditions typically happen when you have a shared global state and you read and write to it in two operations."

That may be when the "typically" happen, but you can still get race conditions where you have two tasks that can run next, and you accidentally write an assumption about which will happen into your code when there is no such assumption in the scheduler. You certainly will get fewer of these with async/await than with pure event-handling-style code, because async/await carries more information about proper ordering of code, but yes, you can still get things that are correctly described as "race conditions".

>That may be when the "typically" happen, but you can still get race conditions where you have two tasks that can run next, and you accidentally write an assumption about which will happen into your code when there is no such assumption in the scheduler.

What are some async coding styles where this is prevented? I feel like making incorrect order-of-execution/arrival time assumptions could be an issue for any kind of async code.

I'm not sure 100% prevention is an option. Working in a language with very strong mutation control, either via immutability in Haskell or something very controlling like Rust, probably eliminates most of them, but I'm pretty sure in both cases it would still be possible to write a straight-up logic error where the programmer assumed a guarantee that did not exist.

Still, I recommend that approach personally. The best of all worlds is to write threaded code in languages that structurally eliminate the majority of issues threads have. The downside is that the "languages that structurally tame threads" is a pretty short list, even now; Rust, Haskell, Erlang/Elixir, see the probably-inevitable replies for a couple of others. You can widen the field substantially by working with libraries in languages that encourage that style of programming, but lacking the language support means you're back to exercising a lot of programmer discipline to ensure you stay in the acceptable subset of the language. This is a continuum based on the support available in the languages and the ability to get libraries, but this gets you to the Go language, Java/Scala's Akka, and some stuff like that. I'm still not really willing to put anything Python has into even the latter list, though; the dynamic languages are just so easy to blow your foot off.

(I still like "green threads" personally, because I write network servers where tens of thousands of simultaneous connections is a realistic concern and the price is a small one to pay. I tend to agree with the Rust community that people are generally grotesquely overestimating the run-time costs of threads nowadays. If you're in a language like Rust, and you're working in a program where you can be pretty confident you're never going to spawn 100,000 threads or face unbounded thread creation for some reason, go for it. Threads are not magically bad; they're bad for specific reasons, and if you address those specific reasons, they cease being ravening Elder Gods who eat your soul, and become just another professional-grade tool... yeah, it has some sharp points on it, but it does the job really well.)

Have you seen the tokio stuff in Rust? It's also an interesting take on the "spin up a zillion threads" problem that's not exactly green threads.

> Look: in traditional multi-threaded programs, we protect shared data using locks. If you avoid explicit locks and instead rely on complete knowledge of all yield points (i.e., all possible execution orders) to ensure that data races do not happen, then you've just created a ticking time-bomb: as soon as you add a new yield point, you invalidate your safety assumptions. > Traditional lock-based preemptive multi-threaded code isn't susceptible to this problem: it already embeds maximally pessimistic assumptions about execution order, so adding a new preemption point cannot hurt anything.

You get an equal and opposite problem: whenever you add one more lock, you invalidate your liveness assumptions.

> The article claims that yield-based programming is easier because the fewer the total number of yield points, the less mental state a programmer needs to maintain. I don't think this argument is correct: in lock-based programming, we need to keep _zero_ preemption points in mind, because we assume every instruction is a yield point. Instead of thinking about NY program interleavings, we think about how many locks we hold. I bet we have fewer locks than you have yields.

I'll take that bet. You really don't have to yield very often - only when making a network request, and perhaps not even for that in the case of a fast local network. Whereas you have to lock every piece of state that you have.

> Whereas you have to lock every piece of state that you have.

You need to lock every piece of shared state you have. Where "shared" means stuff that many threads must communicate among themselves. One tends to keep the number of that kind of state low, really low. When zero is not possible, the most common number by a wide margin is one¹.

If you have more than 1, they are normally completely independent pieces of state that will not be used at the same time. If you have more than 1, and they are not independent, the code is either the result of at least one PHD thesis, or it does not work (or, often, both).

I bet you do network requests more than once on your code.

1 - The size of the shared state does not matter, so it's often one really big state.

Every piece of shared state. In the case of something like a web server, you only need one lock: when the connection is being accepted and handed off to a worker thread. How many reads and writes does your web server perform?

In his talk about threads (https://www.youtube.com/watch?v=Bv25Dwe84g0) Raymond Hettinger made a point: when you have a codebase with many locks the result of composition is often a sequential program.

> as soon as you add a new yield point, you invalidate your safety assumptions.

While true, locks aren't free from this problem. They have the inverse. If someone adds code that accesses a data structure that should be protected by a lock and they forget to add the lock, you also lose all of your safety assumptions.

I find it surprising that noone here comments on the actual topic of the blog post: Namely that the internal implementation of asyncio is opaque at best and this unfortunately propagates upwards to the public API. Personally, I have taken a look at its source code a few times as well (to understand what my code was doing because the docs were lacking details) and I remember that the callback hell paired with those additional user-space buffers the author mentions really made it a major PITA to reason about. Now, why should anyone worry about asyncio's internals? Heck, if everything was working, I wouldn't mind, either. However, as pointed out in the blog, there are quite a few edge cases where it isn't. Plus, the documentation traditionally doesn't do a particularly good job at explaining things. Or is it just the fact that the API is quite confusing sometimes that has caused me to take a look at the source code more often than I care to admit? (Compare https://news.ycombinator.com/item?id=12829759) Whatever it is, the fact is that asyncio's internals do matter unfortunately.

…which is why I was happy to hear that not all hope is lost and that someone created an alternative. Now, I haven't taken a look at curio yet, so maybe I'm a bit quick to judge, but I already found it very refreshing that spending not even a minute to read the documentation already left me with a good idea of how it works and how I can use it. Kudos to the author(s), I will definitely give it a try!

I feel like I'm too dumb to understand any of this. And I've been writing python for 12 years.

Just give me greenlets or whatever and let me run synchronous code concurrently.

  async def proxy(dest_host, dest_port, main_task, source_sock, addr):
    await main_task.cancel()
    dest_sock = await curio.open_connection(dest_host, dest_port)
    async with dest_sock:
      await copy_all(source_sock, dest_sock)
Are you kidding me? Simplified that is

  async def func():
    await f()
    dest_sock = await f()
    async with dest_sock:
      await f()
Every other token is async or await. No thank you.

Are you saying using greenlets are any simpler than this? IMO that mechanism looks way more complex compared to this. And will probably be less efficient.

The point is this: threads are still expensive in bulk (the CPU has to shuffle a lot of data every time you switch). So all kernels have mechanisms to support parallel IO operations. An async library will use the best available kernel mechanism for IO; epoll on Linux, kqueue on BSDs, maybe IO Completion Ports on Windows (not sure). Turns out, doing that requires some help from the language itself or the code turns into a pyramidal mess. Async keyword addresses the readability aspect of code.


a) It's more complex than synchronous code

b) But it solves the performance problem without too much cognitive overhead (once you get used to it).

> threads are still expensive in bulk

They don't have to be. First of all, even ordinary threads are more efficient than you might think. On a really awful low-end Android 4.1 device, I can pthread_create and pthread_join over 5,000 threads per second. On a real computer, my X1 Carbon Gen4, I can create and join over 110,000 threads per second. (And keep in mind that each create-join pair also forces two full context switches.)

For most applications, performance of regular threads is perfectly adequate. In these environments, the maintainability and debuggability advantages of using plain old boring threads makes it really hard to justify using something exotic.

But suppose you do have big performance requirements: you can still use normal-looking threaded code. There's a difference between how we represent threads in source code and how we implement them. It's possible to provide green, userspace-switched threads without requiring "await" and "async" keywords everywhere. GNU Pth did it a long time ago, and there are lots of other fibers implementations.

> the CPU has to shuffle a lot of data every time you switch

Any green-threaded system (with or without explicit preemption points) also does context switches! Such a system maintains in user space a queue of things to work on: as the system switches from one of these work items to another, it's switching contexts! You have the same kind of register reloading and cache coldness problems that switching thread contexts has. There's no particular reason that you can do it much better than the kernel can do it, especially since switching threads in the same address space is pretty efficient.

The problem with all green thread implementations that I know of is that they're language and/or framework-specific. So the moment you start using them, you get the same set of problems as using setjmp/longjmp in C across the boundaries of foreign code - it either just blows up spectacularly, or at the very least violates invariants because the interleaving code is not aware that someone's pulling the rug from under it.

This can only be solved by standardizing a fiber API and (per platform) ABI, and by forcing all libraries in the ecosystem to be aware of fibers if their behavior differs with threads in any way (e.g. if TLS and FLS are distinct).

Callbacks (and hence promises), on the other hand, work with what we already have, and are trivially passed across component boundaries as a simple function pointer + context pointer, or some suitable equivalent expressible in C FFI. For example, I can take an asynchronous WinRT API (which returns a future-like COM object), and wrap it in a Python library that returns awaitable futures; with neither WinRT being aware of the specifics of Python async, nor with Python aware of how WinRT callbacks are implemented under the hood. On the other hand, if WinRT used Win32 fibers for asynchrony, Python would have to be aware of them as well.

I expect you can also use a callback that switches greenlets, or one passes the values it got to lua's coroutine.resume.

How can it switch green threads without breaking any foreign code currently on the stack? Consider what happens when said code holds an OS mutex, for example.

The only way I see this working is if your green threads roll their own stack on the heap, and switch that, without touching the OS stack. But then how is the result fundamentally different from promise chains? Their callbacks and captured state essentially form that very same green stack.

To start a fiber, you allocate some memory, set RSP to the end of that memory, set your other registers to some arbitrary initial state, and jump to your fiber routine. To switch fibers, you set RSP to some other block of memory, restore your registers, and set PC to whatever it was when you last switched away from that fiber. There's nothing magical, and it works with almost all existing code. If you hold a mutex and switch to a different fiber, the mutex stays held. How could it be otherwise?

I was thinking of a situation where thread-aware but not fiber-aware code uses mutex to synchronize with itself, which breaks with fibers because they reuse the same thread, and the mutex is bound to that thread (so if another fiber tries to acquire that mutex, it's told that it already has it, and proceeds to stomp over shared data with impunity).

But upon further consideration, I realize that in this narrow scenario - where fibers are used in conjunction with callback-based APIs - this shouldn't apply, because you can't synchronize concurrent callback chains with plain mutexes, either.

Having said all that, are there any actual implementations that seamlessly marry fibers with callbacks? I don't recall seeing any real world code that pulled that off. Which seems to imply that there are other problems here.

Of note is that CLR tried to support fibers, and found it to be something that was actually fairly expensive. By extension, this also applies to any code running on top of that VM:

"If you call into managed code on a thread that was converted to a fiber, and then later switch fibers without involvement w/ the CLR, things will break badly. Our stack walks and exception propagation will rely on the wrong fiber’s stack, the GC will fail to find roots for stacks that aren’t live on threads, among many, many other things." (http://joeduffyblog.com/2006/11/09/fibers-and-the-clr/)

GC is a sticking point here, it seems - clearly it needs to be fiber-aware to properly handle roots in switched-out fibers.

That all sounds correct to me. I'm not familiar with greenlet internals, but lua's stacks live on the heap and the whole situation ends up being similar to promise chains in terms of where your state is at runtime.

Short replies coz on phone.

1. Async await is almost similar to normal looking threaded code. Just add await before a normal looking call.

2. A language could have chosen to make it "exactly the same" by auto inserting awaits, but then you don't get to say when you don't actually want to wait. Many times you don't.

3. I agree native threads are cheap. But you still have a) thread stacks and additional control structures, b) wouldn't you have to deal with things like processor affinity? I mean, either you/lib or the kernel. And the kernel already does it for you.

Excellent discussion, and your points are all very much spot-on. I just wanted to add this re: Windows and fibers/threads because it is very much relevant to the conversation:


I don't believe you. Show me your code. I think you just completely made your numbers up.


> On a real computer, my X1 Carbon Gen4, I can create and join over 110,000 threads per second.

is that C, or python's multithreading?

> And will probably be less efficient.

they're not. gevent (and threads) are way faster than explicit asyncio, as all of asyncio's keywords / yields each have their own overhead. Here's my benches (disclaimer: for the "yield from" version of asyncio). http://techspot.zzzeek.org/2015/02/15/asynchronous-python-an...

Is that still true? Some uvloop benchmarking has shown it to be equivalent to gevent when using streams: https://magic.io/blog/uvloop-blazing-fast-python-networking/ . Plus Python 3.6 has a bunch of optimizations for asyncio where all of these numbers are going to have to be re-evaluated.

not sure. I'm hoping the more native support for asyncio in 3.6 has improved matters. Certainly though, it's never going to be faster than gevent. Or threads for most tasks.

People keep inventing funky new ways of representing threads.

With or without async, we're writing threads. (Promise chains are _also_ threads, very awkwardly spelled.) Really, we're arguing over whether we want our preemption points to be explicit or implicit. I prefer implicit myself, because the implicit style leads to much clearer code.

I understand how the JavaScript people might be excited that they can finally have threads, even if ugly ones, but there's no reason to get the rest of the world to switch to explicit-preemption-point threads.

> Really, we're arguing over whether we want our preemption points to be explicit or implicit.

It's not even that!

It's not like you actually get to decide where to await in async/await code - you have to await on any call that is async, if you expect to get the result.

Now, if the underlying framework uses hot tasks - meaning the async operation starts executing as soon as it's invoked, and not when the returned task is awaited (as in e.g. .NET/C#) - you can choose to omit async to, effectively, fork your async "thread". So NOT doing await on something is just a fork operation. It's the reverse from regular sync code, where thread forks are explicit, and sequential flow on a single thread is implicit.

One other case where you wouldn't await is when you need to await on a combination of any or all tasks at the same time (i.e., wait until all tasks complete, or wait until one of the tasks completes). But the first one is equivalent to a thread join in sync code, and the second to a condition variable. So, again, you get a case where something more explicit in sync code is more implicit in async code, and vice versa.

Now note that all this is solely about syntax! You can take the C# compiler, and change it so that every awaitable statement is automatically awaited, except when the newly introduced operator "taskof" is applied, in which case you get the raw future instead. Voila! Cooperative future-based multitasking with implicit preemption points. Yet it works exactly the same, and will even be able to call into and be called from any existing C# code compiled by the original compiler.

I suspect that this will be the next step after async/await, once enough people notice that the default (non-await) behavior is something that they need very rarely, and figure out that it's better to rather change the syntax so that the much more common thing (await) is implicit. Similar to how the use of =/== for assignment and comparison has won out over :=/= in imperative languages.

After watching (Curio creator) David Beazley's presentation from earlier this year on async/await[0], I feel I finally get it. Recommended watching.

[0] https://www.youtube.com/watch?v=E-1Y4kSsAFc

The amount of times Beazley says "insane", "nightmare", etc. in this talk makes me wary.

Welcome, the wonderful world of writing anything in Javascript.

Imagine the same thing using Promises:

   def proxy(dest_host, dest_port, main_task, source_sock, addr):
          .then(lamdba _: curio.open_connection(dest_host, dest_port))\
          .then(lambda dest_sock: copy_all(source_sock, dest_sock)

It's statements like this that have kept me from ever learning _anything_ in Javascript...

Well it is only useful when you really rely on asynchronous programming. Nobody states that every piece of code is supposed to be written like this. You should only use async/await when a thorough performance analysis shows that it is your bottleneck.

Think of handling a web request, where you have to do parallel I/O requests to subsystems like a database, a webservice, redis, and so on. I think async/await gives us a nice standard way of describing "hit me back once X is done".

I don't think most code will be this dense with await.

And Rust's developers think that 'unsafe' in third-party crates will be well-vetted and therefore actually "safe", most C developers don't think somebody will incorrectly free or screw with memory they've allocated and passed back to the caller, most C++ developers don't think anybody will (ab)use 'const_cast', and so on.

A lot of terrible bugs in code is caused by people making assumptions such as yours.

He didn't make an 'assumption' like those ones you described.

This is an artificial example of a function copying unmodified data from source to destination. There are async and await tokens in every line, because every line is doing an IO operation. I a real world app this data would be somehow processed in between, using synchronous function calls, therefore without async/await tokens.

>most C++ developers don't think anybody will (ab)use 'const_cast', and so on.

These constructs are opt-in. If you don't want them in your codebase you can find their location by a simple text based search and remove them. In C everything is "unsafe". You can't opt-out.

I don't think anyone is saying that you'll never see crates with bad usage of unsafe. What you will hear them say is that by having the ability to share code, since more people are looking at and using the same codebase, it's more likely issues will be found, and that when they're fixed, they help everyone using the package, rather than just those who found it.

I've been writing async/await code for the past 2.5 years, and no, it actually is typically this dense, if you count tokens (real identifiers are obviously longer, so it's not as bad character-wise, and awaits are not quite as prominent).

Interesting, thanks for sharing that insight. Do you feel that your work is representative, or is there some reason that the code you write would have a higher than usual density? It seems like a lot of code, which is just business logic, would not use these constructs other than on the io barriers.

Don't forget that async is "viral": if you call an async function and need to do something with its result, the calling function must in turn be async for await to work inside it. So the moment you start doing some async I/O at the bottom of some call stack, the entire stack becomes "infected" by async, and needs have awaits inserted in every frame.

And it so happens that I work on the kind of products where a lot of useful work revolves around I/O: IDEs.

It doesn't have to be viral. C#'s tasks have a blocking Wait[0] method which allows you to use an asynchronous Task without changing the signature of your synchronous function. The tradeoff is more verbosity.

[0] - https://msdn.microsoft.com/en-us/library/dd235635(v=vs.110)....

As noted in another comment, Wait is extremely prone to deadlocks - if you happen to Wait on a thread that's running the event loop, then no task that's scheduled on that loop can execute until Wait returns. So if you're waiting on such a task, or on a task that depends (no matter how indirectly) on such a task, you get a deadlock.

Now, if you're writing a library, you pretty much cannot assume anything about the event loop and what's scheduled on it. If your library invokes a callback at any point, all bets are off, because you don't know which tasks that callback may have scheduled, or which tasks it's waiting on. Similarly, if you provide a callback to a library, you also don't know which tasks you might block by waiting.

So, in effect, the only safe place to wait is on a background thread that was specifically spawned for that purpose, and that is guaranteed to have no event loop running on it.

That's not the only tradeoff. It completely negates the benefit of asynchrony and can be a source of deadlocks

I still use gevent any time I need async code. It's also easy to tack onto existing projects with its monkey patching. I've never seen a need to migrate away from gevent, even if it's inarguably a language hack.

It's reasonable compared to the old way of having three layers of callbacks in Node.js.

The main use case for all this async stuff is handling a huge number of simultaneous stateful network connections. At least, that was what Twisted was used for. Are there other use cases for this sort of thing that justify all the complexity that comes with it?

It lets you easily write responsive UI apps without worrying about things like threads - you treat your app as a single conceptual thread, and use async IO operations on it by awaiting them. Since in practice every operation callback is a new item posted onto the event loop, this doesn't block said loop at any point, and UI remains responsive. So the developer can think in simple terms like "if this button is clicked, [await] download this file, then update this label and [await] send this email", instead of background worker threads with condition variables etc.

In particular, WinRT heavily promotes this approach for UWP apps.

My problem with this justification is that these problems have been solved for a long time with simple message passing. In Win32, you just post a message to a window handle from the background thread to notify the UI of status updates, etc. Yes, you do need to worry about shared state/locks if that "message" includes more than a simple integer. But, these are also solved problems and rarely require more exotic lock-less queues, stacks, etc. for the majority of applications that use these types of architectures for UI background processing because the performance implications are inconsequential. Using a shared stack that uses a simple critical section will work fine for managing the messages, especially since Windows now has critical sections that can use spin locks to help minimize context switches.

It's a solved problem in a sense that yes, you can do it that way. But it's also more conceptually complicated, and much easier to get it wrong, which is evident by the fact that so many desktop apps on Windows still lock up occasionally. Not so with UWP apps.

there's a very popular argument often made that if you dont use an explicit async-on-evented-IO approach for concurrency in general, and instead use threads or even an event-IO approach that conceals the IO wait just like a thread scheduler, then your program is impossible to reason about and will have bugs forever. Glyphs "Unyielding" at https://glyph.twistedmatrix.com/2014/02/unyielding.html is one well known blog post that describes this concept. For a lot of developers, the original reasons for event-based IO (e.g. many stateful network connections) is lost. The popularity of Javascript, which in recent years has exploded as a server side platform as well, is a key driver in this trend.

Part of the problem is that object-oriented programming is now out of fashion. If objects only allow one active thread inside the object at a time, you have a conceptual model of how to deal with concurrency. Rust takes this route, and Java has "synchronized". It's done formally, with object invariants, in Spec#. Objects in C++ are often used this way in multi-thread programs.

If you don't have some organized way of managing concurrency, you're going to have problems. Without OOP, what? "Critical sections" lock relative to the code, not the data. "Which lock covers what data?" is a big issue, and the cause of many race conditions.

(The dislike of OOP seems to stem from the problems of getting objects into and out of databases in web services. One anti-OOP article suggests stored procedures as an alternative. Many database-oriented programs effectively use the database as their concurrency management tool. Nothing wrong with that, but it doesn't help if your problem isn't database driven.)

Python has the threading model of C - no language constructs for threads. It's all done in libraries. There's no protection against race conditions in user code. The underlying memory model is protected, by making operations that could break the memory model atomic, but that's all. CPython also has some major thread performance problems due to the Global Interpreter Lock. Having more CPUs doesn't speed things up; it makes programs slower, due to lock contention inefficiencies. So the use of real threads is discouraged in Python.

There's a suggested workaround with the "multiprocessing" module. This creates ultra-heavyweight threads, with a process for each thread, and talks to them with inefficient message passing. It's used mostly to run other programs from Python programs, and doesn't scale well.

So Python needed something to be competitive. There are armies of Javascript programmers with no experience in locking, but familiarity with a callback model. This seems to be the source of the push to put it in Python. Like many language retrofits, it's painful.

Does this imply that the major libraries will all have to be overhauled to make them async-compatible?

> Does this imply that the major libraries will all have to be overhauled to make them async-compatible?

well, "have to" implies that the community accepts this system as the One True Way to program. Which is why I like to point out that this is unwarranted (but yes, because the explicit async model is what I like to call "viral", in that anything that calls async IO must itself be async, so must the caller of that method be async, and turtles all the way out, it means an enormous amount of code has to be thrown out and written in the explicit async style which also adds significant function call / generator overhead to everything - it's basically a disaster).

It's very interesting that you refer to database driven programming as the reason OOP is out of fashion, since IMO one of the biggest misconceptions about async programming is that it is at all appropriate for communciation with a locally available relational database. I wrote in depth on this topic here: http://techspot.zzzeek.org/2015/02/15/asynchronous-python-an... with the goal of the post being, this is the one time I'm going to have to talk about async :)

Are there any languages that have really nailed this? I've used gevent, eventlet, (both python), promises, callbacks (node) and none of them come close to being as productive as synchronous code.

I'd like to try out Akka and Elixer in the future.

Erlang (and by extension Elixir and LFE) has "nailed" it by making the actor pattern first class. Go's channels are great, but Go itself is quite low level. Also you should checkout Clojure's core.async to see what improved channel constructs on top of a high level, lock-free, multithreaded language core looks like.

Part of the problem with Python ecosystem is the insular mind set of its proponents. Python fanboys have no interest in going and seeing whats on the other side. So the platform has become a bit of an echo chamber with Pythonistas declaring their clunky approaches the industry best.

You can see this by looking at how little love a CSP solution for python gets [https://github.com/futurecore/python-csp] verses the enormous buy-in it's more popular frameworks receive.

core.async is using locks under the hood - it's just hiding that from you as an implementation detail.

How is it possible then that core.async works on javascript platform, a platform that has no mutexes?

Maybe there is a lock to implement the thread macro (clojure only), but then that uses native threads. How would you propose to handle access to channels between native threads without locks?

As far as I know there is no locking performed in asynchronous code implemented using the go macro. The go macro is a macro that turns your code inside out into a state machine, is it not? Each <! and >! point becomes and entry/exit into that state machine. There are no locks here because the go macro can essentially "rewrite" your source code for you and there is only a single thread of execution through the interconnected state machines.

There are obviously platform-specific implementations. In Clojure, core.async utilizes locks, and the original comment wasn't referring to ClojureScript.

I like to tell people that the killer app for Haskell is writing IO bound, asynchronous code. The secret weapon is do-notation, which lets you write code as if it were sequential, but have it desugar into what is (essentially) a series of chained callbacks.

I like to point at Facebook's use of Haskell as a good example of being successful in this space http://community.haskell.org/~simonmar/papers/haxl-icfp14.pd... It would be disingenuous to suggest that Haskell is good in all situations, but if there was one place where it should be used, this it.

Haskell is good for lots of things, but I don't see it being particularly powerful in this application. The IO monad and do-notation let you write sequential code. So does Python.


There are two responses here. The first is that in any language which has any sort of threading model at all, sequential code in multiple threads obviates the need for callbacks. If you block, fine; it's just a single thread, it's not obstructing the handling of other threads. Why don't Python and JavaScript have multithreading? Well, because programming with unrestricted concurrency and mutable state is really difficult. But there are ways to solve this problem, Haskell's purity being one among several.

The second is that do-notation is distinct from the IO monad. Even if Haskell didn't have green threads in the runtime, I could still write an async/callback library that looked just as natural as sequential code. Why? It has nothing to do with the IO monad: it has to do with the fact that "do x <- e; m" desugars to, in JavaScript notation, bind(e, function(x) { m }); it's been "callbackified automatically".

> could still write an async/callback library that looked just as natural as sequential code

You can do that in any language with AST transforms powerful enough to transform code into CPS. That functions are "automatically callbackified" is an implementation detail and not one particularly germane to high-level code.

well that's kind of the point isn't it? That Python doesn't offer this, and that Haskell is one of the few production-ready tools that offer this in a clean way?

Who cares how it's implemented? Python lets you write straight-line code that does more than one IO-bound thing at a time. So does Haskell. That one is using a CPS transform under the hood and the other stack-switching via OS threads is irrelevant.

it's not exactly the same. Python lets you write straight-line code, but you still have to be explicit about the sync/async nature of each call. You can abstract this away in Haskell, thanks to some interesting tooling around `do` notation. Some people prefer the explicit nature though.

Haskell does not even allow one to write sequential code.

The IO monad enforces sequence on IO operations, and when you fork it, you get a new, independent sequence of IO operations to play with, not a new thread.

Haskell is really great for concurrent programming. Not only because of green threads (the mainstream concept that is nearest to the IO monad), but because of the "everything is immutable" rule, and very powerful primitives available.

> The IO monad and do-notation let you write sequential code

It's actually much subtler than that due to lazy evaluation. The 'palindrome' section of http://learnyouahaskell.com/input-and-output illustrates this.

You basically write code that says: read all input, split lines, do something per line, join result into multiline string, print.

If you did that in Python it would block until EOF on the input and then print the result. In Haskell, due to lazy evaluation you will interactively get the response on a per-line basis.

What Haskell really gives you is a declarative language to describe what you want to happen to some data and then the lazy evaluation sorts everything out. Underneath, it's effectively a load of callbacks processing data streams.

Yeah but that's lazy IO, and is highly frowned on, even considered a historical mistake.

I would argue that the Haskell language itself, through lazy evaluation, basically has built-in async/await support. Due to lazy evaluation, everything is a async/await - every time an expression is evaluated. In Python, you pass values around. In Haskell, you pass around descriptions of how to fetch a particular value, and then the runtime system makes sure it happens when/if it needs to.

It's a bit like Excel. Every cell is a variable that contains an expression, which defines what this cell evaluates to. With that description in hand, it's a simple matter of not evaluating cells that are not in view, and marking an exception in the evaluation with #######. If it were Python, each cell could contain code that modifies other cells, and it would be impossible to make sense of anything.

I've written quite a lot of concurrent code through the last years (network servers, protocol, ...) and overall I now like Go most.

The biggest reason for this is not that necessarily that I think it has absolutely the best concurrency model, but that it's the most consistent one. Nearly all libraries are written for the model, which means they assume multithreaded access, blocking IO (reads/writes) and no callbacks. As a result most libraries are interoperable without problems.

Erlang/Elixir should have similar properties - however I haven't used it.

Javascript has a similar property because at least everything assumes the singlethreaded environment and concurrency through callbacks (or abstraction of them like promises and async/await on promises). I also like the interoperability and predictability here. But sometimes nested callbacks (even with promises) lead to quite a big of ugly code. And calling "async methods" is not possible from "sync methods" without converting them to async first (which could mean some big refactoring). So I prefer the Go style in general.

The worst thing from my point of view are all the languages that do not have a standard concurrency model, e.g. C++, Java, C#, and according to this article also Python. Most of them have several libraries for (async) IO which can be beautiful by themselves but won't integrate into remaining parts of the application without lots of glue code. E.g. boost asio is nice, but you need a thread with an EventLoop. If your main thread is already built around QT/gtk you now need another thread and then have 2 eventloops which need to interact. Some question for Java frameworks, e.g. integrating a Netty EventLoop in another environment (Android, ...). In these languages we then often get libraries which are not generic for the whole language but specific to a parent IO library (works with asio, works with asyncio, ...) and thereby some fragmented ecosystems.

A standard question that also always arises in these "mixed-threaded" languages when you have an API which takes a callback is: From which thread will this callback be invoked? And if I cancel the operation from a thread, will it guarantee that the callback is not invoked. If you don't think about these you are often already in bug/race-condition land.

C++? Java? Python? The traditional thread model isn't bad merely because it's traditional. I much prefer it to promise hell and to async-everything. About the only thing that beats it is CSP, which you can also represent sequentially without funky new keywords and which you can implement as a library for C++, Java, or Python.

I never understood why people tout Go's goroutine feature so much. You can have it in literally any systems language.

The whole point of Golang is that every library and every project that uses Go will support coroutines and channels. Sure you can write a toy project in a language like C that has these concepts, but your toy library will effectively be usable with all of the other libraries that have ever been written for C. Any library that calls a blocking function will break your coroutine abstraction.

It's like saying that indoor plumbing is no big deal-- it's just liquid moving through a pipe. Well yes. Yes, it is. But if you don't have plumbing in your neighborhood, or a sewage treatment plant in your city, you can't fake it by fooling around in your garage. And frankly, it's not going to smell like a rose.

I wrote such a library in C[1] and in practice it's been no problem. Most libraries that do IO provide hooks (for example I made SQLite fully async[2], with no changes to their code). For cases where that isn't possible (or desirable), there's also an easy way to temporarily move the entire fiber to a thread pool.[3] That's actually much faster than moving back and forth for every call (which is what AIO emulation normally entails).

[1] https://github.com/btrask/libasync [2] https://github.com/btrask/stronglink/blob/master/res/async_s... [3] https://github.com/btrask/libasync/blob/master/src/async.h#L...

Disclaimer: not production ready, for most values of "production"

Edit: stacks don't grow dynamically, of course. But that's also a problem in Go if you want to efficiently call C libraries. If you really need efficiency, you can use raw callbacks for that particular section.

> The whole point of Golang is that every library and every project that uses Go will support coroutines and channels.

Of course, this also means that Go is making it hard for its libraries to be used by other languages. So it's probably a bad candidate to write something like a cross-plat UI toolkit, if you hope for its wide use.

In contrast, threads and callbacks are both well-supported in existing languages; so if you write a library in C using either, pretty much any language will be able to consume it.

That's a fair point. Go was not designed to be used to write libraries-- so much so that the language didn't even have support for dynamically loaded libraries for a very long time. (I'm not sure if they ever implemented their DLL proposal that was out there for a long time... I'm too lazy to check now.) The idea was you would write AWS-style microservices rather than using libraries.

In general, "turducken" designs are awkward and difficult to debug. Ask someone what a joy debugging or writing JNI or CPython code is some time. People often prefer "pure" libraries even when the performance is a little worse. C is the king of libraries awkwardly jammed into existing programming languages, but it's a dubious crown to have. Rust is trying to break into this space, but I'm not sure whether it's really a space worth being in.

People get by just fine with wells and septic tanks, literally and metaphorically.

I'm not sure if you're being sarcastic or not? Wells and septic tanks don't really scale if you have a lot of people living in an area. If too many people dig wells in the same neighborhood, the water table goes dry. Not to mention the need for everyone to buy and operate their own individual water softeners, water filters, and pumps. Septic tanks can leach into the ground (and probably into the aforementioned wells) or clog unless they're properly maintained at all times, by every neighbor. And someone needs to come by every single house and manually empty out the septic tank periodically.

So yes... I guess it is a pretty good analogy for C++. Hundreds of stinky septic tanks, right near hundreds of wells, along with a guy arguing indoor plumbing is overrated. And a buggy, informally specified reimplementation of half of Common Lisp ;)

(I'm not terribly familiar with Python's threading, so I'm not going to talk about it)

I never understood why people tout Go's goroutine feature so much. You can have it in literally any systems language.

There are two big reasons for it.

Firstly, goroutines are extremely lightweight. "Traditional" threading in C, C++, and Java means native OS threads, which are comparatively expensive. Sure, fiber/coroutine libraries exist for these languages, but they are far from common (and, the only fiber library for Java that I know of, Quasar, came after Go).

Secondly, Go's ecosystem encourages CSP-style message-passing, rather than "traditional" memory-sharing. This is channels, not goroutines, but they make working with goroutines very nice. This is less concrete than the first reason; you certainly can implement message-passing in any of the other languages' threading styles. But empirically, it doesn't happen as often. A factor in this is also that, unfortunately, many CS curricula don't discuss CSP, which means that Go's use of this is the first exposure many programmers have to it.

> But empirically, it doesn't happen as often.

It's sad that people use choice-of-language as a proxy for choice-of-execution-strategy (interpreted? JITed?), choice-of-allocation-strategy, choice-of-linking-strategy, choice-of-packaging, and so on. All of these factors should be orthogonal. By linking them, we create a lot of inefficiency by fragmenting our efforts.

AFAICT, C++ is the only language that's really been successful at being multi-paradigm.

C++ still drives you very strongly in certain directions for the things you mentioned.

Languages have to hand you very strong default choices for those things, because only the people with the hardest problems and the most time to solve them can afford to pick up a toolbox-box and build their own toolbox to solve a problem. Even the languages that arguably want to be that low of a level like Rust or D still have to offer a much more batteries-included standard library that will make more of those choices for you, and which will be for the vast majority of users the "real" version of that language.

I use Scala without Akka. Just straightforward Futures and for/yield. It's great: the distinction between "=" and "<-" is minimal overhead when writing, but enough to be visible when reading code. You have to learn the tools for gathering effects (e.g. "traverse"), but you only have to learn them once (and they're generic for any effect, rather than being specific to Future, you can use the exact same functions to do error handling, audit logs and the like).

After using Akka-HTTP, I never want to write a HTTP service with anything else.

akka-http is nice. akka-actor (i.e. the project that was originally called "akka") is awful. The name overlap is unfortunate.

In your opinion, what's wrong with akka-actor?

It sacrifices type safety without offering enough value to be worth that - especially given that the model also eliminates useful stack traces. It forces non-distributed applications to pay the price of an API designed for distributed ones. Its FSM model doesn't offer the conciseness it should.

How do you define "Productive"?

Aside from that, personally I've used both Akka and plain Scala with Futures, as well as node with Promises, bare callbacks and async (though I've not tried fibers). I find Promises and Futures are the perfect balance between simplicity of use and the benefits of using the Async model. There's no need to reason about threads, as they abstract away the actual async implementation, and the interface they expose is very easy to reason about.

I'm surprised there aren't more mentions of Tasks in C# or F# on the .NET platform as examples of asynchronicity done well.

From the perspective of uniformity and availability, while C# provided asynchronicity via callbacks before the introduction of Tasks in the 4.5 release of the .NET Framework, all the core libraries that used callback-style async (as well as some that had been strictly synchronous-only) were updated with Task-based overloads, so there are no problems with Task-based async being inconsistently available. Additionally, adoption of Task-based async in third-party libraries has been high, so it's relatively uncommon to encounter code that does not support it.

From the perspective of code productivity, it's hard to get much better than simply adding the async and await keywords where necessary. As a very simple example, consider a typical server application that receives requests via HTTP, processes them via an HTTP call to another service as well as a database call, and then returns an HTTP response. The sync code (blocking with a thread-per-request model) might look something like this:

    void handleRequest(HttpRequest request) {
        var serviceResult = makeServiceCallForRequest(request);
        var databaseResult = makeDatabaseCallForRequest(request);
        sendResponse(constructResponse(request, serviceResult, databaseResult));
In order to make that same process async (non-blocking with a dynamically-sized thread pool handling all requests), the code would look like this:

    async Task handleRequestAsync(HttpRequest request) {
        var serviceResult = await makeServiceCallForRequestAsync(request);
        var databaseResult = await makeDatabaseCallForRequestAsync(request);
        await sendResponseAsync(constructResponse(request, serviceResult, databaseResult));
It could even be taken one step further to make the service request and database call concurrently if there were no dependencies between the two which would reduce processing latency for individual requests:

    async Task handleRequestAsync(HttpRequest request) {
        var serviceResultTask = makeServiceCallForRequestAsync(request);
        var databaseResultTask = makeDatabaseCallForRequestAsync(request);
        await sendResponseAsync(constructResponse(request, await serviceResultTask, await databaseResultTask));
I've added asynchronicity into a C# server application as above with substantial improvements in both individual request latency and overall scalability. I'm now working on a Java8 system and bemoaning the comparatively primitive and inconsistent async capabilities in Java8.

Writing concurrent code in go takes a lot less thinking than js. Or... a different kind of thinking? But holistically I greatly prefer it for complex asynchronous code.

Lack of generics on channels really hurts the library ecosystem though. Many things you need to write yourself.

Concurrent ML according to Andy Wingo. He recently wrote a good series on concurrency in programming languages: https://wingolog.org/tags/concurrency

Seriously check out crystal. Go's goroutines seem to do quite well, and crystal is pretty close to go in terms of concurrency, but is a higher-level language overall.

goroutines with channels are well loved for concurrency in Go

For Common Lisp, see lparallel and lfarm.


Try await/async in F#.

i think because it is really actually new i think perl 6 is only being considered as production worthy since 2016

also as of now, most people who used it complain it is slow

give it 2 more years before you worry, and for now continue with python or whatever you like to use

no one is in a rush to make perl 6 popular ... it is not a commercial project ... so don't bet your career on perl 6 ... yet

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact