This is a great blog post. Concise, lacking fluff or extraneous prose, it gets right to the point, presents the primary-source reference and then gets right to the solution. A bit of editorializing in the middle but that's completely allowed when writing this tightly. Well damn done, OP.
And also it's great information that I - like I'm sure many of you - also never noticed. THANK YOU!
Well, I don't know, I kinda miss the human angle. I'd have loved to first read six paragraphs about how the author's grandmother raised them on home grown threads and greenlets :^)
> I'd have loved to first read six paragraphs about how the author's grandmother raised them on home grown threads and greenlets.
With recipes, often times your problem is you want to learn how to make something where having the steps listed out is the most important thing. The story behind the recipe isn't important to solve your problem but for tech the story around the choice is important. Often times the "why" is really important and I really like hearing about what led someone to use something first. Often times that's more important or equally as important as the implementation details.
It wouldn't make sense for this post given its title but if someone were making a post about why they chose to use async in Python I'd expect and hope that half of the post goes into the gory details of how they tried alternatives and what their shortcomings were for their specific use cases. That would help me as the reader generalize their post to my specific use cases and see if it applies.
For some reason whenever this comes up there'll be one person saying "I bet you didn't know it's for copyright" and another saying "I bet you didn't know it's for SEO". I've yet to see either prove anything beyond that it's a plausible explanation that could fit the minimal known facts.
I can't find the reference atm, but Jeff Jarvis (https://en.wikipedia.org/wiki/Jeff_Jarvis) says that this _is_ to get around copyright law; the same technique was being done over a hundred years ago (possibly more, that's why I want to reference!). Instead of blogs, think pamphlets.
I feel alone sometimes as seemingly the one person adding "I bet you didn't know it is because blogs sometimes have regular readers and regular readers drive ad revenue more reliably than SEO or concerns about copyright". I think it fits the facts better, by far, but it's really interesting to see how many HN commenters under-estimate the readership of cooking blogs because they only ever interact with recipes as utilitarian data at the end of a specific web search and aren't often themselves the sorts of people who are the regular audience of recipe blogs so they discount that such audiences exist. (I had my cooking show phase and have friends and loved ones that hooked on the stories of some of the bigger name recipe blogs and I love letting them tell me about the sorts of things they are reading. It's amazing the blinders that the general HN commenter sometimes wears without realizing it.)
I immediately bounce from those Stackoverflow clones that keep appearing up at the top of searches. So, I am wondering how much this is still weighted in the scores.
When is the last time you heard of online recipe blogs enforcing copyright claims on other blogspam? Ridiculous.
The real reason is simple, people who write recipes aren’t robots - they’re expressing their stories and emotions, while explaining how to make food that’s dear to them..
>people who write recipes aren’t robots - they’re expressing their stories and emotions, while explaining how to make food that’s dear to them..
the people who write recipes aren't robots, they're narcissists and various forms of insecure and seek validation in the form of attention and adulation from others. That's not a bad thing, it's all too human and we should embrace, not stigmatize, the needy, but if all you want is a recipe rather than to be an acolyte it can seem like a big ask.
You enjoyed time with your grandparents, and you remember it? Welcome the club! and I remember family as much more complicated than simply being all fun, and I feel like you might be Norman Rockwelling a bit.
yes, but you're cart before horse. The structure of paper newspaper and magazine recipe articles and "prose remembrance" cookbooks is the same. Tweaking that model for online use, very plausible, but it's also the way these "stories" are told anyway.
> they're narcissists and various forms of insecure and seek validation in the form of attention and adulation from others
This is incredibly insensitive and judgemental.
Not sure what I expected from HN, I guess...
Why are these "narcissistic" people obliged to provide you with formulaic recipes for free? If the cost is perusing over their feel-good story, I feel it's a fair trade-off.
It's not about recipes and tech blogs. It's about how American journalists have been taught to write, ever since the days of Capote and Hemingway. Everything else flew from there.
As an European, this is painfully evident every time I read something from a US journalist: I have to fast-forward through several paragraphs of useless "human angle" before I can get to the actual meat of the article.
Unfortunately the rot is spreading further and further every year.
For recipes, it also signals effort and provides a hint about quality. There's a lot of low effort, broken, "would anyone enjoy eating this?" recipes out there dumped on recipe sites. A few pages of text says that the author thought the recipe was at least worth that amount of effort, and usually confirms that the author thinks the recipe is at least as good as X other recipes, etc.
It draws attention to a problem that a lot of people have created for themselves by not reading the documentation (or not recalling it if they read it). I guess the author could have just linked the documentation but then they couldn't have added the additional context of the github search demonstrating how common it is.
I must have looked through the docs for create_task a dozen times while trying to figure out how async/await works in Python but still managed to overlook this part.
The author doesn't go into much detail on that point: this warning should be present in documentation of many Python libraries that use create_task and return the result to the user unless that library stores those tasks in a collection as is recommended -- at which point the library author had better roll their own garbage collection!
As someone who happens to be eternally grateful to the author for his contribution to the Python ecosystem [0], I kinda feel like this comment thread is overreacting to his overreaction. When I look at this post all I see is a useful, well explained, byte-size writeup that a search engine might recommend to someone looking for help in writing async Python.
Maybe it's because a bunch of my friends are Scottish and I get their sense of humour.
To me, the surprise here is that usually you don’t expect Python finalizers to do something like this: when they dispose something, it’s usually unobservable from the perspective of the program, e.g. an unreachable file descriptor. Here, the runtime is disposing something that is still observably in progress, which is surprising behavior.
I do wish he dwelt on task groups a bit more at the end. Many comments here seemed too have missed that bit. They're not just a handy way of executing a hack. Instead, they're a revolutionary way (ok maybe that's a but string but not much) to structure your async program to avoid a whole host of bugs.
This is one of many reasons I'm sceptical of the current trend in Python to "async all the things". The nuance to how it operates is often opaque to the developer, particularly those less experienced.
GUI toolkits (like Textual) however are a really good use case for Asyncio. Human interaction with a program is inherently asynchronous, using async/await so that you can more cleanly specify your control flow is so much better than complicated callbacks. Using async/await in front end JS code for example is a delight.
Where I'm particularly unconvinced of their use is in server side view and api end point processing. The majority of the time you have maybe a couple of IO opps that depend on each other. There is often little than can be parallelised (within a request) and so there are few performance gains to be a made. Traditional synchronous imperative code run with a multithreaded server is proven, scalable and much easier to debug.
There are always places where it's useful though, things such as long running requests (websockets, long polling), or those very rare occurrences where you do have many easily parallelizable IO opps within one short request.
The thing is, there's a lot more nuance to it than this. Async/await is part of the language syntax in python, but asyncio is only one particular implementation of an event loop framework to power it. But really what async/await provides is a general-purpose cooperative multitasking syntax. This allows other libraries to implement their own event loop frameworks, each with their own different semantics and considerations (the two best-known alternatives being Curio and Trio). At a language level, there's nothing even forcing you to use async/await for ascync IO -- you could, if you really wanted, probably write a library that used it to start threads and await their completion.
So you have, from highest-level to lowest-level: application code, async/await language syntax, the event loop framework, and then the implementation of the event loop itself. The OP article concerns a peculiar implementation detail in the lowest level that makes it very easy to write bugs at the highest level.
But that means that even if you do "async all the things", you'll only encounter this situation if you write your application code in a particular way. It just so happens that "in a particular way" is, in this case, the overwhelming majority of how people write it, which is, of course, why the OP article is relevant.
> This allows other libraries to implement their own event loop frameworks
At work someone replaced the default library with another faster implementation.
Then the unix socket listener task was not working.
A few hours of git bisect later, I found out the offending commit was the 1 line switching the event loop. Seems the fast implementation didn't implement unix sockets and just had "pass" in the function.
> The OP article concerns a peculiar implementation detail in the lowest level that makes it very easy to write bugs at the highest level.
Are other async implementations using the asyncio.Task abstraction? I haven't looked into it, but I assumed that asyncio.Task was tied to the asyncio implementation and event loop.
asyncio.Task is part of the asyncio event loop framework. So any event loop implementation that conforms to that framework will have to have one (including the default event loop implementation that ships with asyncio). So for example, uvloop, which is an alternative event loop implementation that works with the asyncio event loop framework, also uses asyncio Tasks.
Other event loop frameworks can do whatever they want, and presumably, wouldn't be importing from asyncio. Whether they have a similar abstraction is completely up to the framework itself. Trio, for example, doesn't have a concept of a task object at all, because it enforces a strict tree structure for tasks.
I am a huge fan of parallel and async code. I spend a lot of time researching it and trying to design software that is easily parallelisable.
Many GUIs use the event/message pump pattern, such as Windows 32 API. Qt does something with its event loop (QEventLoop)
Threads are a rather low level instrument to get background tasks going because the interface between the main thread and the threads is rather omitted.
In Java you could use a ConcurrentLinkedQueue. And in Python you can use JoinableQueue.
I am heavily interested in this space because I want to write understandable software that anybody can pick up and work with. I worked on a JMS log viewer that used threads but would crash with ConcurrentModificationException due to not being thread safe. I changed it to be thread safe but its performance dropped through the floor. In my learnings since then I should hast sharded each JMS connection topic to its own thread or multiplexed multiple JMS topics per thread and loop over them. The main thread can interrogate the thread with a lock, that should be faster than every thread trying to acquire the lock. It would be driven by the main thread but the work is done in the background. The threads can keep the fetched messages in memory until the main thread is ready for them.
I think with the right abstraction, thread safety can be achieved and concurrency shouldn't be something to be afraid of. It is very difficult and challenging working at the low levels of concurrency such as a concurrent browser engine. (I've not done that though.)
This is why languages such as Pony lang, Inko, Cyber and Erlang, Elixir are so promising. We can build high performance systems that parallelise.
Writing an async/await pipeline that looks synchronous is far easier to understand and maintain than nested callbacks. So I can see where async is useful. I just hope we can design async software to be simpler to maintain and extend.
> Where I'm particularly unconvinced of their use is in server side view and api end point processing. The majority of the time you have maybe a couple of IO opps that depend on each other. There is often little than can be parallelised (within a request) and so there are few performance gains to be a made. Traditional synchronous imperative code run with a multithreaded server is proven, scalable and much easier to debug. Traditional synchronous imperative code run with a multithreaded server is proven, scalable and much easier to debug.
Python doesn't have multithreading that scales or supports real parallelism. asyncio has very measurable performance benefits for exactly that use case you've mentioned versus threaded servers.
Sorry that's not accurate. Asyncio and threading offer the same variety of "parallelism" , which is that both can wait on multiple io streams at once (the gil is released waiting on io). Neither offer CPU parallelism, unless lots of your CPU work is in native extensions that release the gil. In that unusual case, threading would offer parallelism where asyncio wouldn't.
Asyncio's single advantage is you can wait on lots of io streams, like many thousands, very cheaply without having to roll non blocking IO queueing code directly.
I didn't say that asyncio offered parallelism, I'm pointing out that normal assumptions about multithreading you'd make with other languages don't always apply to Python. You'd typically assume that threads offer parallelism, a property you might choose to use them for over something like single-threaded asyncio.
I've found that for even IO bound workloads, the amount of throughput plateaus when using a relatively small amount of threads despite the GIL being released on IO.
sorry for misreading that from your post! my own benching with threads vs. asyncio has never found any performance difference between the two approaches (asyncio slightly slower). if you need very wide throughput, then yes asyncio is better. otherwise, it's very difficult to create equitable comparisons between threaded and asyncio code.
Haha, it's cool. I once had a problem where I needed to scrape a bunch of sites at once, and ended up using it as a benchmark for threads vs asyncio. The overhead with threads was high even when using a handful of threads, and quickly hit a plateau after scaling to 4-8 threads on a 16 core machine. I was a big fan of the concurrent.futures executor model and generally thought it was good performance anyway, but figured I'd try to implement it in asyncio, and rate limited both for a fairer comparison. It was so shockingly fast that it was the last project I reached for threads again with on Python. I can't say that it was an equitable comparison, but it certainly left an impression on me.
it was probably whatever concurrent.futures does. I use plain threads and get great results for anything under 25 threads on a small laptop. you always keep the same set of threads running in a pool and use a decent queueing model.
certainly, doing that kind of thing with asyncio likely has a lower learning curve because you don't need to worry about "pools of workers" so much. concurrent.futures should do that for you also but that would be the first place I don't trust
> GUI toolkits (like Textual) however are a really good use case for Asyncio.
Only if the GUI toolkit is explicitly written to be asyncio-aware and use asyncio's event loop. Textual appears to be written specifically to do that.
However, other GUI toolkits that I'm aware of that have Python bindings aren't written that way. Qt, for example, uses its own event loop, and if you want anything other than a GUI event to be fed into Qt's event loop so your event-driven code can process it, you have to do that by hand and make sure it works. There is no point in even trying to use another event loop, such as Python's asyncio event loop, since that loop will never run while Qt's event loop is running.
>Where I'm particularly unconvinced of their use is in server side view and api end point processing.
Sure, performance isn't going to get better, but for websockets and server sent events the occasional long-lived async task can be great. Especially when you need to poll something, or check in on a subprocess.
These days with graphql, or complex microservices architectures, you could have multiple hops to fulfil l the original request.
Flask sync will hold that thread hostage until the request is done. Where async with properly used async libs will allow other requests to process.
We often have medium sized reports take seconds. That is a lot of time to wait. And would just end up bloating your service scaling to handle more connections.
Any service with decently long lived network requests will benefit from event loop handled scheduling.
Me too, but threading is botched in Python. Not just the Global Interpreter Lock. Some Python packages are not thread-safe, and it's not documented which ones are not. Years ago I discovered that CPickle was not thread safe, and that wasn't considered a problem.
You can still have thread safety issues with the GIL in place, because globals and other data is shared between threads.
For example, you can put a dictionary at the module level, thread A can set a key in that dictionary like “name”, thread B can overwrite it, and then thread A comes back, does dct[“name”] and gets an unexpected answer.
This is a relatively easy mistake to make, a lot of python code has module level variables.
I really don't get the whole coloured function thing. How's it not just the function signature? You might as well claim a new function argument makes a new colour. Granted, all of my use of async is in Rust in which the compiler picks this stuff up, so maybe in python there are other concerns I'm missing.
I think the "new function argument makes a new colour" is accurate on several levels. async/await is a dual for streamlining working with the Promise or Future or Task Monad (however you want to call it). (And async/await syntax in most of the languages that have it can actually be [ab]used for a substandard "do-notation" for nearly any Monad you want to use.)
At face value, yeah, every function in Haskell that accepts an IO monad is now "IO colored", but at the same time, that's a silly way to look at it. It's just extra type information and type bindings flowing as types flow through functions. It's just a bit of a tautology that functions that deal with other functions that need that type need to deal with that type themselves.
Functions that use Maybe/Option/nulls are all "nullable colored". Functions that use or return integers are clearly "integer colored". That's what programming languages do: they try to track how your types flow through functions. "Coloring" is a bad metaphor or at least a useless one, we just call that "types". Admittedly, I think that's why Python and JS users predominantly use the "what color is your function" complaints the most because all of the rest of typing information for them is generally opt-in and easily ignorable/forgotten.
I am not sure I agree that the GUI is a good use case for async. A human interaction with the program must almost always pre-empt whatever the program was running, so I can not see how a cooperative multi-threading runtime like async Python can work in such a scenario.
The response here makes me think most commenters don’t have experience with this particular footgun.
To clarify: Python can gc your task before it starts, or during its execution if you don’t hold a ref to it yourself.
I can’t think of any scenario in which you’d want that behavior, and it is very surprising as a user.
Python should hold a ref to tasks until they are complete to prevent this. This also “feels right”, in that if I submit a task to a loop, I’d think the loop would have a ref to the task!
It’d be interesting to dig up the internal dev discussions to see why the “fix this” camp hasn’t won.
I can see this behaviour being useful if you are no longer interested in the result of a "pure" task. For example imagine fetching some data via HTTP. If you no longer need to response canceling the request could make sense.
But I agree that this is unexpected and most code probably isn't ready for being cancelled at random points. (Although I guess in Python your code should be exception safe which is often similar)
If you want to avoid an expensive operation in cases where the result is no longer needed then surely you'd want to cancel the task as soon as you know that and not at some indeterminate time when the GC runs so I don't think this behavior makes sense even for that scenario.
That would make tidying up rogue tasks impossible. Of course we all like to think we do cancellation perfectly, but it's nice to know that the task scheduler has your back.
Edit: I don't quite understand why a user would expect a task to remain live _after_ the last reference to it has been dropped...
Because they have expressed the intent to run it by scheduling it on the event loop?
I don't follow the argument wrt tidying up rogue tasks. What does it mean for the task to be "rogue"? If there was some state change that made the task redundant - because it clearly wasn't when it was submitted! - then the code that makes that change, or some other code that observes it, should cancel it. If it isn't cancelled, the fact that nobody is able to observe the value that the task will yield is not sufficient to auto-cancel, as there may still be a dependency on side effects.
And, speaking of tidying up, what if the scheduled task is the one that performs some kind of cleanup?
One might argue that dropping an object should result in it being deallocated, as is normally the case with RAII languages, which in the case of a task is to be stopped and tidied up. Speaking from experience with both threads and async Rust tasks, I find the async case in which tasks are dropped when the references are dropped is much easier to work with (once I overcame my prior expectation based on threads). If you want to keep a task alive, then explicitly keep it alive by holding on to its handle.
If you have a scheduled task to clean up, then you need to manage it at whatever level that occurs. You signal to notify completion of clean-up to the top (or whatever) level. It's no different to signalling the other way to notify of shutting down.
A possible solution might be a built-in dunder flag to explicitly tell the interpreter not to get rid of a specific object. Something like __keep_alive__ or similar.
The notion of fire-and-forget is itself the problem. Even with threads, you should have them join the main thread before the program exits. Which implies you should hold strong references to them until then. Most people don't go out of their way to do this even when they're able to, but that's what you're supposed to do.
I came here to write this comment. Also, you usually need to have some means of canceling the task -- otherwise you have to wait for them to finish, or you leak these stray lost tasks that are doing stuff, like manipulating the state of things.
Let's say I write a task that updates a progress bar as an infinite loop, and let it be gc'ed on program exit, without ever joining it. What's wrong with that design? I can, of course, modify the task to check a flag that indicates program completion, and exit when it's set. But does this extra complexity help the code quality in any way?
Or suppose I spawn a task to warm up some cache (to reduce latency when it's used). It would be nice if it completes before the cache is hit, but surely not at the cost of blocking the main program. I just fire-and-forget that task. If it executes only after the cache was hit, it will realize that, and become a no op. Why would I want to join it at the end? It may not be free (if the cache was never hit, why would I want to warm it up now that the main program is exiting?).
I might not have an answer you'll find convincing, since this is somewhat subjective, as the "should" here isn't based on a "need". The best analogy I can give here is that you don't "need" to, for example, avoid circular references in Python. Some believe it's better design to avoid them if you can. I find value in making things, say, more predictable and deterministic when possible.
Like what, for example? Well, one very simple yet practical one is that when you break into your program with a debugger, you want to minimize noise - any thread or object that's alive unnecessarily is, at the very minimum, extra noise for you to deal with, and at worst, extra surface for a bug to creep in. Moreover, the liveness of the thread/object could provide you with a vital bit of information that you otherwise wouldn't get. Another one is the fact that it lowers the number of obstacles you'll have in the future if you ever want to do something less common - such as suspending a GC, snapshotting the program state, or any number of less common things. Yet another one is the very fact that following the pattern more broadly helps you and future maintainers avoid pitfalls that arise in similar abstractions, like they did here. I could go on, but all of these concerns are basically a bunch of things whose values mostly lie in the potential future, not the present.
There are many other more practically-minded folks who believe the presence of a GC exempts them from caring about such concerns, and see these as adding extra complexity. If you see it that way, I don't have a compelling rebuttal. But if "complexity" is your criterion, perhaps what I can offer is that you can also view it from the opposite standpoint: following the fork-join pattern (or avoiding circular references, etc.) itself avoids complexities that arise from not doing so [1], such as those in the previous paragraph. It's just that not every form of complexity or cost materializes immediately.
[1] Note that complexity is not just a measure of code size, but also the deviation of its behavior from expectation. You can make code more complex to reason about merely by deleting some lines, and that could include a thread.join() call.
You make very good arguments in favor of joining threads in most cases, and I completely agree with you. Perhaps the only disagreement we (may?) have is that I think these arguments may not apply in some cases.
In my first example, I would probably find not joining the thread cleaner than joining (since it would require extra code to rewrite the infinite loop into something joinable, and since the earliest time I can join is at the very end of the program anyway).
In my second example, your arguments are persuasive. It is very likely that there is a place in the program where the cache warming is no longer a good idea (for example once the real traffic started hitting the cache, it's probably too late; in fact warming up at that stage is probably a bug, since it may divert resources from serving the actual user traffic). So yes, in my second example, I now think it's better to either join or cancel the task.
Thanks! Regarding your first example, I'm not entirely sure I understand it. If you have a task that updates a progress bar as an infinite loop... does that mean the task never finishes? What does "progress" even mean for something that goes on infinitely long? What happens if the GUI is destroyed in the middle of that thread's lifetime (which it will be by the main thread, if the secondary thread runs forever)? How many threads do/should you end up with if you later realize you want to run your program itself (i.e. your main()) function multiple times?
The main cases I can think of where joining might not make sense is when you simply don't have the capability to do so in a reasonable manner, like when the main thread is in third-party code that you have no control over. Otherwise, if I understand the example correctly, you absolutely need to join such a thread - and not merely at program exit, but sometime before the GUI is destroyed.
Conveniences like this library and other threading libraries make it easy for people to trivialize something (concurrent programming) that ought not be.
This. Even if you hold a reference to the task, your program very likely has a bug. At some point you should always await it to see if it failed or not.
It's easy to miss this if you observe completion via a side-channel, for example item removed from a queue. But this is also a bad way to write tasks in the first place, let them return meaningful data rather than mutate shared objects. That way you are forced to await them and your code becomes much more straightforward. It's counter-intuitive at first if you think in threads, because there you are more used to worker pools and such, whereas asyncio tasks can be written in a more linear way and don't background workers to the same extent.
After having done this mistake several times, I've concluded one should almost never use create_task. It's much better to place them into a top-level list of background tasks, that is always awaited, using this method they are both started automatically and always awaited for errors appropriately.
> At some point you should always await it to see if it failed or not.
What you're sayin is correct, but doesn't quite imply what I'm saying. I'm saying everything that you spawn asynchronously (be they threads, tasks, whatever) needs to be joined - even if they're no-ops whose success or failure is irrelevant. This is similar to how you should always make sure to deallocate memory that you dynamically allocate whenever you can, as a matter of good practice and good hygiene. Sometimes you can get away with not doing so, but you shouldn't really skip it unless you don't have a choice, as it makes the program logic clearer and can make the program more robust too. (e.g., imagine running your main() in a loop where threads are spawned each time but never guaranteed to join.)
one of the reasons i was never "bitten" by this bug: whenever i use python tasks, i save them in a collection so i can cancel all of them when the program is ready to quit.
Wow. What a strange design decision, as evidenced by sheer number of developers who don't / didn't know about this (myself included). I hope this gets fixed instead of just documented.
Agreed, I’m really surprised at all the comments defending this behavior. I suspect there is a non-obvious reason why it’s this way, but “you should’ve read the docs” and “but why wouldn’t you hold your own strong reference” are weird takes IMHO.
Bugs stemming from the architecture of a poorly specified system become insecurities for the people who rely on that system.
One of the major reasons why Python’s leadership refused to optimize python’s performance, besides Guido’s intransigence, was because they treated the CPython implementation as the specifications.
Legions of script kiddies built their programming identities around the belief that python must be slow, because to admit otherwise would require changing the system.
It was the Python community that consistently treated CPython behavior as the spec, in many cases contrary to explicit statements in the official docs to the effect that "this is a CPython-specific thing" etc, the most notable example being reliance on deterministic refcounting and on the GIL.
But that's not what makes performance hard to fix. It's rather the fact that most Python code out in the wild depends on packages written in native code, and the CPython ABI for said packages exposes way too many implementation details. If you ditch ABI compatibility, you can ditch GIL, for starters (see e.g. IronPython and Jython) - but few people are willing to make do without all the affected packages.
Blaming the followers for not reading the docs is an easy excuse, when the real problem is that the leaders have failed to define Python. The PSF needs to decide what precisely Python is and what it isn't among themselves. Because they did not do so, they can't tell if something is a bug or intentional. See this entire thread as proof.
In lieu of a formal definition, the maintainers resort to a hodgepodge of user docs, PEPs, mailing lists, and the "reference implementation". Worries about making that "reference implementation" more complex stymied Python's development. There's been flamewars about this topic with the PSF and their apologists.
On that note, Rust encountered a similar problem of bad decision-making by maintainers, who also were opposed to specification. They have ejected those maintainers and replaced them with ones who understand the need for a formal definition of their language.
I'm not arguing against the notion that Python doesn't need a more rigorous language spec. But I don't think that's the blocker for alternative implementations with significantly high performance - the parts that are well-documented already provide a lot of leeway to optimize if only user code didn't rely on it.
And if users don't read the docs today, I don't see why they'd suddenly read a formal spec tomorrow if one is available. The problem is that Python got successful specifically in form of CPython, making the latter a de facto standard whether it wants it or not. I understand the users, too: why should they bother writing implementation-agnostic code if they're planning to run it on CPython anyway, and the vast majority of developers who might want to reuse it will likely do the same?
The number of developers defending this decision may also partly explain the decision. Developers' skill has many dimensions that are not perfectly correlated. It's quite possible that even among the (otherwise highly skilled) python core developers, few understand (or care) how API design affects reliability of code written by the library users. And the few that do, have hundreds of higher priority issues to deal with.
Not just that, it is also annoying to work around. Sometimes you really do want to kick off a background task and have it run to completion, even if you can't keep a handle on it, just like a thread. This is surprisingly hard to do.
I have a very similar helper in multiple codebases in production. It has some additional bits to log errors and to handle tasks that should never complete (basically a flag to call sys.exit if they do) but it's basically this.
I don't understand why theads get a background mode but tasks can't be fire-and-forget.
Yes, the base async interface is confusing and overly complex. It's a downside! As they note lots of people have stepped in to provide better helpers (like TaskGroups) - but these are the docs for the base library!
> But who reads all the docs? And who has perfect recall if they do?
Everyone reads the docs? That is why you don't need perfect recall because you can read them whenever you want.
Python has lots of confusing corner cases ("" is truthy, you need to remember to call copy [or maybe deepcopy!] sometimes, all the other situations where you confuse weak v.s. strong references). They cause really common bugs. It's just a hazard of the language in general and the choices it makes (much like tasks being objects is a hazard). I do understand why people think they can throw away task references (based on other languages) - but this is Python! The garbage collector exists and you gotta check if you own the object or something else does.
Edit: this feels like an experienced Python developer, who has already internalized all the older, non-async Python weirdness, being taken aback by weirdness they didn't expect. Like, I feel you, it does suck - but it's not a bug that values you don't retain may get garbage collected.
For Python? The language where everyone just cobbles together random code from the internet and other repos? I can totally see how this mistake happens left and right. The bar of entry for this language is way too low to assume only rigorous senior devs use it.
The author goes on to say they found this pattern lurking in various projects on github. So, no. The problem is that this behavior is subtle, not intuitive, and unless you are reading the actual documentation top to bottom (and not just the function signature and first paragraph from the pop up in your IDE) you will likely get bitten by this.
What is the point of your comment? The author shouldn't have called out the upturned rake in the darkened shed?
I wouldn't say shouldn't - they are free to do what they want. But this is a blog post about something that can trip you up that the docs highlight - which the author calls a "heisenbug". The author doesn't even have a suggestion for the docs, which already calls out the problem they encountered, they just note that there are helpers for this problem (which is true).
The point of my comment is that subtle, non intuitive things like this are all over Python and, while this one is particularly bad, this blog post makes it seem like more of an aberration than it is.
> The author goes on to say they found this pattern lurking in various projects on github.
I'd call it an anti-pattern. If you spawn a process/thread, and never wait/join it, it means you don't actually care what it does, if it crashes, etc. I don't see a problem with Python's behavior here.
Seriously?? I think the vast majority of developers would find it very surprising that the Python runtime would GC a task in the middle of execution. I would expect that the runtime would by default do what the example in the doc says, which is keep a strong reference to the Task object until it finished execution.
The chances are that most people doing this are introducing some nondeterminism that they did not expect, and will have a hard time dealing with it when it bites them in the ass.
What's more to the point is that I am going to have a hard time when it leads to a serious outage or security violation at some major corporation that has become too pervasive in its reach for me to avoid its influence. No amount of schadenfreude is going to compensate for that.
This is unfortunately all too common. I am often dinged by my manager about my throughput when investigating the integration of new libraries because I read the docs, front-to-back, as part of my research and that takes time.
I don't think it's really appropriate or feasible or even helpful for most things. At least for myself, if I read through the entirety of a language's documentation I wouldn't remember half of it, and I probably wouldn't need everything I remembered.
Language and library designs should optimize for least-surprise.
Most libraries are quite poorly documented (to my taste, anyhow). It is certainly worth spending the time reading at the very least documentation pertinent to the components of the library intended for use.
Considering how many times I need to add site:python.org to my python search queries to actually get to the docs, I assume that a surprisingly low number of python developers actually read the docs.
I think you may be too bold with the assumption here, personally I would wager that the majority of people who write Python don't even know Python has official docs outside of a site called Stack Overflow.
Wow I've heard people say that everyone should read all of the docs (which isn't really true) but I've never heard anyone claim that everyone does read all of the docs! Wild.
Oh, sorry, you are right - "" is false-y, even though it's a valid empty value. So it's hard to tell the difference between a value not being filled and a value being filled with an empty value.
I guess I am too deeply in the Python ecosystem to see a problem here. Unless you want to check for the existence of "I exist"? In which case, the Python Way would be
answers = {}
answers["I exist"] = ""
if "I exist" in answers:
print("a")
It's not a problem? The async interface isn't a problem either. It's just a thing you have to remember about python: "most input is truthy except for the input that isn't"
"Most of the time you don't disrupt your program by not keeping the returned reference in scope except for when you do"
Truthy is a Pythonic core principle of the language. It is not an edge case phenomenon in the language which I would expect a regular practitioner to confuse.
What I learned when I wrote Python professionally was “never rely on truthiness” explicitly writing out a boolean expression that does what you want is more explicit (“explicit is better than implicit”, PEP 8) and prevents a whole class of bugs down the line.
PEP8 is touted a lot as if it is a perfectly correct tome of ... correctness. I've worked in Python long enough to know that it both doesn't cover everything and the advice is sometimes actively bad.
I mean, I'm not blaming PEP8 per se (A Foolish Consistency is the Hobgoblin of Little Minds) but it has a tendency to be taken as gospel by a lot of people
Funny enough, those who push for more strict adherence are also the ones that neglect other aspects (speed, alg. complexity, etc) especially readability (and no, PEP8 compliant code is not necessarily the most readable)
The alternative here (which would make your example work) is that dictionary lookups return some falsy sentinel value when the key does not exist, just like in javascript. In javascript, this has made a lot of people very angry and been widely regarded as a bad move.
Those values tend to propagate out and make a mess, sneaking into your stored data or causing errors in unexpected places -- fixing this feels like one of the main advantages of using things like typescript, and it's just not an issue in python.
It's not a problem, because the alternative is way worse. It's just a different language design than the one you're used to.
You could argue about whether or not truthiness makes sense as a concept (personally I think not), but the way it's defined in python is quite coherent and useful in practice.
> The alternative here (which would make your example work) is that dictionary lookups return some falsy sentinel value when the key does not exist, just like in javascript. In javascript, this has made a lot of people very angry and been widely regarded as a bad move.
You get that behavior by using dict.get(): answers.get("I exist") returns None, which is falsy.
Right, sometimes you want that. I suppose my point was that the definition of truthiness works well with the default, most obvious way of indexing a dictionary, which was the original complaint.
Perhaps I was a bit harsh -- that exposes an issue which does trip people up, where they use `if x:` when you mean `if x is not None:`.
Saying that, I think it's defensible in the same way. The fact that you can write `if x:` (where x is not a bool) tells you that the language has a concept of truthiness, and so maybe you should have a think about what that actually means.
The more of these things there are, the more brainpower you devote to remembering the right way to do things; if you don't you introduce bugs, a subtle, painful one here.
I mean, that's the fundamental reason it's called falsey, that is, Python does automatic type coercion if it is evaluating things in a boolean context. FWIW this is the exact same behavior in javascript.
He didn't even have to read "all the docs" - just the ones that pertain the the function that he is using. And then not ignore the section marked "Important" and the highlighted "Note".
That commit is adding the same disclaimer for the `shield` function, the original one-line mention for `create_task()` is a little older, but also only November 2021, so during 3.11 development...?
It is for this reason in Temporal Python[0], where we wrote a custom durable asyncio event loop, that we maintain strong references to tasks that are created in workflows. This wouldn't be hard for other event loop implementations to do too.
I mean the default asyncio event loop can be replaced/extended where you won't have to know/remember on each create_task. But yes, it is an unintuitive default.
Does anyone understand why the event loop only keeps weak references to tasks? It'd seem wise to do something to stop it from being garbage collected while running, maybe also while waiting to run.
Because it's almost always the case that the consumer is going to keep a reference to the task in some way, so that is the logical choice for the "primary owner" of the task. Python doesn't have ownership per se like rust, but if you keep more than one hard reference to an object around, it'll prevent collection, so in cases such as this it makes sense to designate one primary owner and have all other references be weakref.
Python's reference counted - if the event loop holds a reference until the task has run, then drops it, then everything behaves sanely. That's not a cycle. It just means the task that was scheduled will execute, which seems like the right default.
Only guess I’d have is to protect the system against infinite-loop tasks, but I don’t remember any other runtime caring and an a task which never terminates seems easier to diagnose than one which disappears on you.
And this is why trio got it right, and why I think the task groups (nurseries from trio) can't arrive soon enough in the stdlib.
Because not only you must maintain a reference to any task, but you should also explicitly await it somewhere, using something like asyncio.wait() or asyncio.gather().
Most people don't know this, and it makes asyncio very difficult to use for them.
> task groups (nurseries from trio) can't arrive soon enough in the stdlib
Please, no. Asyncio is horrible, and bodging it to make it less horrible just means we will be forced to live with the remaining horror. Far better to replace it with something that works properly (yes, Trio).
I'm using a lot of `asyncio.get_event_loop().create_task(...)` calls without assigning the task to a variable, but the docs [0] don't mention anything regarding this method on the loop object. Can I assume that I'm safe?
I'm pretty sure that's equivalent to asyncio.create_task(), it just gives you the opportunity to specify a different loop if needed. I think the docs are just less explicit because they're aimed more at power users (since most people don't deal with multiple event loops).
He doesn’t really get into what makes this a Heisenbug, only that it’s indeterminate in nature. Would attaching a debugger/stepping through the code make it less likely that your task would get garbage collected out from under you?
Yeah, he seems to be re-defining the term to mean "a bug that occurs occasionally depending on system state" as opposed to "a bug that changes behavior when you observe it closely e.g. in a debugger."
TIL. I guess I assumed it would hew more closely to the Uncertainty Principle.
Edit: actually, come to think of it, I first heard of it in about 2006 from Jamie Brandon and at the time assumed it was something he'd made up. For a second there I forgot that 2006 is more than 10 years ago! (It was a python bug that went away when run in a debugger.)
CPython does most of its memory management by reference counting, which fails to reclaim circular structure. So to make sure it gets everything, it occasionally runs a conventional tracing GC. If the GC happens to run just after you create that async task, the task itself can get collected, it sounds like. It's good to know about this and is (my own editorializing) yet another reason Python3 should have used Erlang-style concurrency instead of this async stuff.
Thank you! I just did a quick `git grep` in a work code base and found one clear instance of this bug, and two more locations where I'm not 100% certain whether references are kept around long enough. Made a note to open a bug on Monday :-)
Another surprise in python's base library:
I knew that re.search searches for a regex match in a string, so I thought that re.match would match the whole string. I was wrong, re.match only anchors the regex at the start, not the end. re.fullmatch anchors it at both sides.
I felt very stupid when I found out; I started at my current work as a Perl developer and learned Python for a new job; but there are two more Python developers (with previous Python experience outside this company) on the same project, and none of them noticed the mistakes I made based on this misunderstanding.
I think asyncio is kind of neat for what it's good at, but beginner programmers who have never wrote code before are going directly to using Python asyncio (i know this because they are telling me so when they post sqlalchemy discussions). This is just wrong.
That's extremely insidious. I suppose I never encountered this issue because I almost always call asyncio.gather(*), which makes having a collection of tasks natural.
Another common async footgun I see is unthrottled gathering, and no throttling mechanism in the standard library. Once you gather an unspecified number of awaitables, bad things start to happen, either with CPU starvation, local IO starvation, or hammering an external service.
What I like about threads is they make dangerous things like this harder, and you have to put more thought into how much concurrent work you want outstanding. They also handle CPU starvation better for things that are latency-sensitive. I've seen degenerate requests tie up the event loop with 500 ms of processing time.
Huh! Unless you're using semaphores, you can also recreate similar situation with threads. Spin up a whole bunch of threads and send all of them towards some shared object or make 100s of requests with them.
There's not much difference between spinning up threads explicitly and creating async task with asyncio.create_task. In either case, you can throttle them with semaphores.
I don't have a source or affected versions, but semaphores can scale poorly. I vaguely remember each blocked acquire getting checked on every event loop iteration, or something silly like that.
Python has a few weird issues like this. The last one I encountered was with a class inheriting Thread, join and the SQL Server ODBC driver on Linux. Fairly sure I hit page faults thanks to a shallow copy on driver allocated string data but didn't have the time to investigate like the hero of this blog post.
wow. yeah, this absolutely explains a heisenbug that i've been chasing for a while. and i can't count the number of times i've had that exact doc page open on my screen in the last few months, and never bothered to read that block of text that starts with "important"...
asyncio.create_task() doesn't exist in 3.6, and I can't find the string "to avoid a task disappearing" in the doc, so I'll go out on a limb: there is no such doc. However I see the reference to weakref.WeakSet.
Some of us have been writing python since 2.x, and quite unsurprisingly wrote asyncio code at 3.6, and still happily support it. Some of us have even asked on HN about maintaining compatibility backwards and forwards between 3.6 and 3.11.
The documentation didn't exist at 3.6, when I wrote the code. I went and checked the source code, and the documentation and reported my findings. Good to know that there's a potential problem, don't you agree? What would you do differently?
I've got to say, I've never actually noticed a problem with "fire and forget" although I use it for more or less disposable tasks to begin with.
However, I've spent some more time looking through asyncio.base_events and
* BaseEventLoop._scheduled is a list()
* BaseEventLoop._ready is a deque()
There is no change between 3.6 and 3.11 in this regard. So this could be a nothingburger if you don't use asyncgens. OTOH I suppose better safe than sorry; the only question is whether no code addressing it is more mentally taxing for the bystander than having code and trusting its implementation.
Java had a similar but inverse problem in early versions. A counter-intuitive behavior that bit people and caused leaks.
If you instantiated a Thread, and then start() was never called, that thread object would leak. And thus potentially an entire graph of objects, via the references chains beneath it.
Obviously a thread that is never started seems pointless by design. But it could happen easily if, for an example, an error happened or an exception was thrown at some point between the instantiation line and the call of start().
The root cause was because Sun's programmers had made the early implementations of Thread get added to a ThreadGroup by default, under the hood. What would happen is that ThreadGroup stayed alive/reachable and thus it kept your app's thread object reachable too, and thus the GC would never clean it up. It was never eligible.
It ended up being the cause of a few weird leaks we saw in production.
IIRC in Java 1.4 or 1.5 Sun fixed it by ensuring the thread got cleaned up in those cases.
There's a GitHub issue to fix this, arguing that the doc fixes are insufficient, consider it to be a design mistake, and argue that it can be changed without breaking backwards compatibility (current GC behavior is not deterministic): https://github.com/python/cpython/issues/91887
Latest reply from GvR is invoking Chesterton's Fence. Here's to hoping the devs can quickly figure that one out and get this fixed. Per the linked issues [1] even the stdlib asyncio implementation was affected.
Hey, at least it's documented... good developers actually RTFM.
I can't comment on the design of this API, because I don't feel like learning the library, but in some performance critical applications these sorts of contracts aren't all that uncommon. Granted, this is python, I guess it's a bit more suspicious, IDK.
The documentation update is quite recent (Python 3.11). It was added after this ticket: https://bugs.python.org/issue44665 (not the first ticket around this problem).
His argument hinges on "I can't be bothered to read the docs on the stuff I'm using." So instead of reading the docs on coroutines and tasks before using them, writes a rant about how it's all wrong because he didn't understand how it works.
On a more fundamental level, why would anyone assume that a coroutine is guaranteed to complete if it is never awaited? There is no reason a scheduler could not be totally lazy and only execute the coroutine once awaited.
At least he bothered to make note of TaskGroups, also clearly shown in his documentation screenshot, immediately above the section marked Important that went ignored, and finishes with "As long as all the tasks you spin up are in TaskGroups, you should be fine." Yep, that's all there was to it.
> There is no reason a scheduler could not be totally lazy and only execute the coroutine once awaited.
Isn't the point of create_task (which is what the article is about) to launch concurrent tasks without immediately awaiting them? The example in the docs [1] wouldn't work (in the stated manner) if the task didn't start until it was awaited.
> At least he bothered to make note of TaskGroups [...] Yep, that's all there was to it.
That only works on Python 3.11, which was released just a few months ago. Debian still uses 3.9, for example, so the TaskGroups solution can't be used everywhere yet.
The reason I said "on a more fundamental level" is that I'm not talking specifically about Python and asyncio, but coroutines in general. Even for Python, there are multiple event loop libraries available, they do not all work identically, which is why multiple ones exist. Someone here mentioned Temporal Python which works differently from asyncio, and would have avoided the author's problem. If you don't know how the scheduler works, you can't assume that a coroutine is guaranteed to complete just because you yoloed it into the scheduler, no matter how convenient that might be for you.
Yes, TaskGroups are a recent addition. If you can't use Python 3.11 for whatever reason, there is also the clearly written code sample at the bottom of the create_task documentation, which the author did not bother to mention. Probably didn't make it that far.
The oop model (or any "client" model of state) dependent on state outside your code being encapsulated/contained inside an object within your program is always confusing. It's not particular to this library or python.
A gui window handle within your program, or simply an open file handle, if the OS does something to your object, it's hard for you to know about it, and your object continually needs to refresh its state if you are concerned about it. I don't know what is referred to as a "task" in this case, but I don't think the lifetime of the actual task is the issue, it's the lifetime or your object.
It's always the case that if you instantiate an object with a ctor, you can't count on anything about it continuing if the dtor is invoked. The problem is much more general than this API, this library, this language, this use case. Just as you need to structure your C code so malloc and free will always match up spatially and temporally, you need to structure your oop code so ctors and dtors match up sensibly. Otherwise the confusion in your head will spread to your code.
And those who always want the compiler and tools to automatically do as much as possible to free them from the burden, are the ones who are most surprised by Heisenbergs. Computers can (or will try to) do anything, it's up to you to make sure it does what it needs to. Maybe someday AIs will do it better than us, but right now you need to provide the I.
Well, looks like I know what I am doing first thing on Monday. I converted a bunch of code to asyncio a while back. I have yet to run into any heisenbug in that code and want to keep it that way.
I've been working on a PySide6 application recently using asyncio. I read the docs but totally overlooked the requirement to hold references to tasks created with `create_task()`.
Why is this so common? Do people seriously not read a language/library documentation? That's the absolute first thing I do when evaluating a technology.
Because people have deadlines and need to get things working. You read enough to figure out how to do what you need to do and then mostly move on.
This function was added in 3.7 with no note on the importance of saving a reference. In 3.9 a note was added "Save a reference to the result of this function, to avoid a task disappearing mid execution." which was then expanded with the explanation of a weak reference in 3.10.
It absolutely is common. People see there is a len function that takes one argument, they call len(some_collection), see that it indeed returns the number of items in the collection like they expect and move on. They don't expect len to return a negative number instead on Thursdays, and of course it doesn't because that would be a pretty big footgun. People also see that there is a create_task function that takes a coroutine, they call create_task(some_coroutine), see that the coroutine indeed runs like they expect, and move on. Sure, you're supposed to await the result, but maybe they don't need the awaited value anymore, only the side effects, and see that it still works.
I've used asyncio.create_task forever, and I've admittedly never read its documentation in depth.
However, I've ALWAYS assigned the return value of create_task to some variable.
To me this is just a good programming practice. The OP says "tasks are not like threads - that you can just launch and forget" - no! Even a thread should not simply be launched and forgotten! You always need a reference to it, so when your application is terminated you can join all the threads that are still running and things can exit in a clean way! Same goes for the tasks: when your application exits, it's just a good practice to stop the even loop and cancel any pending tasks.
And, in general, one should always keep in mind the reference count rule in Python (the author incorrectly calls it "garbage collection" btw): if something you created in your function isn't referenced/assigned to anything, then its reference count will be zero, and it will be removed when the function stack unwinds. This is totally expected behaviour to me.
Thank you for this. This is really useful information.
I recently adapted some garbage collection code to add register scanning.
I can imagine all sorts of subtle bugs where things go away randomly. One problem I have with my multithreaded code is that sometimes a thread crashes and the logs are so long I don't notice. From my perspective the thread is just not doing anything.
Sometimes the absence of behaviour can be really tricky to debug!
I missed this in a little curses program launcher I wrote.
It looks obvious when he puts a big orange box around it, but in the actual docs it's an unassuming paragraph between two border-wrapped blocks with the only distinguishing feature being the bold "Important".
It should probably be referenced immediately next to the "Return the Task object" sentence.
I'll +1 the Trio shoutout [1], but it's worth emphasizing that the core concept of Trio (nurseries) now exists in the stdlib in the form of task groups [2]. The article mentions this very briefly, but it's easy to miss, and I wouldn't describe it as a solution to this bug, anyways. Rather, it's more of a different way of writing multitasking code, which happens to make this class of bug impossible.
Delphi had the opposite bug in its thread pool. The worker threads would dequeue a work item and process it in a loop. The work items were reference counted.
Now, Delphi doesn't have scoped variable declarations like say C++ or C#, so the dequeued work item was stored in a local variable. However, it didn't drop (nil/null) the work item reference before it looped. Thus it would hang on to that reference until the next work item got dequeued or the pool was destroyed.
The result was that if you in a function started a task which captured a local reference (f.ex. using an anonymous function) and then waited for it, that reference could live after your function returned if the pool didn't have anything else to do. Not what most people would expect.
If I want to create a task that runs even after the function returns, ie "async def f(): asyncio.create_task(coro=10_second_coro.run()); return;" is there any way to mitigate this? Function-scoped set of tasks?
Your task is implicitly not function-scoped as you want it to survive exiting the function. What your doing here would be better architecturally done with threads. async is not a direct replacement for threading.
But, you could also return the task object to the caller and have them manage it. There's also nothing async about your function, so you don't need the async or to await it.
A little pedantic but HUP concerns the fundamental limits of simultaneously knowing a particle's position and momentum, not about observation impacting outcomes.
The same problem or something similar exists in many languages. Threads are GC roots because the OS knows about them, but this may not be true for lightweight threads or async callbacks.
It is hard to fix because you don’t want to introduce references from an old object (such as a list of callbacks) to many new objects as that will introduce GC issues, and many other potential leaks.
This article just makes me feel like Python, while a language with nice-ish syntax, is a language that was poorly hacked and put together with little concern/thought about the real-world implications of poor design decisions like this async design decision (and also dynamic typing – a terrible thing in any language).
Most languages have something like this, usually around async.
For instance NodeJS has had a bit of this around promises, and eventually needed to institute the rule “if a promise rejects with an error, anf nobody is around to hear it, we will crash your program on the assumption that you probably needed to clean up some resources but didn't and now they're going to leak. Listen to the error with a handler that does nothing, if we are wrong about that.”
> "Async seems to be the first big "footgun" of Rust. It's widespread enough that you can't really avoid interacting with it, yet it's bad enough that it makes..."
I don’t find this behavior odd at all. Dereferencing unassigned values is normal Python garbage collector behavior. Threads are an exception (no pun intended), but they’re an exception in lots of ways - just try pickling them.
That seems very odd for a default behavior. There might be good reasons to allow GCing of a dropped task reference, but it doesn't seem like that would be the most common case.
In many years since asyncio has been added, I have never used it willingly, outside of the cases where a third-party library required it. There has never been a practical benefit for any of that stuff when compared to select. It always worked poorly and never justified the effort one has to put into writing code that uses the library. The behavior OP describes is just one of the many bad design decisions that are so characteristic of this library.
This one seems quite easy to fix - just have the scheduler check that task objects have a positive refcount before running them, including the first time.
And also it's great information that I - like I'm sure many of you - also never noticed. THANK YOU!