A Heisenbug lurking in async Python

edfletcher_t137 · on Feb 11, 2023

This is a great blog post. Concise, lacking fluff or extraneous prose, it gets right to the point, presents the primary-source reference and then gets right to the solution. A bit of editorializing in the middle but that's completely allowed when writing this tightly. Well damn done, OP.

And also it's great information that I - like I'm sure many of you - also never noticed. THANK YOU!

isoprophlex · on Feb 11, 2023

Well, I don't know, I kinda miss the human angle. I'd have loved to first read six paragraphs about how the author's grandmother raised them on home grown threads and greenlets :^)

nickjj · on Feb 11, 2023

> I'd have loved to first read six paragraphs about how the author's grandmother raised them on home grown threads and greenlets.

With recipes, often times your problem is you want to learn how to make something where having the steps listed out is the most important thing. The story behind the recipe isn't important to solve your problem but for tech the story around the choice is important. Often times the "why" is really important and I really like hearing about what led someone to use something first. Often times that's more important or equally as important as the implementation details.

It wouldn't make sense for this post given its title but if someone were making a post about why they chose to use async in Python I'd expect and hope that half of the post goes into the gory details of how they tried alternatives and what their shortcomings were for their specific use cases. That would help me as the reader generalize their post to my specific use cases and see if it applies.

bialpio · on Feb 11, 2023

Off-topic but the life story is there to make them eligible to be protected by copyright. IANAL.

Source: https://copyrightalliance.org/are-recipes-cookbooks-protecte...

iudqnolq · on Feb 11, 2023

For some reason whenever this comes up there'll be one person saying "I bet you didn't know it's for copyright" and another saying "I bet you didn't know it's for SEO". I've yet to see either prove anything beyond that it's a plausible explanation that could fit the minimal known facts.

jackthetab · on Feb 12, 2023

I can't find the reference atm, but Jeff Jarvis (https://en.wikipedia.org/wiki/Jeff_Jarvis) says that this _is_ to get around copyright law; the same technique was being done over a hundred years ago (possibly more, that's why I want to reference!). Instead of blogs, think pamphlets.

The More You Know...

WorldMaker · on Feb 13, 2023

I feel alone sometimes as seemingly the one person adding "I bet you didn't know it is because blogs sometimes have regular readers and regular readers drive ad revenue more reliably than SEO or concerns about copyright". I think it fits the facts better, by far, but it's really interesting to see how many HN commenters under-estimate the readership of cooking blogs because they only ever interact with recipes as utilitarian data at the end of a specific web search and aren't often themselves the sorts of people who are the regular audience of recipe blogs so they discount that such audiences exist. (I had my cooking show phase and have friends and loved ones that hooked on the stories of some of the bigger name recipe blogs and I love letting them tell me about the sorts of things they are reading. It's amazing the blinders that the general HN commenter sometimes wears without realizing it.)

flandish · on Feb 11, 2023

Interesting. I always thought it was search engine optimization.

aidenn0 · on Feb 11, 2023

SEO is definitely a big part of it; Google penalized pages where people closed or navigated away quickly.

fbdab103 · on Feb 11, 2023

I immediately bounce from those Stackoverflow clones that keep appearing up at the top of searches. So, I am wondering how much this is still weighted in the scores.

jonas21 · on Feb 11, 2023

You might. But many people don't. They just want an answer and don't care if it's a clone or not.

gdprrrr · on Feb 11, 2023

https://github.com/quenhus/uBlock-Origin-dev-filter

kevin_thibedeau · on Feb 12, 2023

Leaked Google code:

  if(bounced && hosts_doubleclick_ads) pagerank++;

water-your-self · on Feb 11, 2023

Its engagement optimization. Adsense pays more if you spend more time on the page

rmbyrro · on Feb 11, 2023

SEO makes total sense. I always add grandma keywords when I'm searching for Python stuff on Google.

Like: "grandma, how the hell have I still not memorized the API and keep needing to resort to the same doc pages again and again?"

Now I trained ChatGPT with grandma letters from when I was young, so it will answer just like if it was my grandma.

bialpio · on Feb 12, 2023

No reason why it can't be both. The only way to know for sure is to ask the author of the recipe that you're reading. :)

yunohn · on Feb 11, 2023

When is the last time you heard of online recipe blogs enforcing copyright claims on other blogspam? Ridiculous.

The real reason is simple, people who write recipes aren’t robots - they’re expressing their stories and emotions, while explaining how to make food that’s dear to them..

fsckboy · on Feb 12, 2023

>people who write recipes aren’t robots - they’re expressing their stories and emotions, while explaining how to make food that’s dear to them..

the people who write recipes aren't robots, they're narcissists and various forms of insecure and seek validation in the form of attention and adulation from others. That's not a bad thing, it's all too human and we should embrace, not stigmatize, the needy, but if all you want is a recipe rather than to be an acolyte it can seem like a big ask.

You enjoyed time with your grandparents, and you remember it? Welcome the club! and I remember family as much more complicated than simply being all fun, and I feel like you might be Norman Rockwelling a bit.

afiori · on Feb 12, 2023

The theory that recipes are written to make you scroll at least a full screen to show more ads seems much more plausible

fsckboy · on Feb 12, 2023

yes, but you're cart before horse. The structure of paper newspaper and magazine recipe articles and "prose remembrance" cookbooks is the same. Tweaking that model for online use, very plausible, but it's also the way these "stories" are told anyway.

yunohn · on Feb 12, 2023

> they're narcissists and various forms of insecure and seek validation in the form of attention and adulation from others

This is incredibly insensitive and judgemental. Not sure what I expected from HN, I guess...

Why are these "narcissistic" people obliged to provide you with formulaic recipes for free? If the cost is perusing over their feel-good story, I feel it's a fair trade-off.

account42 · on Feb 13, 2023

How does that work though? Surely the life story won't make the recipe itself more copyrightable, so anyone can still extract and copy that.

gigatexal · on Feb 12, 2023

Really? Hmm. Had no idea

toyg · on Feb 12, 2023

It's not about recipes and tech blogs. It's about how American journalists have been taught to write, ever since the days of Capote and Hemingway. Everything else flew from there.

As an European, this is painfully evident every time I read something from a US journalist: I have to fast-forward through several paragraphs of useless "human angle" before I can get to the actual meat of the article.

Unfortunately the rot is spreading further and further every year.

rendaw · on Feb 12, 2023

For recipes, it also signals effort and provides a hint about quality. There's a lot of low effort, broken, "would anyone enjoy eating this?" recipes out there dumped on recipe sites. A few pages of text says that the author thought the recipe was at least worth that amount of effort, and usually confirms that the author thinks the recipe is at least as good as X other recipes, etc.

mgsk · on Feb 11, 2023

What does this add this isn't already right there in the documentation?

Jtsummers · on Feb 11, 2023

It draws attention to a problem that a lot of people have created for themselves by not reading the documentation (or not recalling it if they read it). I guess the author could have just linked the documentation but then they couldn't have added the additional context of the github search demonstrating how common it is.

newaccount74 · on Feb 11, 2023

I must have looked through the docs for create_task a dozen times while trying to figure out how async/await works in Python but still managed to overlook this part.

edflsafoiewq · on Feb 11, 2023

That is unsurprising. It was first added as a brief note only in 3.9, and expanded to its present length only in 3.10.

wnolens · on Feb 12, 2023

Same.

klyrs · on Feb 11, 2023

The author doesn't go into much detail on that point: this warning should be present in documentation of many Python libraries that use create_task and return the result to the user unless that library stores those tasks in a collection as is recommended -- at which point the library author had better roll their own garbage collection!

nkrisc · on Feb 11, 2023

If there was nothing to add then there wouldn’t be loads of projects on GitHub making exactly this mistake.

boomskats · on Feb 11, 2023

As someone who happens to be eternally grateful to the author for his contribution to the Python ecosystem [0], I kinda feel like this comment thread is overreacting to his overreaction. When I look at this post all I see is a useful, well explained, byte-size writeup that a search engine might recommend to someone looking for help in writing async Python.

Maybe it's because a bunch of my friends are Scottish and I get their sense of humour.

[0]: https://rich.readthedocs.io/ (yes I'm talking about the fancy new progress bar that pip got recently)

vore · on Feb 12, 2023

To me, the surprise here is that usually you don’t expect Python finalizers to do something like this: when they dispose something, it’s usually unobservable from the perspective of the program, e.g. an unreachable file descriptor. Here, the runtime is disposing something that is still observably in progress, which is surprising behavior.

quietbritishjim · on Feb 12, 2023

I do wish he dwelt on task groups a bit more at the end. Many comments here seemed too have missed that bit. They're not just a handy way of executing a hack. Instead, they're a revolutionary way (ok maybe that's a but string but not much) to structure your async program to avoid a whole host of bugs.

A code snippet would have been nice, or a link to the blog post that introduced them (in trio, another async library): https://vorpus.org/blog/notes-on-structured-concurrency-or-g...

ravloony · on Feb 12, 2023

I also came here to say this. Structured concurrency gets rid of these types of problems.

quietbritishjim · on Feb 12, 2023

*bit strong

samwillis · on Feb 11, 2023

This is one of many reasons I'm sceptical of the current trend in Python to "async all the things". The nuance to how it operates is often opaque to the developer, particularly those less experienced.

GUI toolkits (like Textual) however are a really good use case for Asyncio. Human interaction with a program is inherently asynchronous, using async/await so that you can more cleanly specify your control flow is so much better than complicated callbacks. Using async/await in front end JS code for example is a delight.

Where I'm particularly unconvinced of their use is in server side view and api end point processing. The majority of the time you have maybe a couple of IO opps that depend on each other. There is often little than can be parallelised (within a request) and so there are few performance gains to be a made. Traditional synchronous imperative code run with a multithreaded server is proven, scalable and much easier to debug.

There are always places where it's useful though, things such as long running requests (websockets, long polling), or those very rare occurrences where you do have many easily parallelizable IO opps within one short request.

nbadg · on Feb 11, 2023

The thing is, there's a lot more nuance to it than this. Async/await is part of the language syntax in python, but asyncio is only one particular implementation of an event loop framework to power it. But really what async/await provides is a general-purpose cooperative multitasking syntax. This allows other libraries to implement their own event loop frameworks, each with their own different semantics and considerations (the two best-known alternatives being Curio and Trio). At a language level, there's nothing even forcing you to use async/await for ascync IO -- you could, if you really wanted, probably write a library that used it to start threads and await their completion.

So you have, from highest-level to lowest-level: application code, async/await language syntax, the event loop framework, and then the implementation of the event loop itself. The OP article concerns a peculiar implementation detail in the lowest level that makes it very easy to write bugs at the highest level.

But that means that even if you do "async all the things", you'll only encounter this situation if you write your application code in a particular way. It just so happens that "in a particular way" is, in this case, the overwhelming majority of how people write it, which is, of course, why the OP article is relevant.

LtWorf · on Feb 12, 2023

> This allows other libraries to implement their own event loop frameworks

At work someone replaced the default library with another faster implementation.

Then the unix socket listener task was not working.

A few hours of git bisect later, I found out the offending commit was the 1 line switching the event loop. Seems the fast implementation didn't implement unix sockets and just had "pass" in the function.

heavyset_go · on Feb 11, 2023

> The OP article concerns a peculiar implementation detail in the lowest level that makes it very easy to write bugs at the highest level.

Are other async implementations using the asyncio.Task abstraction? I haven't looked into it, but I assumed that asyncio.Task was tied to the asyncio implementation and event loop.

nbadg · on Feb 12, 2023

asyncio.Task is part of the asyncio event loop framework. So any event loop implementation that conforms to that framework will have to have one (including the default event loop implementation that ships with asyncio). So for example, uvloop, which is an alternative event loop implementation that works with the asyncio event loop framework, also uses asyncio Tasks.

Other event loop frameworks can do whatever they want, and presumably, wouldn't be importing from asyncio. Whether they have a similar abstraction is completely up to the framework itself. Trio, for example, doesn't have a concept of a task object at all, because it enforces a strict tree structure for tasks.

samsquire · on Feb 11, 2023

I am a huge fan of parallel and async code. I spend a lot of time researching it and trying to design software that is easily parallelisable.

Many GUIs use the event/message pump pattern, such as Windows 32 API. Qt does something with its event loop (QEventLoop)

Threads are a rather low level instrument to get background tasks going because the interface between the main thread and the threads is rather omitted.

In Java you could use a ConcurrentLinkedQueue. And in Python you can use JoinableQueue.

I am heavily interested in this space because I want to write understandable software that anybody can pick up and work with. I worked on a JMS log viewer that used threads but would crash with ConcurrentModificationException due to not being thread safe. I changed it to be thread safe but its performance dropped through the floor. In my learnings since then I should hast sharded each JMS connection topic to its own thread or multiplexed multiple JMS topics per thread and loop over them. The main thread can interrogate the thread with a lock, that should be faster than every thread trying to acquire the lock. It would be driven by the main thread but the work is done in the background. The threads can keep the fetched messages in memory until the main thread is ready for them.

I think with the right abstraction, thread safety can be achieved and concurrency shouldn't be something to be afraid of. It is very difficult and challenging working at the low levels of concurrency such as a concurrent browser engine. (I've not done that though.)

This is why languages such as Pony lang, Inko, Cyber and Erlang, Elixir are so promising. We can build high performance systems that parallelise.

Writing an async/await pipeline that looks synchronous is far easier to understand and maintain than nested callbacks. So I can see where async is useful. I just hope we can design async software to be simpler to maintain and extend.

heavyset_go · on Feb 11, 2023

> Where I'm particularly unconvinced of their use is in server side view and api end point processing. The majority of the time you have maybe a couple of IO opps that depend on each other. There is often little than can be parallelised (within a request) and so there are few performance gains to be a made. Traditional synchronous imperative code run with a multithreaded server is proven, scalable and much easier to debug. Traditional synchronous imperative code run with a multithreaded server is proven, scalable and much easier to debug.

Python doesn't have multithreading that scales or supports real parallelism. asyncio has very measurable performance benefits for exactly that use case you've mentioned versus threaded servers.

zzzeek · on Feb 11, 2023

Sorry that's not accurate. Asyncio and threading offer the same variety of "parallelism" , which is that both can wait on multiple io streams at once (the gil is released waiting on io). Neither offer CPU parallelism, unless lots of your CPU work is in native extensions that release the gil. In that unusual case, threading would offer parallelism where asyncio wouldn't.

Asyncio's single advantage is you can wait on lots of io streams, like many thousands, very cheaply without having to roll non blocking IO queueing code directly.

heavyset_go · on Feb 11, 2023

I didn't say that asyncio offered parallelism, I'm pointing out that normal assumptions about multithreading you'd make with other languages don't always apply to Python. You'd typically assume that threads offer parallelism, a property you might choose to use them for over something like single-threaded asyncio.

I've found that for even IO bound workloads, the amount of throughput plateaus when using a relatively small amount of threads despite the GIL being released on IO.

zzzeek · on Feb 12, 2023

sorry for misreading that from your post! my own benching with threads vs. asyncio has never found any performance difference between the two approaches (asyncio slightly slower). if you need very wide throughput, then yes asyncio is better. otherwise, it's very difficult to create equitable comparisons between threaded and asyncio code.

heavyset_go · on Feb 12, 2023

Haha, it's cool. I once had a problem where I needed to scrape a bunch of sites at once, and ended up using it as a benchmark for threads vs asyncio. The overhead with threads was high even when using a handful of threads, and quickly hit a plateau after scaling to 4-8 threads on a 16 core machine. I was a big fan of the concurrent.futures executor model and generally thought it was good performance anyway, but figured I'd try to implement it in asyncio, and rate limited both for a fairer comparison. It was so shockingly fast that it was the last project I reached for threads again with on Python. I can't say that it was an equitable comparison, but it certainly left an impression on me.

zzzeek · on Feb 13, 2023

it was probably whatever concurrent.futures does. I use plain threads and get great results for anything under 25 threads on a small laptop. you always keep the same set of threads running in a pool and use a decent queueing model.

certainly, doing that kind of thing with asyncio likely has a lower learning curve because you don't need to worry about "pools of workers" so much. concurrent.futures should do that for you also but that would be the first place I don't trust

pdonis · on Feb 11, 2023

> GUI toolkits (like Textual) however are a really good use case for Asyncio.

Only if the GUI toolkit is explicitly written to be asyncio-aware and use asyncio's event loop. Textual appears to be written specifically to do that.

However, other GUI toolkits that I'm aware of that have Python bindings aren't written that way. Qt, for example, uses its own event loop, and if you want anything other than a GUI event to be fed into Qt's event loop so your event-driven code can process it, you have to do that by hand and make sure it works. There is no point in even trying to use another event loop, such as Python's asyncio event loop, since that loop will never run while Qt's event loop is running.

harpiaharpyja · on Feb 12, 2023

There's a qasync library that I've used to bridge asyncio and Qt, it works pretty well. But yes, you need some extra support.

traverseda · on Feb 11, 2023

>Where I'm particularly unconvinced of their use is in server side view and api end point processing.

Sure, performance isn't going to get better, but for websockets and server sent events the occasional long-lived async task can be great. Especially when you need to poll something, or check in on a subprocess.

Topgamer7 · on Feb 11, 2023

These days with graphql, or complex microservices architectures, you could have multiple hops to fulfil l the original request.

Flask sync will hold that thread hostage until the request is done. Where async with properly used async libs will allow other requests to process.

We often have medium sized reports take seconds. That is a lot of time to wait. And would just end up bloating your service scaling to handle more connections.

Any service with decently long lived network requests will benefit from event loop handled scheduling.

jsyolo · on Feb 12, 2023

I don't think he was referring to python when he mentioned server-side, but to real multi-threaded runtimes.

whoopdeepoo · on Feb 11, 2023

I don't write any colored function code in python, I'd much rather work with process/thread pools

Animats · on Feb 11, 2023

Me too, but threading is botched in Python. Not just the Global Interpreter Lock. Some Python packages are not thread-safe, and it's not documented which ones are not. Years ago I discovered that CPickle was not thread safe, and that wasn't considered a problem.

jsyolo · on Feb 12, 2023

Since the GIL is there and it restricts multithreaded issues to happen, why would python packages be thread-safe?

Jorge1o1 · on Feb 12, 2023

You can still have thread safety issues with the GIL in place, because globals and other data is shared between threads.

For example, you can put a dictionary at the module level, thread A can set a key in that dictionary like “name”, thread B can overwrite it, and then thread A comes back, does dct[“name”] and gets an unexpected answer.

This is a relatively easy mistake to make, a lot of python code has module level variables.

hgomersall · on Feb 12, 2023

I really don't get the whole coloured function thing. How's it not just the function signature? You might as well claim a new function argument makes a new colour. Granted, all of my use of async is in Rust in which the compiler picks this stuff up, so maybe in python there are other concerns I'm missing.

WorldMaker · on Feb 13, 2023

I think the "new function argument makes a new colour" is accurate on several levels. async/await is a dual for streamlining working with the Promise or Future or Task Monad (however you want to call it). (And async/await syntax in most of the languages that have it can actually be [ab]used for a substandard "do-notation" for nearly any Monad you want to use.)

At face value, yeah, every function in Haskell that accepts an IO monad is now "IO colored", but at the same time, that's a silly way to look at it. It's just extra type information and type bindings flowing as types flow through functions. It's just a bit of a tautology that functions that deal with other functions that need that type need to deal with that type themselves.

Functions that use Maybe/Option/nulls are all "nullable colored". Functions that use or return integers are clearly "integer colored". That's what programming languages do: they try to track how your types flow through functions. "Coloring" is a bad metaphor or at least a useless one, we just call that "types". Admittedly, I think that's why Python and JS users predominantly use the "what color is your function" complaints the most because all of the rest of typing information for them is generally opt-in and easily ignorable/forgotten.

harpiaharpyja · on Feb 12, 2023

Honestly, I think the whole colored function thing is BS. I've never run into an issue with it.

michael_j_x · on Feb 11, 2023

I am not sure I agree that the GUI is a good use case for async. A human interaction with the program must almost always pre-empt whatever the program was running, so I can not see how a cooperative multi-threading runtime like async Python can work in such a scenario.

harpiaharpyja · on Feb 12, 2023

Well it does work so "it can't work" isn't a very substantial criticism.

Is there some nuance or detail that you meant to include?

slewis · on Feb 11, 2023

The response here makes me think most commenters don’t have experience with this particular footgun.

To clarify: Python can gc your task before it starts, or during its execution if you don’t hold a ref to it yourself.

I can’t think of any scenario in which you’d want that behavior, and it is very surprising as a user.

Python should hold a ref to tasks until they are complete to prevent this. This also “feels right”, in that if I submit a task to a loop, I’d think the loop would have a ref to the task!

It’d be interesting to dig up the internal dev discussions to see why the “fix this” camp hasn’t won.

kevincox · on Feb 12, 2023

I can see this behaviour being useful if you are no longer interested in the result of a "pure" task. For example imagine fetching some data via HTTP. If you no longer need to response canceling the request could make sense.

But I agree that this is unexpected and most code probably isn't ready for being cancelled at random points. (Although I guess in Python your code should be exception safe which is often similar)

account42 · on Feb 13, 2023

If you want to avoid an expensive operation in cases where the result is no longer needed then surely you'd want to cancel the task as soon as you know that and not at some indeterminate time when the GC runs so I don't think this behavior makes sense even for that scenario.

hgomersall · on Feb 12, 2023

That would make tidying up rogue tasks impossible. Of course we all like to think we do cancellation perfectly, but it's nice to know that the task scheduler has your back.

Edit: I don't quite understand why a user would expect a task to remain live _after_ the last reference to it has been dropped...

int_19h · on Feb 14, 2023

Because they have expressed the intent to run it by scheduling it on the event loop?

I don't follow the argument wrt tidying up rogue tasks. What does it mean for the task to be "rogue"? If there was some state change that made the task redundant - because it clearly wasn't when it was submitted! - then the code that makes that change, or some other code that observes it, should cancel it. If it isn't cancelled, the fact that nobody is able to observe the value that the task will yield is not sufficient to auto-cancel, as there may still be a dependency on side effects.

And, speaking of tidying up, what if the scheduled task is the one that performs some kind of cleanup?

hgomersall · on Feb 14, 2023

One might argue that dropping an object should result in it being deallocated, as is normally the case with RAII languages, which in the case of a task is to be stopped and tidied up. Speaking from experience with both threads and async Rust tasks, I find the async case in which tasks are dropped when the references are dropped is much easier to work with (once I overcame my prior expectation based on threads). If you want to keep a task alive, then explicitly keep it alive by holding on to its handle.

If you have a scheduled task to clean up, then you need to manage it at whatever level that occurs. You signal to notify completion of clean-up to the top (or whatever) level. It's no different to signalling the other way to notify of shutting down.

hgomersall · on Feb 15, 2023

It's worth noting that, at least in rust tokio, you can spawn a background task. Not sure if there's anything similar in python.

BerislavLopac · on Feb 12, 2023

A possible solution might be a built-in dunder flag to explicitly tell the interpreter not to get rid of a specific object. Something like __keep_alive__ or similar.

dataflow · on Feb 11, 2023

The notion of fire-and-forget is itself the problem. Even with threads, you should have them join the main thread before the program exits. Which implies you should hold strong references to them until then. Most people don't go out of their way to do this even when they're able to, but that's what you're supposed to do.

SamReidHughes · on Feb 12, 2023

I came here to write this comment. Also, you usually need to have some means of canceling the task -- otherwise you have to wait for them to finish, or you leak these stray lost tasks that are doing stuff, like manipulating the state of things.

_cs2017_ · on Feb 12, 2023

Why do you need to join?

Let's say I write a task that updates a progress bar as an infinite loop, and let it be gc'ed on program exit, without ever joining it. What's wrong with that design? I can, of course, modify the task to check a flag that indicates program completion, and exit when it's set. But does this extra complexity help the code quality in any way?

Or suppose I spawn a task to warm up some cache (to reduce latency when it's used). It would be nice if it completes before the cache is hit, but surely not at the cost of blocking the main program. I just fire-and-forget that task. If it executes only after the cache was hit, it will realize that, and become a no op. Why would I want to join it at the end? It may not be free (if the cache was never hit, why would I want to warm it up now that the main program is exiting?).

dataflow · on Feb 13, 2023

I might not have an answer you'll find convincing, since this is somewhat subjective, as the "should" here isn't based on a "need". The best analogy I can give here is that you don't "need" to, for example, avoid circular references in Python. Some believe it's better design to avoid them if you can. I find value in making things, say, more predictable and deterministic when possible.

Like what, for example? Well, one very simple yet practical one is that when you break into your program with a debugger, you want to minimize noise - any thread or object that's alive unnecessarily is, at the very minimum, extra noise for you to deal with, and at worst, extra surface for a bug to creep in. Moreover, the liveness of the thread/object could provide you with a vital bit of information that you otherwise wouldn't get. Another one is the fact that it lowers the number of obstacles you'll have in the future if you ever want to do something less common - such as suspending a GC, snapshotting the program state, or any number of less common things. Yet another one is the very fact that following the pattern more broadly helps you and future maintainers avoid pitfalls that arise in similar abstractions, like they did here. I could go on, but all of these concerns are basically a bunch of things whose values mostly lie in the potential future, not the present.

There are many other more practically-minded folks who believe the presence of a GC exempts them from caring about such concerns, and see these as adding extra complexity. If you see it that way, I don't have a compelling rebuttal. But if "complexity" is your criterion, perhaps what I can offer is that you can also view it from the opposite standpoint: following the fork-join pattern (or avoiding circular references, etc.) itself avoids complexities that arise from not doing so [1], such as those in the previous paragraph. It's just that not every form of complexity or cost materializes immediately.

[1] Note that complexity is not just a measure of code size, but also the deviation of its behavior from expectation. You can make code more complex to reason about merely by deleting some lines, and that could include a thread.join() call.

_cs2017_ · on Feb 13, 2023

You make very good arguments in favor of joining threads in most cases, and I completely agree with you. Perhaps the only disagreement we (may?) have is that I think these arguments may not apply in some cases.

In my first example, I would probably find not joining the thread cleaner than joining (since it would require extra code to rewrite the infinite loop into something joinable, and since the earliest time I can join is at the very end of the program anyway).

In my second example, your arguments are persuasive. It is very likely that there is a place in the program where the cache warming is no longer a good idea (for example once the real traffic started hitting the cache, it's probably too late; in fact warming up at that stage is probably a bug, since it may divert resources from serving the actual user traffic). So yes, in my second example, I now think it's better to either join or cancel the task.

dataflow · on Feb 13, 2023

Thanks! Regarding your first example, I'm not entirely sure I understand it. If you have a task that updates a progress bar as an infinite loop... does that mean the task never finishes? What does "progress" even mean for something that goes on infinitely long? What happens if the GUI is destroyed in the middle of that thread's lifetime (which it will be by the main thread, if the secondary thread runs forever)? How many threads do/should you end up with if you later realize you want to run your program itself (i.e. your main()) function multiple times?

The main cases I can think of where joining might not make sense is when you simply don't have the capability to do so in a reasonable manner, like when the main thread is in third-party code that you have no control over. Otherwise, if I understand the example correctly, you absolutely need to join such a thread - and not merely at program exit, but sometime before the GUI is destroyed.

sidlls · on Feb 12, 2023

Conveniences like this library and other threading libraries make it easy for people to trivialize something (concurrent programming) that ought not be.

Too · on Feb 12, 2023

This. Even if you hold a reference to the task, your program very likely has a bug. At some point you should always await it to see if it failed or not.

It's easy to miss this if you observe completion via a side-channel, for example item removed from a queue. But this is also a bad way to write tasks in the first place, let them return meaningful data rather than mutate shared objects. That way you are forced to await them and your code becomes much more straightforward. It's counter-intuitive at first if you think in threads, because there you are more used to worker pools and such, whereas asyncio tasks can be written in a more linear way and don't background workers to the same extent.

After having done this mistake several times, I've concluded one should almost never use create_task. It's much better to place them into a top-level list of background tasks, that is always awaited, using this method they are both started automatically and always awaited for errors appropriately.

dataflow · on Feb 12, 2023

> At some point you should always await it to see if it failed or not.

What you're sayin is correct, but doesn't quite imply what I'm saying. I'm saying everything that you spawn asynchronously (be they threads, tasks, whatever) needs to be joined - even if they're no-ops whose success or failure is irrelevant. This is similar to how you should always make sure to deallocate memory that you dynamically allocate whenever you can, as a matter of good practice and good hygiene. Sometimes you can get away with not doing so, but you shouldn't really skip it unless you don't have a choice, as it makes the program logic clearer and can make the program more robust too. (e.g., imagine running your main() in a loop where threads are spawned each time but never guaranteed to join.)

fernandotakai · on Feb 12, 2023

one of the reasons i was never "bitten" by this bug: whenever i use python tasks, i save them in a collection so i can cancel all of them when the program is ready to quit.

it makes cleaning up after myself a LOT easier.

bornfreddy · on Feb 11, 2023

Wow. What a strange design decision, as evidenced by sheer number of developers who don't / didn't know about this (myself included). I hope this gets fixed instead of just documented.

jcheng · on Feb 11, 2023

Agreed, I’m really surprised at all the comments defending this behavior. I suspect there is a non-obvious reason why it’s this way, but “you should’ve read the docs” and “but why wouldn’t you hold your own strong reference” are weird takes IMHO.

hitekker · on Feb 12, 2023

Bugs stemming from the architecture of a poorly specified system become insecurities for the people who rely on that system.

One of the major reasons why Python’s leadership refused to optimize python’s performance, besides Guido’s intransigence, was because they treated the CPython implementation as the specifications.

Legions of script kiddies built their programming identities around the belief that python must be slow, because to admit otherwise would require changing the system.

int_19h · on Feb 14, 2023

It was the Python community that consistently treated CPython behavior as the spec, in many cases contrary to explicit statements in the official docs to the effect that "this is a CPython-specific thing" etc, the most notable example being reliance on deterministic refcounting and on the GIL.

But that's not what makes performance hard to fix. It's rather the fact that most Python code out in the wild depends on packages written in native code, and the CPython ABI for said packages exposes way too many implementation details. If you ditch ABI compatibility, you can ditch GIL, for starters (see e.g. IronPython and Jython) - but few people are willing to make do without all the affected packages.

hitekker · on Feb 15, 2023

Blaming the followers for not reading the docs is an easy excuse, when the real problem is that the leaders have failed to define Python. The PSF needs to decide what precisely Python is and what it isn't among themselves. Because they did not do so, they can't tell if something is a bug or intentional. See this entire thread as proof.

In lieu of a formal definition, the maintainers resort to a hodgepodge of user docs, PEPs, mailing lists, and the "reference implementation". Worries about making that "reference implementation" more complex stymied Python's development. There's been flamewars about this topic with the PSF and their apologists.

On that note, Rust encountered a similar problem of bad decision-making by maintainers, who also were opposed to specification. They have ejected those maintainers and replaced them with ones who understand the need for a formal definition of their language.

https://blog.m-ou.se/rust-standard/

int_19h · on Feb 15, 2023

I'm not arguing against the notion that Python doesn't need a more rigorous language spec. But I don't think that's the blocker for alternative implementations with significantly high performance - the parts that are well-documented already provide a lot of leeway to optimize if only user code didn't rely on it.

And if users don't read the docs today, I don't see why they'd suddenly read a formal spec tomorrow if one is available. The problem is that Python got successful specifically in form of CPython, making the latter a de facto standard whether it wants it or not. I understand the users, too: why should they bother writing implementation-agnostic code if they're planning to run it on CPython anyway, and the vast majority of developers who might want to reuse it will likely do the same?

_cs2017_ · on Feb 12, 2023

The number of developers defending this decision may also partly explain the decision. Developers' skill has many dimensions that are not perfectly correlated. It's quite possible that even among the (otherwise highly skilled) python core developers, few understand (or care) how API design affects reliability of code written by the library users. And the few that do, have hundreds of higher priority issues to deal with.

remram · on Feb 12, 2023

Not just that, it is also annoying to work around. Sometimes you really do want to kick off a background task and have it run to completion, even if you can't keep a handle on it, just like a thread. This is surprisingly hard to do.

vore · on Feb 12, 2023

You can always do something extremely goofy like this (but you shouldn't):

    _running_tasks = set()

    def my_create_task(coro, **kwargs):
        async def _coro():
            nonlocal task
            try:
                return await coro
            finally:
                _running_tasks.remove(task)

        task = asyncio.create_task(_coro(), **kwargs)
        _running_tasks.add(task)
        return task

remram · on Feb 12, 2023

I have a very similar helper in multiple codebases in production. It has some additional bits to log errors and to handle tasks that should never complete (basically a flag to call sys.exit if they do) but it's basically this.

I don't understand why theads get a background mode but tasks can't be fire-and-forget.

aeturnum · on Feb 11, 2023

I really think this writer doth protest too much.

Yes, the base async interface is confusing and overly complex. It's a downside! As they note lots of people have stepped in to provide better helpers (like TaskGroups) - but these are the docs for the base library!

> But who reads all the docs? And who has perfect recall if they do?

Everyone reads the docs? That is why you don't need perfect recall because you can read them whenever you want.

Python has lots of confusing corner cases ("" is truthy, you need to remember to call copy [or maybe deepcopy!] sometimes, all the other situations where you confuse weak v.s. strong references). They cause really common bugs. It's just a hazard of the language in general and the choices it makes (much like tasks being objects is a hazard). I do understand why people think they can throw away task references (based on other languages) - but this is Python! The garbage collector exists and you gotta check if you own the object or something else does.

Edit: this feels like an experienced Python developer, who has already internalized all the older, non-async Python weirdness, being taken aback by weirdness they didn't expect. Like, I feel you, it does suck - but it's not a bug that values you don't retain may get garbage collected.

iforgotpassword · on Feb 11, 2023

> Everyone reads the docs?

For Python? The language where everyone just cobbles together random code from the internet and other repos? I can totally see how this mistake happens left and right. The bar of entry for this language is way too low to assume only rigorous senior devs use it.

Karunamon · on Feb 11, 2023

>Everyone reads the docs?

The author goes on to say they found this pattern lurking in various projects on github. So, no. The problem is that this behavior is subtle, not intuitive, and unless you are reading the actual documentation top to bottom (and not just the function signature and first paragraph from the pop up in your IDE) you will likely get bitten by this.

What is the point of your comment? The author shouldn't have called out the upturned rake in the darkened shed?

aeturnum · on Feb 11, 2023

I wouldn't say shouldn't - they are free to do what they want. But this is a blog post about something that can trip you up that the docs highlight - which the author calls a "heisenbug". The author doesn't even have a suggestion for the docs, which already calls out the problem they encountered, they just note that there are helpers for this problem (which is true).

The point of my comment is that subtle, non intuitive things like this are all over Python and, while this one is particularly bad, this blog post makes it seem like more of an aberration than it is.

rollcat · on Feb 11, 2023

> The author goes on to say they found this pattern lurking in various projects on github.

I'd call it an anti-pattern. If you spawn a process/thread, and never wait/join it, it means you don't actually care what it does, if it crashes, etc. I don't see a problem with Python's behavior here.

hn_throwaway_99 · on Feb 12, 2023

Seriously?? I think the vast majority of developers would find it very surprising that the Python runtime would GC a task in the middle of execution. I would expect that the runtime would by default do what the example in the doc says, which is keep a strong reference to the Task object until it finished execution.

mannykannot · on Feb 12, 2023

The chances are that most people doing this are introducing some nondeterminism that they did not expect, and will have a hard time dealing with it when it bites them in the ass.

What's more to the point is that I am going to have a hard time when it leads to a serious outage or security violation at some major corporation that has become too pervasive in its reach for me to avoid its influence. No amount of schadenfreude is going to compensate for that.

brundolf · on Feb 11, 2023

In 15 years of programming I’ve never read the docs for anything front-to-back. I look at the docs when I have a specific question

It’s crazy to suggest something this surprising wouldn’t be a problem just because it’s technically in the docs

sidlls · on Feb 11, 2023

This is unfortunately all too common. I am often dinged by my manager about my throughput when investigating the integration of new libraries because I read the docs, front-to-back, as part of my research and that takes time.

brundolf · on Feb 12, 2023

I don't think it's really appropriate or feasible or even helpful for most things. At least for myself, if I read through the entirety of a language's documentation I wouldn't remember half of it, and I probably wouldn't need everything I remembered.

Language and library designs should optimize for least-surprise.

sidlls · on Feb 12, 2023

Most libraries are quite poorly documented (to my taste, anyhow). It is certainly worth spending the time reading at the very least documentation pertinent to the components of the library intended for use.

leni536 · on Feb 11, 2023

Considering how many times I need to add site:python.org to my python search queries to actually get to the docs, I assume that a surprisingly low number of python developers actually read the docs.

0x008 · on Feb 11, 2023

If you use Druck duck go you can prefix search with “!py3”

Etheryte · on Feb 11, 2023

I think you may be too bold with the assumption here, personally I would wager that the majority of people who write Python don't even know Python has official docs outside of a site called Stack Overflow.

IshKebab · on Feb 11, 2023

> Everyone reads the docs?

Wow I've heard people say that everyone should read all of the docs (which isn't really true) but I've never heard anyone claim that everyone does read all of the docs! Wild.

raverbashing · on Feb 11, 2023

> "" is truthy

Humm, no? Unless you mean ("",)

    >>> not ""
    True

aeturnum · on Feb 11, 2023

Oh, sorry, you are right - "" is false-y, even though it's a valid empty value. So it's hard to tell the difference between a value not being filled and a value being filled with an empty value.

ex:

  answers = {}
  answers["I exist"] = ""
  if answers["I exist"]:
      print("a")

does not print.

fbdab103 · on Feb 11, 2023

I guess I am too deeply in the Python ecosystem to see a problem here. Unless you want to check for the existence of "I exist"? In which case, the Python Way would be

  answers = {}
  answers["I exist"] = ""
  if "I exist" in answers:
      print("a")

aeturnum · on Feb 11, 2023

It's not a problem? The async interface isn't a problem either. It's just a thing you have to remember about python: "most input is truthy except for the input that isn't"

"Most of the time you don't disrupt your program by not keeping the returned reference in scope except for when you do"

It's just a thing that trips people up.

fbdab103 · on Feb 11, 2023

Truthy is a Pythonic core principle of the language. It is not an edge case phenomenon in the language which I would expect a regular practitioner to confuse.

https://docs.python.org/3/library/stdtypes.html#truth-value-...

aeturnum · on Feb 11, 2023

I mean, I've seen bugs around that in code I've worked on and I've created bugs where it's a factor.

Weakrefs are also a core part of the language: https://docs.python.org/3/library/weakref.html . You can't use python without using them.

fiddlerwoaroof · on Feb 11, 2023

What I learned when I wrote Python professionally was “never rely on truthiness” explicitly writing out a boolean expression that does what you want is more explicit (“explicit is better than implicit”, PEP 8) and prevents a whole class of bugs down the line.

nemetroid · on Feb 11, 2023

PEP 8, which you mention, explicitly recommends relying on truthiness:

> For sequences, (strings, lists, tuples), use the fact that empty sequences are false:

  # Correct:
  if not seq:
  if seq:

  # Wrong:
  if len(seq):
  if not len(seq):

AeroNotix · on Feb 11, 2023

PEP8 is touted a lot as if it is a perfectly correct tome of ... correctness. I've worked in Python long enough to know that it both doesn't cover everything and the advice is sometimes actively bad.

raverbashing · on Feb 12, 2023

Amen

I mean, I'm not blaming PEP8 per se (A Foolish Consistency is the Hobgoblin of Little Minds) but it has a tendency to be taken as gospel by a lot of people

Funny enough, those who push for more strict adherence are also the ones that neglect other aspects (speed, alg. complexity, etc) especially readability (and no, PEP8 compliant code is not necessarily the most readable)

tomp · on Feb 11, 2023

> "most input is truthy except for the input that isn't"

Can you think of any other value that has len(x) == 0 but is truthy?

It’s quite simple. Empty collections and zeros are false, almost everything else is true.

The real head scratcher is that midnight is false.

tomn · on Feb 11, 2023

The alternative here (which would make your example work) is that dictionary lookups return some falsy sentinel value when the key does not exist, just like in javascript. In javascript, this has made a lot of people very angry and been widely regarded as a bad move.

Those values tend to propagate out and make a mess, sneaking into your stored data or causing errors in unexpected places -- fixing this feels like one of the main advantages of using things like typescript, and it's just not an issue in python.

It's not a problem, because the alternative is way worse. It's just a different language design than the one you're used to.

You could argue about whether or not truthiness makes sense as a concept (personally I think not), but the way it's defined in python is quite coherent and useful in practice.

heavyset_go · on Feb 12, 2023

> The alternative here (which would make your example work) is that dictionary lookups return some falsy sentinel value when the key does not exist, just like in javascript. In javascript, this has made a lot of people very angry and been widely regarded as a bad move.

You get that behavior by using dict.get(): answers.get("I exist") returns None, which is falsy.

tomn · on Feb 12, 2023

Right, sometimes you want that. I suppose my point was that the definition of truthiness works well with the default, most obvious way of indexing a dictionary, which was the original complaint.

Perhaps I was a bit harsh -- that exposes an issue which does trip people up, where they use `if x:` when you mean `if x is not None:`.

Saying that, I think it's defensible in the same way. The fact that you can write `if x:` (where x is not a bool) tells you that the language has a concept of truthiness, and so maybe you should have a think about what that actually means.

dwattttt · on Feb 11, 2023

> It's just a thing you have to remember ...

The more of these things there are, the more brainpower you devote to remembering the right way to do things; if you don't you introduce bugs, a subtle, painful one here.

heavyset_go · on Feb 11, 2023

"Empty containers are falsy" is a Python fundamental, this isn't a subtle bug, but an obvious one.

pacaro · on Feb 11, 2023

Maybe

  ...
  if answers.get('I exist'):
    print('a')

Which is why you should always explicitly check for None if that is your intent.

wizzwizz4 · on Feb 11, 2023

> So it's hard to tell the difference between a value not being filled and a value being filled with an empty value.

  >>> answers = {}
  >>> if answers["I don't exist"]:
  ...     print("a")

  Traceback (most recent call last):
    File "<pyshell#3>", line 1, in <module>
      if answers["I don't exist"]:
  KeyError: "I don't exist"

The method you're trying to use doesn't work anyway: it doesn't matter that it's confusing. You'd have the same problem with the value False.

heavyset_go · on Feb 11, 2023

> if answers["I exist"]:

    if "I exist" in answers:
         ...

hn_throwaway_99 · on Feb 12, 2023

I mean, that's the fundamental reason it's called falsey, that is, Python does automatic type coercion if it is evaluating things in a boolean context. FWIW this is the exact same behavior in javascript.

mixmastamyk · on Feb 13, 2023

A string is a container in Python. Containers are false when they are empty.

No1 · on Feb 11, 2023

He didn't even have to read "all the docs" - just the ones that pertain the the function that he is using. And then not ignore the section marked "Important" and the highlighted "Note".

richbell · on Feb 11, 2023

What if he read the docs for that function prior to the "important" note being added?

davesque · on Feb 12, 2023

How is "" truthy? I just double checked and `bool("")` is `False` as I expected (in Python 3.9 at least).

mhils · on Feb 11, 2023

The note in the official Python documentation was only added in September 2022 [1], so no wonder this comes as a surprise to many!

[1] https://github.com/python/cpython/commit/6281affee6423296893...

terom · on Feb 13, 2023

That commit is adding the same disclaimer for the `shield` function, the original one-line mention for `create_task()` is a little older, but also only November 2021, so during 3.11 development...?

[1] https://github.com/python/cpython/commit/c750adbe6990ee8239b...

kodablah · on Feb 11, 2023

It is for this reason in Temporal Python[0], where we wrote a custom durable asyncio event loop, that we maintain strong references to tasks that are created in workflows. This wouldn't be hard for other event loop implementations to do too.

0 - https://github.com/temporalio/sdk-python

make3 · on Feb 11, 2023

he never said it was hard, his point is that it's unintuitive & a lot of people don't know or don't remember

kodablah · on Feb 11, 2023

I mean the default asyncio event loop can be replaced/extended where you won't have to know/remember on each create_task. But yes, it is an unintuitive default.

NelsonMinar · on Feb 11, 2023

Does anyone understand why the event loop only keeps weak references to tasks? It'd seem wise to do something to stop it from being garbage collected while running, maybe also while waiting to run.

kortex · on Feb 11, 2023

Because it's almost always the case that the consumer is going to keep a reference to the task in some way, so that is the logical choice for the "primary owner" of the task. Python doesn't have ownership per se like rust, but if you keep more than one hard reference to an object around, it'll prevent collection, so in cases such as this it makes sense to designate one primary owner and have all other references be weakref.

JonChesterfield · on Feb 12, 2023

Python's reference counted - if the event loop holds a reference until the task has run, then drops it, then everything behaves sanely. That's not a cycle. It just means the task that was scheduled will execute, which seems like the right default.

skitter · on Feb 11, 2023

> if you keep more than one hard reference to an object around, it'll prevent collection

Which is the behavior the parent comment asks for.

kevin_thibedeau · on Feb 12, 2023

That creates a new problem that you have to remember to kill unwanted threads.

masklinn · on Feb 11, 2023

Only guess I’d have is to protect the system against infinite-loop tasks, but I don’t remember any other runtime caring and an a task which never terminates seems easier to diagnose than one which disappears on you.

coopsmoss · on Feb 11, 2023

I agree, I think this is very unpythonic behavior

BiteCode_dev · on Feb 11, 2023

And this is why trio got it right, and why I think the task groups (nurseries from trio) can't arrive soon enough in the stdlib.

Because not only you must maintain a reference to any task, but you should also explicitly await it somewhere, using something like asyncio.wait() or asyncio.gather().

Most people don't know this, and it makes asyncio very difficult to use for them.

postultimate · on Feb 12, 2023

> task groups (nurseries from trio) can't arrive soon enough in the stdlib

Please, no. Asyncio is horrible, and bodging it to make it less horrible just means we will be forced to live with the remaining horror. Far better to replace it with something that works properly (yes, Trio).

qwertox · on Feb 12, 2023

I'm using a lot of `asyncio.get_event_loop().create_task(...)` calls without assigning the task to a variable, but the docs [0] don't mention anything regarding this method on the loop object. Can I assume that I'm safe?

[0] https://docs.python.org/3/library/asyncio-eventloop.html?hig...

quietbritishjim · on Feb 12, 2023

I'm pretty sure that's equivalent to asyncio.create_task(), it just gives you the opportunity to specify a different loop if needed. I think the docs are just less explicit because they're aimed more at power users (since most people don't deal with multiple event loops).

bandyaboot · on Feb 11, 2023

He doesn’t really get into what makes this a Heisenbug, only that it’s indeterminate in nature. Would attaching a debugger/stepping through the code make it less likely that your task would get garbage collected out from under you?

Izkata · on Feb 11, 2023

You're probably going to need a reference to the task in order to inspect it in the debugger. Creating that reference prevents the bug.

foobarbecue · on Feb 11, 2023

Yeah, he seems to be re-defining the term to mean "a bug that occurs occasionally depending on system state" as opposed to "a bug that changes behavior when you observe it closely e.g. in a debugger."

macintux · on Feb 11, 2023

The first is a common way of using the term Heisenbug. I first heard it used that way 10 years ago when discussing Erlang’s error handling model.

foobarbecue · on Feb 11, 2023

TIL. I guess I assumed it would hew more closely to the Uncertainty Principle.

Edit: actually, come to think of it, I first heard of it in about 2006 from Jamie Brandon and at the time assumed it was something he'd made up. For a second there I forgot that 2006 is more than 10 years ago! (It was a python bug that went away when run in a debugger.)

throwaway81523 · on Feb 11, 2023

CPython does most of its memory management by reference counting, which fails to reclaim circular structure. So to make sure it gets everything, it occasionally runs a conventional tracing GC. If the GC happens to run just after you create that async task, the task itself can get collected, it sounds like. It's good to know about this and is (my own editorializing) yet another reason Python3 should have used Erlang-style concurrency instead of this async stuff.

remram · on Feb 12, 2023

It's actually very not Heisenbug, because if you enable asyncio's debug mode [1], it will tell you what's going on.

[1]: https://docs.python.org/3/library/asyncio-dev.html#debug-mod...

cutler · on Feb 11, 2023

Maybe grafting async onto a single threaded dynamic language just isn't such a good idea in the first place.

hn_throwaway_99 · on Feb 12, 2023

Works absolutely fine in JavaScript, and the language is certainly much better for it than it was before async/await.

cutler · on Feb 16, 2023

Javascript was born with async built-in. The async/await in JS is just syntactic sugar around callbacks/promises.

murphy214 · on Feb 11, 2023

bingo

perlgeek · on Feb 11, 2023

Thank you! I just did a quick `git grep` in a work code base and found one clear instance of this bug, and two more locations where I'm not 100% certain whether references are kept around long enough. Made a note to open a bug on Monday :-)

Another surprise in python's base library: I knew that re.search searches for a regex match in a string, so I thought that re.match would match the whole string. I was wrong, re.match only anchors the regex at the start, not the end. re.fullmatch anchors it at both sides.

I felt very stupid when I found out; I started at my current work as a Perl developer and learned Python for a new job; but there are two more Python developers (with previous Python experience outside this company) on the same project, and none of them noticed the mistakes I made based on this misunderstanding.

Lammy · on Feb 11, 2023

I experienced a heisenbug exactly like this in Ruby when trying to `while case Ractor::receive`: https://github.com/okeeblow/DistorteD/blob/dd2a99285072982d3...

zzzeek · on Feb 11, 2023

I think asyncio is kind of neat for what it's good at, but beginner programmers who have never wrote code before are going directly to using Python asyncio (i know this because they are telling me so when they post sqlalchemy discussions). This is just wrong.

aldenpage · on Feb 11, 2023

That's extremely insidious. I suppose I never encountered this issue because I almost always call asyncio.gather(*), which makes having a collection of tasks natural.

kortex · on Feb 11, 2023

This is good form. It makes top-level control flow easier to follow, and keeps the concurrency scoped.

dehrmann · on Feb 11, 2023

Another common async footgun I see is unthrottled gathering, and no throttling mechanism in the standard library. Once you gather an unspecified number of awaitables, bad things start to happen, either with CPU starvation, local IO starvation, or hammering an external service.

What I like about threads is they make dangerous things like this harder, and you have to put more thought into how much concurrent work you want outstanding. They also handle CPU starvation better for things that are latency-sensitive. I've seen degenerate requests tie up the event loop with 500 ms of processing time.

rednafi · on Feb 11, 2023

Huh! Unless you're using semaphores, you can also recreate similar situation with threads. Spin up a whole bunch of threads and send all of them towards some shared object or make 100s of requests with them.

There's not much difference between spinning up threads explicitly and creating async task with asyncio.create_task. In either case, you can throttle them with semaphores.

dehrmann · on Feb 11, 2023

I don't have a source or affected versions, but semaphores can scale poorly. I vaguely remember each blocked acquire getting checked on every event loop iteration, or something silly like that.

qxmat · on Feb 11, 2023

Python has a few weird issues like this. The last one I encountered was with a class inheriting Thread, join and the SQL Server ODBC driver on Linux. Fairly sure I hit page faults thanks to a shallow copy on driver allocated string data but didn't have the time to investigate like the hero of this blog post.

notatoad · on Feb 11, 2023

wow. yeah, this absolutely explains a heisenbug that i've been chasing for a while. and i can't count the number of times i've had that exact doc page open on my screen in the last few months, and never bothered to read that block of text that starts with "important"...

thanks

m3047 · on Feb 11, 2023

Hrmmmm.

> But who reads all the docs?

asyncio.create_task() doesn't exist in 3.6, and I can't find the string "to avoid a task disappearing" in the doc, so I'll go out on a limb: there is no such doc. However I see the reference to weakref.WeakSet.

Jtsummers · on Feb 11, 2023

The world didn't end in 2016. Welcome to seven years in the future where this documentation does, in fact, exist:

https://docs.python.org/3/library/asyncio-task.html#asyncio....

m3047 · on Feb 11, 2023

Some of us have been writing python since 2.x, and quite unsurprisingly wrote asyncio code at 3.6, and still happily support it. Some of us have even asked on HN about maintaining compatibility backwards and forwards between 3.6 and 3.11.

The documentation didn't exist at 3.6, when I wrote the code. I went and checked the source code, and the documentation and reported my findings. Good to know that there's a potential problem, don't you agree? What would you do differently?

m3047 · on Feb 16, 2023

I've got to say, I've never actually noticed a problem with "fire and forget" although I use it for more or less disposable tasks to begin with.

However, I've spent some more time looking through asyncio.base_events and

* BaseEventLoop._scheduled is a list()

* BaseEventLoop._ready is a deque()

There is no change between 3.6 and 3.11 in this regard. So this could be a nothingburger if you don't use asyncgens. OTOH I suppose better safe than sorry; the only question is whether no code addressing it is more mentally taxing for the bystander than having code and trusting its implementation.

smetj · on Feb 11, 2023

Start a thread/greenthread/fiber/process/task without holding a reference to at least tie all loose ends at exit? Hmm dunno.

tgv · on Feb 11, 2023

You can do that in go. You don't even get a reference to the thread/goroutine.

nixpulvis · on Feb 11, 2023

Fire and forget.

syngrog66 · on Feb 12, 2023

Java had a similar but inverse problem in early versions. A counter-intuitive behavior that bit people and caused leaks.

If you instantiated a Thread, and then start() was never called, that thread object would leak. And thus potentially an entire graph of objects, via the references chains beneath it.

Obviously a thread that is never started seems pointless by design. But it could happen easily if, for an example, an error happened or an exception was thrown at some point between the instantiation line and the call of start().

The root cause was because Sun's programmers had made the early implementations of Thread get added to a ThreadGroup by default, under the hood. What would happen is that ThreadGroup stayed alive/reachable and thus it kept your app's thread object reachable too, and thus the GC would never clean it up. It was never eligible.

It ended up being the cause of a few weird leaks we saw in production.

IIRC in Java 1.4 or 1.5 Sun fixed it by ensuring the thread got cleaned up in those cases.

sidlls · on Feb 11, 2023

A better title would be “Bug lurks in incorrect usage of async Python”.

The library documentation clearly calls this out, and incorrect implementations, while buggy, do not mean that async Python is itself buggy.

terom · on Feb 13, 2023

There's a GitHub issue to fix this, arguing that the doc fixes are insufficient, consider it to be a design mistake, and argue that it can be changed without breaking backwards compatibility (current GC behavior is not deterministic): https://github.com/python/cpython/issues/91887

Latest reply from GvR is invoking Chesterton's Fence. Here's to hoping the devs can quickly figure that one out and get this fixed. Per the linked issues [1] even the stdlib asyncio implementation was affected.

[1] https://github.com/python/cpython/issues/90467

nixpulvis · on Feb 11, 2023

Hey, at least it's documented... good developers actually RTFM.

I can't comment on the design of this API, because I don't feel like learning the library, but in some performance critical applications these sorts of contracts aren't all that uncommon. Granted, this is python, I guess it's a bit more suspicious, IDK.

vbernat · on Feb 11, 2023

The documentation update is quite recent (Python 3.11). It was added after this ticket: https://bugs.python.org/issue44665 (not the first ticket around this problem).

No1 · on Feb 11, 2023

His argument hinges on "I can't be bothered to read the docs on the stuff I'm using." So instead of reading the docs on coroutines and tasks before using them, writes a rant about how it's all wrong because he didn't understand how it works.

On a more fundamental level, why would anyone assume that a coroutine is guaranteed to complete if it is never awaited? There is no reason a scheduler could not be totally lazy and only execute the coroutine once awaited.

At least he bothered to make note of TaskGroups, also clearly shown in his documentation screenshot, immediately above the section marked Important that went ignored, and finishes with "As long as all the tasks you spin up are in TaskGroups, you should be fine." Yep, that's all there was to it.

ptx · on Feb 11, 2023

> There is no reason a scheduler could not be totally lazy and only execute the coroutine once awaited.

Isn't the point of create_task (which is what the article is about) to launch concurrent tasks without immediately awaiting them? The example in the docs [1] wouldn't work (in the stated manner) if the task didn't start until it was awaited.

> At least he bothered to make note of TaskGroups [...] Yep, that's all there was to it.

That only works on Python 3.11, which was released just a few months ago. Debian still uses 3.9, for example, so the TaskGroups solution can't be used everywhere yet.

[1] https://docs.python.org/3/library/asyncio-task.html#coroutin...

No1 · on Feb 12, 2023

The reason I said "on a more fundamental level" is that I'm not talking specifically about Python and asyncio, but coroutines in general. Even for Python, there are multiple event loop libraries available, they do not all work identically, which is why multiple ones exist. Someone here mentioned Temporal Python which works differently from asyncio, and would have avoided the author's problem. If you don't know how the scheduler works, you can't assume that a coroutine is guaranteed to complete just because you yoloed it into the scheduler, no matter how convenient that might be for you.

Yes, TaskGroups are a recent addition. If you can't use Python 3.11 for whatever reason, there is also the clearly written code sample at the bottom of the create_task documentation, which the author did not bother to mention. Probably didn't make it that far.

fsckboy · on Feb 12, 2023

The oop model (or any "client" model of state) dependent on state outside your code being encapsulated/contained inside an object within your program is always confusing. It's not particular to this library or python.

A gui window handle within your program, or simply an open file handle, if the OS does something to your object, it's hard for you to know about it, and your object continually needs to refresh its state if you are concerned about it. I don't know what is referred to as a "task" in this case, but I don't think the lifetime of the actual task is the issue, it's the lifetime or your object.

It's always the case that if you instantiate an object with a ctor, you can't count on anything about it continuing if the dtor is invoked. The problem is much more general than this API, this library, this language, this use case. Just as you need to structure your C code so malloc and free will always match up spatially and temporally, you need to structure your oop code so ctors and dtors match up sensibly. Otherwise the confusion in your head will spread to your code.

And those who always want the compiler and tools to automatically do as much as possible to free them from the burden, are the ones who are most surprised by Heisenbergs. Computers can (or will try to) do anything, it's up to you to make sure it does what it needs to. Maybe someday AIs will do it better than us, but right now you need to provide the I.

anthomtb · on Feb 11, 2023

Well, looks like I know what I am doing first thing on Monday. I converted a bunch of code to asyncio a while back. I have yet to run into any heisenbug in that code and want to keep it that way.

cpburns2009 · on Feb 11, 2023

I've been working on a PySide6 application recently using asyncio. I read the docs but totally overlooked the requirement to hold references to tasks created with `create_task()`.

whoopdeepoo · on Feb 11, 2023

> But who reads all the docs

Why is this so common? Do people seriously not read a language/library documentation? That's the absolute first thing I do when evaluating a technology.

adamckay · on Feb 11, 2023

Because people have deadlines and need to get things working. You read enough to figure out how to do what you need to do and then mostly move on.

This function was added in 3.7 with no note on the importance of saving a reference. In 3.9 a note was added "Save a reference to the result of this function, to avoid a task disappearing mid execution." which was then expanded with the explanation of a weak reference in 3.10.

skitter · on Feb 11, 2023

It absolutely is common. People see there is a len function that takes one argument, they call len(some_collection), see that it indeed returns the number of items in the collection like they expect and move on. They don't expect len to return a negative number instead on Thursdays, and of course it doesn't because that would be a pretty big footgun. People also see that there is a create_task function that takes a coroutine, they call create_task(some_coroutine), see that the coroutine indeed runs like they expect, and move on. Sure, you're supposed to await the result, but maybe they don't need the awaited value anymore, only the side effects, and see that it still works.

throwaway81523 · on Feb 11, 2023

I had a manager who actually told me not to read docs. I was a bad report and read them anyway.

blacklight · on Feb 12, 2023

I've used asyncio.create_task forever, and I've admittedly never read its documentation in depth.

However, I've ALWAYS assigned the return value of create_task to some variable.

To me this is just a good programming practice. The OP says "tasks are not like threads - that you can just launch and forget" - no! Even a thread should not simply be launched and forgotten! You always need a reference to it, so when your application is terminated you can join all the threads that are still running and things can exit in a clean way! Same goes for the tasks: when your application exits, it's just a good practice to stop the even loop and cancel any pending tasks.

And, in general, one should always keep in mind the reference count rule in Python (the author incorrectly calls it "garbage collection" btw): if something you created in your function isn't referenced/assigned to anything, then its reference count will be zero, and it will be removed when the function stack unwinds. This is totally expected behaviour to me.

sgt · on Feb 11, 2023

Is this something go developers also have to be careful with when using goroutines?

gerad · on Feb 11, 2023

No. But sometimes goroutines have the opposite problem, where they don’t terminate and get cleaned up.

https://betterprogramming.pub/common-goroutine-leaks-that-yo...

candiddevmike · on Feb 11, 2023

Is there an (easy?) test for checking goroutine leaks?

Snawoot · on Feb 11, 2023

Yes, it's visible on goroutine profile, provided by built-in profiler pprof. E.g.: https://github.com/mysteriumnetwork/node/issues/5311#issueco...

Jtsummers · on Feb 11, 2023

No. Goroutines don’t generate a reference to hold onto, either. They just run until they or the program terminate.

samsquire · on Feb 11, 2023

Thank you for this. This is really useful information.

I recently adapted some garbage collection code to add register scanning.

I can imagine all sorts of subtle bugs where things go away randomly. One problem I have with my multithreaded code is that sometimes a thread crashes and the logs are so long I don't notice. From my perspective the thread is just not doing anything.

Sometimes the absence of behaviour can be really tricky to debug!

dehrmann · on Feb 11, 2023

Eww. What's especially nasty is this is the opposite behavior of threads.

epakai · on Feb 12, 2023

I missed this in a little curses program launcher I wrote.

It looks obvious when he puts a big orange box around it, but in the actual docs it's an unassuming paragraph between two border-wrapped blocks with the only distinguishing feature being the bold "Important".

It should probably be referenced immediately next to the "Return the Task object" sentence.

rlpb · on Feb 11, 2023

This issue doesn't exist with Trio's structured concurrency model. In other words, the problem is already solved.

nbadg · on Feb 11, 2023

I'll +1 the Trio shoutout [1], but it's worth emphasizing that the core concept of Trio (nurseries) now exists in the stdlib in the form of task groups [2]. The article mentions this very briefly, but it's easy to miss, and I wouldn't describe it as a solution to this bug, anyways. Rather, it's more of a different way of writing multitasking code, which happens to make this class of bug impossible.

[1] https://github.com/python-trio/trio

[2] https://docs.python.org/3/library/asyncio-task.html#task-gro...