Hacker News new | past | comments | ask | show | jobs | submit login
Overhead of Python asyncio tasks (textualize.io)
147 points by willm 19 days ago | hide | past | favorite | 82 comments

Async Python is still confusing af - when do I need it, what happens under the hood, does it actually help with performance, sometimes the GIL comes into play and sometimes it doesn't, why do we ever use threads at all if there's a GIL, why is it called asyncio if we can use it for anything. My mind is kind of scattered and people seems to be using a lot of async Python for some reason.

Any good resources to clear things up?

I agree that especially within the standard context of python and its syntax, async seems weird (because it's sprinkled into an existing paradigm).

The best mental model for me always was to think: Here's an await, that means "interpreter, go and do something else that's currently waiting while this is not done yet."

And that's all about IO, because what you can wait on is essentially IO.

By the way, I really wish there was a better story to executors in async python. To think we still have the same queue/pickle-based multiprocessing to send async tasks to another core is kind of sad. Hoping for 3.12 and beyond there.

[edit] one really neat example that helped me get asyncio was Guido van Rossum's crawler in 500 lines of python [1]. A lot of the syntax is deprecated now, but it's still a great walk-through

[1] http://aosabook.org/en/500L/a-web-crawler-with-asyncio-corou...

It's great for when you can do concurrent I/O tasks. For example running a web backend, web scraping, or many API calls. FastAPI (and Starlette that it's built off of) is an async web framework and in my experience performs well.

Basically, your program will normally halt when doing I/O, and won't proceed until that I/O is done. During that halt, your program is doing nothing (no cpu being used). With asyncio, you can schedule multiple tasks to run, so if one is halted doing I/O, another can run.

Edit: And AFAIK, the GIL does not come into play at all with async. Only when multithreading.

> the GIL does not come into play at all with async.

In the sense that the GIL is still held and you can have at most one path of Python code executing at a time regardless of whether you use async or threads, sure. Most blocking I/O was already releasing the GIL so the difference is purely in how you can design your modules; for any reasoning about performance the GIL behaves the same way whether you use asyncio or not.

It makes concurrent programming much simpler than using threads.

Very few locking and care is needed with asyncio, as opposed to using threads. Race conditions are basically not a thing if you write reasonably idiomatic code.

It might be (or not) faster than using threads, but that's not the main benefit in my view - this easiness of use is.

I've found SuperFastPython to be helpful for understanding Python concurrency: https://superfastpython.com/python-concurrency-choose-api/

It's basically a fancy way to do epoll() around file descriptors, but hide the need to keep a main loop and state, and keep it hidden as functions that run in fake concurrency, stopping whenever they block, and being executed again when their file descriptor has activity.

It doesn't necessarily improve performance. It's just a much easier way to do non-blocking I/O (note that blocking and threaded I/O is easier to do but much much heavier).

In personal experience, when you write anything more complicated using threads in python, I end up writing lots of boilerplate code that I wished I could just shove somewhere - Events, 1 element Queues, ContextVar contexts, all of which don't benefit for being named and make appearance only in two spots in code. Async removed a lot of this for me while also unifying the future ecosystem and making my code more composable and easy to integrate.

When working in a fully async context, it becomes very logical and natural. A great example of this is the FastAPI web framework. You never write any top-level code, only functions that the framework calls, so you never have to deal with the event loop directly. You basically just sprinkle some async and await keywords around your IO bottlenecks and things suddenly run smoother.

> Async Python is still confusing af - when do I need it,

It's needed when you're spending a lot of time waiting for an I/O request to complete (network/HTTP requests, disk reads/writes, database reads/writes)

> what happens under the hood,


(Please read the entire blog post - It goes through the necessary concepts like generators, event loops, & coroutines)

> does it actually help with performance,

Refer to the first answer: You'll see improved performances if your workloads are mainly comprised of waiting for other stuff to complete. If you're compute-heavy, it'll be better to use the 'multiprocessing' library instead.

> sometimes the GIL comes into play and sometimes it doesn't,

The GIL comes into play when you have a lot of compute-heavy tasks: Otherwise, you'll rarely encounter it.

It's only when you have that many compute-heavy tasks that you start to use the 'multiprocessing' library.

> why do we ever use threads at all if there's a GIL,

Threads exist because it was there before asyncio & event loops came into Python.

> why is it called asyncio if we can use it for anything.

Its name came from PEP 3156, proposing the asyncio library back in 2012.


As for why, asynchronous I/O stands in contrast to synchronous I/O, where the program/thread had to wait for the I/O request to complete before it can do anything else. Making tasks asynchronous allows it to do other stuff while it waits for a task's request to complete, increasing CPU & I/O utilization.

> My mind is kind of scattered and people seems to be using a lot of async Python for some reason. Any good resources to clear things up?

Highly recommend this video from mcoding: It's fairly simple & goes through a sample implementation.


Also, this article:


Async looks like parallelism, but its just smart scheduling. The core concept of async is saying "hey, im waiting for something to complete thats not under my control (like waiting for data on a socket to be able to be read), go ahead and do other things in the mean time".

If you ever coded in sockets in C (and its a good exercise to do so), you probably have at some point ran across `select` which is essentially a non blocking way to check which sockets have data available to read, and then sequentially read the data. This gives the ability for a program to appear parallelized in the sense that it can handle multiple client connections, but its not truly parallel. Different clients can be handled at different time depending on which order they connect, which is asynchronous in nature (versus processing each client in sequence and waiting on each one to connect and disconnect before moving on to the next one)

Async in Python is basically this concept, with a core fundamental feature of time limited execution. Functions can say that they are pausing for x seconds, allowing other functions to run, or functions can say that they give a certain function x seconds to run before resuming execution. If you async code (along with any library you may use) doesn't contain any sleeps or timeouts, its exactly equivalent to synchronous code (since the event loop never really recieves a message that it can suspend a routine or cancel it). With sleeps and timeouts, you gain control over things that can potentially block, both from a caller perspective of not having a function call block your own, and from a callee perspective of not making your function blocking.

The use case is for it is that it is good for I/O bound operations like Threading is, but with the addition that you don't have to worry about synchronization or race conditions, since by design your code will have predictable access patterns. The downside is that your code and any libraries that you use within your code has to be implemented as async libraries, and any library that is async has to have async wrappers around the calls to its methods, which in turn means that your entire code has to be async.

Threading with Python is generally not useful, as its not true parallelism because of GIL. GIL allows only one thread in Python to run. Threading is safer in Python because of this, however it obviously has drawbacks. In general its best used if you want asyncio like performance with a library that is not written with async, since GIL is smart enough to detect when a thread is waiting for input and switch context.

True parallelism in Python is achieved with multiprocessing, however the use case is a little different. Rather than spinning off processes, you generally launch a bunch of worker processes up front (to avoid the larger overhead), then use smart scheduling to distribute work between these processes. Here though you do have to worry about race conditions and synchronization, and use things like locks and mutexes.

> Clearly create_task is as close as you get to free in the Python world, and I would need to look elsewhere for optimizations. Turns out Textual spends far more time processing CSS rules than creating tasks (obvious in retrospect).


1. Creating async tasks is cheap. 2. It is important to confirm intuitions, before acting on them.

async tasks are cool, but the usual PSA applies here:

Be careful to hold your references, because async tasks without active references will be garbage collected. I've been bitten by that in the past.

Long discussion here: https://bugs.python.org/issue21163

Docs: https://docs.python.org/3/library/asyncio-task.html#asyncio....


Save a reference to the result of this function, to avoid a task disappearing mid-execution. The event loop only keeps weak references to tasks. A task that isn’t referenced elsewhere may get garbage collected at any time, even before it’s done."

If you can, best is to always spawn them in a task group (either using anyio or Python 3.11's task groups).

This prevents tasks from being garbage collected, but also prevents situations where components can create tasks that outlive their own lifetime. Plus, it's a saner approach when dealing with exception handling and cancellation.

Perhaps I just don't get them, but task groups never really made sense to me.

The whole beauty of async tasks is that you can spawn, retry, and consume them lazily. When you create a task group, you again end up waiting on a single long-running last task, desperately trying to fix individual failures and retries that hold up the entire group.

Here's why task groups may be a good idea https://vorpus.org/blog/notes-on-structured-concurrency-or-g...

Thanks for this!

task groups solve this problem

Python is my strongest language. If you ask me to write some asynchronous code I will try my best not to write it in Python. Usually it's just not worth it.

Have you not used async in Python lately? IMO it's very easy to do.

I don't have huge experience with Python, but I used async code with C#/Typescript and lately I had to use some asyncio magic.

I found this article: https://blog.dalibo.com/2022/09/12/monitoring-python-subproc... and while async/await syntax is the same, it's not entirely clear for me, why there's some event loop and what exactly happens, when I pass function to asyncio.run(), like here: https://github.com/pallets/click/issues/85#issuecomment-5034...

So, you can use it and it's not that hard, but there are some parts that are vague for me, no matter which language implements async support.

I find threading to be much easier to use in Python

What a fun experiment. I quick converted it to Go using goroutines and waitgroups for fun: https://gist.github.com/schmichael/1a417808b8e88b684838ae9f4...

I am seeing 500~600k tasks/second in .NET6 w/ TPL.

Edit: Updating per request below.

Tested on a TR2950x. I wonder if NUMA issues in my case or bad python version (3.9.4).


  100,000 tasks    77,108 tasks per/s
  200,000 tasks    69,945 tasks per/s
  300,000 tasks    72,453 tasks per/s
  400,000 tasks    74,636 tasks per/s
  500,000 tasks    66,253 tasks per/s
  600,000 tasks    77,576 tasks per/s
  700,000 tasks    69,673 tasks per/s
  800,000 tasks    68,176 tasks per/s
  900,000 tasks    73,846 tasks per/s
  1,000,000 tasks          68,013 tasks per/s
.NET 6

  100000 Tasks 523000 Tasks/s
  200000 Tasks 550000 Tasks/s
  300000 Tasks 550000 Tasks/s
  400000 Tasks 559000 Tasks/s
  500000 Tasks 547000 Tasks/s
  600000 Tasks 539000 Tasks/s
  700000 Tasks 547000 Tasks/s
  800000 Tasks 540000 Tasks/s
  900000 Tasks 560000 Tasks/s
  1000000 Tasks 542000 Tasks/s

would you mind sharing the go and python results running on your machine too? It is apples to orange comparation otherwise.

EDIT. My results on a 5950x (undervolted)

python3.8.exe test.py

100,000 tasks 139,130 tasks per/s

200,000 tasks 121,905 tasks per/s

300,000 tasks 120,000 tasks per/s

400,000 tasks 114,286 tasks per/s

500,000 tasks 119,403 tasks per/s

600,000 tasks 117,073 tasks per/s

700,000 tasks 130,612 tasks per/s

800,000 tasks 122,488 tasks per/s

900,000 tasks 120,000 tasks per/s

1,000,000 tasks 110,155 tasks per/s

python3.11.exe .\test.py

100,000 tasks 206,452 tasks per/s

200,000 tasks 185,507 tasks per/s

300,000 tasks 186,408 tasks per/s

400,000 tasks 179,021 tasks per/s

500,000 tasks 167,539 tasks per/s

600,000 tasks 177,778 tasks per/s

700,000 tasks 188,235 tasks per/s

800,000 tasks 180,919 tasks per/s

900,000 tasks 168,421 tasks per/s

.\test.exe (go 1.20 compiled)

100000 tasks 2710563.336378 tasks per/s

200000 tasks 3076885.207567 tasks per/s

300000 tasks 3332292.917434 tasks per/s

400000 tasks 3040479.422795 tasks per/s

500000 tasks 2810232.844653 tasks per/s

600000 tasks 3004138.200371 tasks per/s

700000 tasks 2738877.029117 tasks per/s

800000 tasks 2893730.985022 tasks per/s

900000 tasks 3043877.494077 tasks per/s

1000000 tasks 2857992.089078 tasks per/s

Yeah, I did the exact same thing and got very similar results. Just to save people clicking through, the Go/goroutine version is 25x as fast as Python. :-)

JavaScript is 10-37x faster out of the box without any imports

    async function time_tasks(count=100) {
      async function nop_task() {
        return performance.now();

      const start = performance.now()
      let tasks = Array(count).map(nop_task)
      await Promise.all(tasks)
      const elapsed = performance.now() - start
      return elapsed / 1e3

    for (let count = 100000; count < 1000000 + 1; count += 100000) {
      const ct = await time_tasks(count)
      console.log(`${count}: ${1 / (ct / count)} tasks/sec`)
Outputs (Python 3.11, Bun 0.5.1):

    % bun textual.ts
    100000: 3767797.000743159 tasks/sec
    200000: 9001406.4697609 tasks/sec
    300000: 8281002.001242148 tasks/sec
    400000: 10038491.340232708 tasks/sec
    500000: 8976653.913474608 tasks/sec
    600000: 10437550.828698047 tasks/sec
    700000: 9443895.154523576 tasks/sec
    800000: 11021991.118011119 tasks/sec
    900000: 9790550.215324111 tasks/sec
    1000000: 10263937.143648934 tasks/sec
    % python3 textual.py
    100,000 tasks   303,063 tasks per/s
    200,000 tasks   270,058 tasks per/s
    300,000 tasks   271,621 tasks per/s
    400,000 tasks   261,945 tasks per/s
    500,000 tasks   251,070 tasks per/s
    600,000 tasks   272,520 tasks per/s
    700,000 tasks   250,977 tasks per/s
    800,000 tasks   253,131 tasks per/s
    900,000 tasks   244,696 tasks per/s
    1,000,000 tasks   266,061 tasks per/s

Bun* is that much faster. Wonder what Node or Deno perform like.

FWIW, Mac Air, dual core i7 1.7GHz

    $ deno run tasks.js 
    100000: 2777777.777777778 tasks/sec
    200000: 3225806.4516129033 tasks/sec
    800000: 2395209.580838323 tasks/sec
    900000: 1679104.4776119404 tasks/sec
    1000000: 1851851.8518518517 tasks/sec

Hmm, slower as the number scales up. Wonder why my M1 didn’t do that

Didn't check, but maybe memory pressure. I have 8 gb ram but had Chrome with hundred tabs and other stuff open.

I preferred gevent (it's been probably 10 years since I've used it.) Yes, you need a ton of monkey patching, etc... but it was less intrusive once you had everything set up. Sprinkling await and async everywhere always struck me as inelegant.

I've tested all the methods I know to wait for an asynchronous task.

See: https://gist.github.com/jimmy-lt/4a3c6ad9cab1545692e5a3fe971...

  $ python3.11
  100,000 tasks 22,716,947 tasks per/s
  200,000 tasks 22,706,630 tasks per/s
  300,000 tasks 22,742,779 tasks per/s
  400,000 tasks 22,614,202 tasks per/s
  500,000 tasks 22,760,379 tasks per/s
  600,000 tasks 22,799,818 tasks per/s
  700,000 tasks 22,842,971 tasks per/s
  800,000 tasks 22,778,395 tasks per/s
  900,000 tasks 22,854,241 tasks per/s
  1,000,000 tasks 22,470,395 tasks per/s
  100,000 tasks 10,336,986 tasks per/s
  200,000 tasks 10,405,286 tasks per/s
  300,000 tasks 10,451,505 tasks per/s
  400,000 tasks 10,482,455 tasks per/s
  500,000 tasks 10,451,287 tasks per/s
  600,000 tasks 10,485,478 tasks per/s
  700,000 tasks 10,508,302 tasks per/s
  800,000 tasks 10,505,167 tasks per/s
  900,000 tasks 10,492,568 tasks per/s
  1,000,000 tasks 10,457,516 tasks per/s
  100,000 tasks 219,858 tasks per/s
  200,000 tasks 196,281 tasks per/s
  300,000 tasks 201,530 tasks per/s
  400,000 tasks 193,674 tasks per/s
  500,000 tasks 187,611 tasks per/s
  600,000 tasks 201,972 tasks per/s
  700,000 tasks 187,505 tasks per/s
  800,000 tasks 191,531 tasks per/s
  900,000 tasks 198,127 tasks per/s
  1,000,000 tasks 173,259 tasks per/s
  100,000 tasks 291,095 tasks per/s
  200,000 tasks 193,324 tasks per/s
  300,000 tasks 129,177 tasks per/s
  400,000 tasks 107,024 tasks per/s
  500,000 tasks 123,023 tasks per/s
  600,000 tasks 122,304 tasks per/s
  700,000 tasks 121,674 tasks per/s
  800,000 tasks 106,530 tasks per/s
  900,000 tasks 135,841 tasks per/s
  1,000,000 tasks 106,153 tasks per/s
  100,000 tasks 319,629 tasks per/s
  200,000 tasks 283,560 tasks per/s
  300,000 tasks 204,328 tasks per/s
  400,000 tasks 203,584 tasks per/s
  500,000 tasks 200,968 tasks per/s
  600,000 tasks 214,506 tasks per/s
  700,000 tasks 206,512 tasks per/s
  800,000 tasks 204,556 tasks per/s
  900,000 tasks 210,298 tasks per/s
  1,000,000 tasks 202,523 tasks per/s

Whenever I had to run a lot of Python tasks I preferred Celery over asyncio.

I only ever write an asyncio daemon when I want to launch and manage the results of a bunch of Celery tasks.

Not speaking from some superior position of research here, I'm just saying what I prefer to use.

So, 250,000 per second on a (roughly 10,000 MIPS I7 core)

Each of those task_create calls is roughly 10,000,000,000 / 250,000 = 40,000 instructions.

Thats 40'000 instructions of pure overhead as it does not contribute to the task at hand (accidental complexity).

Your method of estimating instructions includes printing multiple lines which context switch to the kernel to perform IO. I'm not sure how an "instruction count" metric is useful anyway.

edit: I'm not actually sure you are counting the context switch, but I still don't think estimating instruction count that way is particularly useful.

Its useful to show how much waste there is in a solution.

Having an operation that can be executing 250,000 times per second on a modern processor is extremely slow... not fast.

250k/s is roughly the same speed as context switching, so while slow for pure computation, it is a reasonable amount of "waste" for switching between concurrent tasks.

If you didn't prevent preemptive context switches during your benchmarking, it's entirely possible the only thing you measured was the context switch time.

This is a fun experiment, but to get a rigorous idea of the overhead involved takes more work than what anyone in the post or comments has done.

Matching the cost of a genuine context switch should be a (laughably bad) upper bound for any language's particular concurrency offerings. It is not reasonable.

> It is not reasonable.

Reasonableness is relative and use case dependent. The post itself illustrates how the cost is insignificant compared to other "wasteful" operations related to CSS handling.

If this is too much overhead for your use case, there are plenty of other approaches and languages to choose from.

If it costs as much as a context switch, you might as well just do context switches. These hosted language scaffoldings - whether that's asyncio, go routines, TPL, Webflux, etc. - exist specifically so you don't have to do a full context switch. If they cost as much as a context switch, they have failed. Regardless of what else is taking time in the system.

If you're not any better, just replace your whole hosted concurrency system with a statement that triggers sched_yield.

It's irrelevant that it's slow if it's only a small part of total runtime.

Waste allows scale, that's a general civilizational rule.

You wouldn't be able to write your comment if the browser were written in extremely efficient assembly code because it wouldn't exist.

NASA and every big organization also has a lot of waste, but only that way you can get to the moon.

I have been looking into the overhead if async on c++ and we found that the cost increases substantially when the function has a return value. It would be interesting to see if this is the case with python 's asyncio.

Seems pretty decent, with that it’s a real shame Python’s async ended up coroutine-based rather than task-based. Given the langage semantics it ends up being a lot of pain for fairly little gain at the end if the day.

Mostly a testament to how absurdly fast modern CPUs are despite Python itself being so slow.

/ Recent Python convert, in spite of the horrible general performance of the official implementation of the language. That sweet, sweet module library. Also, with Docker containers the deployment issues have been solved. It might be slow to execute but it's really efficient to develop with.

This has always been the case. Computing power doubles every few years, and it's cheap (probably less than 10% of your total project cost, unless you are doing highly specific stuff). It doesn't make sense to optimise for CPU cycles as the real bottleneck is usually developer efficiency, I/O, and UX.

Invest in Apple, their phone is going to be a big deal.

After running that code on both a Windows SB3 and major souped up Lenovo running Ubuntu...I just feel inadequate.

Above anything, this shows the performance gains from 3.10 -> 3.11:

  >> python3.10 create_task_overhead.py
  100,000 tasks   185,694 tasks per/s
  200,000 tasks   165,581 tasks per/s
  300,000 tasks   170,857 tasks per/s
  400,000 tasks   159,081 tasks per/s
  500,000 tasks   162,640 tasks per/s
  600,000 tasks   158,779 tasks per/s
  700,000 tasks   161,779 tasks per/s
  800,000 tasks   179,965 tasks per/s
  900,000 tasks   160,913 tasks per/s
  1,000,000 tasks  162,767 tasks per/s

  >> python3.11 create_task_overhead.py
  100,000 tasks   289,318 tasks per/s
  200,000 tasks   265,293 tasks per/s
  300,000 tasks   266,011 tasks per/s
  400,000 tasks   259,821 tasks per/s
  500,000 tasks   251,819 tasks per/s
  600,000 tasks   267,441 tasks per/s
  700,000 tasks   251,789 tasks per/s
  800,000 tasks   254,303 tasks per/s
  900,000 tasks   249,894 tasks per/s
  1,000,000 tasks  266,581 tasks per/s

Python 3.11 running in a-Shell on an M1 iPad Pro

    Python 3.11.0 (heads/3.11-dirty:8d3dd5b9647, Dec  7 2022, 08:17:48) [Clang 14.0.0 (clang-1400.0.29.202)]
 on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    [~/Documents]$ python test.py
    100,000 tasks    127,992 tasks per/s
    200,000 tasks    115,960 tasks per/s
    300,000 tasks    117,205 tasks per/s
    400,000 tasks    113,131 tasks per/s
    500,000 tasks    109,609 tasks per/s
    600,000 tasks    116,649 tasks per/s
    700,000 tasks    110,743 tasks per/s
    800,000 tasks    111,361 tasks per/s
    900,000 tasks    109,688 tasks per/s
    1,000,000 tasks          117,064 tasks per/s


  100,000 tasks   155,257 tasks per/s
  200,000 tasks   138,569 tasks per/s
  300,000 tasks   134,779 tasks per/s
  400,000 tasks   144,371 tasks per/s
  500,000 tasks   135,672 tasks per/s
  600,000 tasks   135,299 tasks per/s
  700,000 tasks   146,456 tasks per/s
  800,000 tasks   139,192 tasks per/s

Windows SB3:

  100,000 tasks    177,778 tasks per/s
  200,000 tasks    150,588 tasks per/s
  300,000 tasks    152,381 tasks per/s
  400,000 tasks    134,031 tasks per/s
  500,000 tasks    160,804 tasks per/s
  600,000 tasks    129,293 tasks per/s

M2 Macbook Pro 16GB 16-inch 2023

100,000 tasks 184,167 tasks per/s

200,000 tasks 160,964 tasks per/s

300,000 tasks 165,278 tasks per/s

400,000 tasks 149,577 tasks per/s

500,000 tasks 160,593 tasks per/s

600,000 tasks 168,098 tasks per/s

700,000 tasks 161,837 tasks per/s

800,000 tasks 160,364 tasks per/s

900,000 tasks 149,479 tasks per/s

1,000,000 tasks 155,919 tasks per/s

Make sure you set your cpu frequency governor to "performance".

If your Linux machine is working an order of magnitude slower than you'd expect from the hardware, that's the first thing I'd check.

I haven't used asyncio that much, certainly not in any serious sense, but wouldn't ContextVar lookups be a major factor of performance in serious asyncio code? Using tasks for things that aren't io-bound seems likely to give a false sense of performance superiority when doing basically nothing.

I’d be surprised if context vars are more expensive than a dict lookup or two. But I haven’t profiled. Could be wrong.

Why would you want a terminal emulator anywhere near python? Using python for lightweight system utility gui apps seems like using a hammer to screw in a nail. Yeah you can do it and modern hardware is fast enough that you probably won't care, but why??

Python is a really good scripting language? I guess the big alternatives would be C and BASH and I'd pick python for short utilities over both of those any day.

How slow do you think python is?

I agree that it is a good scripting language. That's where it excells. A GUI terminal emulator however, is not a script... Python once compiled and using it's underlying c libraries is fast enough by modern standards but it is slow to start and projects using it are prone towards difficult to read/messy code.

Obviously the latter is up to the developer(s) and whatever standards they set for themselves but any software project tends towards paths of least resistance inherent in the language and frameworks being used over time as maintainers change and PRs fixes from contributers are merged. This is pedantic of course, but that doesn't change the fact that Python is not the optimal solution for this problem.

I think you're misunderstanding what Textual is. It's not a GUI terminal emulator: it's a library for building interactive terminal applications.

Yeah I realized that later.

You seem to be off by an order or three of magnitude about what is "fast enough" for terminal apps, which haven't been a performance bottleneck for decades.

200k ops per second not fast enough for you? How many words per second do you type?

So why wouldn't you use it? I've have since the late 90s and even then it was faster than I could keep up with. Unoptimized Java Swing on SGI was the only thing that wasn't, from memory. More recently Windows Terminal had a very bad implementation for a while, but fixed it after their ass was handed to them over it here at HN.

I really don't know why you are so focused on the speed. Obviously python merely acting as translation to optimized c libraries is going to be fast enough. I said the performance difference was negligible on modern hardware. Python's speed is far from its largest problem as I outlined in the above comment.

I see, thought it was the main point, but there were others...

> but it is slow to start

This is not really true either. Sure not as fast as C, but imperceptible for the most part. And they have improved it in recent versions.

Yes, you can cause it be an issue with a poorly written or though out system, this happened at one job I had. But that wasn't Python's fault they decided to pull in thousands of files each invocation.

My Python scripts respond instantly, even big ones. I have a CLI photo editor and implementation lang is not an issue that I even contemplated until now.

> and projects using it are prone towards difficult to read/messy code.

Primarily large ones with a long history of alternating developers. There are great tools to improve its scalability; use them. The simple pyflakes will eliminate most issues. Type checking gets the long tail for the mission-important+.

Python is extremely slow for some tasks. I was surprised to discover how slow when I ran some benchmarks, despite having used python for many years at the time. It has been improving lately, but here is a blog post I made on the topic quite a few years ago that has some interesting comparisons: https://gist.github.com/vishvananda/7a2f1942d0e9ffff4093

Just reran the benchmarks from 10 years ago, python is only 37X slower than C on the benchmark now, and the go version is running faster than the C version. Python still has big productivity wins of course...

Reading this is like reading early Renaissance alchemist arguing about how much mercury they need to combine with how much silver to create gold... This is so far gone I don't even know where to begin...

> It may be IO that gives AsyncIO its name, but Textual doesn't do any IO of its own.

So why on Earth are you using AsyncIO? You don't need it, if that's true...

> Those tasks are used to power message queues

How are your message queues not doing I/O? What on Earth are they doing then?

Needless to say that the whole benchmark is worthless because it never even initiates anything that would be involved when creating actual asynchronous I/O tasks...


I mean, I know, in Pythonland this is just your average Wednesday, but dear lord, if you don't visit that land all that often it shocks you more every time you do.

This reply strongly reads of 'im smarter than eveyone' and 'know everything.' Is it possible that the author is doing something that you don't know about? Such as writing and reading text to sockets (which would be I/O.) And is it possible they may have a reason to be using asyncio in Python (which uses a single main thread and hence needs something like asyncio for concurrency.)

>Needless to say that the whole benchmark is worthless

Overhead startup isn't worthless.

'I mean, I know, in Pythonland this is just your average Wednesday'

and here we go. The whole post was really just you trying to make out that you're superior to everyone else. In this case looking down on an entire ecosystem. Why? Python is one of the most popular languages. It has an elegant syntax and it's capable of solving most problems. Go jerk yourself off in private.

> This reply strongly reads of 'im smarter than eveyone'

Dude, where's everyone, where?

I'm smarter than the guy who posted this nonsense about asyncio, that's for sure, but that's a very low bar...

Now, when it comes to Python, then, it's an environment that is flooded with the programmers with the lowest skill level imaginable. This is where all those month-long bootcamps pump their graduates, this is where all the people who took a month-long intro to CS with Python class go, this is where a lot of people who only use Python accidentally, to compliment some other programming activity (or just general computer-related activity) go. It's a swamp, and there's no reason to pretend it isn't.

On the other hand, anyone who had any serious aspirations for Python left the scene ten or so years ago. Today, Python is the worst parts of Java and PHP combined. So, again, it's a very low bar to be better than most Python programmers. Without even trying, when I searched for a job and had to do a bunch of automated assessment tests, I was consistently placed in the "best" decile, often in the "best 5%" of all applicants, and I hate the language. I'm honestly not good at it and don't want to be good at it. You don't need to try hard to be "as good" as I am, because I'm not good... but, yeah, in this particular area everyone is hands down awful.

I also had to interview about two dozens of applicants in my last job. I had people who couldn't explain the difference between expression and statement. I had people who couldn't tell what __str__() method is for. I had extremely low expectations, and yet I was consistently disappointed by the knowledge level of applicants. It's surreal what's going on there.


> Overhead startup isn't worthless.

Yeah, maybe... OP never measured it anyways. OP never created any asynchronous I/O tasks. OP measured how long does it take to run couple functions in Python. Admittedly, it takes a long time, but it's irrelevant to async I/O. Especially in the context of comparing whatever OP's doing to threads. Which was their goal stated in the opening statement.

> automated assessment tests, I was consistently placed in the "best" decile

Of course you were. I'd expect that for a large fraction of gainfully employed devs.

People who can't code their way out of a paper bag are way over-represented in tech screens. Because a highly competent dev will apply at 3 companies they choose and get a job. A terrible dev will do 100 applications and see what sticks.

We opened a ML intern position recently and got over 200 applications and 95% of them were terrible. Should I conclude most ML grads are incompetent? I don't think so... probably 190 of them are the bottom 5% of the local market, and these same 190 CVs are on the desk - well, in the rubbish bin - of everyone with an open position right now.

Python asyncio is actually a co-operative multitasking system, think real-time operating system type stuff. It doesn't do real concurrency but it gives you a nice interface for reasoning about tasks (think thread) while being able to explicitly control when/where they yield control of the event loop, how often they run, etc.

>How are your message queues not doing I/O?

We generally wouldn't consider in-process moving data around to be "I/O", now if you started interacting with an external database/file/pipe/MMAP-ed-file/etc than that would be "I/O".

>What on Earth are they doing then?

Tasks. Kind of like processes but lighter weight. Running an event loop, dispatching signals, that sort of thing. You can do all that by manually writing your own event loop but python's asyncio (IMO) makes it easier to reason about exactly when your yielding the event loop to some other task, and makes it easier to write code that can yield control of the event loop at arbitrary places. So like if you want to update a widget every 10 seconds you can write something like

    while True:
        await asyncio.sleep(10) #Other tasks can run during this 10 seconds sleep
        #Update widget contents
Useful for stuff that needs to periodically poll data, like a process monitor, or even for just simple clock widgets.


As an aside that cooperative multitasking can be really nice in micropython, where you can write tight loops in straight assembly if you need to and still get a pleasant task interface for managing higher level tasks/threads. Combined with some interrupt handlers it makes a pretty elegant real-time-ish operating system. (You probably need to manually deal with garbage collections though)

I knew you would be here in the comments. Crabbone won't leave any chance he gets to shit and dump on Python any HN post he can.

Must be so frustrating, knowing that this silly language is one of the most populair programming languages in the world, despite its short comings, one of which is slower performance.

This isn't about python or any language. The extreme negative tone and sense of superiority seems to point to something else.

What is it?

I don't care about Python performance.

But, yes, I'm very frustrated this trash is so popular. I never made this a secret :|

Python deserves negative tone, and if you think that popularity somehow makes it exempt from criticism, then you deserve the same.

But this isn't about "something being popular makes it exempt from criticism".

You don't really criticise. You just exaggerate and emo-dump all over python and the people who use it. Your stance seems to be that python needs to be destroyed and dumped and replaced by some other language. Your stance is that people who use python are idiots.

That kind of 'criticism' is so totally useless it's kind of seriously pathological. Your 'critisism' says more about you (due to the extremely antagonising tone) and not much about Python really.

Quite sad when you think about it.

asyncio is really not all about I/O though, despite the name. You can just as easily use it for concurrency by interleaving tasks with each other. You can imagine multiple UI elements that all need to make progress, and using coroutines to have them cooperatively yield to each other and not just have one element block progress everywhere else.

It's just using asyncio as a task scheduler, nothing more, nothing less. Maybe when you visit Pythonland you might instead be shocked in a more positive way :-)

Asyncio is for cooperative multitasking. I/O is the most common use case but it's not the only one. They're using the event loop to schedule their GUI tasks.

Textual is a framework for building desktop apps.

I assume the message queues are in-memory structures used to pass messages between tasks, hence no I/O.

This is really fairly standard stuff.

I understand you may not be familiar with GUI software and/or the Python ecosystem but jumping straight to condescension when you don't understand something is not really a good attitude.

Message queues can be implemented in memory, so no network or disk I/O. In fact, it seems this project does use the built-in "queue" library: https://docs.python.org/3/library/queue.html

Lol. Waht I/O to disk? What are you talking about? Asyncio was never about disk I/O. It's about Internet socket I/O exclusively.

But, what do you think happens when you write (input) data to memory and then read (output) it from memory? I'll help you: it starts with "I" and ends with "O"!

> Asyncio was never about disk I/O. It's about Internet socket I/O exclusively.

Game over. You have no clue what you're talking about.

> But, what do you think happens when you write (input) data to memory and then read (output) it from memory? I'll help you: it starts with "I" and ends with "O"!

That's.. that's not what people mean when they talk about I/O in this context. But I think you know that, you're just grasping at straws to win an argument.

Applications are open for YC Summer 2023

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact