Just a personal take: after not coding with Rust for several months, I find it more and more difficult to return to an async code I was writing.
The whole thing just reads... ugly and inconsistent. It needs too much already-accumulated knowledge. As the article correctly points out, you need a bunch of stuff that are seemingly unrelated (and syntactically speaking you would never guess they belong together). And as other commenters pointed out, you need to scan a lot of docs -- many useful Tokio tools are not just not promoted, they are outright difficult to find at all.
Now don't get me wrong, I worked on projects where a single Rust k8s node was ingesting 150k events per second. I have seen and believed and I want to use Rust more. But the async story needs the team's undivided attention for a long time at this point, I feel.
Against my own philosophy and values I find myself attracted to Golang. It has a ton of ugly gotchas and many things are opaque... and I still find that more attractive than Rust. :(
This article is a sad reminder for me -- I am kind of drifting away from Rust. I might return and laugh at myself for this comment several months down the line... but at the moment it seems that my brain prefers stuff that's quicker to grok and experiment with. Not to mention writing demos and prototypes is objectively faster.
If I had executive power in the Rust leadership I'd definitely task them with taking a good hard look at the current state of async and start making backwards-incompatible changes (backed by new major semver versions of course). Much more macros or simply better-reading APIs might be a very good start. Start making and promoting higher-order concurrency and parallelism patterns i.e. the `scoped_pool` thingy for example.
I really think in the end, in another decade or two, the community consensus is going to be that async as it is conceived of today is simply a mistake, full stop.
Think about it. Where did it come from in its current incarnation? Node. Why did Node choose it? Did it have a multiplicity of options and carefully choose the best one based on years of experience with each choice? No. Async was "chosen" because it was the only option for the weak runtime that it had to work with. It was the only choice.
But the Node marketing machine rumbled into action and made bold claims about performance and quality and phrased them in universals across all languages rather than merely claiming it was good for Javascript and that propaganda is still running around in people's minds to this day.
I know Rust chose it for performance reasons but I still think in a decade or two community consensus will be either that the 5% performance overhead is worth it for any non-trivial program because async is just too much worse, or someone will come up with an even lower impact version of some other option that will dominate it.
As your message demonstrates, we're still in the phase where you have to pay lip service to the supposed community consensus before saying "but I think it isn't really working out", but as I don't care about the community consensus I don't mind just saying it. Async is a mistake. It's just you manually doing what the compiler ought to be doing for you, and while I'm open to other possibilities in the future, right now threaded code is straight-up a better option on almost every metric, and the metrics where it may have a slight disadvantage, it's only slight and worth it. It's not like you choose threading and you instantly lose 10x performance or something. The first and second derivative opinions about async are clearly negative and I don't mind jumping farther ahead in the process when it comes to my own engineering decisions.
I used/worked on a similar async model at FB long before I'd ever touched node. Threads are not really solving the same problem as async.
Imagine you are building the FB newsfeed. You have 10 stories and want to fetch them as quickly as possible. The stories are all different types and each refers to other data: profile info, privacy, comments (which depend its own profile fetch), likes, etc.
So you have this tree of dependent data fetches, how do you minimize the time waiting? You could walk the tree node-by-node, but that balloons your wait time. So you kick each bit of work off to a new thread and wait for it to be done. How do you wait for it? Lots of options of course, but they all end up looking like an ad-hoc implementation of async. Which is fine! But if you have a large codebase where a large fraction of code is dealing with this stuff you pretty soon want the runtime to handle this. And if you are mixing code written by others you really want everyone to use the same mechanism.
So the value of async isn't really in the specific implementation choices, it's that it lets you represent this tree in a consistent way. This is less about the number of requests per second, and more about how much code you have to deal with; plenty of places have low RPS but have accumulated lots and lots of code.
Second, threads have their own costs even ignoring performance. Arguably a big reason PHP/Hack worked so well for FB is that each request was single-threaded and started with a clean state; there's whole universes of bugs that are avoided in this model. And node doesn't have threads because JS doesn't have threads, microcontrollers may not have threads, etc.
I used "async" long before Node too. It's one of the reasons I knew to stay away from Node.
"ad-hoc implementation of async"
I think you're confusing the problem with the solution. I solve this sort of thing in Go all the time. This is literally what I was doing yesterday. It's fine in that context. You had to reach for async because it was the only option in your context, not because it was the best choice. Async as it is conceived of today was a hack for dynamic scripting languages that basically had no other option. They "won" there because the environments simply couldn't support any other solution, not because they outcompeted anything.
Which is the proper way of doing it. As soon as you expose all that plumbing a whole generation of inexperienced programmers are going to use it to build their sandcastles with and then you can spend the next two decades on the cleanup. Such stuff should be implemented once and made bulletproof rather than to give everybody a box with interesting shiny parts that will explode when used wrong.
> So you have this tree of dependent data fetches, how do you minimize the time waiting? You could walk the tree node-by-node, but that balloons your wait time. So you kick each bit of work off to a new thread and wait for it to be done. How do you wait for it?
This is exactly the kind of problem that is targeted by Project Loom [1] and structured concurrency [2] on the JVM. Async programming makes a lot of tools not applicable (or at least very cumbersome to use): debuggers, profilers, tracing tools, stack traces, etc.
> Where did it come from in its current incarnation? Node.
Rust's async/await is inspired by C# (which was inspired by F#, which was inspired by Haskell).
> right now threaded code is straight-up a better option on almost every metric,
Agreed that people are generally too eager to reach for async when they could easily get away with threads, and fortunately Rust makes threads extremely pleasant to work with. The whole point of Rust is that it doesn't force you to use any given paradigm when it comes to concurrency/parallelism, and has support for many approaches.
And yet it didn't learn that it took almost 10 years for .NET community to sort out async/await support across all layers of the stack.
.NET archictects have spent last year researching Go and Java Loom approaches, and have acknowledge if it had been today, most likely that would have been the approach taken instead of async/await, as many .NET devs still get it wrong.
During the "ASP.NET Core and Blazor futures, Q&A" at BUILD 2023.
This is the best practices of async/await in .NET as written by one of the ASP.NET architects,
Where he notes that the .NET implementation wasn't without issues on Midori.
Which is again another point of how the whole Rust's async/await process failed to learn from previous experiences.
Even worse, because in what concerns .NET, the runtime is part of the story, while in Rust that is yet another piece of the puzzle that is in motion, not yet decided, although it can be anything as long it is tokio.
> Where he notes that the .NET implementation wasn't without issues on Midori.
Indeed. In my opinion and experience async/await is a very leaky abstraction. I think it is definitely not worth it in most languages. Whether it is appropriate for rust I'm unconvinced. I think it is DOA in C++ with its required allocation, but I have yet to use it in anger.
I only have used them in C++/WinRT, as they are a kind of required to handle anything WinRT, and it was kind of ok when doing UWP, although it was much more complex as in WinRT case, it also takes into account how COM/WinRT compartments work.
As for Rust supporting threads... yes! Yes it does! And it effectively solves all of the problems with them that can be solved at the programming language level. There's an irreducible increase in complexity with concurrent code, but there's only so much to be done about that.
I think a lot of developers have a tendency to massively overprivilege performance and always reach for the biggest, shiniest thing when they don't need it, and cost themselves a lot in the process. Sure, if you're doing tens of millions of things concurrently in Rust you may need the async support because threads can't do what you want. But before you write code based on the assumption that you're going to be doing tens of millions of concurrent things on a 64-core system, check that you actually are. If you've got an API server whose request rate is measured in tens per second, a really quite common case, you'll never notice the thread overhead. You really need quite a massive system nowadays to get to the point where that's even remotely a problem, let alone your biggest one.
I don't deny that when you are nailed to the wall and seriously wondering if 64 cores is going to be enough you may have specialized needs that require specialized solutions. I am glad that Rust has the option for those who need it. But async should be counted as a cost of such problems, not a benefit, and something you prove out as the only solution, not the first thing reached for.
We wrote asynchronous driven network code for decades without syntactic sugar for it, and it was fine. That 5% performance gain can be had without it. Async syntax is a mistake. Not just because of the mess it makes across the program tree, but also because it brings with it a specific notion of how to do asynchronous I/O.
io_uring for example gives an entirely different model, one with some lovely performance benefits. I'm sure there's a way to make it play well with Rust async and tokio, etc. but they're not really the same model on the surface. And so having bolted async into the language syntax itself, and also adding an explicit requirement on concepts like 'executors' into the language, we've created a legacy problem.
IMHO async in Rust was a mistake. A systems programming language should not have mandated a technique like this, and should have stayed agnostic.
Having written decades of async code without rust (and still doing so in c++), I don't understand your point. It is really hard to write safe highly performant asynchronous code. Rust makes it safe, but without the additional "syntax sugar" it makes it very unergonomic. The developer velocity I can achieve with async rust is unbelievable compared to what I had to do previously.
I think your problem may be with tokio, not async. io_uring and async are not incompatible.
> The developer velocity I can achieve with async rust is unbelievable compared to what I had to do previously.
I have seen a glimpse of that in my paid work but I have to say it takes quite a while and a lot of nerves expended to get to that point. It's one of these things where you climb a mountain peak and can now see everything clearly but before that it's all a mess, mists and shadowy figures.
> A systems programming language should not have mandated a technique like this
Rust doesn't mandate anything. It's unipinionated when it comes to concurrency; use whatever approach that you think is best. Async/await is just one option among many.
Its presence is not a mandate. It is not a decorative syntactical element, but affects compiler code generation the same way as rust has _synchronous_ generators.
A developer is still free to spawn a thousand foot-guns, and I daresay the assumption would be that they made that choice knowing how to properly juggle them.
> io_uring for example gives an entirely different model, one with some lovely performance benefits.
The io_uring model is also fundamentally async? So not sure what you’re arguing here. It’s just completion, rather than readiness based, but otherwise no different, and completely compatible with how other async things work in the Rust ecosystem (see tokio-uring).
If anything, I’d argue that io_uring is the best example of the benefits of async API’s over the “just spawn another blocking thread bro” model.
And I have code still in production written in the "Perl Object Environment", one of several async frameworks for Perl, written before Node even existed. Also before Node existed, I had written code in the Twisted framework for Python.
Node did not invent async by any means and I know it in the strongest possible sense, having used it before Node existed. But those libraries, in use for over a decade, did not make everyone using them exclaim in joy "Yes! This is what all concurrent programming should be!" A few, sure, but far from everyone. They're kind of a bear, even after years of polish.
Not only did those libraries already know about callback hell, they had already implemented some of the (partial) solutions to it. Watching the Node propaganda was almost surreal to me because I could and more-or-less did predict how it would go, years in advance, as it recreated the experience already had by other languages. Experience that did not leave the impression that this was the paradigm for the future. These libraries typically languished within their ecosystems, kept around only because they were the only practical way to, say, keep your Python but also get concurrency. The fact they were the only way to proceed kept them alive, but they never really flourished despite being right there, readily available.
Node is the reason people are running around today claiming async is the bee's knees. Node is the reason why people come on to /r/golang asking how Go can possibly be used for network programming when it lacks the Magic of Async. Contradicting the Node propaganda is why people are nervous about saying "uhmm, hey, I'm using async in a big program and it's not exactly sunshine and rainbows". Prior to Node that would have just been the common understanding; these libraries were often viewed as powerful within their target language because there was no other way to do what they did, but tricky and complex to work with. It was well understood they easily degraded into a mess and were often difficult to onboard a new developer because of the difficulty of following the control flow.
All I'm really saying here is, the community opinion was correct before Node and we should just go back to that opinion, founded on decades of experience.
Exactly. I wonder how many people that are riding the async bandwagon have any actual experience building large systems that operate under some kind of reliability guarantee. It's a great way to pull a whole pile of unnecessary complexity into a codebase, lots of fun debugging hairy little problems and to solve interesting bugs with free footguns provided for all takers.
It's a kludge at best and it's quite annoying to see a first class project such as Rust be distracted like this.
Please keep preaching. Every time I've investigated async I've just asked "why is it so unnecessarily complicated?", but I don't have enough background to know if I'm just missing something.
Async await isn't useful because it's more ergonomic. It's useful because it's low overhead. If you don't care about that then rust might not be the right language for you the same way a language with a GC might be better suited for most folks. For my domain, any other choice would have made rust untenable. Rust has made all the right choices that make it an excellent choice for code I was otherwise forced to use c or c++ for. Not every language must bend its ways for every possible user, and most folks get upset when that inevitable happens because it dilutes the qualities that made the language useful for the original set of folks it was primarily useful for. Being popular unfortunately comes with the curse of not pleasing everyone.
IMO almost nobody demands a language to "bend its way" so as to not "make folks upset", it's just that the way Rust went about async/await is not ergonomic at all. It's also pretty dark, as in it's hard to surface good information.
Rust didn't choose async because of Node. Async's massive popularity is because of Node. It existed long before then, and that's precisely part of my point. It had a bad reputation before then, and it's slowly-but-surely reacquiring it now.
Rust didn't choose async because it was popular. You can go ahead and read the massive, multi-year discussions on the topic if you'd like. They've been going on since before 1.0.
I read them at the time. The gist was basically "it has been empirically established that async is the right way to do concurrency" - ie, it was because of Node.
It was more "it fits best with our needs as a systems language (overloaded term, yada yada) and we should support a robust concurrency model sooner rather than later". The Rust devs were aware of green threads and the like. It is deeply unfortunate that async is still such a mess, though.
I am not versed in computer science well enough to comment on the model in general but I'll agree that the current way of doing async is pretty sloppy. When you fully invest your project in it and use async libraries it does get better but some compilation errors still sound like dark magic (I'll forever regret not writing down a few of them I encountered last year because the team is receptive to changing rustc error message).
Though I'll also say plain old multithreading and async are serving different niches, in my experience.
No, it is completely fair to say that reading and comprehending Golang concurrent code is far-far easier than reading Rust Async code. With Virtual Threads and Structured Concurrency, even concurrent Java code is easy to read nowadays.
Rust Async is far more difficult to understand than Rust Sync. You can't just dive in. You need to have coffee and truly focus your mind like you are solving a Math Olympiad problem. I find it difficult to believe that a community language came to this ridiculous level of complexity.
The constraints of performance and whatnot limited the options for Rust's async model. It's not bad given the constraints, but it's not great, and how developers use async Rust vs. Go matters even knowing this.
Any kind of mechanism like that requires a mental model that carries a ton of state because it is no longer immediately visible what the scope of execution is.
Continuations and the way Erlang handles this have far less mental overhead and help keeping the model and the mental representation of that model in sync. Differences between the two is where bugs will hide.
The web isn't asynchronous it's synchronous in almost every aspect the whole reason all this async stuff became so popular (at least, as far as I understand it) is because there is a mismatch between your typical process and having a large number of clients dangling off it. The solution - in my opinion - is to solve this by making it easier to have more processes, which can then block while waiting for the other side, not by adding layer upon layer of complexity to try to stick to the wrong set of tools or to shoehorn a larger problem into a box too small to contain it.
It essentially gets you all of the complexity of interrupt driven code for very little gain and without the memory protection that a process driven approach would have. This whole async solution is to offer an alternative to Node, (even if it originated elsewhere that's clearly the competition for Rust) and should have stayed there, it's an ugly wart on Rust and a distraction to boot.
Kind of agree. The erlang runtime just gives you a process abstraction that is extremely cheap. So within a process you can do things logically sequentially without dragging the system as a whole down.
This approach would not have been the right choice for rust itself, since rust as a systems programming language can not have a runtime.
> Continuations and the way Erlang handles this have far less mental overhead and help keeping the model and the mental representation of that model in sync. Differences between the two is where bugs will hide.
You still have a lot to think about in Erlang. For example, you need an entire supervisory system to handle the fact that an actor can die. You need to handle the fact that an actor A might send a message to actor B and not get a response, or it'll get a response way later. You need a native way to link actors, a native way to name actors, a tracing system, etc. Actors are not something you can simply slap onto a system and suddenly you've got Erlang superpowers.
TBH tokio tasks give you a very similar abstraction without the problems - you can join a task. That's a huge ergonomic improvement because you can isolate/join state at will without having to use a channel.
> it's an ugly wart on Rust and a distraction to boot.
It's really not that bad. Like, IMO the major reason to have async is for cancellation. I want to be able to have a network request time out. That is way, way easier to handle in async code than threaded code. The best I can do with a sync system is specify timeouts on an underlying resource like a socket, I can't do anything like "this specific request should have this specific timeout" without async.
The supervisory system is an extra, you don't technically need it but it can help make your application bullet proof and I would definitely recommend if you go the Erlang route to use it to your advantage.
Every other language and/or runtime will need something similar anyway, but there is a good chance that it won't be nearly as elegant (as as solid) as the way Erlang does this.
Agreed that actors are not something that you can slap onto an existing system, but you can build systems that use a lot of those lessons effectively without them being written in Erlang.
> It's really not that bad. Like, IMO the major reason to have async is for cancellation. I want to be able to have a network request time out. That is way, way easier to handle in async code than threaded code. The best I can do with a sync system is specify timeouts on an underlying resource like a socket, I can't do anything like "this specific request should have this specific timeout" without async.
That's a great way to end up with orphaned resources though. If you kept it at the process level that simply could never happen, your process would error out, the supervisor would take over and that would be that.
> The supervisory system is an extra, you don't technically need it but it can help make your application bullet proof and I would definitely recommend if you go the Erlang route to use it to your advantage.
An actor system without supervisors is throwing away a lot. How do you handle an actor that crashes? The thing is, again, Actors are very low level - they're a foundational model. You end up having to build protocols and systems on top.
> If you kept it at the process level that simply could never happen, your process would error out, the supervisor would take over and that would be that.
I don't think you're going to get an orphaned resource by timing out and dropping the future, which drops its state.
I am responding to the very last point about dropping a future and the resource usage.
Let's assume a future that implements reading from a network socket. And the file descriptor (fd) was already passed to epoll/select (I am ignoring io_uring).
After dropping the future, the socket is is still being considered by the kernel for reading. Eventually epoll will return, and the IO runtime (like tokio) will notice the stray fd and discard it. And can you even close the fd before it is out of epoll?
So until the fd is back to user land, it appears very similar to leaking resources. Even if only for a short time while the IO loop is completing an iteration.
I'm not sure how an actor would solve that problem. You're describing the way a system call works, which is not relevant to how your userland asynchrony works. In either case you're always "leaking" (I'd say "holding") the resource while you wait for the kernel to give it back to you.
Indeed I was merely describing how it behaves within the context of future + ioloop with tokio. I didn't take a stance on future vs actor.
In Rust my instinct tells me the difference between actor and future is negligible. Because it compiles down to almost the same thing. But I wouldn't bet my life on it.
Note that my understanding of what an actor is might be wrong. I assume an actor is a sort of thread with a main ioloop receiving and sending messages. Which is basically what a tokio ioloop thread is. Atop of which the futures give you an abstractions to compose IO functions more easily.
You seem to be mostly arguing that all the building blocks are in place and you can assemble your own LEGO.
Which is exactly what I started disliking lately. I want async Rust more opinionated with much more huge red arrows pointing in the right directions for cases A, B and C, and links to docs and even 3rd party crates for cases from D to Z.
And yes I work mostly with Elixir for years and Erlang's model is just irreplaceable so far. You're quite right that everything is synchronous but all the Erlang "processes" (green threads, fibers or w/e people want to call them) are preemptively and forcibly switched. And that has basically eliminated 95% of all parallel programming problems.
I would kill for Erlang's concurrency / parallel primitives in Rust, especially if they come with their own DSL (likely via Rust proc macros?) and are much shorter and comprehensible than they are right now. :|
> It essentially gets you all of the complexity of interrupt driven code for very little gain
Yep, sadly. I've written some fairly impressive commercial Rust code but it added gray hairs and left me wondering whether that was the best way to do it.
In a way it really gets me: we have a very good solution and yet everybody is pushing their own novel footguns with gusto. Baffling, really when you think about it. I'm trying to imagine civil engineering with entirely new ways to design bridges every 4 months, different materials that are entirely untested but this time we'll get it right and so on. That's roughly the state of the software world.
Oh don't even get me started on this, I start fuming just by reading your comments because they often align very closely with my thinking (in other threads as well).
I really think we should all pull up our sleeves and e.g. make transpilers from TLA+ to every mainstream language. (Provided we first made sure TLA+'s parallel logic is exhaustive and flawless, of course.)
I too get super ticked off these days. The programmers at large are constantly running in circles and almost nobody seems to care and everyone loves to pretend that this is exactly how things should be... Argh!
Liability is the big one. As soon as that kicks in a lot of this absolutely irresponsible stuff will stop.
Regulation would still be playing catch up but being liable, preferably all the way to the people involved including directors, could be done pretty fast and would set us on the path to licensed software engineer in short order.
Indeed, and the "what about the poor FOSS developer" isn't something I buy in.
The small little street bazaar salesperson, the little corner shop, big shopping mall, whatever, all of them are liable and manage to make their business just as well.
Keep an eye on gleam lang if you’re not already. It’s a language with an ML inspired type system (like rust) that compiles to erlang. It is likely too nascent to be used in production (in terms of tooling, ecosystem, stability, etc).
I do already keep an eye on it and I like its syntax a lot. Problem is that the current commercial Elixir ecosystem is very strongly gravitating towards web and API development where several libraries reign supreme (Phoenix [web framework] and many of its dependents and derivatives, plus Absinthe [GraphQL] and Ecto [databases]). They also heavily rely on Elixir's macros so Gleam has quite a lot of work to do before it gains any tangible switching-over power.
To be honest... I am more likely to learn Golang more deeply (I know it quite well already but haven't, like, programmed in it in production for a long time, I am mostly using it for my own scripting and personal project needs) or even dive into OCaml now that it has a multithreaded runtime.
I do like how enthusiastically people make new languages but IMO most of them should be absorbed back into the hivemind at one point. This huge fragmentation does not help anything (except maybe teach you a technique or two which is of course very valuable by itself).
I think part of this is the ego component: it's much more fun to make a relatively big contribution to something new than it is to fix a small part of something much larger.
That, plus many programmers are former nerds who just love tinkering with stuff and they invent languages as thought experiments. Which is 100% cool with me but it gets weird when they start to promote them...
And figuring out the gap between "wanting to tinker" and "being a reliable paid professional" is a big struggle that takes a while to figure out -- and maybe takes your entire life in maintaining a good balance between both.
If you want golang in Rust just use channels and tasks? Or threads?
I don't find async Rust difficult at all, I'm having a hard time really empathizing with this to the extent of needing breaking changes.
To me, async from a lang perspective is virtually done - in 2024 I suspect all of the various impl Trait and async Trait stuff will be done and at that point I don't see anything left.
I do that of course, and that's one of the easiest ways to use async Rust. In real projects you need much more however. F.ex. I had to code an example of how to add tasks to an already running pool of tasks and posted my findings here: https://github.com/dimitarvp/rust-async-examples/blob/main/e... (there's #2 as well with some more comments and a different approach).
The fact that I needed to make a GitHub repo and start making show-and-tell demos on how to do various things with async Rust to me is both a red flag and me being diligent BUT it should be more obvious. And promoted in docs.
Rust started suffering from "you got all the nuts and bolts in place, now build your own solution, son" syndrome which I grew to dislike. Too low-level. I wouldn't mind something akin to e.g. Golang's flowmatic library (check the first two examples at the top of the README): https://github.com/carlmjohnson/flowmatic
`.ManageTasks` doesn't seem to have a clear analogue that I can think of right now, that's an interesting one.
`.TaskPool` looks like `StreamExt::buffer_unordered` unless I'm missing something. The code example in the README didn't really say a lot for someone who isn't already well-versed in the library.
That's on me for making blanket statements -- my apologies. Yep, I know about the first two and I used them regularly.
My issue in general is that async Rust involves using several things at the same time (not just the two keywords) and this should be surfaced much earlier in any intro material and the maintainers of the language (or the libraries, or both) should just double down on whatever they feel is the best way to do async Rust.
The stance of "you have freedom, make your choice and assemble your own LEGO" is not very productive. I want Rust to start being a bit more opinionated.
Though I'll recognize this is just a personal taste but I still have to defend it by saying that as a programmer who is paid to deliver, I want to be able to deliver in predictable timelines and not having to go off on an adventure to learn all the intricacies of async Rust before I am able to write a feature.
I'd like to think that I've worked on some "real projects" and I've never run into these issues tbh. It's just hard for me to wrap my mind around. Like I said, I get wanting more libraries (although I think they exist? A local pool exists and you can spawn tasks in it, using tokio) but I'm just not seeing a fundamental problem.
I found the code you wrote very confusing so I can see why, when you come back to it, you find it confusing. But I don't really understand its purpose either.
Well, I mentioned it a few times in this thread: things are mixed in non-intuitive ways -- async/await, tokio, and various crates (including StreamExt which was not at all obvious and it took me a while to finally find mentioned in a few blog posts and GitHub gists). It just introduces friction and nowadays I am more averse to tech that requires more homework. Admittedly a personal preference of course but I don't think it's an invalid concern either.
> I found the code you wrote very confusing so I can see why, when you come back to it, you find it confusing.
Well, I needed such a pattern in a project that had to start as quickly as at all possible and only start and schedule the most critical work -- and then start adding all the other work as the first batch of work was already in progress.
It's really hard to say because I don't fully know your use case. Maybe your way was the right way but I think I probably would have done something differently. Maybe a priority queue of jobs with a static pool of workers.
> The fact that I needed to make a GitHub repo and start making show-and-tell demos on how to do various things
While I resonate with this, because I also went through a period of struggling with async stuff, I genuinely think this is because async is just hard in general. Done properly, it yields an immense amount of power, quite efficiently, but also opens up a lot of “degrees of freedom” about it can be operated, which leads to confusion.
A lot of the async stuff, how it actually worked, and how to actually use it, only clicked for me when I played around with Glommio, which runs an executor-per-core, and some of the constraints it imposed made understanding it all somewhat easier.
Thanks a lot of the Glommio mention, that's an instant star and I'll review it in more details Soon™.
> Done properly, it yields an immense amount of power, quite efficiently, but also opens up a lot of “degrees of freedom” about it can be operated, which leads to confusion.
Yeah, very well put. Indeed it's very powerful and sure it's confusing. I need guard rails. And I need my hands slapped much more. "Looks like you're trying X -- this is how you do it, you idiot" would work well. :D
I think this blog post by tomaka gives a good summary of the current issues with async rust. Certainly much more comprehensive than what you can write in a hacker news post:
I guess I just don't run into these issues. Maybe because I write my code to be effectively single threaded already. I almost never actually need a channel, mutex, etc, except in very specific and scoped areas.
I wonder why I haven't hit these issues despite writing web services in Rust for years.
Sounds to me like you are dealing with this at the process boundary, more like actor and message based software. This is a mature approach and will serve you well.
The typical context for stuff like async/await is code that should be running in a separate process so it can block but doesn't. Then it gets ugly fast.
Yes, I suspect you're right. I do not like building giant monoliths with internal task systems. I prefer using async/await on the inside and microservices with rpcs or, ideally, streams for communication.
I think perhaps people are trying to push too much into a single process?
> I think perhaps people are trying to push too much into a single process?
That's exactly what's happening. A bit of Erlang/Elixir experience would help a lot of people to create architectures (even in other languages) that do not require such kludges. If your application starts to take on the kind of complexity that you normally deal with a monolithic OS kernel you're doing something terribly wrong.
And then people are saying that their microservice is high performance because it's handling 10 000 req/s per 72 core node... which is ok but rather per core (depending on the amount and type of logic being processed), not node.
I'm handling requests in high microseconds range (median) in high volume, low latency network service with hard limits on latency numbers (not HFT). Good luck building that kind of service (a lot of constantly changing state) with only microservices with less complexity and higher throughput then in monolith or semi monolith with a lot of internal state guarded by different synchronization mechanisms. I/O overhead alone would eat you.
Sure it can, but all design patterns have their use cases. Generalizing and saying that one is just "terrible" (which your previous comment implied), and people "should learn" is just untrue and an ignorant take. Sometimes it's the only sane solution compared to others, considering project constraints.
Yes, that one is terrible compared to the alternatives. If there is one thing that really irritates me about the way we go about IT then it is that stuff that works and is reliable and well understood gets tossed and replaced by 'new shiny thing' which then eventually undergoes the same treatment. We are eternally stuck in reinventing wheels, rather than that we make real progress in for instance reliability and are able to accept liability for what we create.
Reducing scope and simplification, increasing reliability and ultimately aiming for software as a true engineering profession should be our goal. Not fashion and running after certain other eco-systems because they are perceived to be in competition. Rust has promise, it is potentially a game changer and this sort of distraction can be done without. It feels as if over time the Rust project is being hijacked by people that want it to be everything for everybody, and now places itself not only in competition with C (and possibly C++) but also with Node (which is itself in competition with C).
Especially considering project constraints of the Rust project it might pay off to see what ultimately is what makes C++ a problematic language. This can be traced back to Stroustrup dragging in everything and the kitchen sink rather than to stick to his original vision, which was 'C but with classes'. The end result is a giant hairball and fifty (ok, exaggerated) ways of achieving the same thing.
On first note, yes that is what happens when a language whose original scope was to be a systems programming language, and now wants to do everything.
Regarding Bjarne Stroustrup, he wasn't dragged anything, his C++ is described in "The C++ Annotated Referece Manual" and "The Design and Evolution of C++", everything that happened after that is responsability of WG21, where he only has one vote among 300+ persons.
There are several papers from him where he complains about the direction WG21 is going, like "Remember the Vasa".
Hm, ok, I admit that bit was from memory, the quote that stuck was 'C++ now has as many additions and ways of doing things as there are ways of pronouncing 'Bjarne Stroustrup'', and stems from the days when his contributions were still much larger, say 20 years ago, whereas the 'remember the Vasa' quote is much more recent.
I positively loved C++ when it was still called 'cfront' and after that it became more and more layers of complexity and tricks. C already had plenty of that (for instance, the weird pointer to a function syntax) and C++ went all out to drag in the various concepts that were present in other programming languages. This wasn't the same as the way say English borrows words and concepts from other languages freely, those are mostly just words and new ways to combinate existing ones. It would be as though English suddenly started using Kanji or logograms, or as if it would start using right-to-left writing or maybe even bottom to top in the middle of 'normal' sentences.
Such inclusivity increases the surface area of the language itself and ultimately makes life harder for everybody using it: you have to learn more because someone else could use these new constructs in their code and you may end up having to interact with it.
I think GvR got that one right with his 'there should be only one way to do it' but even that became a straightjacket.
I'm not saying I have all of the solutions and if Stroustrup wasn't the driving force behind C++'s complexity then I'll be happy to take that back. But I really believe that programming languages should strive for simplicity, just enough to make it all work so that the barrier to entry is small and the amount of gate-keeping can be kept to a minimum. That is the only way to really drive adoption and to hopefully undo some of the crazy fragmentation that we have in the programming language landscape.
The only languages capable of holding down to that simplicity require having some BDFL.
The moment the language evolution is driven by some kind of request for improvement, or similar process, with votes on what goes in or not, the outcome is design by commitee.
Many other languages aren't much better than C++, if you check their current versions and the Features/Year rate, including standard libraries.
Also having BDFL only works out as long as it is the original author, afterwards there is no guarantee that the successor has the same vision for the language.
The tragic long term meaning is to either accept complexity, or reboot the programming language ecosystem every couple of decades.
> Yes, that one is terrible compared to the alternatives
No, that's exactly opposite of what I meant when I described what I'm working on, that all of the alternatives that you both with OP described are terrible in that specific case because of project constraints. I actually measured that!
As to the rest of your comment I fully agree, tired of all the hype cycles myself.
> I think perhaps people are trying to push too much into a single process?
I'll immediately spin that right back at you: people are trying to do too much via too many OS processes. ¯\_(ツ)_/¯
I like my single-OS-process apps that can distribute work internally (via the aforementioned Erlang "green threads"; they are not exactly that but close), or Rust's tokio, or Golang's goroutines and channels and WaitGroups.
I suppose we can all retreat to our corners, shrug and say "well, it's just how I prefer doing things" but I still find the many-OS-processes approach overrated and leaky. You can optimize the program pretty well but then you have to worry about whether the kernel will schedule and distribute work properly (i.e. on a 24-core server, will spawning 24 copies even with CPU core pinning work well?).
I personally prefer to dispense with these worries and just make self-contained apps / services and then be able to put them in any hosting, and deal with scaling issues either by further optimizations, or just bumping the server bill with $20.
I don't think there's a meaningful difference between an Erlang actor and an operating system process other than the upfront memory allocation size.
The salient different to me is the supervisory structures. In Erlang you must compose them, in operating systems the kernel does that for you, or you move the responsibility into a distributed queue, etc.
As for scale, I don't really know how to reconcile "I'm fine just paying another 20 dollars" with "I'm worried about how my kernel will schedule my processes".
> in operating systems the kernel does that for you
Does it really? I am not an uber Linux expert but I have only heard of systemd doing that. Serious question, I'll appreciate more examples.
> As for scale, I don't really know how to reconcile "I'm fine just paying another 20 dollars" with "I'm worried about how my kernel will schedule my processes".
Sorry if I was unclear, I meant it as "I don't want to worry if the kernel will properly schedule N clones of my single-threaded program so I prefer to write a fully async one in a single OS process and work on it to make it efficient instead; failing that, I'll just bump my hosting so my program has the throughput I require".
The kernel is ultimately the thing that preempts your process and allows it to be killed, restarted, etc. It's also what handles things like "the connection was dropped", or "the thing timed out", etc. Userland facilitates management of those actions in a user oriented way through helpers like systemd or your container orchestrator or whatever.
It doesn't make much difference though, you can say it's systemd, the kernel, dockerd, or whatever. The point is that operating systems have tons of facilities built in for the management of processes.
> Sorry if I was unclear, I meant it as "I don't want to worry if the kernel will properly schedule N clones of my single-threaded program so I prefer to write a fully async one in a single OS process and work on it to make it efficient instead; failing that, I'll just bump my hosting so my program has the throughput I require".
Why not "I don't want to worry about if my kernel will properly schedule N clones of my single threaded program so I'll just bump my hosting so my program has the throughput I require" ?
Also I didn't suggest single threading, I generally build things as 'async/await' on the inside and 'service oriented' on the outside. So when there's some new domain I don't spawn a new task, I just spawn a separate process.
Well, it seems we're much more aligned than I thought. :D
> Why not "I don't want to worry about if my kernel will properly schedule N clones of my single threaded program so I'll just bump my hosting so my program has the throughput I require" ?
Boring answer: I never prioritized that way of work, I admit. And after gaining tons of experience in several ecosystems and languages I got burned out and right now I can't bring myself to learn systemd properly.
> Also I didn't suggest single threading, I generally build things as 'async/await' on the inside and 'service oriented' on the outside. So when there's some new domain I don't spawn a new task, I just spawn a separate process.
Ah, I see. Fair, I would and have done the same (though that comes with some care when coding the thing so it can determine what it needs to do so spawning the 2nd copy doesn't step on the toes of the first one).
Disagree. Processes and actors are nearly identical, except that the kernel has the ultimate ability to preempt everything.
Supervisors are built on `link`, and every process in erlang must always exit with an exit status. This is identical to a process handle with a process exit status.
The whole point of supervisors is that they are extremely simple - you can implement all of Erlang's OTC and supervisory semantics with, more or less, recursive functions (which are the foundation of actors), names, and links.
They are superficially functionally identical, but OTP (which is what I think you meant when you wrote OTC) goes a lot further than just some syntactic sugar. It is effectively a part of a distributed operating system that focuses on reliability at the cost of some other factors. It does that one thing and it does it extremely well with a footprint that can span multiple pieces of hardware. No operating system comes close to delivering that kind of reliability. You can cobble something like it together from a whole raft of services but why would you, it already exists.
Joe Armstrong's view of the software world (one with which I strongly identify so forgive me the soapbox mode) is that reliability, not throughput should be our main focus and that failure should be treated as normal rather than as exceptional and that distributed systems are the only viable way to achieve the kind of reliability that we need to bring software engineering into the world. This is a completely different kettle of fish than stringing together a bunch of services using kernel or auxiliary mechanisms, both in terms of results and in terms of overhead.
What you're talking about is the fact that Erlang as a VM is well built. That is irrelevant to whether or not supervisory trees are equivalent to what an OS provides.
> No operating system comes close to delivering that kind of reliability.
Erlang runs on an OS though. Like, Erlang would inherently always be limited by the reliability of the underlying system and in fact relies entirely on it for preemption.
> (one with which I strongly identify so forgive me the soapbox mode)
For the record, I am an extreme fan of Joe's and I routinely re-read his thesis, so you are preaching to the choir in a way.
> This is a completely different kettle of fish than stringing together a bunch of services using kernel or auxiliary mechanisms, both in terms of results and in terms of overhead.
I just totally disagree and in fact you'll find that Erlang's seminal works used the term 'processes' and in fact is based on the entire model of OS processes. Off the top of my head he cites at least two papers about processes and transactions as the foundation for reliability in his thesis.
Yes, but those processes are much lighter weight than OS processes and can be created and destroyed in a very small fraction of the time that the OS does the same thing.
> Like, Erlang would inherently always be limited by the reliability of the underlying system and in fact relies entirely on it for preemption.
This is factually incorrect, sorry. The Erlang VM uses something called 'reductions' which preempt Erlang processes when their computational slice has been reached which has absolutely nothing to do with the OS preemption mechanism.
And an Erlang system can span multiple machines with ease, even if those machines are located in different physical locations.
Erlang processes are more along the lines of greenthreads than full OS processes but they do not use the 'threads' mechanism the OS provides for the basic scheduling. The VM does use OS threads but this is mostly to decouple IO from the rest of the processing.
Oh, and BEAM/Erlang can run on bare metal if it has to.
> Yes, but those processes are much lighter weight than OS processes and can be created and destroyed in a very small fraction of the time that the OS does the same thing.
Yes, that is true. I don't think that is relevant to the semantics of the constructs.
> The Erlang VM uses something called 'reductions' which preempt Erlang processes when their computational slice has been reached which has absolutely nothing to do with the OS preemption mechanism.
That preemption relies on the runtime yielding at every function call. The only thing that can actually preempt a process mid instruction is the kernel, and actually it can't do that either it's the hardware that yields to the kernel.
> And an Erlang system can span multiple machines with ease, even if those machines are located in different physical locations.
Yes, that's not unique to Erlang, obviously one can launch processes on arbitrary machines.
> Erlang processes are more along the lines of greenthreads than full OS processes but they do not use the 'threads' mechanism the OS provides for the basic scheduling.
I think you're getting hung up on implementation details. The abstraction is semantically equivalent. One might be faster, one might be heavier, one might have some nice APIs, but in terms of supervisory primitives, as I said before, the only thing required is `link` and processes have that.
I'm too lazy to go to the various paper Joe cites but if you take the time you'll find that many of them are about processes
I fully agree that the async story needs attention.
Something as drastic as backwards-incompatible changes might not be needed.
But definitely much more documentation about best practices, highlighting the option of local async tasks, making the APIs for that more convenient, and also some low hanging fruits in terms of language syntax.
It seems a bit like while there was a lot of excitement about the async syntax a few years ago, now there is a MVP syntax and everything has just slowed down a lot.
> now there is a MVP syntax and everything has just slowed down a lot.
Sadly this is spot on. Seems like after async was delivered there was only one big push from tokio 0.1 to 0.2 to 1.0 and that was it, the ecosystem and/or language maintainers seem to have lost interest in making it better to read and use.
Nobody has lost interest, the language maintainers have been pushing on this problem for years. Scroll back through Niko's blog posts and see how many of them are related to improving async: https://smallcultfollowing.com/babysteps/
The reason that progress appears slow is because the next steps for async improvement have required extending the type system, and doing that properly (making sure not to introduce unsoundness) required a massive overhaul to the type checker, and forming an entire new team solely dedicated to the verification of the type system. Here's the most recent update on that work: https://blog.rust-lang.org/inside-rust/2023/07/17/trait-syst...
I see, thank you for the clarification. Shame that it takes so long. I really need Rust's async benefits without all the baggage and the 6 months trial-by-fire training that it requires, and I need them right now, but it seems I'll be using other languages for the moment, and just scale infrastructure. Which is a real shame because Rust did make many good calls.
AFAIK the focus right now is on supporting `async` in traits, which required a bunch of language features seemingly unrelated like Generic Associated Types, which were released in rust 1.65, and Type Alias Impl Trait, which is still WIP due to some complications.
The fun part is over. At least that is how a rust maintainer described it to me.
When I asked why doing the unfun part (documentation, maintenance, etc.) isn't a condition for allowing people to do the fun part (pushing cool new code), it got no response.
Because open-source projects survive on people having fun improving them.
There just isn't that many people working on the Rust language as a day job, and they all have their hands full. Some of them have been working on foundational stuff that is required to get async rust to work, but it's work taking place over years.
To give an example, the Inside Rust blog recently posted [1] an article about the Rust Trait System Refactor initiative, which is a refactor of one of most complex chunks of the rustc compiler, started earlier this year.
That refactor is required to unlock some advanced lifetime syntax. Advanced lifetime syntax is required for some specific GAT improvements. GAT improvements are required to unlock async traits.
It's not just "people are too lazy to write the documentation". A lot of thought went into this.
I'm not sure if parent is correct about adding lifetime syntax. To my understanding, the new trait solver will fix bugs and make features more consistent ("why can I put a trait bound here but not there?"). People have mentioned 'self for self-referntial structs but I think that's just wishful thinking right now.
Rust sweetspot is really for use cases where any kind of automatic memory management is forbidden, either due to real use case requirements (high integrity computing, kernel drivers,...), or due to existing domain culture that frowns upon any other kind of alternatives.
For everything else, there are more productive alternatives, even Go, which after generics is kind of ok.
> If you lose deterministic destructors and therefore the possibility of RAII, for me it is a loss.
Hum... Haskell does the "getting out of scope means it's clean-up" thing quite fine. Just like it's async handling is only paralleled to the agent-based languages.
It's just a matter of looking at a language where people want to do good things, instead of "we always did it this way".
Maybe for prototypes or small scripts but disagree otherwise. The promise of automatic memory management is that you don't have to think about memory.
But the second you don't think about memory you are doomed to write bad code anyway. Maybe because of performance but most probably due to architecture. To have a language that forces you to think about memory is not a curse, it is a blessing.
> But the second you don't think about memory you are doomed to write bad code anyway.
Yeah but the failure modes are different.
"Doomed" in C++ means now I've got worms all over my network. Pretty much a fully negative outcome.
"Doomed" in Rust means I can't change it anymore until I relearn and refactor everything. If it's a pacemaker or rocket engine, that's probably ideal. But if it's not...
"Doomed" in Go or Java means I gotta cough up another few GB of RAM and maybe kick over the service every month. For some random middleware, that's fine.
Dunno, tech debt is pretty easy to solve in Go/Java (easier in Java) as long as you dedicate some time to it. Excellent refactoring, code-coverage and instrumentation and profiling tools.
If attention is paid, if the architecture is good, all else like programmer salary/skill being equal, any of those languages (and plenty of others) will work fine. (And honestly, not a huge difference in the work to achieve that, a good design will look about the same in all four.)
It's only an interesting question which to use if attention is not paid, or you don't want to invest enough to ensure attention will be paid; in which case you need to look at the causes and effects of that.
I am not your parent commenter but to me they added the nuance of "not all programming languages are good and we should stop pretending otherwise". And "we don't actually have as much choice as we think".
Well, thanks. I was going for that first part (with the added detail that C is one of the languages being discussed on the context).
But I don't really agree with that second statement. People seem to not be aware of how much choice we have, and focus only on a few possibilities that are not even all of them good.
I suppose such a discussion could spiral into semantics so I'll only say this:
Yes, in theory we have a lot of choice but in practice you start a project and you need a good ORM / data-mapper, you need a good HTTP/Web/WebSockets framework, you need good DB migration solution, you need SSO for 5 services, and you need 20+ more things and when you start eliminating stuff there are barely any programming languages left. If even 20 are left at the end of such an evaluation that would be still pretty good (but in my practice you usually end up with maximum 7-8 viable choices).
You don't have to agree with me on this, or I with you -- it's my empirical observation which is prone to bias and filter bubbles (of course).
Alright, you confused me. I was simply saying that this (your words):
> any of those languages (and plenty of others) will work fine.
...is not necessarily true. I've tried a good amount of languages in my life and their killer apps lose steam very fast when you try doing commercial code with them. For the rest of discussion, well, I didn't understand its point, so not participating there.
Agree. But the argument put forward in favor of automatic memory management is that it speeds up development.
But if that is only true if you don't care about the architecture that argument doesn't hold (assuming you want a good architecture).
The claim I make is that you have to think about memory, either directly or indirectly. And if you have to think about it dealing with it yourself isn't a burden, it is liberating.
In in a similar sense to how a static type system is liberating. Which might sound a bit weird, until you need to refactor.
This is still a mostly circular argument though. An architecture is "good" if it's saving resources - compute, time, or labor. If your "good" architecture is slowing down development i.e. more time more labor, you'd better be able to point to significant compute gains or it's not actually good. Rust is, only sometimes, and marginally, better than other languages in this regard.
And your compute gains need to be something you can't just trade off against money for development time, i.e. it either needs to be a fundamental part of your problem, or you need to be planning for hyperscale. This is also only a small fraction of problems.
Just because a language has some form of automatic memory management, that doesn't mean it doesn't provide the same mechanisms of a language like C for low-level programming.
Plenty of options, D, Go, Swift, C#, Nim, Eiffel, Common Lisp, Haskell, OCaml,....
Dead options that unfortunely didn't manage to win market adoption, Oberon, Oberon-2, Active Oberon, Component Pascal, Modula-2+, Modula-3, Cedar,
Speaking of Cedar, when will Rust managed to have a full workstation OS?
Redox still has a bit to catch with what a full graphical workstation OS was capable of in 1980's hardware, using a systems programming language with automatic memory management.
Naturally, there are other examples out from Xerox PARC, Genera and Texas Instruments on that regard.
Nope, you're the one missing the point, just because a language has some kind of automatic memory management, it doesn't mean C like features for deterministic manual memory management, or any other kind of manual resource management aren't available.
Honestly, is 150k events per second even a meaningful number to talk about in performance context? I've got Clojure code ingesting 150k messages per second, processing, then outputting 30k msgs per second sitting at 47% CPU on a cheapo 70 $/mo VPS. Straightforward unoptimized C/C++ achieves millions of ZeroMQ messages processed per second in a single thread.
Why use async or even Rust at all at such small loads?
I knew somebody will call me out since I didn't give details. :D
It was a 1-2 vCPU k8s pod which is still pretty impressive IMO. On my 20 CPU threads workstation I can easily achieve multi-million events per second. Even one of my Linux laptops can go at around 1 million but its ETH interface started overheating and couldn't sustain it longer.
> Why use async or even Rust at all at such small loads?
That's the better question, yeah. We had 200+ k8s pods and had to ingest a lot of data is the simple answer. Using anything except C++ or Rust would have made our cloud bill much bigger.
Though I have to admit that nowadays I would be very strongly tempted to try with Golang or even OCaml's new multithreaded runtime.
a) Saying Node + Deno are good is a stretch. Node has horrible performance, even for simple routing. And I'll source that[0].
b) Saying that adding `Send + Sync + 'static` bounds is a serious burden is, to me, overstating things.
> the far better model for writing performant servers.
It's completely workload dependent. For a chat server it's almost definitely not going to be more performant and you may end up with worse latency.
> it only costs you friction everywhere else in your entire codebase, and quite often performance as well.
I am unconvinced tbh. I do not believe that adding Send + Sync + 'static bounds is onerous, I do not believe satisfying those bounds is hard (it's almost always just a matter of moving the value), and I do not believe that the vast majority of programs benefit from TPC architecture.
I recognize that there is a problem here - that we are optimizing for one runtime at the expense of others - but I am not convinced at this point that the problem matters.
Not sure if this is facetious, but HN does have a rule that prohibits editorializing. From the Guidelines: "Otherwise please use the original title, unless it is misleading or linkbait; don't editorialize."
> Yes the RwLock and mpsc comes from Tokio and lets you .await instead of blocking a thread, but these are not async primitives, these are multi-threading synchronization primitives.
The only reason all this async stuff even exists is because we want concurrency. We want to say "while this one task waits for I/O, this other task will do stuff". So it's not too surprising to me that an intro to async would include synchronization primitives. Those primitives aren't really "thread"-specific if by thread you mean OS thread. When you do async like this, you're basically re-implementing OS threads in user space.
Multi threading and concurrency are not the same. You can have a very high performance server that handles thousands of requests concurrently on a single thread. That's how node/deno do things.
But the way to do things in async rust is that if you want concurrency you also have to use multithreading. At least that is what you see in all the examples and docs.
As soon as you require your futures to be Send you have to use not just concurrency but full blown multithreading primitives.
So e.g. you have to use Arc<RwLock<...>> where a Rc<RefCell<...>> would suffice.
In terms of C++ code that would equate to std::shared_ptr<std::shared_mutex<...>> which ... sounds quite wasteful in terms of scalability/performance. Why is it not possible to simply return a Rust promise? That's the way I do it in my C++ async (executor) library backed by work-stealing queues under the hood.
I think it's worth understanding why you need Arc and RwLock here.
If the compiler fails to type check because a future is not Send, it means that the future's context holds onto some state that is not safe to be sent to a different thread. For example, holding a reference to something on the stack across an await point. If it fails because something is not Sync, it means the future's context holds onto some state that is potentially accessed concurrently from multiple threads.
If you write your async code such that it does not have these concurrency bugs then you don't have a problem. Arc is a convenient way to make something that is Send and RwLock is a convenient way to make something Sync. You can also fix your code by not introducing concurrency bugs.
This isn't perfect though because the errors are often cryptic. You can have a function that is fine in one context completely break when called in another with "future is not send". And good luck identifying where the bug is and how to fix it.
As for constructing a future directly, sure you can do that. But keep in mind Rust futures aren't promises that just wrap some .then callback. They are types that implement the Future trait which has a method .poll that takes self by Pin and a Waker object to prevent the underlying data from being moved and invalidating references as well as wake other pending futures. If that sounds complex, it is, because async in Rust is actually super complicated and the syntactic sugar does a massive amount of work.
The lengths that the Rust devs went through to minimize the overhead of async/await is impressive but also horrifying. And unfortunately, async/await certainly isn't a zero-overhead abstraction if you consider ergonomics/learnability. I wonder if it can evolve without breaking backwards compatibility.
Not sure what you mean by "returning a rust promise". But yes, Arc<Mutex<...>> is a thread safe smart pointer containing a mutex, which has quite some overhead.
You will see this in many places in the internals of async code that has to be Send.
OK in many cases, but it can kill you if you have to go through this for many small operations.
Yes, that's what I meant - it's a heavy operation consisting of multiple dynamic memory allocations and atomics. Former can be somewhat addressed through custom allocators and which in my case proved to be a big win for small and short-living tasks.
Because there is no “instead”? A promise (or really a task if you’re in an async context, what you link to is really a thread + a pseudo-oneshot channel) allows independently scheduling a unit of work, it’s neither shared nor mutable state.
The promise object you link to is not Clone, so there is no way for two different threads to refer to the same one without adding an intermediate `Arc`, and because all the methods on it operate by value that’s not exactly helpful either as you can’t move the promise out of the Arc.
Same with a tokio::task::Task. A JoinHandle can’t be duplicated (though according to its doc on some platforms it might be duplicatable — if the thread returned a duplicatable value anyway).
I think the article's point is that if you're doing single-threaded concurrency then you don't (or at least might not) need synchronization primitives. If you're writing a Node script, for example, then you don't have to worry about two callbacks trying to modify the same variable at the same time because you know that there is only one thread of execution.
True. Rust's mandate for "zero runtime cost abstractions" is one of the better justifications for async/await. Because async/await, too, has significant limitations, notably:
1. Coloured functions
2. Placing the concurrency design decision (async vs not) on the implementer of a function, not the caller
3. Debugging complexity
(1) and (2) are mutually, negatively, reinforcing. As the implementer of a library function, I can't possibly know all the possible scenarios in which my function will be called. If I make it synchronous, then I'm blocking my caller. Which might be an issue - or might not. If I make it async then I don't block the caller - but I've just consigned the entire call stack to be async.
There's clearly strong opinions on both sides. Some are strong advocates of async/await. Personally, I'm not. Multiple, concurrent, fine-grained sequences of actions fit my mental model much better. Async/await doesn't give me that: I find reading async/await code to be a bit like reading control flow with GOTOs.
My preferences are undoubtedly influenced by working with Erlang, which has supported fine-grained sequences since the beginning. There are no coloured functions; just functions. As the caller, I can decide if I want things to run sequentialy (call from another function) or concurrently (spawn as a separate Erlang process). I have the context I need to make that decision.
It's not a panacea by any means (see e.g. Structured Concurrency [0]). But it fits my mental model much better.
I'm not fan of async, but I don't like to deny reality by saying just use Structured Concurrency/Green threads/Magic Multithreading Powder. They all have their trade-offs.
And to be fair they are working to solve Colored functions.
However, async is still half-baked because no support for async traits and other corner cases.
Which was again caused by async being in RFC, development hell.
"If you write regular synchronous Rust code, unless you have a really good reason, you don't just start with a thread-pool. You write single-threaded code until you find a place where threads can help you, and then you parallelize it, [..]"
I cannot agree more with that. As someone who's done a good deal of Java in my day job, I can tell you a thing or two about spawning threads willy-nilly. At least it is easier to avoid in Rust, but I'd still prefer it the other way round: opt-in, instead of opt-out.
It is opt-in. If you're using Tokio then you can specify whether you want to use a single-threaded or multi-threaded runtime. Multi-threaded is "default" in the sense that if you just use `#[tokio::main]` then you get a multi-threaded runtime but you can also just do `#[tokio::main(flavor = "current_thread")]` to get a single threaded executor.
More to the point, even using a multi-threaded runtime won't spawn threads willy-nilly. It will default to using N worker threads (where N is the number of CPU cores available).
> It is opt-in. If you're using Tokio then you can specify whether you want to use a single-threaded or multi-threaded runtime. Multi-threaded is "default" in the sense that if you just use `#[tokio::main]` then you get a multi-threaded runtime but you can also just do `#[tokio::main(flavor = "current_thread")]` to get a single threaded executor.
Doing something extra to get a behavior different than default is the definition of opt-out my friend.
That argument doesn't hold up. Adding `#[tokio::main]` is opting in because it is that extra something that has to be done. Adding that line is opting in to the multithreaded runtime.
The willy-nilly was in regard to Java and more precisely the ecosystem, not the language itself. In Java I find it hard to have N worker threads, where N is the number of CPU cores available, because many common libraries spawn threads - willy-nilly I'd say.
I have not enough experience about the Rust ecosystem but I hope it will successfully avoid a similar fate.
> Making things thread safe for runtime-agnostic utilities like WebSocket is yet another price we pay for making everything multi-threaded by default. The standard way of doing what I'm doing in my code above would be to spawn one of the loops on a separate background task, which could land on a separate thread, meaning we must do all that synchronization to manage reading and writing to a socket from different threads for no good reason.
Why so? Libraries like quinn[1] define "no IO" crate to define runtime-agnostic protocol implementation. In this way we won't suffer by forcing ourselves using synchronization primitives.
Also, IMO it's relatively easy to use Send-bounded future in non-Send(i.o.w. single-threaded) runtime environment, but it's almost impossible to do opposite. Ecosystem users can freely use single threaded async runtime, but ecosystem providers should not. If you want every users to only use single threaded runtime, it's a major loss for the Rust ecosystem.
Typechecked Send/Sync bounds are one of the holy grails that Rust provides. Albeit it's overkill to use multithreaded async runtimes for most users, we should not abandon them because it opens an opportunity for high-end users who might seek Rust for their high-performance backends.
> Also, IMO it's relatively easy to use Send-bounded future in non-Send(i.o.w. single-threaded) runtime environment, but it's almost impossible to do opposite. Ecosystem users can freely use single threaded async runtime, but ecosystem providers should not.
We have Send and non-Send primitives in Rust for a reason. You could use Arc/Mutex/AtomicUsize/... everywhere on a single thread, but you should use Rc/RefCell/Cell<usize>/... instead whenever possible since those are just cheaper. The problem is that in the ecosystem we are building the prevailing assumption is that anything async must also be Send, which means we end up using Send primitives even in non-Send contexts, which is always a waste.
> If you want every users to only use single threaded runtime, it's a major loss for the Rust ecosystem.
Running single threaded executors does not prohibit you from using threads, it just depends on how you want to do that. You can:
1. Have a single async executor running a threadpool that requires _everything_ to be Send.
2. Have a single threadpool, each thread running its own async executor, in which case only stuff that crosses thread boundaries needs to be Send.
The argument is that there are many scenarios where 2 is the optimal solution, both for performance and developer experience, but the ecosystem does not support it well.
I am eager to see what comes of tokio-uring, though. Especially if someone comes up with a good API for reads using shared buffer pools... that one might cause API changes everywhere.
> The problem is that in the ecosystem we are building the prevailing assumption is that anything async must also be Send, which means we end up using Send primitives even in non-Send contexts, which is always a waste.
Honestly I don't think every user want top notch throughput when writing asynchronous Rust applications. Most users would just want the correctness of the Rust type system, and its "lightweight" runtime characteristics compared to CPython, Node.js etc., which can provide fairly good performance.
The thing is, using Arc in single threaded runtime does not greatly harm performance. If it matters, you should be handling 1M+ rps per core and using multithreaded runtime because it scales better, which benefits from using threadsafe primitives.
I find all the async stuff in Rust incredibly ugly, cumbersome, and its one of the biggest reasons I prefer C++, still. C++ lets me just write single- or multithreaded code, because none of the dependencies force their `async` stuff on me. Yeah, its up to me to ensure things are synchronized, but I'd rather do that than try to figure out how to get some dependency that isnt meant to use async to work in some async move closure.
They are usually doing some ugly stuff internally to make this work, but as a pure library user you don't have to care.
If you want to write something small that e.g. pulls some data via http, performs some computation, then pushes the result, it is totally possible to do this fully in sync rust.
And if you are writing a highly concurrent web server, looking into async might not be the worst idea.
One thing I hate about "async" systems in general is the absurdity of suddenly having two types of functions behave differently with same syntax. And you have to add "await" keyword to async functions to make them synchoronous, i.e. behave normally.
I find the Go's approach of "everything is synchronous, except when made asynchronous with the 'go' keyword" much more pleasant and much less error prone. Is there a Rust library that does this - that provides coroutine-style asychronous behavior, but with synchronous Go-like syntax?
Go can do this because it’s closer to Java or C# than C. It has a runtime that gets compiled into every binary. Rust deliberately avoided that by design.
Why is runtime needed for a syntactic feature like that? All you need is an async wrapper for system calls, and every sync function could, in principle, be compiled down to a coroutine and ran on an executor of your choice.
Just because Go has a runtime and has a go keyword, doesn't in itself mean that runtime is necessary for the go keyword to exist.
Any goroutine can be interrupted to let another goroutine run on the respective OS thread, without the involvement of the code author. The thing managing this scheduling behind the scenes is the runtime. Also, the entire standard library is built atop non-blocking IO, which again is managed by the runtime behind the scenes (e.g. waiting for file descriptor readiness, scheduling the respectively blocked goroutine, etc). And there is no “async system call wrapper” involved - these are fundamentally different system calls (for epoll or kqueue or whatever).
You could argue that all go code is async (in the Rust sense), and because of this uniformity, there’s no need for the syntactic distinction that Rust requires you to make.
Rust could have done the same (and at one point in time they did have green threads), Rust decided that the convenience to Rust programmers wasn’t worth the cost elsewhere (the inclusion of a runtime, performance overhead, C interop, etc).
Sure, but I'm not saying "implement all Go features in Rust" - all I'm saying is "use function call prefix to asynchronize function calls instead of synchronizing them". What exactly makes that particular syntactic construction troublesome? Compiler could compile it exactly the same way it's doing now.
That means that the interop story for golang is horrible. Golang can somewha work with libraries with C bindings. But you can not publish a golang lib as a lib.so with C bindings because of the runtime.
You can, actually, using '-buildmode c-shared' and '//export FuncName' comment directive. CGo supports exporting Go functions in shared libraries, and sets up the Go runtime to be used properly with native C types (e.g. int64 will be long long).
Having written a lot of async and sync rust, and having both components interact with one another, calling async stuff from a sync context isn't even that ugly these days. But maybe it's stockholm syndrome.
The implementation is really not complicated, though it’s hardly great if you expect it to be performant: reqwest just creates a tokio runtime, schedules the task on it, waits for completion, then tears it down.
Can you expand what ugly stuff they're doing internally? I mean, synchronous code is the most simple way to do anything. POSIX API is synchronous by default. Why would you need ugly stuff? Asynchronous API might force you to do ugly stuff (e.g. for files or DNS).
Let's say you are using a low level networking library that is async to the core, like hyper or quinn.
Then you will have some ugliness at the boundary. You might get by with just spawning a current thread runtime to use your async lib, or do something like blocking channels if your low level library spawns long lived tasks.
If your low level libraries are all sync, then there won't be much ugliness.
Eh, in that case, you can also just write single or multithreaded Rust code, entirely ignoring async Rust. I do. And multithreaded Rust is much more pleasant experience than multithreaded C++.
Which dependency is forcing async Rust on you? Alternatives are usually available.
Postgres (which is a wrapper around tokio-postgres, and as a result drags the whole tokio bloated dependency graph). I would love to have a purely sync alternative to that.
I am so glad someone said this. This also shows nicely how async is wrong, not just that it's viral, but it's bad design because it forces code duplication.
Middleware/library writers that touch on anything that could be async (db, SPI, network etc.) will now have to write two versions of their API and duplicate most code.
I find the async stuff obnoxious, too, and I avoid it. Helps that I don't work in the web space.
But in general the dependency story with Rust is better than with C++. I don't miss the integration story with C++; just bringing in a dep in the first place is a roll of the dice on whether it's going to work with your build system. And then whether it brings with it some other lifestyle assumptions (exceptions, some third party deps you don't want or can't link to, work with your compiler or not, etc.)
Honestly, I work in systems & embedded stuff, and it's not hard to avoid async in Rust. It's only once you start poking at web-adjacent stuff that you run into the wall, and then when you do you find yourself covered in the taint of people who brought their NodeJS Stockholm Syndrome over to Rust, but that's another rant...
Or just stick with cmake + vcpkg/conan, and enjoy the existing ecosystem of C++ libraries.
Nowadays it is impensable to do a Java project without Maven/Gradle, yet it took about 10 years for Ant to be relevant (counting from 1996), and couple more for Maven, and yet another few for Gradle.
Similarly with NuGET and MSBuild evolution.
Yet they were eventually adopted, same is happening with vcpkg/conan.
It's not always that simple. What if your sync lib has some callbacks, and you want to do something in the callbacks that requires async.
You could argue that this mismatch exists even with rust itself, where you have Drop, which is sync, and you might have to do something in your drop that requires async, like closing a network connection.
You have to pass in some runtime handle that you can use to spawn a task to do what you have to do. This is definitely not simple and beginner friendly. Or even worse - say goodbye to RAII and tell people that they have to explicitly call an async shutdown fn.
Which panics if you don't call it in the right context. So now you have code which can only be used under certain circumstances and with one particular async runtime.
Since when has it become acceptable in rust to have fns that just panic depending on the state of some thread local? That is one thing I really dislike about tokio.
The thread local runtime is convenience over correctness, which is antithetical to what rust usually favours.
You've made the giant assumption that most people working in Rust are working with some kind of executor or framework, and you're either wrong, or the state of the Rust world has become very depressing.
tokio::spawn_blocking doesn't exist in my codebase because I'm not using tokio, nor should I have to in order to use library code from a crate.
This is the problem with async in a nutshell; it's viral, and it brings with it a lot of lifestyle assumptions. Assumptions that I think people who are used to working in the web framework world are maybe comfortable with, but which should not have been allowed to spread to the crates ecosystem broadly.
async is fine as an implementation option for your service or binary; but its propagation into library code means now people using that library are tied into requiring a framework to host it with. And this is frankly just bad hygiene, and in a way seems contrary to Rust philosophy generally.
Most crate writers are polite enough to offer async and non-async bundling of their code. But on more than one occasion recently I've run into dependencies that did not do this, or did not do this thoroughly (offering sync versions with less features, for example).
> tokio::spawn_blocking doesn't exist in my codebase because I'm not using tokio, nor should I have to in order to use library code from a crate.
If you're trying to use a library that uses Futures, then you need an executor to poll that futures to completion. But then it's not the same situation as the one I'm responding to (where it was about using blocking code in an async context).
If you have blocking code and need a way to poll third-party futures to completion then it's tokio::block_on or equivalent that you need.
> This is the problem with async in a nutshell; it's viral
That's the biggest misunderstanding about async/await: it's not async that's viral, it's IO. If your library does IO, then you do IO and have to deal with the fact. And you have to actively deal with it no matter what, because IO is slow and you definitely do not want your UI thread be frozen[1] because something under the hood is doing IO. The only difference is that async makes it explicit in the typesystem, where blocking does not and you need to guess where to spawn new threads.
[1] or your thread that's listening on incomming sockets.
Technically true, but in practice you’re constrained by whatever is provided by the ecosystem. And the truth is that many Rust projects were forced to choose async or sync, so these are often disjoint and fragmented. Even within async, crate authors need to pick and choose which async ecosystem to support, due to lack of support in std for spawning tasks and running simple IO, which most projects need. This is slightly alleviated by the Tokio hegemony, but that itself is a deferred migration minefield for Rust to handle.
Libraries don't just sprout out from nothingness, someone has written those. And if there isn't a library that suits you, that someone can be you then. Key observation is that the ecosystem is only additive, it doesn't deny you of anything.
This is true to an extant, except it's also a cultural problem: it spreads across crates, and soon you've got tokio spread everywhere.
But it's also a concern for me that a language feature was added that requires runtime framework support to function. This is bad separation of concerns.
It (async) is viral syntactic sugar that doesn't need to exist; asynchronous programming has been done for decades without it.
The thing the article calls bad is not the thing the examples illustrate. The set-up is supposed to be that multithreading is a pain, but none of the examples actually do that. Take the initial list. "These are not async primitives, these are multi-threading synchronization primitives." Yeah, but they're multi-threaded forms of stuff you still need. If you needed RwLock in multi-threaded mode, you need RefCell in single-threaded mode, and the API is virtually identical. If your state needed to be in an Arc, it'll need to be in an Rc. And if you needed sync::mpsc, you'll instead need... sync::mpsc.
A Send bound is no great burden. The only type that you will regularly interact with that is not Send is the lock type from a Mutex or RwLock, which is good because if you hold it across a long await you can slow down your app, a bug prevention mechanism that does not exist in straight multithreading. The only point that actually illustrates a threading-caused problem is the thing about multithreading sockets, which it admits is almost imperceptible, and which you can also solve by not doing that.
Almost everything the author identifies as a parallelism problem is a 'static problem. Tokio is missing a scoped spawn like std has, and if it gained one then the much described multithreading woes would reduce to basically nothing.
Agreed. Perhaps we can have a syntax to declare what is editorialized. We already add "(YEAR)" and "[pdf]" and "[video]". We could "[Rust]" or other keywords like that about the post missing from the title?
It's a frustrating area. As I've mentioned before, I'm writing a high-performance metaverse client in Rust, something which has many of the problems of both a web browser and a MMO. If you want to have a good looking metaverse, it takes a gamer PC multiple CPUs and a good GPU to deal with the content flood. (This is why Meta's Horizon looks so bad.) Now you have to use all that hardware effectively.
So what I'm writing uses threads. About 20 of them. They're doing different things at different priorities towards a coordinated goal. This is different from the two usual use cases - multiple servers running in the same address space, and parallel computation of array-type data.
Concurrency problems so far:
- Single-thread async is simple. Multi-thread async is complicated. Multi-thread async plus other threads not managed by the async system isn't used enough to be well supported.
- Rust is good at preventing race conditions, but it doesn't yet have a static deadlock analyzer. It needs one.
- Different threads at different priorities do work in both Linux and Windows, but not all that well. With enough low-priority compute-bound threads to keep all CPUs busy, high-priority threads do not get serviced in a timely manner. I was spoiled by previous work on QNX, which, being a true real-time operating system, takes thread priorities very seriously. On QNX, compute-bound background work has almost no effect on the high-priority stuff. Linux just doesn't work well at 100% CPU utilization. Unblocking a lock does not wake up a higher priority waiting thread immediately. This can delay high-priority threads unnecessarily.
- The WGPU crowd has spent a year getting their locking sorted out so that you can load content into GPU memory while the GPU is rendering something else. It's a standard feature of Vulkan graphics that you can do this, but it has to be supported at all levels above Vulkan too. For me, that's WGPU and Rend3. That stack is close enough to ready to test, but not ready for prime time yet.
- There's no way to cancel a pending HTTP request made with "ureq". "reqwest" supports that, but you have to bring in all the async and Tokio stuff, which means you now have multi-thread async plus other threads. This is only a problem for what I'm doing when the user closes the window, and the program needs to exit quickly and cleanly. I'm getting a 5-10 second stall at exit because of this.
- Crossbeam-channel is not "fair"; it's possible to starve out some requests. Parking-lot is fair, but doesn't have thread poisoning, which means that clean shutdowns after a panic are hard.
- Running under Wine with 100% CPU utilization with multiple threads results in futex congestion in Wine's memory allocation library, and performance drops by over 99%, with all CPUs stuck in spinlocks. The program is still running correctly, but at about 0.5 frames per second instead of 60 FPS. Bug reported and recognized by the Wine crew, but it's hard to fix. I can make this happen under gdb running my own code and see all those threads in the spinlocks, so I was able to file a good bug report. But I haven't generated a simple test case. It's a Wine-only problem; doesn't affect Microsoft Windows.
So that's life in a heavily threaded world.
Individual CPUs have not become much faster in over a decade. Everybody has been stuck at 3-4 GHz for a long time now. CPUs with many cores are widely available in everything from phones to game consoles. To use modern hardware effectively, you need threading.
I think rust does it right in many cases. Raw rust multithreading is a pleasure to use, and some libraries like rayon are extremely unobtrusive.
That is why I am somewhat unhappy with the state of async rust. It does not yet have the quality and unobtrusiveness of sync rust libraries like rayon.
CPU cycles are practically free but memory latency and synchronization are not. (Hardware threads gives more CPU cycles capacity). People's scalability problems are memory related, not CPU cycles. In other words, we can't scale updating a single memory location with more hardware threads. People often want to scale the updating of something so they add threads, but the contention of memory synchronization prevents that.
Scaling independent memory updates works well and is embarrassingly/pleasingly parallel.
I am working on a multithreaded design that is scalable and easier.
The idea is that if you represent your problem, data flow and control flow as a tree where branches do not communicate, you can get the purported scalability of multiple hardware threads.
Yes. The state of the art technology has not percolated everywhere. Google's tech stays locked up in Google.
Parallelism in many languages that are mainstream is difficult to get right and error prone. I don't think most developers should be working with low level threads or synchronization primitives for business purposes.
I am working on a notation and runtime and I'm thinking of automatic parallelisation.
There is also Go and Erlang but this is an area I am interested in enough to have an attempt myself.
Multithreading can be difficult and I recommend this blog post:
Google is not the 70s, and they haven't really done much, and not recently.
Google's big thing in that area maybe was MapReduce, which was quite behind the state of the art when they first introduced it.
I believe they eventually moved to BSP, which is more in line with what academia have been developing since the early 90s, but that's not really proprietary stuff.
But in any case, if you're just talking about scheduling graphs of tasks, there are thousands of competing libraries that do it.
I am not familiar with any particular research, I just don't doubt that what I'm doing has not been done before. I still think it's worth doing.
What I meant was that some ideas get trapped in corporations and never open sourced. If you have any papers you recommend, I would read them. I have a list of whitepapers on my Github profile.
Recently I was trying to paralellise the A* graph search algorithm for code generation. I got it parallel by sharding neighbour discovery per thread. This speeds it up, so it scales because I sharded the problem space.
What if people could model the problems as data flow problems and then the paralellisation is automatic? In my experience all the tutorials and materials on the internet refer to threads or go channels. These are low level primitives for what I'm aiming for.
Thanks for the reference to "Bulk synchronous parallel", I had not heard of that.
I'm investigating at turning arbitrary OOP style code into LMAX Disruptor style pipelined actor code that crunches through events in parallel.
My goal is that programs written in my notation are parallel by default, without explicit design due to sharding and scheduling. I tried naively tried to shard a simple bank without my syntax and that gets 700 million requests per second on 12 threads, because I shard the money into different integers across threads. I want this kind of thinking to come by default.
The automatic parallelisation I was thinking about I've seen is to do with loops or autovectorisation. I want to combine event streams with loop parallelisation. I am also interested in SIMD+multithreading together but I've not studied that in any detail.
I am inspired by Alan's Kay's original idea of OOP programming.
So I've turned objects that are routed to "actors" or "tasks" that emit "event arrays" or streams and then parallelise their processing. Loops are first class citizens. This is inspired by coroutines and generators.
I turn control flow to data and parallelise that in addition to data parallelism.
I use routing to shard.
Take the canonical example of being a search giant and you want to download multiple files, parse them for links, then index those links and then save them somewhere. You have CPU heavy tasks and IO heavy tasks. I want to multiplex IO with coroutines per thread and CPU tasks per multiple threads. You could say this is an example of what you describe as scheduling graphs of tasks.
url(url) | task download-url
for url in urls:
fire document(url, download(url))
document(url) | task extract-links
parsed = parse(document)
fire parsed-document(url, parsed)
parsed-document(parsed) | task fetch-links
for link in document.query("a")
fire new-link(url, document, link)
new-link(url, document, link) | task save-data
fire saved-link(url, link, db.save(url, link))
for url in ["http://samsquire.com/", "https://devops-pipeline.com/"]:
fire url(url)
This program corresponds to the following event stream:
url() events are in array, document() are in an array() parsed-document() are in array, so they can be performantly processed, by different threads compiled down to loops.
They can be routed to shard.
The events of this event stream can be dispatched from/in different threads (and machines) and routed to be crunched in parallel. Each event corresponds to a "mailbox" which corresponds to a thread mapping and we can define topologies of graphs like you say based on these objects and events.
Some of my thoughts:
* I am trying to combine coroutines with threads for efficient scheduling.
* If OOP interactions can be mapped to multidimensional array buffers, we can even inline code to be autovectorised as simple loops.
* I want to integrate my 1:M:N lightweight thread scheduler, and epoll based server (hopefully I can rewrite it to use liburing) with a general purpose parallelising runtime.
* Vitess is a sharding database proxy, sharding is extremely powerful paralellisation technique.
Async is and probably will always be less usable than blocking Rust. It is a very, very useful mode of operating when you really need two of its biggest benefits: lightweight cooperative concurrency and task cancellation, but it comes at a big usability cost.
Rust software should use async tactically - in places where it is needed. Unfortunately handling http, which is a large part of many applications is actually a place where async has benefits. But if you plan to run your http behind nginx anyway (for TLS termination) even there using blocking http server might be a good idea.
> If you write regular synchronous Rust code, unless you have a really good reason, you don't just start with a thread-pool. You write single-threaded code until you find a place where threads can help you, and then you parallelize it,
I disagree with this one. When you work on a software project you should have the basic architecture figured out already, and main part of that is breaking your software into structurally parallel parts that can work independently. Adding ad-hoc parallelism after the fact works only for small scale things and will lead to rather accidental concurrency architecture.
Then for each part (groups of threads), figure out if it *needs* async. Between each part you'd communicate via channels or some shared data structures that rather easily can be made to work with both async/blocking code.
So e.g. an async http server, benefits from lightweight async concurrency, makes rpc-like channel-based calls to blocking IO / CPU/bussiness-logic intense workers (that don't benefit from async) where it makes sense. Each part is written in the best "type of Rust for its use-case".
Or if you need ability to cancel certain computations inside the larger framework (e.g. simulating agents etc.) you might want to nest async executor inside a blocking code.
Note: there's a lot of types of program archetypes out there (CRUD, ETL, data-intensive, embedded, frontent SPA, native mobile app) and I've noticed that many people are boxed in the type they happen to work on. CRUD applications (which are very common) are often 90% http handling-based and it might make sense to write them whole in async Rust.
Your CRUD web application server almost certainly doesn't need async Rust. Using a blocking HTTP server is not "might be a good idea", it simply is a good idea.
I recommend Rouille for this: https://github.com/tomaka/rouille. In case you are worried about performance, check the benchmark. Blocking Rouille is faster than builtin async server in Node.js.
>Your CRUD web application server almost certainly doesn't need async Rust. Using a blocking HTTP server is not "might be a good idea", it simply is a good idea.
How so? By what logic?
> Blocking Rouille is faster than builtin async server in Node.js
Because there are no advantages and only disadvantages of using async Rust. Async Rust is harder to use, and by assumption you don't need async Rust performance.
Yes, but since Rust ecosystem went all in on async http, you'll have to use some niche http framework, and dumb CRUD code is not going to suffer too bad from async's limitations.
I really wish there was a popular and community embraced blocking http framework in Rust, to tip the scales here a bit.
It's not cooperative if multiple things happen _at the same time_ which is always touted of rust async. Cooperative would be iterators, or generators, or coroutines/continuations. They let you do things in a single thread but have the execution order be mixed. That's concurrent, but not parallel. What you are talking about is parallel execution. That changes the classification away from cooperative. Sorry, just a pet peeve of mine.
If you use rust async with a local executor such as the tokio current thread runtime, it provides lightweight cooperative concurrency.
There is nothing ever happening at the same time, since you are on a single thread. That is why you don't need synchronization primitives such as Mutex but can live with something lightweight like RefCell. And that is why you can get by with non atomic reference counting smart pointers (Rc instead of Arc).
And it is cooperative in that you have to yield by calling await, otherwise nothing else will run.
What the article argues is that the option concurrent but not parallel, which both tokio and futures support in principle, should be advertised more and maybe even be the default.
> If you know anything about asynchronous sockets it should be that multi-threading a socket doesn't actually yield you more requests / second, and it can actually lower it...
Re-read this a few times, and I'm fairly convinced it is not generally true. The author is also being a bit confusing about what exactly he means by "socket" here. Because while it's true that multi-threading over a server socket (e.g. the one that binds to the port when you launch the server) will not yield performance gains, multi-threading clients (that have their own sockets, including file descriptors) definitely will. That's the whole point of nginx thread pools[1]. Note that nginx does zero "CPU-bound" work, it literally just serves files.
Node/Deno being single-threaded is purely a limitation of Javascript. Tomcat, Jetty, etc. are all multi-threaded. I'm a bit tired, so I can't comment on the rest of the post in detail, but this was a bit of a red flag.
For me the biggest issue with Async is the management of multiple dependent async calls. It has some weird thing going on and I am not sure which pattern to use exactly. Some functions expect exactly same async fn signature some not and I am not sure why and which one to use.
If you have a function that "depends on" another function you call it within the other function. If it's async you .await it. Do you mean something about spawning tasks or passing callbacks around?
The code dealing with your clients looks suspect. Why do you have a global with a .get method that is async and returns a reference? It should probably be sync. On top of that it probably shouldn't be a global variable. Pass it in as context to the handler.
I'm also suspicious of having this `AwsClients` struct and a bunch of free functions that take it by owner. Why aren't you adding an `impl` block with those as methods? And why are they taking ownership requiring the .clone`?
Great questions. I guess because I never used any language that had the concept of ownership and I could not find any example how to handle these connections. I try to take your input and refactor the code based on your input. Thanks a million!
Rust asyncs design was probably the mostly correct decision for what rust is: A general system programming language which you can use in most situations people today use C,C++ and more.
If rust would only be targeting web server programming the right decision might have been no async and green threads,
it's just much easier to use.
But that rust would likely never have succeeded as most of it's initial success cases where for use-cases where you wouldn't want green threads.
Nicely we might still get no async and green threads: In form of run times which run WASM compiled rust code in a node like fashion. Probably in combination with some serverless/edge-compute providers which hopefully will be nice to use.
While the article mostly focuses on the cognitive cost, which I deeply sympathize with, I do wonder about the runtime performance cost. Are there any good benchmarks actually quantifying the impact of all that extra thread-safety and the hoops that it adds? I'm not asking simply due personal interest in seeing the numbers, but also because if we want to steer the community towards this non-threadsafe direction it would help to have material to back the ideas and I suspect Rust community would be more responsive to complaints about perf than cognitive cost.
Anecdotal evidence, but many large rust async code bases perform significantly better if you are using a single threaded runtime vs. a multithreaded runtime.
And this is with quinn still carrying around the synchronization primitives to make everything Send.
Another piece of anecdotal evidence: the functional collection library https://github.com/bodil/im-rs comes in two flavours: one using Arc and one using Rc. The reason for this is that you get a significant performance increase by using Rc (where manipulating the refcount is just an integer inc/dec) vs Arc where manipulating the refcount needs to be an atomic op.
Last but not least - often making futures Send requires you to box your futures. Send and lifetimes just does not play well together. Do something like this:
This is what eventually you would want something like async_trait to desugar to. Use the above in a Send context, and you get a completely undecipherable higher-ranked lifetime error. So you are back to boxing all futures.
> Last but not least - often making futures Send requires you to box your futures. Send and lifetimes just does not play well together. Do something like this:
You should ~never need to box your futures to make them Send. You do sometimes need to box them to make them Unpin, but most of the time this is a tradeoff where it's possible to find a way to avoid moving them instead.
It seems like these screeds (both the OP and this reply) come from the misunderstanding that "X is hard, Y sometimes comes up in the errors when trying to do X, therefore X would be easy if we got rid of Y". In reality, Y often just demonstrates intricacies of X that you weren't thinking about.
Yes, it currently needs to box. But again, Sync is just a symptom of the real issues here (async fns have unpronouncable return types, opaque return types (impl trait) isn't supported in traits, and we don't have a good syntax for expressing constraints on opaque return types). Getting rid of Sync wouldn't fix any of those issues.
You don't get weird higher-ranked lifetime errors when you use non Sync futures with hand-rolled async traits.
I am not saying that this is the reason to use local futures. But if you have decided that local futures are the way to go anyway, it's a nice benefit.
Moaning aside, what is (if any?) the direct equivalent to single-threaded asyncio, like Python or node.js, in Rust?
The thing I enjoy about async in Python is that it's very easy to write "thread"-safe code - you know exactly where you might give up execution context, and 90% of the time you have no need for mutexes and locks. But as this article complains, in Rust, it seems to be sync or multi-threaded.
In C# i always wondered why they couldn't hide the async/await logic for most cases. I never need to fire off two IO futures at the same time, so just make the thread do other stuff if i'm waiting for IO feedback, don't make me type out async/await in all impacted functions, let the compiler figure out when it can process other stuff
the use of async and await is a design decision that requires knowledge of the program's logic and desired behavior. It's not just a matter of compiler optimization, and that's why the compiler can't automatically figure out where to use these keywords.
Suppose we have a service where users place orders, and we need to:
1. Save the order.
2. Deduct items from inventory.
3. Send a confirmation email.
If we perform these operations asynchronously but sequentially using `await`:
This could lead to issues like sending the confirmation email before the order is saved, showing the importance of `await`. The compiler cannot optimize this without understanding the business logic.
You have it backward. The compiler should implicitly add the awaits for waitable objects, unless an operation is explicitly async.
So you would write (in pseudocode):
And the compiler will implicitly await all three operations (and ideally infer that your function is async).
If you want to overlap computation, you avoid the implicit wait with async:
Task PlaceOrders(Order order1, Order order2) {
let done1 = async PlaceOrder(order1); // async prevents implicit waiting
let done2 = async PlaceOrder(order2);
wait_all(done1, done2); // wait_all is also async and implicitly awaited. Ideally this should happen automatically for all unwaited and not returned futures at end of scope
}
This allows being polymorphic on the async-ness of the function (pardon the pseudo c++):
template<Range R, callable<R::value_type> F >
void for_each(R range, F f) {
for (auto x : range) f(x); // f(x) is awaited if f is async and for_each itself becomes async.
}
edit: sometimes it is important that no preemption happens in a region [1], so some scoped marker (atomic { ... } for example) would case a compilation error if an await would be introduced automatically.
edit2: and of course you should be able to use async even if the called function is a boring old blocking one. The runtime can spawn background task (or better yet use work-stealing) to run it.
[1] personally I think that atomicity guarantees should be about data, not code, but whatever.
I appreciate the original comment by ikekkdcjkfke, and your elaboration, gpderetta. However, I perceive a distinction between compiler optimization and proposing a fundamentally different model for handling asynchronicity. It seems to me that what you're suggesting deviates significantly from the existing C# model. Transitioning to this model wouldn't be a simple matter of compiler optimization—it would be a major breaking change that would require rethinking many aspects of the language.
Please don't misunderstand me; I'm not dismissing your proposal outright. However, it's essential to consider the potential trade-offs such as control flow management and error handling strategies. These are currently well-addressed by the async mechanism in C#.
Regarding polymorphic async-ness, I acknowledge this as a minor limitation within the current C# model. However, a common practice involves returning Task or Task<T>, even when no async calls are involved. Moreover, I find Kotlin's approach to handling this through inlining of suspend functions quite intriguing. You can check it out here: https://kotlinlang.org/docs/kotlin-tips.html#the-suspend-and...
In closing, it would be beneficial to our discussion if we could clarify whether we're contemplating an optimization within the current C# framework or proposing a fundamentally different approach to asynchronicity.
I'm not suggesting that C# makes a change, it is probably too late (although you could easily implement both models), I was just describing my preferred semantics (well, I prefer stackfull coroutines, but that's another story). I don't think it deviates much from the existing semantics, the minimum change is adding awaits automatically for any call to awaitable functions and requiring async annotations otherwise. This is a relatively minor change.
I don't think that the async/sync division is a minor thing, but thanks for the Kotlin reference, I'll take a look. Before watching the video I guess that they implement 'stackfull-like' behaviour in otherwise stackless coroutines by force-inlining any HOF so that the coroutine is again flattened. If that's the case, that's great! What happens if inlining can't happen? They reject the code or convert to stackfull coroutines? That's for me as always been the holy grail: stackfull semantics that optimize to stackless (i.e. bounded stack usage) when possible (i.e. when the compler can see all possible yield points).
I have been trying to figure out (in C++) the subset of the language such as the optimization is always guaranteed (at the very least you need first class and explicit continuations so that the compiler can track them and inlining must be possible). Possibly Kotlin has cracked it.
To clarify what I was trying to argue for, if one is using the .net async Task system only for the purpose to help the thread pool (always immediately awaiting an async method) then the responsibility should bear more on the .net framework, instead of spilling async/await Task<> into code bases. I don't know what a solution looks like, but maybe just threads are too expensive in .net
I was thinking more in terms of making good old 'blocking' calls and that the compiler can 'taskify' those old blocking calls to let the thread do some other task instead of waiting on the 'blocking'
While the compiler could figure out where to insert awaits automatically, it adds cognitive overhead for the developer -- suddenly the same way of calling a function can result in either a [future/task/promise/...] or the expected type.
Then you should annotate the tricky diverging call site with async (see my other comment), instead of making it implicit and requiring annotation of the common expected case:
I just don't run into many cases where I'm not immediately awaiting a task, so instead of littering the code base with async await and their return types all the way up, there could be a lighweight version that free's the thread up when waiting on a blocking call (I'm assuming the main reason we want to use async in C# is to free the thread up to do other tasks).
Yeah, it is old. But I only found it recently and did not see a discussion of it yet.
I found this quite interesting. I love rust, but async rust always seems almost like a different language.
Lifetimes become much more complex and much less useful, you use Box and Arc way too much. It seems like you would be better off if everything was heap allocated.
Working with local futures brings back some of the things I love about rust, but it is certainly not prominently featured in libraries like tokio and the entire async ecosystem.
I'm aware that Rust's threading has higher overhead because they're system threads, but what about green/user threads? Is there intrinsic overhead to userspace threads that async doesn't have? I think every "task"/thread needs it's own callstack space, and you're paying scheduling overhead no matter what. Is there literature on the topic?
The short answer is that an async function in rust (and C++ and C#) is rewritten by the compiler to become a state machine (a struct + an enum to discriminate the current state + an union store local variables). You could do it by hand, though of course it's tedious and error prone. Hence why almost nobody writes software like that.
For the compiler, the state machine code is amenable to optimizations. You can find examples online where a chain of async function is optimized to a single call in the main.
In contrast green thread and the like are very similar to a kernel thread. The difference is they are lighter because there is less bagage to carry around. But composing green threads require an arguably light context switch.
As far as I understand, green/user threads were on the table in rust team initially, but were sorted out because having green threads also means there is a runtime managing them which rust tries to avoid.
What I do not understand is where the principal difference with async lies: it has no runtime, but you have to bring your own for it to work anyway in form of e.g. tokio. What rust goal conflicts with a similar solution for green threads?
The article is specifically complaining about how the design of Rust's async ecosystem forces one to use wasteful Arc<Mutex<T>> etc in places where the data has no reason to move across threads.
Meanwhile, a greenthread system can happily use local state without atomics.
I personally want Tokio and Glommio to have a baby that inherits good parts of both parents.
When I was between jobs, I decided to learn rust by writing some async code. I really got stung and spent days doing things that would take minutes in C# or Node.
Part of the problem is that, in Rust, stack memory is easier to use than heap memory. When writing traditional threaded code, this isn't much of an issue because naturally most of our code is working with values on the stack.
BUT: When we look more closely in how async works in C# or Javascript, the compiler, under the hood, breaks up an async method into multiple methods and puts values that appear to be on the stack onto the heap. (Of course, I'm over simplifying.) It just works, and it just works well.
But, in Rust, making something async implicitly moves what appears to be on the stack onto the heap. It can quickly become hard to reason about.
I wish I knew about techniques that this article describes. Maybe it would make my code easier? In my current hobby projects, I'm doing traditional blocking IO because it's not "worth it" to write async. (In comparison, in Node, async code helps avoid nesting callbacks within callbacks, and in C#, doing IO in async is a best practice.)
async/await is ugly and hard to use and understand for me. It is pretty reasonable that rust chose it because it is a zero-cost abstraction. But i just don't like it.
This article is against multithreaded executors by default and that synchronous code is easier to read and more practical than async code. I completely understand this.
My hobby and main interest is multithreading, async, coroutines, parallelism so I love articles like this so thank you.
I am trying to design a solution that lets us have our cake and eat it to. I want multithreaded coroutines or multithreaded async executors by default. I am trying to design a server and runtime that is largely parallel, concurrent, efficient, easy to understand, easy to reason about, async and easy to read and maintain. I want:
* Tokio is not bad but I think a codebase that uses async rust requires a high level of skill, cognitive load and understanding. It's not straightforward!
* synchronous straight-line flow code is the easiest to read and follow
* promises and callbacks aren't easy to read or follow control flow
* making a single threaded program parallel after writing it is almost a complete rewrite
* work stealing thread pools solve the starvation problem, so they're good!
* IO shouldn't block CPU and CPU shouldn't block IO
* I am trying to design a syntax that is data orientated that means programs can be parallelised after writing them, so parallelisation comes for free
* we can use the LMAX Disruptor pattern for efficient cross-thread communication. I use a lockfree multiconsumer multiproducer ringbuffer in my programs.
I have an epoll-server which multiplexes clients/sockets over threads, this is more efficient than a thread-per-socket/client. I need to change it into a websocket server.
Imagine you're a search engine company and you want to index links between URLs. How would you model this with async rust and thread pools?
task download-url
for url in urls:
download(url)
task extract-links
parsed = parse(document)
return parsed
task fetch-links
for link in document.query("a")
return link
task save-data
db.save(url, link)
How would you do control flow and scheduling and parallelism and async efficiently with this code?
`db.save()`, `download()` are IO intensive whereas `document.query("a")` and `parse` is CPU intensive.
I've tried to design a multithreaded architecture that is scalable which combines lightweight threads + thread pools for work + control threads for IO epoll or liburing loops:
Rust acolytes need to come to terms with the fact that if they want Rust to be the next C++, it's going to be the next C++. Heck, even C++ async is simpler. Anyway, who needs a package manager when you've got a package manager?
Lots of languages have tools which reimplement package management specifically for their libraries, and this pattern has become fairly "normal", but the traditional norm is to access and provide libraries as system packages via the system's package manager. With that in mind, things like Cargo become less magical and more redundant with other tools like Make.
I think the idea is that being the next C++ means being just as hard to write as C++, and then saying "why do we need the next C++ when we already have C++?", but I'm curious if my interpretation is right :)
Yes, the first part is what I was going for. Actually it is even worse than that, because C++ has a more limited syntax. Rust's syntax and semantics must support more features. This fundamentally limits its accessibility.
He isn't suggesting that. He's suggesting it is the right place to start, in the same way that we normally start writing sync code with a single thread.
> “..and when you need to utilize multiple CPU cores, you just spawn multiple processes that listen on the same socket. This is a much better way of structuring servers..”
Yes it is. Spinning up another process is not multithreading, like, literally. It’s multiprocess. I never cease to be amazed with the complete lack of basic CS literacy from JS script kiddies.
> why multi-threaded task executors should be the default
many many reasons and it's subtile and complicated
For one ironically it's just way easier to use, especially for less experienced programmers. In a task system like that it's just way easier to accidentally cause major issues when it's many tasks on one thread compared to multiple threads (at least with the guard rails rust provides for threading safety). On the other hand the performance overhead of by-default multi-threaded is just fine for a ton of use-cases to a point you could argue worrying about it is premature optimization.
Through it's important to state that a lot of this comes from the ecosystem around rust and not rust itself, as stuff like `LocalSet` shows you can have a non-multi threaded runtime and there is no reasons all the libraries you might use couldn't provide non multi thread safe versions. Some do. Just many decided that avoiding the performance overhead of being thread save isn't worth the maintenance overhead and additional foot guns it can provide.
Now naturally you can say "but node/deno/etc." but they are a completely different beast then "just" being not multi threaded by default. Like e.g. they don't have multi threaded code at all. Just single threaded code communicating through serialized messages (kinda). They also e.g. handle all I/O completely separated from you application code and don't have any non serialized communication between threads etc. etc.
Interestingly if you look at the design choices of the I/O Event loop (reactor) of tokio there are some conceptual similarities. Also there AFIK there is a company which is building something similar to the node model for rust using WASM.
I mean in the end the node-style approach is grate for building servers, but rust isn't just for building servers but much more general with much more low level use cases.
Now the main point where rust could improve quite a bit is to make it much easier to write a library which work very efficient with both cases. Code which crosses thread boundaries and code which doesn't. Currently you are often split between either implementing it twice, doing terrible unusable generics tricks or using tons of `cfg` (probably generated using macros/annotations) and hope no dependency accidentally imports the multi-thread feature when you don't need it. Non of this is really viable but it's a surprisingly hard problem. Currently the best idea I can come up for it is generic modules which make the "terrible to use generics tricks" usable, but it's probably not enough by a long stretch. (Even if a solution is found it might not work for Waker, even if it does you still might want sync/send Wakers in some cases.)
Wait, async is multithreaded by default in Rust? For me the whole point of using async in JavaScript or Python (originally with Twisted's @inlineCallbacks) was to get concurrency without threads.
Imagine writing code for a computer game bot: move left, wait for enemy, attack enemy... You normally can't write it like this because it would block the rest of your program. Async allows you to go from "program sequential" to "bot sequential" for lack of better terms. If you are IO bound there is no need for threads. I often like to use one network thread and one GUI thread to keep things separate, and to prevent the occasional blocking in one to cause latency in the other. You just need to have a method to post calls to the other event loop. Works well in Python, as well with Qt-based apps.
C# on the other hand made the same "mistake" of being very general. I think you can even have a coroutine suspend on one thread and wake on another. It looks like you switch threads in the middle of a function.
I guess that is useful when you want to write performant multithreaded servers. I just want to write easy sequential code without worrying about locks or state machines.
> Wait, async is multithreaded by default in Rust? For me the whole point of using async in JavaScript or Python (originally with Twisted's @inlineCallbacks) was to get concurrency without threads.
Async in rust is nothing by default because it doesn't have a runtime. You need to use a runtime/executor for async to work, the most popular being Tokio, which can be configured to be single-threaded or multi-threaded.
EDIT: But in a way since Rust make 0 assumption about how async function are going to be executed, it does design around the fact that they might be multi-threaded.
The point being made isn't whether or not the futures are evaluated on one thread or many, but that by default, library authors assume that something may be evaluated on multiple threads which imposes some constraints on both the argument and return types from functions.
So for example, the tokio executor could run single threaded or on a thread pool. However, tokio::task::spawn takes a future as an argument that may be evaluated on a different thread, so any caller of this API and the designers of the library behind it need to guarantee the argument is thread safe.
That constraint is reflected through the type system by constraining the argument type of tokio::task::spawn<T>(fut: Future<Output = T> + impl Send + 'static) which means "fut is a type that implements Future with the output of type T and is Send (safe to send to a different thread) and does not have any lifetimes that outlive the 'static lifetime (contains no references to non-static data)".
That's just what I was trying to say. If your language potentially allows futures to resume on different threads, you pay a complexity cost. I'm not too familiar with Rust, but I think the cost is especially high here because Rust is so concerned with correctness (there is no Send trait in C++ for example).
And if you are using async only for nicer control flow, then there is no need for multithreaded executors. Just coroutines that live on a single thread and get orchestrated by an event loop (trampoline/reactor/executor).
I think you're still missing it. Rust futures can be evaluated on one thread. They don't need to be Send either. It's that library authors need to add the constraints to their APIs because they may be multithreaded.
> Wait, async is multithreaded by default in Rust?
No, whether or not async is multithreaded depends on whether or not your executor is multithreaded. Rust doesn't ship an executor by default, you need to bring your own. Tokio, the most popular general-purpose executor, is multithreaded by default, and can be configured to be single-threaded.
The whole thing just reads... ugly and inconsistent. It needs too much already-accumulated knowledge. As the article correctly points out, you need a bunch of stuff that are seemingly unrelated (and syntactically speaking you would never guess they belong together). And as other commenters pointed out, you need to scan a lot of docs -- many useful Tokio tools are not just not promoted, they are outright difficult to find at all.
Now don't get me wrong, I worked on projects where a single Rust k8s node was ingesting 150k events per second. I have seen and believed and I want to use Rust more. But the async story needs the team's undivided attention for a long time at this point, I feel.
Against my own philosophy and values I find myself attracted to Golang. It has a ton of ugly gotchas and many things are opaque... and I still find that more attractive than Rust. :(
This article is a sad reminder for me -- I am kind of drifting away from Rust. I might return and laugh at myself for this comment several months down the line... but at the moment it seems that my brain prefers stuff that's quicker to grok and experiment with. Not to mention writing demos and prototypes is objectively faster.
If I had executive power in the Rust leadership I'd definitely task them with taking a good hard look at the current state of async and start making backwards-incompatible changes (backed by new major semver versions of course). Much more macros or simply better-reading APIs might be a very good start. Start making and promoting higher-order concurrency and parallelism patterns i.e. the `scoped_pool` thingy for example.