Hacker News new | past | comments | ask | show | jobs | submit login
Maybe Rust isn’t a good tool for massively concurrent, userspace software (bitbashing.io)
704 points by mrkline on Sept 8, 2023 | hide | past | favorite | 613 comments



OK, I suppose I should write to this.

As I've mentioned before, I'm writing a high performance metaverse client. Here's a demo video.[1] It's about 40,000 lines of Rust so far.

If you are doing a non-crappy metaverse, which is rare, you need to wrangle a rather excessive amount of data in near real time. In games, there's heavy optimization during game development to prevent overloading the play engine. In a metaverse, as with a web browser, you have to take what the users create and deal with it. You need 2x-3x the VRAM a comparable game would need, a few hundred megabits per second of network bandwidth to load all the assets from servers, a half dozen or so CPUs running flat out, and Vulkan to let you put data into the GPU from one thread while another thread is rendering.

So there will be some parallelism involved.

This is not like "web-scale" concurrency, which is typically a large number of mini-servers, each doing their own thing, that just happen to run in the same address space. This is different. There's a high priority render thread drawing the graphics. There's a update thread processing incoming events from the network. There are several asset loading and decompression threads, which use up more CPU time than I'd like. There are about a half dozen other threads doing various miscellaneous tasks - handling moving objects, updating levels of detail, purging caches, and such.

There's considerable locking, but no "static" data other than constants. No globals. Channels are used where appropriate to the problem. The main object tree is single ownership, and used mostly by the update thread. Its links to graphics objects are Arc reference counted, and those are updated by both the update thread and the asset loading threads. They in turn use reference counted handles into the Rend3 library, which, via WGPU and Vulkan, puts graphics content (meshes and textures) into the GPU. Rendering is a loop which just tells Rend3 "Go", over and over.

This works out quite well in Rust. If I had to do this in C++, I'd be fighting crashes all the time. There's a reason most of the highly publicized failed metaverse projects didn't reach this level of concurrency. In Rust, I have about one memory related crash per year, and it's always been in someone else's "unsafe" code. My own code has no "unsafe", and I have "unsafe" locked out to prevent it from creeping in. The normal development process is that it's hard to get things to compile, and then it Just Works. That's great! I hate using a debugger, especially on concurrent programs. Yes, sometimes you can get stuck for a day, trying to express something within the ownership rules. Beats debugging.

I have my complaints about Rust. The main ones are:

- Rust is race condition free, but not deadlock free. It needs a static deadlock analyzer, one that tracks through the call chain and finds that lock A is locked before lock B on path X, while lock B is locked before path A on path Y. Deadlocks, though, tend to show up early and are solid problems, while race conditions show up randomly and are hard to diagnose.

- Async contamination. Async is all wrong when there's considerable compute-bound work, and incompatible with threads running at multiple priorities. It keeps creeping in. I need to contact a crate maintainer and get them to make their unused use of "reqwest" dependent on a feature, so I don't pull in Tokio. I'm not using it, but it's there.

- Single ownership with a back reference is a very common need, and it's too hard to do. I use Rc and Weak for that, but shouldn't have to. What's needed is a set of traits to manage consistent forward and back links (that's been done by others) and static analysis to eliminate the reference counts. The basic constraints are ordinary borrow checker restrictions - if you have mutable access to either parent or child, you can't have access to the other one. But you can have non-mutable access to both. If I had time, I'd go work on that.

- I've learned to live without objects, but the trait system is somewhat convoluted. There's one area of asset processing that really wants to be object oriented, and I have more duplicate code there than I like. I could probably rewrite it to use traits more, but it would take some bashing to make it fit the trait paradigm.

- The core graphics crates aren't finished. There was an article on HN a few days ago about this. "Rust has 5 games and 50 game engines". That's not a language problem, that's an ecosystem problem. Not enough people are doing non-toy graphics in Rust. Watch my video linked below.[1] Compared to a modern AAA game title, it's not that great. Compared to anything else being done in Rust (see [2]) it's near the front. This indicates a lack of serious game dev in Rust. I've been asked about this by some pro game devs. My comment is that if you have a schedule to meet, the Rust game ecosystem isn't ready. It's probably about five people working for a year from being ready.

[1] https://video.hardlimit.com/w/tp9mLAQoHaFR32YAVKVDrz

[2] https://gamedev.rs/


We've been building our robotic simulators in Rust for the past 3 years and I have the exact same experience. So far, I think, we've encountered maybe 5 actual runtime bugs over the last 3 years. Sure rust has some problems and yes the async isn't fully there yet, but overal the benefits outweigh the problems.


Async as a paradigm seems so against what GP was discussing. If I understood, and from my experience, we're talking more about concurrent execution with carefully-designed priorities, locks, and timing requirements. This is closer to the embedded / systems-level concurrency, if I understand it right. Are we really expecting a coroutine/ async style to just lift into this world?


Threads are for doing your own work in parallel. Async is for waiting on others to do their work in parallel.

Your own work would be some CPU-intensive operations you can logically divide and conquer.

Others' work would be waiting for file I/O from the OS, waiting for a DB result set following a query, waiting for a gRPC response, etc.

Conceptually quite distinct, and there are demonstrated advantages and drawbacks to each. Right tool for right job and all that.


All correct. An additional comment is that, when I was coming up, parallelism in its many forms was of the variety "I need to do this job in parallel" or "I need to handle exactly 32 concurrent workers". After web & such, it was common to just think of paralellism as "I declare this one method as returning a promise" and then "async def", which semantically is very different than managing threads. As pointed out, it's now more like "This function is basically a server for any and all uncontrolled calls from elsewhere".


This was my thoughts, async is just ONE valid approach to the ultimate problem of "do multiple things at once" it is not the end all be all of approaches


Out of curiosity, is this robotics simulator open source/available?


> Rust is race condition free, but not deadlock free. It needs a static deadlock analyzer, one that tracks through the call chain and finds that lock A is locked before lock B on path X, while lock B is locked before path A on path Y.

That sounds like a great idea. Something in the style of lockdep, that (when enabled) analyzes what locks are currently held while any other lock is taken, and reports any potential deadlocks (even if they haven't actually deadlocked).

That would require some annotation to handle cases of complex locking, so that the deadlock detection knows (for instance) that a given class of locks are always obtained in address order so they can't deadlock. But it's doable.


There's tracing-mutex that builds a dag of your locks when you acquire them and panics (at runtime) if it could deadlock: https://github.com/bertptrs/tracing-mutex

parking_lot has a deadlock detection feature for when you deadlock that iirc tells you what deadlocked (so you're not trying to figure it out with a debugger and a lot of time) https://amanieu.github.io/parking_lot/parking_lot/deadlock/i...

I also just found out about https://github.com/BurtonQin/lockbud which seems to detect deadlocks and a few other issues statically? (seems to require compiling your crate with the same version of rust as lockbud uses, which from the docs is an old 1.63 nightly build?)


Google has acquired before: https://abseil.io/docs/cpp/guides/synchronization#thread-ann...

It's quite nice, but for cpp not rust


I wonder if locks may have some thread-local registry, at least in debug builds.

If locks can be numbered or otherwise ordered, it would be easy to enforce a strict order of taking locks and an inverse strict order of releasing them, by looking up in the registry which locks your thread is currently holding. This would prevent deadlocks.

This, of course, would require to have an idea of all the locks you may want to hold, and their relative order (at least partial), as Dijkstra described back in the day. But thinking about locks is ahead of time is a good idea anyway.


I'm doing basically the same thing in Java for an MMO and the JDK makes it so easy. Just move objects via concurrent queues from network to model creation to UI threads. It's actually quite boring, and fast!


Is there video or a demo?


I don't have anything recorded from the past few years. Here's an old video:

https://youtu.be/L7XIFC2SawY?si=qN7TNxZi-P05uXVa

It's basically a custom 3D multithreaded OSM renderer, and the assets are a custom binary format. Uses very little network bandwidth.

Hoping to have an update this year that shows the updated graphics. I wrote a UI framework to improve my productivity (live hot reloading of UI components written with HTML with one way data binding. I had to do this because the game is gonna have so many UIs and I got tired of writing them in Java 8 style Java. Soon I can resume work on the game after sidewaysdata.com is doneish (also using the UI library to build the desktop/mobile timing application).


You can sign up to be notified if I ever get it done :) here https://tdworldgame.com/


Nice.

The "many UI" problem is large in Rust. Egui needs far too much Rust code per dialog box. Someone was working on a generator, but I haven't looked in on that project in a while.


I actually quite liked egui. It was Rust that felt too slow to write. Also the egui template project with eframe and no app code yet took 15 seconds for an incremental compile. The entire game so far compiles and starts faster than that in Java, so...


Non-blocking I/O is quite mature on Java, and it shows. Unfortunately Java is still a rabid devourer of memory. Its RAM consumption tends to be the biggest con whenever evaluating the pros of using Java. Sometimes it's worth it. More and more often it's not anymore.


I think the game takes a few hundred mb to run while zooming out of a city right now.

FastComments pubsub system in Java takes less than 500mb of heap for like 100k subscribers.

But yes, you have to worry about object field count.


Good luck on the metaverse app! I'd love to see more interesting metaverse takes.

One quibble though. Rust isn't race condition free, it's data race free. You can still end up with race conditions outside of data access. https://news.ycombinator.com/item?id=23599598


> Async is all wrong when there's considerable compute-bound work, and incompatible with threads running at multiple priorities

The priority thing is relatively easy to fix:

Either create multiple thread pools, and route your futures to them appropriately.

Or, write your own event loop, and have it pull from more than one event queue (each with a different priority).

It should be even easier than that, but I don’t know of a crate that does the above out of the box.

One advantage of the second approach is (if your task runtime is bounded) that you can have soft realtime guarantees for high priority stuff even when you are making progress on low priority stuff and running at 100% CPU.


This doesn't help with priority inversions; since you don't know who is waiting on a future/promise until it starts waiting on it, you can't resolve them until then, which means you can have work running at too low a priority. It's not structured enough.


> Single ownership with a back reference is a very common need, and it's too hard to do.

I've been collecting a list[1] of what memory-management policies programmers actually want in their code; it is far more extensive than any particular language actually implements. Contributions are welcome!

I already had back reference on the list, but added some details. When the ownership is indirect (really common) it is difficult to automate.

One thing that always irritates me: Rust's decision to make all objects moveable really hurts it at times.

[1] https://gist.github.com/o11c/dee52f11428b3d70914c4ed5652d43f...


Yes, back-linked objects are probably going to have to be pinned.


Cheering for your metaverse app. Hope to hear more about it. I suspected you might be doing gamedev but this is the first time you’ve shown extensive work.

One challenge with rust is that (for better or worse) most gamedev talent is C++. If you ever open source it I’d be interested in contributing, though I’m not sure how effective the contributions would be.

Good luck!


Email sent.

I'm not that interested in self-promotion here as I am in getting more activity on Rust graphics development. I think the Rust core graphics ecosystem needs about five good graphics people for a year to get unstuck. Rust is a good language for this sort of thing, but you've got to have reliable heavy machinery down in the graphics engine room.

Until that exists, nobody can bet a project with a schedule and a budget on Rust. The only successful commercial high-detail game title I know of that uses Rust is a sailing race simulator. They simply linked directly to "good old DX11" (Microsoft Direct-X 11) and wrote the game logic in Rust. Bypassed Rust's own 3D ecosystems completely.


Is it the one by the same guy who made the gold-standard moddable racing simulator?


It's "Hydrofoil Generation".[1] The only game on the "Released" page of the Rust gaming group that looks post-2000.

[1] https://arewegameyet.rs/games/released/


He co-founded and was the lead dev of Kunos Simulazioni, which made Assetto Corsa (https://store.steampowered.com/app/244210/Assetto_Corsa/).

I miss his Twitch streams! https://www.twitch.tv/kunosstefano


Any pointers on what exactly is missing?

I am neither a Rust guy or a graphics guy, but I have some interest in what is missing in the ecosystem.


> Any pointers on what exactly is missing?

Yes. [1]

[1] https://www.reddit.com/r/rust_gamedev/comments/13qt6rq/were_...


    "I've learned to live without objects, but the trait system is somewhat convoluted. There's one area of asset processing that really wants to be object oriented, and I have more duplicate code there than I like. I could probably rewrite it to use traits more, but it would take some bashing to make it fit the trait paradigm."
Can you expand on this? I come from the C# world and the Rust trait system feels expressive enough to implement the good parts of OOP.


I understand this not as objects are missing, after all, struct with methods and traits are objects aren't they? But more like the lack of hierarchical inheritance, that is most often used in OOP to conveniently share common code with added specialization. Override only the methods you want. You can do it with Traits of course, but it's much more verbose. You can technically use the defer trait to simulate a sort of method inheritance, but it is frowned upon as it should be reserved for smart-pointer like objects (so the doc says).


That's about what I was going to say. Traits have no data of their own. If you need that, you have to construct it, with a data object in each trait instance and access functions for it. It turns the notion of inheritance inside out. Awkward enough that it's only done if absolutely necessary.


I'm from the C# world and am working through learning Rust... in C# we've largely moved away from using inheritance. Not sure that it's a good thing but "best practise" results in serialisation being implemented differently (serialisers which use attributes, or for more advanced teams serialisation wired in at compile time targeted by attributes - advantage here being that the state doesn't have to be public).


I still use inheritance in C# although it is only used for is-a relationships and those aren't that common. But when you need it for that, it's usually pretty important.

I also think it's much more common to see it in library / framework code and not in application code.


UI in Rust without inheritance is tricky. There's still no great UI framework written in Rust yet, though not for lack of trying! I'm interested to see how Bevy's UI turns out. They're currently exploring the design space and requirements for production-grade UI, actually.


Wouldn’t something akin to swift UI work well in this situation? I can understand that not having a “component” class to inherit would make building custom components difficult, but if most layout and skinning can be accomplish via functions then you can sidestep the issue for most cases I think…


I think people are looking to SwiftUI for inspiration. It'll still take some time to build and evaluate these solutions.


It's "prefer composition over inheritance" though, not "never use inheritance".

There is a time and a place for it.


Which, as I said, results in you *usually* using composition :)


Does delegate support delegating everything (except what you're specifically implementing for your own struct) yet? That's the way to do it.


> Async contamination

I've always wondered why the "color" of a function can't be a property of its call site instead of its definition. That would completely solve this problem - you declare your functions once, colorlessly, and then can invoke them as async anywhere you want.


> I've always wondered why the "color" of a function can't be a property of its call site instead of its definition. That would completely solve this problem - you declare your functions once, colorlessly, and then can invoke them as async anywhere you want.

If you have a non-joke type system (which is to say, Haskell or Scala) you can. I do it all the time. But you need HKT and in Rust each baby step towards that is an RFC buried under a mountain of discussion.


You can do it without HKTs with an effects system, which you can think of as another kind of generics that causes the function to be sliced in different ways depending on how it's called. There is movement in Rust to try to do this, but I wish it was done before async was implemented considering async could be implemented within it...


The rust guys are working on this very problem with the keyword generics proposal https://blog.rust-lang.org/inside-rust/2022/07/27/keyword-ge...


If a function calls something that does something async, that can't be evaluated synchronously due to 1) no setup; could be async IO and require being called in the context of an async runtime (library feature, not language feature) and 2) blocking synchronously on an async task in an async runtime can result in deadlocks from task waiting on runtime IO polling but the waiting preventing the runtime from being polled.


> could be async IO and require being called in the context of an async runtime

The compiler already has knowledge that a function is being called as async - what prevents it from ensuring that a runtime is present when it does?

> blocking synchronously on an async task in an async runtime can result in deadlocks from task waiting on runtime IO polling but the waiting preventing the runtime from being polled

What prevents the runtime from preempting a task?


> what prevents it from ensuring that a runtime is present when it does?

The runtime being a library instead of a language/compiler level feature. Custom runtimes is necessary for systems languages as they can have specialized constraints.

EDIT: Note that it's the presence of a supported runtime for the async operation (e.g. it relies on runtime-specific state like non-blocking IO, timers, priorities, etc.), not only the presence of any runtime.

> What prevents the runtime from preempting a task?

Memory efficient runtimes use stackless coroutines (think state machines) instead of stackful (think green threads / fibers). The latter comes with inefficiencies like trying to guess stack sizes and growing them on demand (either fixing pointers to them elsewhere or implementing a GC) so it's not always desirable.

To preempt the OS thread of a stackful coroutine (i.e. to catch synchronously blocking on something) you need to have a way to save its stack/registers in addition to its normal state machine context which is the worst of both worlds: double the state + the pointer stability issues from before.

This is why most stackful coroutine runtimes are cooperatively scheduled instead, requiring blocking opportunities to be annotated so the runtime can workaround that to still make progress.


> _green thread inefficiencies_

Ron Pressler (@pron) from Loom @ Java had an interesting talk on the Java Language Summit just recently, talking about Loom’s solution to the stack copying: https://youtu.be/6nRS6UiN7X0


Thank you for your explanation of the trade-space around preemptible coroutines, that greatly helped my understanding. I am still unclear on one thing:

> The runtime being a library instead of a language/compiler level feature. Custom runtimes is necessary for systems languages as they can have specialized constraints.

Compilers link against dynamic libraries all the time. What prevents the compiler from linking against a hypothetical libasync.so just like any other library? (alternatively, if you want to decouple your program from a particular async runtime, what prevents the language from defining a generic interface that async runtimes must implement, and then linking against that?)


This would imply a single/global runtime along with an unrealistic API surface;

For 1) It's common enough to have multiple runtimes in the same process, each setup possibly differently and running independently of each other. Often known as a "thread-per-core" architecture, this is the scheme used in apps focused on high IO perf like nginx, glommio, actix, etc.

For 2) runtime (libasync.so) implementations would have to cover a lot of aspects they may not need (async compute-focused runtimes like bevy don't need timers, priorities, or even IO) and expose a restrictive API (what's a good generic model for a runtime IO interface? something like io_uring, dpdk, or epoll? what about userspace networking as seen in seastar?). A pluggable runtime mainly works when the language has a smaller scope than "systems programming" like Ponylang or Golang.

As a side note; Rust tries to decouple the scheduling of Futures/tasks using its concept of Waker. This enables async implementations which only concern themselves with scheduling like synchronization primitives or basic sequencers/orchestration to be runtime-agnostic.


I did some reading up on this, and found more detail about the "unrealistic API" surface (e.g. [1]), and I think I understand the problem at least as a surface level (and agree with the conclusions of the Rust team).

So then to tie this back to my earlier question - why does this make a difference between "async declared at function definition site" vs "async declared at function call site"?

Libraries have to be written against a specific async API (tokio vs async-std, to reference the linked Reddit thread) - that makes sense. But that doesn't change regardless of whether your code looks like `async fn foo() {...}` or `async foo();`. The compiler has ahead-of-time knowledge of both cases, as well...

[1] https://old.reddit.com/r/rust/comments/f10tcq/confusion_with...


I most runtimes you can just call something like `block_on`. There are some things to be careful about to avoid starving other takes but most general-purpose runtimes will spawn more threads as needed. Similarly blocking in an asynx task is generally not much of an issue for these runtimes for the same reasons.

It isn't like JavaScript where there is truly only one thread of execution at a time and blocking it will block everything.


`std::thread::spawn()` and `.join()` are the ultimate async implementation.


>The normal development process is that it's hard to get things to compile, and then it Just Works. That's great! I hate using a debugger, especially on concurrent programs. Yes, sometimes you can get stuck for a day, trying to express something within the ownership rules. Beats debugging.

This is a far superior workflow when you factor in outcomes. More up front time to get a "correct"/more-reliable output scales infinitely better than than churning out crap that you need to wrap in 10,000 lines of tests to keep from breaking/validate (See: the dumpster-fire that is Rails)


> This is a far superior workflow when you factor in outcomes.

I’m a strong-typing enthousiast, too, but still, I’m not fully convinced that’s true.

It seems you can’t iterate fast at all in Rust because the code wouldn’t compile, but can iterate fast in C++, except for the fact that the resulting code may be/often is unstable.

If you need to try out a lot of things before finding the right solution, the ability to iterate fast may be worth those crashes.

Maybe, using C++ for fast iterations, and only using various tools to hunt down issues the borrow checker would catch on the iteration you want to keep beats using Rust.

Or do Rust programmers iterate fast using unsafe where needed and then fix things once they’ve settled on a design?


> It seems you can’t iterate fast at all in Rust because the code wouldn’t compile

Yup, this is correct - and the reason is because Rust forces you to care about efficiency concerns (lifetimes) everywhere. There's no option to "turn the borrow checker off" - which means that when you're in prototyping mode, you pay this huge productivity penalty for no benefit.

A language that was designed to be good at iteration would allow you to temporarily turn the borrow checker off, punch holes in your type system (e.g. with Python's "Any"), and manage memory for you - and then let you turn those features on again when you're starting to optimize and debug. (plus, an interactive shell and fast compilation times - that's non-negotiable) Rust was never designed to be good at prototyping.

I heard a saying a few years ago that I like - "it's designed to make hardware [rigid, inflexible programs], not software". (it's from Steve Yegge - I could track it down if I cared)


>There's no option to "turn the borrow checker off" - which means that when you're in prototyping mode, you pay this huge productivity penalty for no benefit

That’s not really true. The standard workaround for this is just to .clone() or Rc<RefCell<>> to unblock yourself, then come back later and fix it.

It is true that this needs to be done with some care otherwise you can end up infecting your whole codebase with the “workaround”. That comes with experience.


> It is true that this needs to be done with some care otherwise you can end up infecting your whole codebase with the “workaround”

It's a "workaround" precisely because the language does not support it. My statement is correct - you cannot turn the borrow-checker off, and you pay a significant productivity penalty for no benefit. "Rc" can't detect cycles. ".clone()" doesn't work for complex data structures.


You can’t turn off undefined behavior in C++ either. Lifetimes exist whether the language acknowledges them or not.

Except if you go to a GC language, but then you’re prototyping other types of stuff than you’d probably pick Rust for.


You can use unsafe if you really want to "turn the borrow-checker off", no?


No, because that doesn't give you automatic memory management, which is the point. When I'm prototyping, there's zero reason for me to care about lifetimes - I just want to allocate an object and let the runtime handle it. When you mark everything in your codebase unsafe (a laborious and unnecessary process that then has to be laboriously undone), you still have to ask the Rust runtime for dynamic memory manually, and then track the lifetimes in your head.


If you're saying you want GC/Arc then that's more than just "turning off the borrow checker".


Pedantry. Later on in my comment I literally say "manage memory for you" - it should be pretty clear that my intent was to talk about a hypothetical language that allowed you to change between use of a borrow checker and managed memory, even if I didn't use the correct wording ("turn off the borrow checker") in that particular very small section of it.


Bit much to complain about pedantry with how prickly your tone has been in this whole thread. If you only want this functionality for rapid iteration/prototyping, which was what you originally said, then leaking memory in those circumstances is not such a problem.


You're right, I have been overly aggressive. I apologize.

> If you only want this functionality for rapid iteration/prototyping, which was what you originally said, then leaking memory in those circumstances is not such a problem.

There's use-cases for wanting your language to be productive outside of prototyping, such as scripting (which I explicitly mentioned earlier in this thread[1] - omission here was not intentional), and quickly setting up tools (such as long-running web services) that don't need to be fast, but should not leak memory.

"Use Rust, but turn the borrow checker off" is inadequate.

[1] https://news.ycombinator.com/item?id=37441120


Yeah, I do think the space where manual memory management is actually desirable is pretty narrow - and so I'm kind of baffled that Rust caught on where the likes of OCaml didn't. But seemingly there's demand for it. (Either that, or programming is a dumb pop culture that elevates performance microbenchmarks beyond all reason)


> There's no option to "turn the borrow checker off" - which means that when you're in prototyping mode, you pay this huge productivity penalty for no benefit.

Frankly I think this is a good thing! And I disagree with your "no benefit" assertion.

I don't like prototyping. Or rather, I don't like to characterize any phase of development as prototyping. In my experience it's very rare that the prototype actually gets thrown away and rewritten "the right way". And if and when it does happen, it happens years after the prototype has been running in production and there's a big scramble to rewrite it because the team has hit some sort of hard limit on fixing bugs or adding features that they can't overcome within the constraints of the prototype.

So I never prototype. I do things as "correctly" as possible from the get-go. And you know what? It doesn't really slow me down all that much. I personally don't want to be in the kind of markets where I can't add 10-20% onto a project schedule without failing. And I suspect markets where those sorts of time constraints matter are much rarer than most people tell themselves.

(And also consider that most projects are late anyway. I'd rather be late because I was spending more time to write better, safer code, than because I was frantically debugging issues in my prototype-quality code.)


The dev cycle is slower, yes, but once it compiles, there is no debug cycle.


Wildly false. Rust's design does virtually nothing to prevent logic errors.


I have found someone that never introduces logic errors, and found out a way to use dependent types in Rust. /s


95% of my "logic errors" are related to surprise nulls (causing a data leak of sensitive data) or surprise mutability. The idea that there is no debug cycle is ridiculous but I am confident that there will be less of them in Rust.


I bet it won't survive a pentest attack, and there are more things missing on program expectations than only nullability.

On the type system theory, Rust still has quite something to catch up to theorem provers, which even those aren't without issues.


So then tests are optional?

Most bugs are elementary logic bugs expressible in every programming language.


> So then tests are optional?

Yes and no. You're gonna write far fewer tests in a language like Rust than in a language like Python. In Python you'll have to write tests to eliminate the possibility of bugs that the Rust compiler can eliminate for you. I would much rather just write logic tests.

> Most bugs are elementary logic bugs expressible in every programming language.

I don't think that's true. I would expect that most bugs are around memory safety, type confusion, or concurrency issues (data races and other race conditions).


Python is not a language I would consider to be meaningfully comparable to Rust. They have very different use cases.

In modern C++, memory safety and type confusion aren’t common sources of bugs in my experience. The standard idiomatic design patterns virtually guarantee this. The kinds of concurrency issues that tend to cause bugs can happen in any language, including Rust. Modern C++, for all its deficiencies, has an excellent type safety story, sometimes better than Rust. It doesn’t require the language to provide it though, which is both a blessing and a curse.


I think mostly you just need less iteration with Rust because the language seems to guide you towards nice, clean solutions once you learn not to fight the borrow checker.

Rust programmers don't iterate using unsafe because every single line of unsafe gives you more to think and worry about, not less. But they might iterate using more copying/cloning/state-sharing-with-ref-counting-and-RefCell than necessary, and clean up the ownership graph later if needed.


> I think mostly you just need less iteration with Rust because the language seems to guide you towards nice, clean solutions once you learn not to fight the borrow checker.

That's not iteration. That's debugging. "Iteration" includes design work. Rust's requirement to consider memory management and lifetimes actively interferes with design work with effectively zero contributions towards functional correctness (unlike types, which actually help you write less buggy code - but Rust's type system is not unique and is massively inferior to the likes of Haskell and Idris), let alone creating things.


> Rust's requirement to consider memory management and lifetimes actively interferes with design work with effectively zero contributions towards functional correctness

I don't really agree with that. If you've decided on a design in Rust where you're constantly fighting with lifetimes (for example), that's a sign that you may have designed your data ownership wrong. And while it's not going to be the case all the time, it's possible that a similar design in another language would also be "wrong", but in ways that you don't find out until much later (when it's much harder to change).

> Rust's type system is not unique and is massively inferior to the likes of Haskell and Idris

Sure, but few people use Haskell or Idris in the real world for actual production code. Most companies would laugh me out of an interview if I told them I wanted to introduce Haskell or Idris into their production code base. That doesn't invalidate the fact that they have better type systems than Rust, but a language I can't/won't use for most things in most places isn't particularly useful to me.


No, I was not talking about debugging or correctness. The point was that Rust does not merely guide you towards correct code, it tends to guide you towards good code.


The point of iteration is not typically to find the best implementation for a given algorithm, it's to find the best algorithm for solving a given problem.

I can see the argument that Rust encourages by its design a clean implementation to any given algorithm. But no language's design can guide you to finding a good algorithm for solving a given problem - you often need to quickly try out many different algorithms and see which works best for your constraints.


I don't think iteration is usually about testing different algorithms – it's much more about finding out what the problem is in the first place (that is, attaining a good enough understanding of the problem to solve it), and secondarily about finding out a solution to the problem that satisfies any relevant boundary conditions.


In Rust you have the option to Box and/or Rc everything. That gets you out of all the borrowing problems at the cost of rubtime performance (it basically puts you in the C++ world). This is a perfectly reasonable way to program but people forget about it due to the more "purist" approach that's available. But it's a good way to go for iteration and simplicity, and (in my opinion) still miles better than C++, due to the traits, pattern matching, error handling, and tooling.


I tend to agree, but pro game dev is a hell where people demand that a new feature be demoed for the producer by 1 PM tomorrow. I have the luxury of not being under such pressure.


> Yes, sometimes you can get stuck for a day, trying to express something within the ownership rules.

This is a big problem. Fast iteration time is very valuable.

And who likes doing this to themselves anyway? Isn't it a very frustrating experience? How is this the most loved language?


> And who likes doing this to themselves anyway? Isn't it a very frustrating experience? How is this the most loved language?

The thing is, these dependencies do exist no matter what language you use if they stem from an underlying concept. In that case rust just makes you explicitly write them which is a good thing since in C++ all these dependencies would be more or less implicit and everytime somebody edits the code he needs to think all these cases through and get a mental model (if he sees it at all!). In Rust you at least have the lifetime annotations which make it A: obvious there is some special dependency going on and B: show the explicit lifetimes etc.

So what I'm saying, you need to put in this work no matter which language you choose, writing it down is then not a big problem anymore. If you don't think about these rules your program will probably work most of the time but only most of the time, and that can be very bad for certain scenarios.


> So what I'm saying, you need to put in this work no matter which language you choose

This is very false. Managed-memory languages don't require you to even think about lifetimes, let alone write them down.

Yes, I understand that this is for efficiency - but claiming that you have to think about lifetimes everywhere is just wrong, and irrelevant when discussing topics (prototyping/design work/scripting) where you don't care about efficiency.


Lifetimes are still important in managed languages. You just have to track them in your head, which is fallible. The difference is that if you get it wrong in a managed language, you get leaks or stale objects or other logic bugs. In rust you get compile time errors.


While this is correct, it's still much easier to think about lifetimes in managed languages. The huge majority of allocated objects gets garbage-collected after a very short time, when they leave the context (similar to RAII).

Mostly you need to think about large and/or important objects, and avoid cycles, and avoid unneeded references to such objects that would live for too long. Such cases are few.

The silver lining is that if you make a mistake and a large object would have to live slightly longer, you won't have to wrangle with the lifetime checker for that small sliver of lifetime. But if you make a big mistake, nothing will warn you about a memory leak, before the prod monitoring does.


> The huge majority of allocated objects gets garbage-collected after a very short time, when they leave the context (similar to RAII).

Those objects are also virtually no problem in languages like Rust or C++. Those are local objects whose lifetimes are trivial and they are managed automatically with no additional effort from the developer.


> The difference is that if you get it wrong in a managed language, you get leaks or stale objects or other logic bugs.

Can you provide concrete examples of this? I've literally never had a bug due to the nature of a memory-managed language.


Once upon a time (at least through IE7) Internet Explorer had separate memory managers for javascript and the DOM. If there was a cycle between a JS object and a DOM object (a DOM node is assigned as a property of an object, and another property was assigned as an event handler to the DOM node) then IE couldn't reclaim the memory.

Developers of anything resembling complex scripts (for the time) had to manually break these cycles by setting to null the attributes of the DOM node that had references to any JS objects.

Douglas Crockford has a little writeup here[0] with a heavy-handed solution, but it was better than doing it by hand if you were worried another developer would come along and add something and forget to remove it.

Other memory managed languages also have to deal with the occasional sharp corners. Most of the time, this can be avoided by knowing to clean up resources properly, but some are easier to fall for than others.

Oracle has a write up on hunting Java memory leaks [1] Microsoft has a similar, but less detailed article here[2]

Of course, sometimes a "leak" is really a feature. One notorious example is variable shadowing in the bad old days of JS prior to the advent of strict mode. I forget the name of the company, but someone's launch was ruined because a variable referencing a shopping cart wasn't declared with `var` and was treated as a global variable, causing concurrent viewers to accidentally get other user's shopping cart data as node runs in a single main thread, and concurrency was handled only by node's event loop.

[0] https://www.crockford.com/javascript/memory/leak.html

[1] https://docs.oracle.com/en/java/javase/17/troubleshoot/troub...

[2] https://learn.microsoft.com/en-us/dotnet/core/diagnostics/de...


My question was about the nature of a memory-managed language causing "leaks or stale objects or other logic bugs". This issue is not that - this is due to a buggy implementation causing memory leaks.

To be more precise: this is a bug, that was fixable, in the runtime, not in user applications that would run on top of it.

Assume a well-designed memory-safe language and implementation. What kinds of memory hazards are there?


> separate memory managers

Notwithstanding the rest of your comment, this doesn't seem like a good example of the problem, since most GCs have a complete view of their memory.


CMEs in Java are a constant thorn to many Java programmers as a lifetime violation bug. Hell even NPEs are too for that matter, lol.


You can certainly get memory abandonment, which is like a leak but for memory that's still referenced and is just never going to be used again.


I will note that in GC literature at least, that is still considered a leak.

In an ideal world, we could have a GC that reclaimed all unused memory, but that turns out to be impossible because of the halting problem. So, we settle for GCs that reclaim only unreachable memory, which is a strict subset of unused memory. Unused reachable memory is a leak.


> Managed-memory languages don't require you to even think about lifetimes, let alone write them down.

Memory is only one of many types of resources applications use. Memory-managed languages do nothing to help you with those resources, and effectively managing those resources is way harder in those languages than in Rust or C++.


What? Rust doesn't do anything to "help you with those resources", either - you can still create cycles in ARC objects or allocate huge amounts of memory and then forget about it.

In both languages you have to rely on careful design, and then profile memory use and manage it.

However, Rust requires you to additionally reason about lifetimes explicitly. Again - great for performance, terrible for design, prototyping, and tools in non-resource-constrained environments*.


> The thing is, these dependencies do exist no matter what language you use

Sure, but in a lot of cases, these invariants can be trivially explained, or intuitive enough that it wouldn't even need explanation. While in Rust, you can easily spend a full day just explaining it to the compiler.

I remember spending litteral _days_ tweaking intricate lifetimes and scopes just to promise Rust that some variables won't be used _after_ a thread finishes.

Some things I even never managed to be able to express in Rust, even if trivial in C, so I just rely on having a C core library for the hot path, and use it from Rust.

Overall, performance sensitive lifetime and memory management in Rust (especially in multithreaded contexts) often comes down to:

1) Do it in _sane_ Rust, and copy everything all over the place, use fancy smart pointers, etc.

2) Do it in a performant manner, without useless copies, without over the top memory management, but prepare a week of frustrating development and a PhD in Rust idiosyncrasies.


>use fancy smart pointers, etc.

The thing is, you think your code is safe and it most likely is, but mathematically speaking, what you are doing is difficult or even impossible to prove correct. It is akin to running an NP complete algorithm on a problem that is easier than NP. Most practical problem instances are easy to solve, but the worst case which can't be ruled out is utterly, utterly terrible, which forces you to use a more general solution than is actually necessary.


Don't let perfect be the enemy of good.

Since smart pointers because ubiquitous in c++, I've (personally) had only a handful of memory and lifetime issues. They were all deduceable by looking at where we "escape hatched" and stored a raw ptr that was actually a unique pointer, or something similar. I'll take having one of those every 18 months over throwing away my entire language, toolchain,ecosystem and iteration times.


I don't think it's a matter of putting one versus the other.

If you can get away with smart pointers and such, life is beautiful, nothing wrong there!

The debate here is rather for the cases where you cannot afford such things.


> Some things I even never managed to be able to express in Rust, even if trivial in C, so I just rely on having a C core library for the hot path, and use it from Rust.

i can’t think of anything you can do in c that you can’t do in unsafe rust, and that has the advantage that you can both narrow it down to exactly where you need it and only there, and your can test it in miri to find bugs


To be fair, unsafe Rust has an entirely new set of idiosyncrasies that you have to learn for your code not to cause UB. Most of them revolve around the many ways in which using references can invalidate raw pointers, and using raw pointers can invalidate references, something that simply doesn't exist in C apart from the rarely-used restrict qualifier.

(In particular, it's very easy to inadvertently trigger the footgun of converting a pointer to a reference, then back to a pointer, so that using the original pointer again can invalidate the new pointer.)

Extremely pointer-heavy code is entirely possible in unsafe Rust, but often it's far more difficult to correctly express what you want compared to C. With that in mind, a tightly-scoped core library in C can make a lot of sense; more lines of unsafe code in either language leave more room for bugs to slip in.


> i can’t think of anything you can do in c that you can’t do in unsafe rust

That is not my point.

There is a world between "you can do it" and "you will do it".

Some things in Rust are doable in theory, but end up being so insane to implement that you won't do it in practice. That is my point.


> How is this the most loved language?

Personal preference and pain tolerance. Just like learning Emacs[1] - there's lots of things that programmers can prioritize, ignore, enjoy, or barely tolerate. Some people are alright with the fact that they're prototyping their code 10x more slowly than in another language because they enjoy performance optimization and seeing their code run fast, and there's nothing wrong with that. I, myself, have wasted a lot of time trying to get the types in some of my programs just right - but I enjoy it, so it's worth it, even though my productivity has decreased.

Plus, Rust seems to have pushed out the language design performance-productivity-safety efficiency frontier in the area of performance-focused languages. If you're a performance-oriented programmer used to buggy programs that take a long time to build, then a language that gives you the performance you're used to with far fewer bugs and faster development time is really cool, even if it's still very un-productive next to productivity-oriented languages (e.g. Python). If something similar happened with productivity languages, I'd get excited, too - actually, I think that's what's happening with Mojo currently (same productivity, greater performance) and I'm very interested.

[1] https://news.ycombinator.com/item?id=37438842


> even if it's still very un-productive next to productivity-oriented languages (e.g. Python).

The thing is, for many people, including me, Rust is actually a more productive language than Python or other dynamic languages. Actually writing Python was an endless source of pain for me - this was the only language where my code did not initially work as expected more times than it did. Where in Rust it works fine from the first go in 99% of cases after it compiles, which is a huge productivity boost. And quite surprisingly, even writing the code in Rust was faster for me, due to more reliable autocomplete / inline docs features of my IDE.


I think part of the problem is "developer productivity" is a poorly-defined term that means different things to different people.

To some, it means getting something minimal working and running as quickly as possible, accepting that there will be bugs, and that a comprehensive test suite will have to be written later to suss them all out.

To others (myself included), it means I don't mind so much if the first running version takes a bit longer, if that means the code is a bit more solid and probably has fewer bugs. And on top of that, I won't have to write anywhere near as many tests, because the type system and compiler will ensure that some kinds of bugs just can't happen (not all, but some!).

And I'm sure it means yet other things to other people!


> Python or other dynamic languages

I should have stated that I'm comparing Rust to typed Python (or TypeScript or typed Racket or whatever). Typed Python gives you a type system that's about a good as Rust's, and the same kinds of autocompletion and inline documentation that you would get with Rust, while also freeing you from the constraints of (1) being forced to type every variable in your program upfront, (2) being forced to manage memory, and (3) no interactive shell/REPL/Jupyter notebooks - Rust simply can't compete against that.

You're experience would likely have been very different if you were using typed Python.


> Typed Python gives you a type system that's about a good as Rust's

No, it absolutely does not.

Also consider that Python has a type system regardless of whether or not you use typing, and that type system does not change because you've put type annotations on your functions. It does allow you to validate quite a few more things before runtime, of course.


> Some people are alright with the fact that they're prototyping their code 10x more slowly than in another language because they enjoy performance optimization and seeing their code run fast, and there's nothing wrong with that.

I look at it a little differently: I'm fine with the fact that I'm prototyping my code 10x more slowly (usually the slowdown factor is nowhere near that bad, though; I'd say sub-2x is more common) than in another language because I enjoy the fact that when my code compiles successfully, I know there are a bunch of classes of bugs my code just cannot have, and this wouldn't be the case if I used the so-called "faster development" language.

I also hate writing tests; in a language like Rust, I can get away with writing far fewer tests than in a language like Python, but have similar confidence about the correctness of the code.


> Some people are alright with the fact that they're prototyping their code 10x more slowly than in another language because they enjoy performance optimization and seeing their code run fast, and there's nothing wrong with that.

Disclaimer: I've sort of bounced off of Rust 3 or so times and while I've created both long-running services in it as well as smaller tools I've basically mostly had a hard time (not enjoying it at all, feeling like I'm paying a lot in terms of development friction for very little gain, etc.) and if you're the type to write off most posts with "You just don't get it" this would probably just be one more on the pile. I would argue that I do understand the value of Rust, but I take issue with the idea that the cost is worth it in the majority of cases, and I think that there are 80% solutions that work better in practice for most cases.

From personal experience: You could be prototyping your code faster and get performance in simpler ways than dealing with the borrow checker by being able to express allocation patterns and memory usage in better, clearer ways instead and avoid both of the stated problems.

Odin (& Zig and other simpler languages) with access to these types of facilities are just an install away and are considerably easier to learn anyway. In fact, I think you could probably just learn both of them on top of what you're doing in Rust since the time investment is negligible compared to it in the long run.

With regards to the upsides in terms of writing code in a performance-aware manner:

- It's easier to look at a piece of code and confidently say it's not doing any odd or potentially bad things with regards to performance in both Odin and Zig

- Both languages emphasize custom allocators which are a great boon to both application simplicity, flexibility and performance (set up limited memory space temporarily and make sure we can never use more, set up entire arenas that can be reclaimed or reused entirely, segment your resources up in different allocators that can't possibly interfere with eachother and have their own memory space guaranteed, etc.)

- No one can use one-at-a-time constructs like RAII/`Drop` behind your back so you don't have to worry about stupid magic happening when things go out of scope that might completely ruin your cache, etc.

To borrow an argument from Rust proponents, you should be thinking about these things (allocation patterns) anyway and you're doing yourself a disservice by leaving them up to magic or just doing them wrong. If your language can't do what Odin and Zig does (pass them around, and in Odin you can inherit them from the calling scope which coupled with passing them around gives you incredible freedom) then you probably should try one where you can and where the ecosystem is based on that assumption.

My personal experience with first Zig and later Odin is that they've provided the absolute most productive experience I've ever had when it comes to the code that I had to write. I had to write more code because both ecosystems are tiny and I don't really like extra dependencies regardless. Being able to actually write your dependencies yourself but have it be such a productive experience is liberating in so many ways.

Odin is my personal winner in the race between Odin and Zig. It's a very close race but there are some key features in Odin that make it win out in the end:

- There is an implicit `context` parameter primarily used for passing around an allocator, a temp-allocator and a logger that can be implicitly used for calls if you don't specify one. This makes your code less chatty and let's you talk only about the important things in some cases. I still prefer to be explicit about allocators in most plumbing but I'll set `context.allocator` to some appropriate choice for smaller programs in `main` and let it go

- We can have proper tagged unions as errors and the language is built around it. This gives you code that looks and behaves a lot like you'll be used to with `Result` and `Option` in Rust, with the same benefits.

- Errors are just values but the last value in a multiple-value-return function is understood as the error position if needed so we avoid the `if error != nil { ... }` that would otherwise exist if the language wasn't made for this. We can instead use proper error values (that can be tagged unions) and `or_return`, i.e.:

    doing_things :: proc() ParsingError {
        parsed_data := parse_config_file(filename) or_return
        ...
    }
If we wanted to inspect the error this would instead be:

    // The zero value for a union is `nil` by default and the language understands this
    ParsingError :: union {
        UnparsableHeader,
        UnparsableBody,
    }

    UnparsableHeader :: struct {
        ...
    }

    UnparsableBody :: struct {
        ...
    }

    doing_things :: proc() {
        parsed_data, parsing_error := parse_config_file(filename)
        // `p in parsing_error` here unpacks the tag of the union
        // Notably there are no actual "constructors" like in Haskell
        // and so a type can be part of many different unions with no syntax changes
        // for checking for it.
        switch p in parsing_error {
        case UnparsableHeader:
            // In this scope we have an `UnparsableHeader`
            function_that_deals_with_unparsable_header(p)
        case UnparsableBody:
            function_that_deals_with_unparsable_body(p)
        }

        ...
    }
- ZVI or "zero-value initialization" means that all values are by default zero-initialized and have to have zero-values. The entire language and ecosystem is built around this idea and it works terrifically to allow you to actually talk only about the things that are important, once again.

P.S. If you want to make games or the like Odin has the absolute best ecosystem of any C alternative or C++ alternative out there, no contest. Largely this is because it ships with tons of game related bindings and also has language features dedicated entirely to dealing with vectors, matrices, etc., and is a joy to use for those things. I'd still put it forward as a winner with regards to most other areas but it really is an unfair race when it comes to games.


Debugging rare crashes and heisenbugs is more frustrating, and in non-safe languages, a chronic problem.

Whereas after you prove the safety of a design once, it stays with you.


It stays with you until you need to change something and find yourself unable to make incremental changes.

And in many use cases people are throwing Rust (and especially async Rust) on problems solved just fine with GC languages so the safety argument doesn’t apply there.


> change something and find yourself unable to make incremental changes

why do you believe this becomes the case with rust code?


The safety argument is actually the reason why you can use Rust in those cases to begin with. If it was C or C++ you simply couldn't use it for things like webservers due to the safety problems inherent to these languages. So Rust creeps into the part of the market that used to be exclusive to GC languages.


What do you think nginx and Apache are written in?


How few severe vulnerabilities and other major defects (memory corruption or crashes) do you think Nginx and Apache have had over the years?


Sort of. Do you want someone that doesn't understand the constraints that likely is creating a bug that will cause crashes? Or do you want to block them until they understand the constraints?


So you use a safe, garbage-collected language like Python, and iterate 5x as fast as Rust. Problem solved. It's 2023 - there are at least a dozen production-quality safe languages.


> and iterate 5x as fast as Rust.

I've been involved in Java, Python, PHP, Scala, C++, Rust, JS projects in my career. I think I'd notice a 5x speed difference in favor of Python if it existed. But I haven't.


You're probably just using Python wrong, then. You can use a Jupyter notebook to incrementally develop and run pieces of code in seconds, and this scales to larger programs. With Rust, you have to re-compile and re-run your entire application every time you want to test your changes. That's a 5x productivity benefit by itself.


You’re seriously suggesting writing a game engine in Python?


You accidentally responded to the wrong comment. I never mentioned a game engine.


This thread is about writing a game engine, so GP didn't "accidentally" respond to the wrong comment. Their question is on-topic.

If your comments aren't relevant to writing a game engine, then they're not relevant to this thread.


> This thread is about writing a game engine

This is false. This "thread" is not "about" anything. The top-level comment was about writing a game engine, and various replies to that thread deviated from that topic to a greater or lesser extent. Nobody has the authority to decide what a thread is "about".

Additionally, the actual article under consideration is about Rust's design in general. That makes my comments more on topic than one about game engines in particular, and so it should be pretty clear that if you're going to assume anything about my comments, then it would not be that they're about game engines.


It doesn't really matter, there doesn't exist a problem space where both Rust and Python are reasonable choices.

Case in point, I once wrote a program to take a 360 degree image and rotate it so that the horizon followed the horizontal line along the middle, and it faced north. I wrote it in python first and running it on a 2k image took on the order of 5 minutes. I rewrote it in rust and it took on the order of 200ms.

Could I iterate in Python faster? Yes, but the end result was useless.


> there doesn't exist a problem space where both Rust and Python are reasonable choices

This thread, and many other threads about Rust, are filled with people arguing the exact opposite - that Rust is a good, productive language for high-level application development. I agree with you, there's relatively little overlap - that's what I'm arguing for!


Both qualify for writing tiny web servers, cli/byte-manipulation scripts, server automation jobs, in-house GUI applications, and other small stuff. Could technically argue that these are a "relatively little overlap" depending on what you do though..


The “beats debugging” part I took it as meaning “it is better than spending that day debugging”.

I have fought the ownership rules and lost (replaced references by integers to a common vector-ugly stuff, but I was time constrained). But I have seen people spend several weeks debugging a single problem, and that was really soul-crushing.


I think you may be misunderstanding what GP means. It's about spending a day working on issues. You're either doing it before you launch your iteration, or you're doing it after. GP thinks it's better to spend the time before you push the change. From a quality perspective it's hard to see how anyone could disagree with that, but I can certainly see why there would be different preferences from programmers.

I don't personally mind debugging, too much, but if your goal is to avoid bugs in your running software, then Rust has some serious advantages. We mainly use TypeScript to do things, which isn't really comparable to Rust. But we do use C when we need performance, and we looked into Rust, even did a few PoCs on real world issues, and we sort of ended up in a situation similar to GP. Rust is great though a bit "verbose" to write, but its eco-system is too young to be "boring" enough for us, so we're sticking with C for the time being. But being able to avoid running into crashes by doing the work before your push your code is immensely valuable in fault-intolerant systems. Like, we do financial work with C, it cannot fail. So we're actually still doing a lot of the work up-front, and then we handle it by rigorously testing everything. Because it's mainly used for small performance enhancement, our C programs are small enough to where this isn't an issue, but it would be a nightmare to do with 40.000 lines of C code.


I agree that fast iteration time is valuable, but I don't think this has to hold 100% of the time.

I would much rather bang my head against a compiler for N hours, and then finally have something that compiles -- and thus am fairly confident works properly -- than have something that compiles and runs immediately, but then later I find I have to spend N hours (or, more likely, >N hours) debugging.

Your preferences may differ on this, and that's fine. But in the medium to long term, I find myself much more productive in a language like Rust than, say, Python.


I’m working on an unrelated project that does some stuff similarly to you. I’m at 4k lines right now.

Just wondering, how long did it take you to hit 40k lines? I’m a new Rust developer and it’s taken me ages to get this far.

I totally relate to your experience though. When I finally get my code to compile, it “just works” without crashes. I’ve never felt so confident in my code before.


> how long did it take you to hit 40k lines?

3 years.


Impressive dedication! I hope I can make it that long. The project looks cool and the technical details sound even cooler.

Thanks for the perspective.


>When I finally get my code to compile, it “just works” without crashes. I’ve never felt so confident in my code before.

This isn't a new idea for a desirable state. Same experience with Modula-2 three decades ago. A page or more of compiler errors to clear, then suddenly adiabatic. A very satisfying experience.


I don’t know what you mean by web-scale, you’d be mistaken if you meant “the multi-threaded services that power giant internet properties”.

If you want extreme low contention extreme high-utilization, you’re doing threading and event-driven simultaneously, there are no easy answers on heavily contended data structures because you can’t duplicate to exploit immutability if mere insane complexity is an alternative, and mistakes cost millions in real time.

There’s a reason why those places scout so heavily for the lock-free/wait-free galaxy brains the minute they finish their PhDs.


> There was an article on HN a few days ago about this. "Rust has 5 games and 50 game engines".

That's not a serious article. That's a humourous video.

Source: https://youtu.be/TGfQu0bQTKc?t=169


It has some truth to it, still.


Not really. To get to 50 game engines you need some creative accounting. The real joke would be 3 engines and 0 profitable games.


Out of curiosity, why not go all in on Tokio? Make everything a future, including writing to the GPU.

And are you using an ECS based architecture? Do you feel you’d have a different opinion if you were?


As a past active SecondLife user back in the day (circa 15 years ago), and a short-stunt OpenSimulator dev, I had been thinking a lot about how much better SecondLife could be if it had the modern tech absorbed - thanks for doing this! :-) I did a short return to try SL recently, and the lagginess of the viewer made me sad.

Is there a ML to subscribe to, to learn when the viewer is more generally available for testing? Thanks again!


What’s the server for a metaverse client? Is there a standardized protocol, or a particularly popular one you’re targeting?


It's a client for Second Life or Open Simulator.


Rust is not race condition free, it guarantees no data races though.


This is very interesting. How do you manage latency of events coming over the network?

Do... you... wind up having to set TCP_NODELAY?

•͡˘㇁•͡˘


Embarrassingly, yes, because I can't turn off delayed ACKs from Rust.


I learned about nagling 20 years ago when I helped write a networked game server and had to turn it off so that input packets would be sent quickly. Thank you for your response to my troll, easily a top 5 career highlight.


> If I had to do this in C++, I'd be fighting crashes all the time.

Why? I'd take modern C++ over Rust every day of the week.


> As I've mentioned before, I'm writing a high performance metaverse client.

Why? (Serious question)


Started 3 years ago during covid when metaverse looked attractive. In 3 years many of these AI applications will face the same questions.


Will they? AI has adoption already, the metaverse is still waiting for meaningful adoption. You could argue that it’s never coming.


What you've just described is basically every networked video game, the majority of which are happily running via c++.

(Plus some increase in content load over the network, which does exist ala runtime mod loading, streaming, etc)


Yes? Not architecturally different, but with fewer bugs. People are always complaining about bugs in videogames.


Looks great!

Without judgment I must ask, what made you decide to target metaverse specifically? Is it more of a fun challenge, or do you see it having a bright/popular future?


I was really bored during COVID lockdown and needed a hard problem. I may say more about metaverse stuff in another place, but don't want to derail the Rust issues.


Rust is not race condition free unless it the compiler does formal verification like Ada/Sparks?

It is data-race free however.


Props on doing this work! That being said is it just me or does the video seem to stutter?


The big WGPU project to improve concurrency wasn't finished then. All the texture loading that's requested from other threads currently goes into a work queue inside Rend3 executed by the refresh thread. Because the application is frantically loading and unloading textures at various resolutions as the camera moves, there's a stutter. There's too much texture content for it all to be in VRAM at high resolution all at once. Vulkan allows concurrent loading of data. WGPU now does. (As of this morning, unless someone finds another blocking bug.) Rend3 next. Then I'll probably have to change something in my code. That's what I mean about problems down in the graphics engine room.

This is the metaverse data overload problem - many creators, little instancing. No art director. No Q/A department. No game polishing. It's quite solveable, though.

Those occasional flashes on screen are the avatar (just a block in this version) moving asynchronously from the camera. That's been fixed.


>It's probably about five people working for a year from being ready.

The trouble is, we actually have tens/hundreds of people, all working on their own. The blessing and curse of open source development


aside: I love egui


I find myself in this weird corner when it comes to async rust.

The guy's got a point in that doing a bunch of Arc, RwLock, and general sharing of state is going to get messy. Especially once you are sprinkling 'static all over the place, it infects everything, much like colored functions. I did this whole thing once back when I was starting off where I would Arc<RwLock> stuff, and try to be smart about borrow lifetimes. Total mess.

But then rust also has channels. When you read about it, it talks about "messages", which to me means little objects. Like a few bytes little. This is the solution, pretty much everything I write now is just a few tasks that service some channels. They look at what's arrived and if there's something to output, they will put a message on the appropriate channel for another task to deal with. No sharing objects or anything. If there's a large object that more than one task needs, either you put it in a task that sends messages containing the relevant query result, or you let each task construct its own copy from the stream of messages.

And yet I see a heck of a lot of articles about how to Arc or what to do about lifetimes. They seem to be things that the language needs, especially if you are implementing the async runtime, but I don't understand why the average library user needs to focus so much on this.


I find the criticisms a little strange - async doesn’t imply multithreaded, and you don’t need to annotate everything shared with magic keywords if you’re async within the same thread because there’s no sharing. Only one future at a time is running on the thread and they’re within the same context.

When moving between threads I do what you suggest here and use channels to send signals rather than having a lot of shared state. Sometimes there is a crucial global state something that’s easier to just directly access, but I just write struct that manages all the Arc/RwLock or whatever other exclusive access mechanism I need for the access patterns. From the callers point of view everything is just a simple function call. When writing the struct I need to be thoughtful of sharing semantics but it’s a very small struct and I write it once and move on.

I also don’t understand their concern about making things Send+Sync. In my experience almost everything is easily Send+Sync, and things that aren’t shouldn’t or couldn’t be.

I get that sometimes you just want to wear sweatpants and write code without thought of the details, but most languages that offer that out of the box don’t really offer efficient concurrency and parallelism. And frankly you rarely actually need those things even if the “but it’s cool” itch is driving you. Most of the time a nodejs-esque single threaded async program is entirely sufficient, and a lot of the time Async isn’t even necessary or particularly useful. But when you need all these things, you probably need to hike up your sweatpants and write some actual computer code - because microseconds matter, profiled throughput is crucial, and nothing in life that’s complex is easy and anyone selling you otherwise is lying.


> Sometimes there is a crucial global state something that’s easier to just directly access, but I just write struct that manages all the Arc/RwLock or whatever other exclusive access mechanism I need for the access patterns. From the callers point of view everything is just a simple function call. When writing the struct I need to be thoughtful of sharing semantics but it’s a very small struct and I write it once and move on.

This is a recurring pattern I've started to notice with Rust: most things that repeatedly feel clunky, or noisy, or arduous, can be wrapped in an abstraction that allows your business logic to come back into focus. I've started to think this mentality is essential to any significant Rust project.


Yeah it was a bit of a block for me as well, I don’t know where it came from, but I resisted wrapping things. Reality is breaking things up into crates is encouraged anyway, and just abstracting complexity away is Not That Hard, and can usually be pretty small and concise to boot.

I think I’m used to other languages provided a lot of these abstractions or having some framework that manages it all. The frameworks in rust tend to be pretty low level (with a few notable exceptions) so perhaps that’s where it comes from.


Well for one- creating abstractions always comes with a tradeoff, so it's good to have some basic skepticism around them. But Rust embraces them, for better and worse. It equips you to write extremely safe and scalable abstractions, but it's also designed in a way that assumes you're going to use those capabilities (mainly, being really low-level and explicit by default), and so you're going to have a harder time if you avoid them

Another thing, for me, was that I came from mostly writing TypeScript, which is the opposite: the base language is breezy without abstractions, and the type system equips you to strongly-type plain data and language features, so you'll have a great time if you stick to those

But yeah, it's been interesting to see how different the answers to these questions can be in different languages!


Rust embraces abstractions because Rust abstractions are zero-cost. So you can liberally create them and use them without paying a runtime cost.

That makes abstractions far more useful and powerful, since you never need to do a cost-benefit analysis in your head, abstractions are just always a good idea in Rust.


"Zero-cost abstractions" can be a confusing term and it is often misunderstood, but it has a precise meaning. Zero-cost abstractions doesn't mean that using them has no runtime cost, just that the abstraction itself causes no additional runtime cost.

These can also be quite narrow: Rc is a zero-cost abstraction for refcounting with both strong and weak references allocated with the object on the heap. You cannot implement something the same more efficiently, but you can implement something different but similar that is both faster and lighter than Rc. You can make a CheapRc that only has strong counts, and that will be both lighter and faster by a tiny amount, or a SeparateRc that stores the counts separately on the heap, which offers cheaper conversions to/from Rc.


I am very aware of the definition of zero-cost.

We're talking about the comparison between using an abstraction vs not using an abstraction.

When I said "doesn't have a runtime cost", I meant "the abstraction doesn't have a runtime cost compared to not using the abstraction".

If you want your computer to do anything useful, then you have to write code, and that code has a runtime cost.

That runtime cost is unavoidable, it is a simple necessity of the computer doing useful work, regardless of whether you use an abstraction or not.

Whenever you create or use an abstraction, you do a cost-benefit analysis in your head: "does this abstraction provide enough value to justify the EXTRA cost of the abstraction?"

But if there is no extra cost, then the abstraction is free, it is truly zero cost, because the code needed to be written no matter what, and the abstraction is the same speed as not using the abstraction. So there is no cost-benefit analysis, because the abstraction is always worth it.


The way you used it in your parent comment didn't make it clear that you were using it properly, hence my clarification. I'm honestly still not sure you've got it right, because Rust abstractions, in general, are not zero-cost. Rust has some zero-cost abstractions in the standard library and Rust has made choices, like monomorphization for generics, that make writing zero-cost abstractions easier and more common in the ecosystem. But there's nothing in the language or compiler that forces all abstractions written in Rust to be free of extra runtime costs.


I never said that ALL abstractions in Rust are zero-cost, though the vast majority of them are, and you actually have to explicitly go out of your way to use non-zero-cost abstractions.


Are you sure about that?

>Rust embraces abstractions because Rust abstractions are zero-cost. So you can liberally create them and use them without paying a runtime cost.

>you never need to do a cost-benefit analysis in your head, abstractions are just always a good idea in Rust

Again though, and ignoring that, "zero-cost abstraction" can be very narrow and context specific, so you really don't need to go out of your way to find "costly" abstractions in Rust. As an example, if you have any uses of Rc that don't use weak references, then Rc is not zero-cost for those uses. This is rarely something to bother about, but rarely is not never, and it's going to be more common the more abstractions you roll yourself.


There's always a complexity cost even when there isn't a runtime cost. It just so happens that in Rust, the benefits tend to outweigh the costs


The whole point of an abstraction is to remove complexity for the user.

So I assume you mean "implementation complexity" but that's irrelevant, because that cost only needs to be paid once, and then you put the abstraction into a crate, and then millions of people can benefit from that abstraction.


You've got a very narrow view that I'd encourage you to be more open-minded about

No abstraction is perfect. Every abstraction, when encountered by a user, requires them to ask "what does this do?", because they don't have the implementation in front of their eyes

This may be an easy question to answer- maybe it maps very obviously to a pattern or domain concept they already know, or maybe they've seen this exact abstraction before and just have to recall it

It may be slightly harder- a new but well-documented concept, or a concept that's intuitive but complex, or a concept that's simple but poorly-named

Or it may be very hard- a badly-designed abstraction, or one that's impossible to understand without understanding the entire system

But the simplest, most elegant, most intuitive abstraction in the world has nonzero cognitive cost. We abstract despite the cost, when that cost is smaller than the cost of not abstracting.


Even the costs you are talking about are a one-time cost to read the documentation and learn the abstraction. And the long-term benefits of the abstraction are far greater than the one-time costs. That's why we create abstractions, because they are a net gain. If they were not a net gain, we would simply not create them.


The whole point of abstraction is to replace the need of understanding all the details of the implementation with a more general and simpler concept. So while the abstraction itself may have a non zero cognitive cost for the end user, this cost should be lower than the cognitive cost of the full implementation that the abstraction hides. Hence the net cognitive cost of proper abstraction is negative.

Abstractions allow systems to scale. Without them, it would be impossible to work on a system that's 1M lines of code long, because you'd have to read and understand all 1M lines before doing anything.


> abstractions are just always a good idea

The "zero-cost" phrase is deceptive. There's a non-zero cognitive cost to the author and all subsequent readers. A proliferation of abstractions increases the cost of every other abstraction further due to complex interactions. This is true of in all languages where the community has embraced the idea of abstraction without moderation.


Well, the intent of an abstraction is it comes at a non zero cost to the author but a substantial benefit to the user/reader. If it’s a cost to everyone why are you doing it at all?

Rust embraces zero to low cost abstraction at the machine performance level, although to get reflective or runtime adaptive abstractions you end up losing some of that zero cost as you need to start boxing and moving things into heaps and using vtables, etc. IMO this is where rust is weakest and most complex.


> There's a non-zero cognitive cost to the author and all subsequent readers.

No, the cognitive cost of a particular abstraction relative to all other abstractions under consideration can be negative.

The option of not using any abstraction doesn’t exist. If you disagree with that then I think we have to go back one step and ask what an abstraction even is.


It also often makes debugging harder.


> async doesn’t imply multithreaded

Async the keyword doesn’t, but Tokio forces all of your async functions to be multi thread safe. And at the moment, tokio is almost exclusively the only async runtime used today. 95% of async libraries only support tokio. So you’re basically forced to write multi thread safe code even if you’d benefit more from a single thread event loop.

Rust async’s set up is horrid and I wish the community would pivot away to something else like Project Loom.


No, tokio does not require your Futures to be thread-safe.

Every executor (including tokio) provides a `spawn_local` function that spawns Futures on the current thread, so they don't need to be Send:

https://docs.rs/tokio/1.32.0/tokio/task/fn.spawn_local.html

I have used Rust async extensively, and it works great. I consider Rust's Future system to be superior to JS Promises.


So you’re stuck choosing a single CPU or having to write send and sync everywhere. There’s a lot of use cases where you would want a thread-per-core model like Glommio to take advantage of multiple cores while still being able to write code like it’s a single thread.

> I have used Rust async extensively, and it works great. I consider Rust's Future system to be superior to JS Promises.

Sure, but it’s a major headache compared to Java VirtualThreads or goroutines


> So you’re stuck choosing a single CPU or having to write send and sync everywhere. There’s a lot of use cases where you would want a thread-per-core model like Glommio to take advantage of multiple cores while still being able to write code like it’s a single thread.

thread_local! exists, and you can just call spawn_local on each thread. You can even call spawn_local multiple times on the same thread if you want.

You can have some parts of your programs be multi-threaded, and then other parts of your program can be single-threaded, and the single-threaded and multi-threaded parts can communicate with an async channel...

Rust gives you an exquisite amount of control over your programs, you are not "stuck" or "locked in", you have the flexibility to structure your code however you want, and do async however you want.

You just have to uphold the basic Rust guarantees (no data races, no memory corruption, no undefined behavior, etc.)

The abstractions in Rust are designed to always uphold those guarantees, so it's very easy to do.


> Rust gives you an exquisite amount of control over your programs

It does.

Problem is that there isn't the documentation, examples etc to help navigate the many options.


> So you’re stuck choosing a single CPU or having to write send and sync everywhere. There’s a lot of use cases where you would want a thread-per-core model like Glommio to take advantage of multiple cores while still being able to write code like it’s a single thread.

No your not, you spawn a runtime on each thread and use spawn_local on each runtime. This is how actix-web works and it uses tokio under the hood.

https://docs.rs/actix-rt/latest/actix_rt/


Yea this is exactly what I do. It makes everything much cleaner.


How is the future system superior? Is this a case of the languages type constraints being better vs non-existent? Saying something is superior doesn't really add much.

I am genuinely asking because I have little formal background in CS so "runtimes" and actual low level differences between , for instance, async and green threads mystifies me. EG What makes them actually different from the "runtime" perspective?


Wow, I've been using tokio for years and never knew about this. Thanks!


>but Tokio forces all of your async functions to be multi thread safe

While there are other runtimes that are always single-threaded, you can do it with tokio too. You can use a single threaded tokio runtimes and !Send tasks with LocalSet and spawn_local. There are a few rough edges, and the runtime internally uses atomics where a from-the-ground-up single threaded runtime wouldn't need them, but it works perfectly fine and I use single threaded tokio event loops in my programs because the tokio ecosystem is broader.


So with another async runtime it's possible to write async Rust that doesn't need to be thread-safe??? Can you show some example?


You don't even need other runtimes for this. Tokio includes a single-threaded runtime and tools for dealing with tasks that aren't thread safe, like LocalSet and spawn_local, that don't require the future to be Send.


Every executor (including tokio) supports spawning Futures that aren't Send:

https://docs.rs/tokio/1.32.0/tokio/task/fn.spawn_local.html

There is a lot of misinformation in this thread, with people not knowing what they're talking about.


I really like the message passing paradigm. And languages like Erlang have shown that its an excellent choice... for distributed systems. But writing code like that is a very diffferent experience from, say, async JavaScript, which feels more like writing synchronous code with green threads (except you have to deal with function coloring as well). I believe people will try to write code in a way that is already familiar to them, leading them down the path of Arc and RwLock in Rust.


> But writing code like that is a very diffferent experience from, say, async JavaScript,

I write a fair amount of code in Elixir professionally and this isn't how I view it.

There are some specific Elixir/Erlang bits of ceremony you need to do to set up your supervision tree of GenServers, but then once that's done you get to write code that feels like so gle threaded "ignore the rest of the world" code. Some of the function calls you're making might be "send and message and wait for a response" from GenServers etc. but the framework takes care of that.

I wrote some driver code for an NXP tag chip. Driving the inventory process is a bit involved, you have to do a series of things, set up hardware, turn on radio, wait a bit, send data, service the SPI the whole time in parallel. With the right setup for the hardware interface I just wrote the whole thing as a sequence, it was the simplest possible code you could imagine for it. And this at the same time as running a web server, and servicing hardware interrupts that cause it to reload the state of some registers and show them to each connected web session.


Go also uses goroutines and channels to facilitate message passing, or as they describe it, "sharing memory by communicating."

I imagine Rust to be a language far more similar to Go, in both use cases and functionality, than JS.


And in the end, almost everything ends up using Mutex, RWMutex, WaitGroup, Once, and some channels that exist only to ever be closed (like Context.Done), and only if you need to select around them.

It's great, but message passing it is not.


As a quite senior Go developer, I'd like to +1 this a ton. You're far more likely to have shocking edge cases unaccounted for when using channels. I consider every usage very, very carefully. Just like every other language, I think the ultimate solution is to build higher-level abstractions for concurrency patterns (e.g. errgroup) and, now that Go has generics, it's the right time to start building them.

If you haven't seen this paper, I bet you'll find at least one or two new bugs that you didn't know about: https://songlh.github.io/paper/go-study.pdf


The first one is indeed non-obvious, but the remaining snippets presented as bugs would not pass a review unless hidden inside 1k+ LOC PRs. Some are so blatantly obvious (seriously for loop and not passing current value as variable?) that I'm surprised that authors have listed them as if they're somehow special.


> for loop and not passing current value as variable

In most languages, current for loop value is always accessed a variable, not a reference. The only languages where it's not the case that I know of are Go and Python (JavaScript used to also have this problem with for(var ...), it was fixed with for(let ...)). So if you don't regularly write Go, it's easy to make this mistake.


I still like channels because they may be a net reduction in the number of concurrency primitives in use, which complicates quantification in the paper - their taxonomy is great, though. Channels have some sharp corners.


Reducing the number of concurrency primitives does not imply reduction in complexity. On the contrary in fact, I've seen the messes created by golang in large production systems. Here's a good article: https://www.uber.com/blog/data-race-patterns-in-go/


If you choose to use Mutex, that's on you.

Rust gives you channels (both synchronous blocking channels and async channels), and they work great, there is nothing stopping you from using them.


I'm pretty sure the gp was talking about Go Mutex, not Rust Mutex.


Ah, my mistake.


Because all languages and developers assume that Erlang is only about message passing. And they completely ignore literally everything else: from immutable structures to the VM providing insane guarantees (like processes not crashing the VM, and monitoring)


> I imagine Rust to be a language far more similar to Go, in both use cases and functionality, than JS.

I mostly agree. But I would wager that for a significant amount of people their first exposure to "async" is JS and not any number of other languages. And when you try to write async Rust the same way as you might write async JS, things just aren't that pretty.


> But then rust also has channels. When you read about it, it talks about "messages", which to me means little objects. Like a few bytes little. This is the solution, pretty much everything I write now is just a few tasks that service some channels. They look at what's arrived and if there's something to output, they will put a message on the appropriate channel for another task to deal with. No sharing objects or anything. If there's a large object that more than one task needs, either you put it in a task that sends messages containing the relevant query result, or you let each task construct its own copy from the stream of messages.

The dream of Smalltalk and true OOP is still alive.


Why does Smalltalk constantly get credit for being true OOP? Simula was doing OOP long before Smalltalk. Most languages choose Simula style OOP, and reject the things that make Smalltalk different.

If you say Smalltalk is better OOP I might agree, but calling it "true" is not correct.


Because the term was coined by Alan Kay, who apparently later said he probably should have called it message oriented (paraphrasing).

There's also a written conversation you can find online where he disqualifies pretty much all of the mainstream languages of being OO.

A lot of people, like you, say that OO == ADTs. Or rather, what ever Simula, C++ and Java are doing. Some will say that inheritance is an integral part of it, other's say it's all about interfaces.

But then there's people who say that Scheme and JavaScript are more object oriented than Java and C#. Or that when we're using channels or actors we're now _really_ doing OOP.

There's people who talk about patterns, SOLID, clean code and all sorts of things that you should be adhering to when structuring OO code.

Then there's people who say that OO is all about the mental model of the user and their ability to understand your program in terms of operational semantics. They should be able to understand it to a degree that they can manipulate and extend it themselves.

It's all very confusing.


> Because the term was coined by Alan Kay

This is pretty unlikely. See https://news.ycombinator.com/item?id=36879311.


> The term "object-oriented" was applied to a programming language for the first time in the MIT CSG Memo 137 (April 1976)

That's publications though. Alan Kay says he used it in conversation in 1967: http://userpage.fu-berlin.de/~ram/pub/pub_jf47ht81Ht/doc_kay...

There's probably also a distinction to be made between "object-oriented" and "object-oriented programming".


The referenced research also considers the publications by Kay and his team (including his theses and the Smalltalk-72 and 76 manuals) and other uses of the term. I think Kay mixes things up in retrospective; in his 1968 thesis he used the terms "object language" and "object machine", but not "object-oriented"; imagine giving your new breakthrough method a name, but then not using that name anywhere in the publication; that seems unthinkable, especially with an accomplished communicator like Kay. The first time "object-oriented" appears in a publication of his or his team is in 1978.


> Most languages choose Simula style OOP

Right, including Smalltalk 76 and 80 onwards themselves. Remember Kay's statement "actually I made up the term object-oriented and I can tell you I did not have C++ in mind, so the important thing here is I have many of the same feelings about Smalltalk" (https://www.youtube.com/watch?v=oKg1hTOQXoY&t=636s); the reason he refers to Smalltalk this way in his 1997 talk was likely the fact that Smalltalk-80 has more in common with Simula 67 than his brain child Smalltalk-72. Ingalls explicitly refers to Simula 67 in his 2020 HOPL paper.

> and reject the things that make Smalltalk different

Which would mostly be its dynamic nature (Smalltalk-76 can be called the first dynamic OO language) and the use of runtime constructs instead dedicated syntax for conditions and loops (as it is e.g. the case in Lisp). There are a lot of dynamic OO languages still in use today, e.g. Python. Also Smalltalk-80 descendants are still in use, e.g. Pharo.


Alan Kay is generally credited with coming up with the term "object-oriented", so for better or for worse, many people defer to his definition and his embodiment of ideas when looking for a strict definition of the term.


> many people defer to his definition and his embodiment of ideas when looking for a strict definition of the term.

I consider the definition e.g. used by IEEE as sufficiently strict, see e.g. https://ethw.org/Milestones:Object-Oriented_Programming,_196..., but - as you say - it's not the defintion used by Kay.


Honestly, though, that’s like crediting William Burroughs with Blade Runner.


Erlang often crops up in these conversations.


OOP was a shit idea that needed to die.


There's a difference between Smalltalk's concept of OOP and Java's. I'm not talking about Java's.


I remember picking up this sort of advice from a professor way back in college. It's a godsend. Structure the problem as data flowing between tasks and connect them up with queues, avoid sharing state. It's just a better way to deal with multithreading no matter what language you use.


There is a time any place for sharing state and data. However it is extremely complex to make that work, and so if at all possible don't. In general the only time I can't use queues is when I'm writing the queue implementation (I've done this several times - turns out there are a number of different special cases in my embedded system where it was worth it to avoid some obscure downside to the queues I already had).

When you need the absolute best performance sharing state is sometimes better - but you need a deep understanding of how your CPUs share state. A mutex or atomic write operation is almost always needed (the exceptions are really weird), and those will kill performance so you better spend a lot of time minimizing where you have them.


I like this too.

I would also suggest looking into ringbuffers and LMAX Disruptor pattern.

There is also Red Planet Lab's Rama, which takes the data flow idea and uses it to scale.


Async is not a solution for data parallelism.


> But then rust also has channels. When you read about it, it talks about "messages", which to me means little objects. Like a few bytes little.

As a wise programmer once said, "Do not communicate by sharing memory; instead, share memory by communicating"


Ooh, that's very ezn. Ah crap I think I have a race condition.


Hoare Was Right.

(But if you're only firing up a few tasks, why not just use threads? To get a nice wrapper around an I/O event loop?)


Exactly. People are too afraid of using threads these days for some perceived cargo-cult scalability reasons. My rule of thumb is just to use threads if the total number of threads per process won't exceed 1000.

(This is assuming you are already switching to communicating using channels or similar abstraction.)


The performance overhead of threads is largely unrelated to how many you have. The thing being minimized with async code is the rate at which you switch between them, because those context switches are expensive. On modern systems there are many common cases where the CPU time required to do the work between a pair of potentially blocking calls is much less than the CPU time required to yield when a blocking call occurs. Consequently, most of your CPU time is spent yielding to another thread. In good async designs, almost no CPU time is spent yielding. Channels will help batch up communication but you still have to context switch to read those channels. This is where thread-per-core software architectures came from; they use channels but they never context switch.

Any software that does a lot of fine-grained concurrent I/O has this issue. Database engines have been fighting this for many years, since they can pervasively block both on I/O and locking for data model concurrency control.


The cost of context switching in "async" code is very rarely smaller than the cost of switching OS threads. (Exception is when you'ree using a GC language with some sort of global lock.)

"Async" in native code is cargo cult, unless you're trying to run on bare metal without OS support.


The cost of switching goroutines, rust Futures, Zig async Frames, or fibers/userspace-tasks in general is on the other of a few nano-seconds whereas it's in the micro-second range for OS threads. This allows you to spawn tons of tasks and have them all communicate with each other very quickly (write to queue; push receiver to scheduler runqueue; switch out sender; switch to receiver) whereas doing so with OS threads would never scale (write to queue; syscall to wake receiver; syscall to switch out sender). Any highly concurrent application (think games, simulations, net services) uses userspace/custom task scheduling for similar reasons.


Nodejs is inherently asynchronous and the JavaScript developers bragged during its peak years how it was faster than Java for webservers despite only using one core because a classic JEE servlet container launches a new thread per request. Even if you don't count this as "context switch" and go for a thread pool you are deluding yourself because a thread pool is applying the principles of async with the caveat that tasks you send to the thread pool are not allowed to create tasks of their own.

There is a reason why so many developers have chosen to do application level scheduling: No operating system has exposed viable async primitives to build this on the OS level. OS threads suck so everyone reinvents the wheel. See Java's "virtual threads", Go's goroutines, Erlang's processes, NodeJS async.

You don't seem to be aware what a context switch on an application level is. It is often as simple as a function call. There is no way that returning to the OS, running a generic scheduler that is supposed to deal with any possible application workload that needs to store all the registers and possibly flush the TLB if the OS makes the mistake of executing a different process first and then restore all the registers can be faster than simply calling the next function in the same address space.

Developers of these systems brag about how you can have millions of tasks active at the same time without breaking any sweat.


The challenge is that async colors functions and many of the popular crates will force you to be async, so it isn't always a choice depending on which crates you need.


Please excuse my ignorance, I haven't done a ton of async Rust programming - but if you're trying to call async Rust from sync Rust, can you not just create a task, have that task push a value through a mpsc channel, shove the task on the executor, and wait for the value to be returned? Is the concern that control over the execution of the task is too coarse grained?


Yes, you can do that. You can use `block_on` to convert an async Future into a synchronous blocking call. So it is entirely possible to convert from the async world back into the sync world.


But you have to pull in an async runtime to do it. So library authors either have to force everyone to pull in an async runtime or write two versions of their code (sync and async).


There are ways to call both from both for sure, but my point is if you don't want any async in your code at all...that often isn't a choice if you want to use the popular web frameworks for example.


I can't both perform blocking I/O and wait for a cancellation signal from another thread. So I need to use poll(), and async is a nice interface to that.


99% of the use cases that ought to use async are server-side web services. If you're not writing one of those, you almost certainly don't need async.


Or desktop programs. Many GUI frameworks have a main thread that updates the layout (among other things) and various background ones.


Async and GUI threads are different concepts. Of course most GUIs have an event loop which can be used as a form of async, but with async you do your calculations in the main thread, while with GUIs you typically spin your calculations off to a different thread.

Most often when doing async you have a small number of tasks repeated many times, then you spin up one thread per CPU, and "randomly" assign each task as it comes in to a thread.

When doing GUI style programming you have a lot of different tasks and each task is done in exactly one thread.


Hmm I would say the concepts are intertwined. Lots of GUI frameworks use async/await and the GUI thread is just another concurrency pattern that adds lock free thread exclusivity to async tasks that are pinned to a single thread.


Async for GUIs is also nice. Not essential, but allows you to simply lot of callback code


Note that if you "just" write responses to queries without yielding execution, you don't need async, you just write Sync handlers to an async framework. (Hitting dB requests in a synchronous way is not good for your perf though, you better have a mostly read / well cached problem)


A particularly interesting use case for async Rust without threads is cooperative scheduling on microcontrollers[1]; this article also does a really good job of explaining some of the complications referenced in TFA.

[1]: https://news.ycombinator.com/item?id=36790238


Waiting asynchronously on multiple channels/signals. Heterogenous select is really nice.


It really is, but I still favour "unsexy" manual poll/select code with a lot of if/elseing if it means not having to deal with async.

I fully acknowledge that I'm an "old school" system dev who's coming from the C world and not the JS world, so I probably have a certain bias because of that, but I genuinely can't understand how anybody could look at the mess that's Rust's async and think that it was a good design for a language that already had the reputation of being very complicated to write.

I tried to get it, I really did, but my god what a massive mess that is. And it contaminates everything it touches, too. I really love Rust and I do most of my coding in it these days, but every time I encounter async-heavy Rust code my jaw clenches and my vision blurs.

At least my clunky select "runtime" code can be safely contained in a couple functions while the rest of the code remains blissfully unaware of the magic going on under the hood.

Dear people coming from the JS world: give system threads and channels a try. I swear that a lot of the time it's vastly simpler and more elegant. There are very, very few practical problems where async is clearly superior (although plenty where it's arguably superior).


> but I genuinely can't understand how anybody could look at the mess that's Rust's async and think that it was a good design for a language that already had the reputation of being very complicated to write.

Rust adopted the stackless coroutine model for async tasks based on its constraints, such as having a minimal runtime by default, not requiring heap allocations left and right, and being amenable to aggressive optimizations such as inlining. The function coloring problem ("contamination") is an unfortunate consequence. The Rust devs are currently working on an effects system to fix this. Missing features such as standard async traits, async functions in traits, and executor-agnosticism are also valid complaints. Considering Rust's strict backwards compatibility guarantee, some of these will take a long time.

I like to think of Rust's "async story" as a good analogue to Rust's "story" in general. The Rust devs work hard to deliver backwards compatible, efficient, performant features at the cost of programmer comfort (ballooning complexity, edge cases that don't compile, etc.) and compile time, mainly. Of course, they try to resolve the regressions too, but there's only so much that can be done after the fact. Those are just the tradeoffs the Rust language embodies, and at this point I don't expect anything more or less. I like Rust too, but there are many reasons others may not. The still-developing ecosystem is a prominent one.


I read comments like this and feel like I’m living in some weird parallel universe. The vast majority of Rust I write day in and day out for my job is in an async context. It has some rough edges, but it’s not particularly painful and is often pleasant enough. Certainly better than promises in JS. I have also used system threads, channels, etc., and indeed there are some places where we communicate between long running async tasks with channels, which is nice, and some very simple CLI apps and stuff where we just use system threads rather than pulling in tokio and all that.

Anyway, while I have some issues with async around futur composition and closures, I see people with the kind of super strong reaction here and just feel like I must not be seeing something. To me, it solves the job well, is comprehensible and relatively easy to work with, and remains performant at scale without too much fiddling.


Honestly, this is me too. The only thing I’d like to also see is OTP-like supervisors and Trio-like nurseries. They each have their use and they’re totally user land concerns.


> It really is, but I still favour "unsexy" manual poll/select code with a lot of if/elseing if it means not having to deal with async.

> I fully acknowledge that I'm an "old school" system dev who's coming from the C world and not the JS world, so I probably have a certain bias because of that, but I genuinely can't understand how anybody could look at the mess that's Rust's async and think that it was a good design for a language that already had the reputation of being very complicated to write.

I'm in the same "old school" system dev category as you, and I think that modern languages have gone off the deep end, and I complained about async specifically in a recent comment on HN: https://news.ycombinator.com/item?id=37342711

> At least my clunky select "runtime" code can be safely contained in a couple functions while the rest of the code remains blissfully unaware of the magic going on under the hood.

And we could have had that for async as well, if languages were designed by the in-the-trenches industry developer, and not the "I think Haskell and Ocaml is great readability" academic crowd.

With async in particular, the most common implementation is to color the functions by qualifying the specific function as async, which IMO is exactly the wrong way to do it.

The correct way would be for the caller to mark a specific call as async.

IOW, which of the following is clearer to the reader at the point where `foo` is called?

Option 1: color the function

      async function foo () {
         // ...
      }
      ...
      let promise = foo ();
      let bar = await promise;

Option 2: schedule any function

      function foo () {
         // ...
      }

      let sched_id = schedule foo ();

      ...

      let bar = await sched_id;

Option 1 results in compilation errors for code in the call-stack that isn't async, results in needing two different functions (a wrapper for sync execution), and means that async only works for that specific function. Option 2 is more like how humans think - schedule this for later execution, when I'm done with my current job I'll wait for you if you haven't finished.


Isn't mixing async and sync code like this a recipe for deadlocks?

What if your example code is holding onto a thread that foo() is waiting to use?

Said another way, explain how you solved the problems of just synchronously waiting for async. If that just worked then we wouldn't need to proliferate the async/await through the stack.


> Said another way, explain how you solved the problems of just synchronously waiting for async.

Why? It isn't solved for async functions, is it? Just because the async is propagated up the call-stack doesn't mean that the call can't deadlock, does it?

Deadlocks aren't solved for a purely synchronous callstack either - A grabbing a resource, then calling B which calls C which calls A ...

Deadlocks are potentially there whether or not you mix sync/async. All that colored functions will get you is the ability to ignore the deadlock because that entire call-stack is stuck.

> If that just worked then we wouldn't need to proliferate the async/await through the stack.

It's why I called it a leaky abstraction.


Yes actually it is solved. If you stick to async then it cannot deadlock (in this way) because you yield execution to await.


> Yes actually it is solved. If you stick to async then it cannot deadlock (in this way) because you yield execution to await.

Maybe I'm misunderstanding what you are saying. I use the word "_implementation_type_" below to mean "either implemented as option 1 or option 2 from my post above."

With current asynchronous implementations (like JS, Rust, etc), any time you use `await` or similar, that statement may never return due to a deadlock in the callstack (A is awaiting B which is awaiting C which is awaiting A).

And if you never `await`, then deadlocking is irrelevant to the _implementation_type_ anyway.

So I am trying to understand what you mean by "it cannot deadlock in this way" - in what way do you mean? async functions can accidentally await on each other without knowing it, which is the deadlock I am talking about.

I think I might understand better if you gave me an example call-chain that, in option 1, sidesteps the deadlock, and in option 2, deadlocks.


I'm referring to the situation where a synchronous wait consumes the thread pool, preventing any further work.

A is sychrounously waiting B which is awaiting C which could complete but never gets scheduled because A is holding onto the only thread. Its a very common situation when you mix sync and async and you're working in a single threaded context, like UI programming with async. Of course it can also cause starvation and deadlock in a multithreaded context as well but the single thread makes the pitfall obvious.


That's an implementation problem, not a problem with the concept of asynchronous execution, and it's specifically a problem in only one popular implementation: Javascript in the browser without web-workers.

That's specifically why I called it a Leaky Abstraction in my first post on this: too many people are confusing a particular implementation of asynchronous function calls with the concept of asynchronous function calls.

I'm complaining about how the mainstream languages have implemented async function calls, and how poorly they have done so. Pointing out problems with their implementation doesn't make me rethink my position.


I don't see how it can be an implementation detail when fundamentally you must yield execution when the programmer has asked to retain execution.

Besides Javascript, its also a common problem in C# when you force synchronous execution of an async Task. I'm fairly sure its a problem in any language that would allow an async call to wait for a thread that could be waiting for it.

I really can't imagine how your proposed syntax could work unless the synchronous calls could be pre-empted, in which case, why even have async/await at all?

But I look forward to your implementation.


> I don't see how it can be an implementation detail when fundamentally you must yield execution when the programmer has asked to retain execution.

It's an implementation issue, because "running on only a single thread" is an artificial constraint imposed by the implementation. There is nothing in the concept of async functions, coroutines, etc that has the constraint "must run on the same thread as the sync waiting call".

An "abstraction" isn't really one when it requires knowledge of a particular implementation. Async in JS, Rust, C#, etc all require that the programmer knows how many threads are running at a given time (namely, you need to know that there is only one thread).

> But I look forward to your implementation.

Thank you :-)[1]. I actually am working (when I get the time, here and there) on a language for grug-brained developers like myself.

One implementation of "async without colored functions" I am considering is simply executing all async calls for a particular host thread on a separate dedicated thread that only ever schedules async functions for that host thread. This sidesteps your issue and makes colored functions pointless.

This is one possible way to sidestep the specific example deadlock you brought up. There's probably more.

[1] I'm working on a charitable interpretation of your words, i.e. you really would look forward to an implementation that sidesteps the issues I am whining about.


That sounds interesting indeed.

I think the major disconnect is that I'm mostly familiar with UI and game programming. In these async discussions I see a lot of disregard for the use cases that async C# and JavaScript were built around. These languages have complex thread contexts so it's possible to run continuations on a UI thread or a specific native thread with a bound GL context that can communicate with the GPU.

I suppose supporting this use case is an implementation detail but I would suggest you dig into the challenge. I feel like this is a major friction point with using Go more widely, for example.


> and not the "I think Haskell and Ocaml is great readability" academic crowd.

Actually, Rust could still learn a lot from these languages. In Haskell, one declares the call site as async, rather than the function. OCaml 5 effect handlers would be an especially good fit for Rust and solve the "colouration" problem.


That’s how Haskell async works. You mark the call as async, not the function itself.


I think Rust’s async stuff is a little half baked now but I have hope that it will be improved as time goes on.

In the mean time it is a little annoying to use, but I don’t mind designing against it by default. I feel less architecturally constrained if more syntactically constrained.


I'm curious what things you consider to be half-baked about Rust async.

I've used Rust async extensively for years, and I consider it to be the cleanest and most well designed async system out of any language (and yes, I have used many languages besides Rust).


Async traits come to mind immediately, generally needing more capability to existentially quantify Future types without penalty. Async function types are a mess to write out. More control over heap allocations in async/await futures (we currently have to Box/Pin more often than necessary). Async drop. Better cancellation. Async iteration.


> Async traits come to mind immediately,

I agree that being able to use `async` inside of traits would be very useful, and hopefully we will get it soon.

> generally needing more capability to existentially quantify Future types without penalty

Could you clarify what you mean by that? Both `impl Future` and `dyn Future` exist, do they not work for your use case?

> Async function types are a mess to write out.

Are you talking about this?

    fn foo() -> impl Future<Output = u32>
Or this?

    async fn foo() -> u32

> More control over heap allocations in async/await futures (we currently have to Box/Pin more often than necessary).

I'm curious about your code that needs to extensively Box. In my experience Boxing is normally just done 1 time when spawning the Future.

> Async drop.

That would be useful, but I wouldn't call the lack of it "half-baked", since no other mainstream language has it either. It's just a nice-to-have.

> Better cancellation.

What do you mean by that? All Futures/Streams/etc. support cancellation out of the box, it's just automatic with all Futures/Streams.

If you want really explicit control you can use something like `abortable`, which gives you an AbortHandle, and then you can call `handle.abort()`

Rust has some of the best cancellation support out of any async language I've used.

> Async iteration.

Nicer syntax for Streams would be cool, but the combinators do a good job already, and StreamExt already has a similar API as Iterator.


Re: existential quantification and async function types

It'd be very nice to be able to use `impl` in more locations, representing a type which needs not be known to the user but is constant. This is a common occurrence and may let us write code like `fn foo(f: impl Fn() -> impl Future)` or maybe even eventually syntax sugar like `fn foo(f: impl async Fn())` which would be ideal.

Re: Boxing

I find that a common technique needed to get make abstraction around futures to work is the need to Box::pin things regularly. This isn't always an issue, but it's frequent enough that it's annoying. Moreover, it's not strictly necessary given knowledge of the future type, it's again more of a matter of Rust's minimal existential types.

Re: async drop and cancellation.

It's not always possible to have good guarantees about the cleanup of resources in async contexts. You can use abort, but that will just cause the the next yield point to not return and then the Drops to run. So now you're reliant on Drops working. I usually build in a "kind" shutdown with a timer before aborting in light of this.

C# has a version of this with their CancelationTokens. They're possible to get wrong and it's easy to fail to cancel promptly, but by convention it's also easy to pass a cancelation request and let tasks do resource cleanup before dying.

Re: Async iteration

Nicer syntax is definitely the thing. Futures without async/await also could just be done with combinators, but at the same time it wasn't popular or easy until the syntax was in place. I think there's a lot of leverage in getting good syntax and exploring the space of streams more fully.


> That would be useful, but I wouldn't call the lack of it "half-baked", since no other mainstream language has it either. It's just a nice-to-have.

Golang supports running asynchronous code in defers, similar with Zig when it still had async.

Async-drop gets upgraded from a nice-to-have into an efficiency concern as the current scheme of "finish your cancellation in Drop" doesn't support borrowed memory in completion-based APIs like Windows IOCP, Linux io_uring, etc. You have to resort to managed/owned memory to make it work in safe Rust which adds unnecessary inefficiency. The other alternatives are blocking in Drop or some language feature to statically guarantee a Future isn't cancelled once started/initially polled.


> Golang supports running asynchronous code in defers, similar with Zig when it still had async.

So does Rust. You can run async code inside `drop`.


To run async in Drop in rust, you need to use block_on() as you can't natively await (unlike in Go). This is the "blocking on Drop" mentioned and can result in deadlocks if the async logic is waiting on the runtime to advance, but the block_on() is preventing the runtime thread from advancing. Something like `async fn drop(&mut self)` is one way to avoid this if Rust supported it.


You need to `block_on` only if you need to block on async code. But you don't need to block on order to run async code. You can spawn async code without blocking just fine and there is no risk of deadlocks.


1) That's no longer "running async code in Drop" as it's spawned/detached and semantically/can run outside the Drop. This distinction is important for something like `select` which assumes all cancellation finishes in Drop. 2) This doesn't address the efficiency concern of using borrowed memory in the Future. You have to either reference count or own the memory used by the Future for the "spawn in Drop" scheme to work for cleanup. 3) Even if you define an explicit/custom async destructor, Rust doesn't have a way to syntactically defer its execution like Go and Zig do so you'd end up having to call it on all exit points which is error prone like C (would result in a panic instead of a leak/UB, but that can be equally undesirable). 4) Is there anywhere one can read up on the work being done for async Drop in Rust? Was only able to find this official link but it seems to still have some unanswered questions (https://rust-lang.github.io/async-fundamentals-initiative/ro...)


Now you lose determinism in tear down though.


Ok, in a very, very rare case (so far never happened to me) when I really need to await an async operation in the destructor, I just define an additional async destructor, call it explicitly and await it. Maybe it's not very elegant but gets the job done and is quite readable as well.

And this would be basically what you have to do in Go anyways - you need to explicitly use defer if you want code to run on destruction, with the caveat that in Go nothing stops you from forgetting to call it, when in Rust I can at least have a deterministic guard that would panic if I forget to call the explicit destructor before the object getting out of scope.

BTW async drop is being worked on in Rust, so in the future this minor annoyance will be gone


Yes I am aware of async drop proposals. And the point is not to handle a single value being dropped but to facilitate invariants during an abrupt tear down. Today, when I am writing a task which needs tear down I need to hand it a way to signal a “nice” shutdown, wait some time, and then hard abort it.


Actually, this "old school" approach is more readable even for folks who have never worked in the low-level C world. At-least everything is in front of your eyes and you can follow the logic. Unless code leveraging async is very well-structured, it requires too much brain-power to process and understand.


It's great! But there's nothing about it that requires futures.

It really annoys me that something like this isn't built-in: https://github.com/mrkline/channel-drain


That works for channels, but being able to wait other asynchronous things is better. Timeouts for instance.

We could imagine extending this to arbitrary poll-able things. And now we have futures, kind of.


> (But if you're only firing up a few tasks, why not just use threads? To get a nice wrapper around an I/O event loop?)

To get easier timers, to make cancellation at all possible (how to cancel a sync I/O operation?), and to write composable code.

There are patterns that become simpler in async code and much more complicated in sync code.


You cancel a sync IO op similar to how you cancel an async one: have another task (i.e OS thread in this case) issue the cancellation. Select semantically spawns a task per case/variant and does something similar under the hood if cancellation is implemented.


You can do that, but then the logic of your cancellable thread gets intermingled with the cancellation logic.

And since the cancellation logic runs on the cancellable thread, you can't really cancel a blocking operation. What you can do is to let it run to completion, check that it was canceled, and discard the value.


Not sure I follow; the cancellation logic is on both threads/tasks 1) the operation itself waiting for either the result or a cancel notification and 2) the cancellation thread sending that notification.

The cancellation thread is generally the one doing the `select` so it spawns the operation thread(s) and waits for (one of) their results (i.e. through a channel/event). The others which lose the race are sent the cancellation signal and optionally joined if they need to be (i.e. they use intrusive memory).


He didn't say queues though. CSP isn't processes streaming data to each other through buffered channels, it's one process synchronously passing one message to another. Whichever one gets the the communication point waits for the other.


It is both.

Hoare's later paper introduced buffered channels to CSP.

So one can use it as synchronous passing, or queued passing.


The author does mention that you should probably stop at using Threads and passing data around via channels... but then mentions the C10K problem and says that sometimes you need more... but does not answer the question that I think is begging to be asked: does using Rust async with all the complications (Arc, cloning, Mutex whatever) does actually outperform Threads/channels?? Even if it does, by how much? It would be really interesting to know the answer. I have a feeling that Threads/channels may be more performant in practice, despite the imagined overhead.


There's not a good distributed concurrent benchmark in the Techempower Web Framework benchmarks, because the Multiple Queries and Fortunes test programs don't use any parallelism or concurrency primitives to win at fast SQL queries. https://www.techempower.com/benchmarks/#section=data-r21&tes...

From https://news.ycombinator.com/item?id=37289579 :

> I haven't checked, but by the end of the day, I doubt eBPF is much slower than select() on a pipe()?

Channels have a per-platform implementation.

- "Patterns of Distributed Systems (2022)" (2023) https://news.ycombinator.com/item?id=36504073


Threads cannot scale at all, because you're limited to the number of threads (which is usually quite small).

Async code can scale essentially infinitely, because it can multiplex thousands of Futures onto a single thread. And you can have millions of Futures multiplexed onto a dozen threads.

This makes async ideal for situations where your program needs to handle a lot of simultaneous I/O operations... such as a web server:

http://aturon.github.io/blog/2016/08/11/futures/

Async wasn't invented for the fun of it, it was invented to solve practical real world problems.


Threads, at least on Linux, are much more lightweight than you seem to think. Async Rust can scale better, of course, but you're overexaggerating your case.


If your system cannot be decomposed away from shared mutable state, then you cannot avoid lifetime management and synchronization primitives.

Ultimately, it depends on your data model.


Channels with passing messages has been around as a solid way for doing async and multi threading since forever. These systems are called actor based systems. Erlang is a good example which uses it as its core. Then on the jvm there is Akka. Axon is another.


> If there's a large object that more than one task needs, either you put it in a task that sends messages containing the relevant query result, or you let each task construct its own copy from the stream of messages.

When you can guarantee sole ownership, why not put that exclusive pointer in the message? I’d think that this sort of compile-time lock would be an important advantage for the type system. (I think some VMs actually do this sort of thing dynamically, but I can’t quite remember where I read about it.)

On a multiprocessor, there’s of course a balance to be determined between the overhead of shuffling the object’s data back and forth between CPUs and the overhead of serializing and shuffling the queries and responses to the object’s owning thread. But I don’t think the latter approach always wins, does it? At least I can’t tell why it obviously should.


That was the argument for Go. But Go is not used that way. People still share and lock stuff in Go. Go is only safe for race conditions that break the memory model, not all race conditions, as Rust is.


Rust protects against data races, not race conditions.

https://doc.rust-lang.org/nomicon/races.html


Go doesn’t protect against even those, which is what the parent meant


Go cannot catch data races at compile time (like Rust) but can catch a subset of data races at run time with the race detector. Go provides imperfect, opt-in protection.


Technically, you can break Go's memory model via race conditions: Write to an interface on one thread while reading it from another and you may read the old vtable pointer but new data pointer / the other way around. Same goes for slices with data/length/capacity.


You don't need Rust for that. You can even do it in JavaScript.


how do you do bidirectional channels/rpc?

like “send request to channel A with message 123, make sure to get a response back from channel B exactly for that message”


this is what Go got right like 10 years ago


In the sense that green threads are easier, sure.

But green threads were not and are not the right solution for Rust, so it's kind of beside the point. Async Rust is difficult, but it will eventually be possible to use Async Rust inside the Linux kernel, which is something you can't do with the Go approach.


Rust Futures are essentially green threads, except much lighter-weight, much faster, and implemented in user space instead of being built-in to the language.

Basically Rust Futures is what Go wishes it could have. Rust made the right choice in waiting and spending the time to design async right.


You're overstating your case. Rust's async tasks (based on stackless coroutines) and Go's goroutines (based on stackful coroutines) have important differences. Rust's design introduces function coloring (tentative solution in progress) but is much more suited for the bare-metal scene that C and C++ are famous for. Go's design has more overhead but, by virtue of not having colored functions, is simpler for programmers to write code for. Most things in computer science/programming involve tradeoffs. Also, Rust's async/await is built-in to the language. It's not a library implementation of stackless coroutines.


> Go's design has more overhead but, by virtue of not having colored functions, is simpler for programmers to write code for.

Colored functions is a debatable problem at best. I consider it a feature not a bug and it makes reasoning about programs easier at the expense of writing additional async/await keywords which is really a very minor annoyance.

On the other hand Go's need of using channels to do trivial and common tasks like communicating the result of an async task together with the lack of RAII and proper cleanup signaling in channels (you can very easily deadlock if nothing is attached on the other end of the channel), plus no compile time race detection - all that makes writing concurrent code harder.


ok lol


I think they are referring to channels, which came with the tagline "share memory by communicating."


Rust has had channels since before Go was even publicly announced. Remember that Rust, like Go, was inspired by Pike's earlier language Limbo, which uses CSP. https://en.wikipedia.org/wiki/Limbo_(programming_language)


Rust has had OS channels since forever, and async channels for 5 years.

Rust has changed a lot in the past 5 years, people just haven't noticed, so they assume that Rust is still an old outdated language.


yes


We need a way to bridge the gap. Having a runtime may not be suitable for all apps but it can easily allow you to reach 95%+ concurrency performance. The async compile-to-state-machine model is only necessary for the last 5%. Most userland apps rarely need to maximize concurrency efficiency. They need concurrency yes, but performance at the 95th percentile is more than sufficient.


I really don’t buy this argument that only some small “special” fraction of apps “actually” need async, and for the rest of us “plebs” we should be relegated to blocking.

Async is just hard. That’s it. It’s fundamentally difficult.

In my experience language implementations of async fall into 2-axes: clarity and control. C# is straightforward-enough (having cribbed its async design off functional languages) but I find it scores low on the “clarity” scale and moderate-high in control, because you could control it, but it was t always clear.

JS is moderate-high clarity, low control: easy to understand, because all the knobs are set for you. Before it got async/await sugar, I’d have said it would have been low clarity, because I’ve seen the promise/callback hell people wrote when given rope.

Python is the bottom of the barrel for both clarity and control. It genuinely has to have the most awful and confusing async design I’ve ever seen.

I personally find Rust scores high in both clarity and control. Playing with the Glommio executor was what really solidified my understanding of how async works however.


I learned concurrency and parallelism by confronting blocking behavior: waiting on a networking or filesystem request stops the world, so we need a new execution context to keep things moving.

What I realized, eventually, is that blocking is a beautiful thing. Embrace the thread of execution going to sleep, as another thread may now execute on the (single core at the time) CPU.

Now you have an organization problem, how to distribute threads across different tasks, some sequential, some parallel, some blocking, some nonblocking. Thread-per-request? Thread-per-connection?

And now a management problem. Spawning threads. Killing threads. Thread pools. Multithreaded logging. Exceptions and error handling.

Totally manageable in mild cases, and big wins in throughput, but scaling limits will present themselves.

I confront many of these tradeoffs in a fun little exercise I call "Miner Mover", implemented in Ruby using many different concurrency primitives here: https://github.com/rickhull/miner_mover


Maybe "add a runtime that switches execution contexts on behalf of the user" and "force the programmer to reimplement everything" are not the only options.


in the sense that sharing memory by communicating is the right approach


Go: it turns out that generic is actually useful

Rust: it turns out that not every concurrency needs to be zero-cost abstraction


Except that Rust hasn’t yet realised it


If Rust had gone with green threads as the core async strategy (I know it was a thing pre-1.0), that would be terrible. You're not understanding Rust's design goals. Rust's async model, while it has several major pain points at present, is still undoubtedly superior for what Rust was made for. It would be a shame to throw all that away. Go can go do it's own thing (it has, evidently).


Yes, async is effectively a much harder version of Rust, and it's regrettable how it's been shoved down the throats of everyone, while only 1% of projects using it really need it. Hover, async is also amazing in these 1% of cases when it's useful.

If you have a service that handles massive amounts of network calls at the core (think linkerd, nginx, etc.), or you want to have a massive amount of lightweight tasks in your game, or working on an embedded software where you want cooperative concurrency, async Rust is an amazing super-power.

Most system/application level things is not going to need async IO. Your REST app is going to be perfectly fine with a threadpool. Even when you do need async, you probably want to use it in a relatively small part of your software (network), while doing most of the things in threads, using channels to pass work around between async/blocking IO parts (aka hybrid model).

Rust community just mindlessly over-did using async literally everywhere, to the point where the blocking IO Rust (the actually better UX one) became a second class citizen in the ecosystem.

Especially visible with web frameworks where there is N well designed async web frameworks (Axum, Wrap, etc.) and if you want a blocking one you get:

  tiny_http, absolute bare bones but very well done
  rouille - more wholesome, on top of tiny_http, but APIs feel very meh comparing to e.g. Axum
  astra - very interesting but immature, and rather barebones


The argument here is that Rust chose to implement coroutines the wrong way. It went the route of stackless coroutines that need async/await and colored functions. This creates all the friction the article laments over.

But it also praises Go for its implementation, which is also based on a coroutine of a different kind. Stackful coroutines, which do not have any of these problems.

Rust considered using those (and, at first, that was the project's direction). Ultimately, they went to the stackless operation model because stackfull coroutine requires a runtime that preempts coroutines (to do essentially what the kernel does with threads). This was deemed too expensive.

Most people forget, however, that almost no one is using runtime-free async Rust. Most people use Tokio, which is a runtime that does essentially everything the runtime they were trying to avoid building would have done.

So we are left in a situation where most people using async Rust have the worst of both worlds.

That being said, you can use async Rust without an async runtime (or rather, an extremely rudimentary one with extremely low overhead). People in the embedded world do. But they are few, and even they often are unconvinced by async Rust for their own reasons.


Rust chose to drop the green thread library so that it could have no runtime, supporting valuable use cases for Rust like embedding a Rust library into a C binary, which we cared about. Go is not really usable for this (technically it's possible, but it's ridiculous for exactly this reason). So those sorts of users are getting a lot of benefit from Rust not having a green threading runtime. As are any users who are not using async for whatever reason.

However, async Rust is not using stackless coroutines for this reason - it's using stackless coroutines because they achieve a better performance profile than stackful coroutines. You can read all about it on Aaron Turon's blog from 2016, when the futures library was first released:

http://aturon.github.io/blog/2016/08/11/futures/

http://aturon.github.io/blog/2016/09/07/futures-design/

It is not the case that people using async Rust are getting the "worst of both worlds." They are getting better performance by default and far greater control over their runtime than they would be using a stackful coroutine feature like Go provides. The trade off is that it's a lot more complicated and has a bunch of additional moving parts they have to learn about and understand. There's no free lunch.


People love(d) rust because it’s a pleasant language to write code for while also being insanely performant. Async is taking away the first point and making it miserable to write code for. If this trend continues, it’ll ultimately destroy the credibility of the language and people will choose other languages. The proposers of async did not take this into account when they were proposing async


I designed async/await and I absolutely did take this into account. I designed it to be as pleasant as possible under the constraints.


Can you admit that you failed in making it a pleasant experience to write async, especially for library authors? I don’t think it’s too late to admit failure and implement something like May https://github.com/Xudong-Huang/may


no, I don't admit that, and I think you're an enormous asshole


Async await is polluting rust. What was once my favorite language is now a pain in the ass. And I am not the only person who feels this way. There’s no shame in pivoting


Naive question, since I tried my hand at rust years ago, but haven't looked at it since: isn't it possible to write another crate to build go-like channels? A kind of "write, then lose the reference" function call that places a value on a queue, and an accompanying receiver. That could make life easier for "normal" software development.


There are many such primitives in Rust (including one in the standard library). And it's effectively the default, the only annoying thing is the libraries which use async (it is possible to just wrap the async code in sync code, just a little annoying. But I think it's what most users of the language should do.)


But "most" users can live with a bit of overhead in return for safe parallelism. It's just a handful that wants to squeeze the last bit of power out of a CPU.

The other day, Intel revealed a processor with 66 thread support per core. 64 of those threads were called "slow", because there's no prefetching and speculative execution, as they are supposed to be waiting (mainly for memory, but networking could be another option). Perhaps very many cheap hardware threads is a way out of this.


That is my point. For most people just using synchronous code with threads is the best option. Async only shines if you really want to push your I/O relative to your compute (which is becoming more of a challenge on modern hardware as I/O bandwidth is rapidly expanding compared to compute), or if you need to keep track of an extremely large number of tasks with low memory overhead. Starting off with "I'm writing a network application, I should use async" is likely already a mistake, especially in Rust.


Threads are driven by the OS. Something needs to drive couritines, so there's no way around needing some (even rudimentary, like in embedded) executor. But to be a versatile and universal systems language, Rust can't just build-in executor into a language.

I think that stackless coroutines are better than stackfull, in particular for Rust. Everything was done correctly by the Rust team.

Again, this is all fair and good, as long as people understand the tradeoff and make good technical decisions around. If they all jump on async bandwagon blind o the obvious limitations, we get where Rust ecosystem is now.


Well people who jumped on async bandwagon are deeply involved in Rust community. So if they do something, others have to assume they are doing it right.


For better or worse, when faced with choices like this Rust has consistently decided to make sure it's workable for the lowest-level usecases (embedded, drivers, etc). I respect the consistency, and I appreciate that it's focused on an under-served market, especially compared to eg. web applications (an over-served market, if anything), even if it's sometimes a bummer for me personally


> Rust considered using those (and, at first, that was the project's direction). Ultimately, they went to the stackless operation model because stackfull coroutine requires a runtime that preempts coroutines (to do essentially what the kernel does with threads). This was deemed too expensive.

Stackful coroutines don't require a preemptive runtime. I certainly hope that we didn't end up with colored functions in Rust because of such a misconception.


They often implement soft preemption. Tokio and others like Glommio do. Usually, it's based on interrupts. The runtime schedules a timer to fire an interrupt, and some code is injected into the interrupt handler.

This is used to keep track of task runtime quotas so they can yield as soon as possible afterward.

This is the same technique used in Go and many others for preemption. If you don't add this, futures that don't yield can run forever, stalling the system.

You are right that it is not strictly necessary, but in practice, it is so helpful as a guard against the yielding problem that it's ubiquitous.

> I certainly hope that we didn't end up with colored functions in Rust because of such a misconception.

Misconceptions are everywhere unfortunately!


Tokio and glommio using interrupts is ironically another misconception. They're cooperatively scheduled so yes, a misbehaving blocking task can stall the scheduler. They can't really interrupt an arbitrary stackless coroutine like a Future due to having nowhere to store the OS thread context in a way that can be resumed (Each thread has its own stack, but now it's stackful with all the concerns of sizing and growing. Or you copy the stack to the task but now have somehow to fixup stack pointers in places the runtime is unaware).

https://tokio.rs/blog/2020-04-preemption#a-note-on-blocking

> Tokio does not, and will not attempt to detect blocking tasks and automatically compensate


> This is the same technique used in Go and many others for preemption. If you don't add this, futures that don't yield can run forever, stalling the system.

You may be referring to this particular issue in Go https://github.com/golang/go/issues/10958 which I think was somewhat addresses a couple releases back.


> You are right that it is not strictly necessary, but in practice, it is so helpful as a guard against the yielding problem that it's ubiquitous.

This is honestly shocking to hear. I would think that if people had bugs in their programs they would want them to fail loudly so they can be fixed.


As someone else said, it is not, strictly speaking, a bug. If your server receives a request that requires very computationally expensive work, is it okay to delay every other request on that core? That's probably not okay, and it'll show in your latency distribution.

Folks would rather have every future time sliced so that other tasks get some CPU time in a ~fair way (after all, there is no concept of task priority in most runtime).

But you're right: it isn't required, and you could sprinkle every loop of your code with yielding statements. But knowing when to yield is impossible for a future. If nothing else is running, it shouldn't yield. If many things are running but the problem space of the future is small, it probably shouldn't yield either, etc.

You simply do not have the necessary information in your future to make an informed decision. You need some global entity to keep track of everything and either yield for you or tell you when you should yield. Tokio does the former, Glommio does the latter.

It gets even more complex when you add IO into the mix because you need to submit IO requests in a way that saturates the network/nvme drives/whatever. So if a future submits an IO request, it's probably advantageous to yield immediately afterward so that other futures may do so as well. That's how you maximize throughput. But as I said, that's a very hard problem to solve.


Trying to solve the problem by frequently invoking signal handlers will also show in your latency distribution!

I guess if someone wants to use futures as if they were goroutines then it's not a bug, but this sort of presupposes that an opinionated runtime is already shooting signals at itself. Fundamentally the language gives you a primitive for switching execution between one context and another, and the premise of the program is probably that execution will switch back pretty quickly from work related to any single task.

I read the blog about this situation at https://tokio.rs/blog/2020-04-preemption which is equally baffling. The described problem cannot even happen in the "runtime" I'm currently using because io_uring won't just completely stop responding to other kinds of sqe's and only give you responses to a multishot accept when a lot of connections are coming in. I strongly suspect equivalent results are achievable with epoll.


>Trying to solve the problem by frequently invoking signal handlers will also show in your latency distribution!

So just like any other kind of scheduling? "Frequently" is also very subjective, and there are tradeoffs between throughput, latency, and especially tail latency. You can improve throughput and minimum latency by never preempting tasks, but it's bad for average, median, and tail latency when longer tasks starve others, otherwise SCHED_FIFO would be the default for Linux.

>I read the blog about this situation at https://tokio.rs/blog/2020-04-preemption which is equally baffling

You've misunderstood the problem somehow. There is definitely nothing about tokio (which uses epoll on Linux and can use io_uring) not responding in there. io_uring and epoll have nothing to do with it and can't avoid the problem: the problem is with code that can make progress and doesn't need to poll for anything. The problem isn't unique to Rust either, and it's going to exist in any cooperative multitasking system: if you rely on tasks to yield by themselves, some won't.


> So just like any other kind of scheduling?

Yes. Industries that care about latency take some pains to avoid this as well, of course.

> io_uring and epoll have nothing to do with it and can't avoid the problem: the problem is with code that can make progress and doesn't need to poll for anything.

They totally can though? If I write the exact same code that is called out as problematic in the post, my non-preemptive runtime will run a variety of tasks while non-preemptive tokio is claimed to run only one. This is because my `accept` method would either submit an "accept sqe" to io_uring and swap to the runtime or do nothing and swap to the runtime (in the case of a multishot accept). Then the runtime would continue processing all cqes in order received, not *only* the `accept` cqes. The tokio `accept` method and event loop could also avoid starving other tasks if the `accept` method was guaranteed to poll at least some portion of the time and all ready handlers from one poll were guaranteed to be called before polling again.

This sort of design solves the problem for any case of "My task that is performing I/O through my runtime is starving my other tasks." The remaining tasks that can starve other tasks are those that perform I/O by bypassing the runtime and those that spend a long time performing computations with no I/O. The former thing sounds like self-sabotage by the user, but unfortunately the latter thing probably requires the user to spend some effort on designing their program.

> The problem isn't unique to Rust either, and it's going to exist in any cooperative multitasking system: if you rely on tasks to yield by themselves, some won't.

If we leave the obvious defects in our software, we will continue running software with obvious defects in it, yes.


>This sort of design solves the problem for any case of "My task that is performing I/O through my runtime is starving my other tasks."

Yeah, there's your misunderstanding, you've got it backwards. The problem being described occurs when I/O isn't happening because it isn't needed, there isn't a problem when I/O does need to happen.

Think of buffered reading of a file, maybe a small one that fully fits into the buffer, and reading it one byte at a time. Reading the first byte will block and go through epoll/io_uring/kqueue to fill the buffer and other tasks can run, but subsequent calls won't and they can return immediately without ever needing to touch the poller. Or maybe it's waiting on a channel in a loop, but the producer of that channel pushed more content onto it before the consumer was done so no blocking is needed.

You can solve this by never writing tasks that can take "a lot" of time, or "continue", whatever that means, but that's pretty inefficient in its own right. If my theoretical file reading task is explicitly yielding to the runtime on every byte by calling yield(), it is going to be very slow. You're not going to go through io_uring for every single byte of a file individually when running "while next_byte = async_read_next_byte(file) {}" code in any language if you have heap memory available to buffer it.


Reading from a socket, as in the linked post, is an example of not performing I/O? I'm not familiar with tokio so I did not know that it maintained buffers in userspace and filled them before the user called read(), but this is unimportant, it could still have read() yield and return the contents of the buffer.

I assumed that users would issue reads of like megabytes at a time and usually receive less. Does the example of reading from a socket in the blog post presuppose a gigabyte-sized buffer? It sounds like a bigger problem with the program is the per-connection memory overhead in that case.

The proposal is obviously not to yield 1 million times before returning a 1 meg buffer or to call read(2) passing a buffer length of 1, is this trolling? The proposal is also not some imaginary pie-in-the-sky idea; it's currently trading millions of dollars of derivatives daily on a single thread.


You're confusing IO not happening because it's not needed with IO never happening. Just because a method can perform IO doesn't mean it actually does every time you call it. If I call async_read(N) for the next N bytes, that isn't necessarily going to touch the IO driver. If your task can make progress without polling, it doesn't need to poll.

>I'm not familiar with tokio so I did not know that it maintained buffers in userspace

Most async runtimes are going to do buffering on some level, for efficiency if nothing else. It's not strictly required but you've had an unusual experience if you've never seen buffering.

>filled them before the user called read()

Where did you get this idea? Since you seem to be quick to accuse others of it, this does seem like trolling. At the very least it's completely out of nowhere.

>it could still have read() yield and return the contents of the buffer.

If I call a read_one_byte, read_line, or read(N) method and it returns past the end of the requested content that would be a problem.

>I assumed that users would issue reads of like megabytes at a time and usually receive less.

Reading from a channel is the other easy example, if files were hard to follow. The channel read might implemented as a quick atomic check to see if something is available and consume it, only yielding to the runtime if it needs to wait. If a producer on the other end is producing things faster than the consumer can consume them, the consuming task will never yield. You can implement a channel read method that always yields, but again, that'd be slow.

>The proposal is obviously not to yield 1 million times before returning a 1 meg buffer, is this trolling

No, giving a illustrative example is not trolling, even if I kept the numbers simple to make it easy to follow. But your flailing about with the idea of requiring gigabyte sized buffers probably is.


> You're confusing IO not happening because it's not needed with IO never happening. Just because a method can perform IO doesn't mean it actually does every time you call it. If I call async_read(N) for the next N bytes, that isn't necessarily going to touch the IO driver.

Maybe you can read the linked post again? The problem in the example in the post is that data keeps coming from the network. If you were to strace the program, you would see it calling read(2) repeatedly. The runtime chooses to starve all other tasks as long as these reads return more than 0 bytes. This is obviously not the only option available.

I apologize for charitably assuming that you were correct in the rest of my reply and attempting to fill in the necessary circumstances which would have made you correct


Actually, no, I misread it trying to make sense of what you were posting so this post is edited.

This is just mundane non-blocking sockets. If the socket never needs to block, it won't yield. Why go through epoll/uring unless it returns EWOULDBLOCK?


For io_uring all the reads go through io_uring and generally don't send back a result until some data is ready. So you'll receive a single stream of syscall results in which the results for all fds are interleaved, and you won't even be able to write code that has one task doing I/O starving other tasks. For epoll, polling the epoll instance is how you get notified of the readiness for all the other fds too. But the important thing isn't to poll the socket that you know is ready, it's to yield to runtime at all, so that other tasks can be resumed. Amusingly upon reading the rest of the blog post I discovered that this is exactly what tokio does. It just always yields after a certain number of operations that could yield. It doesn't implement preemption.


Honestly I assumed you had read the article and were just confused about how tokio was pretending to have preemption. Now you reveal you hadn't read the article so now I'm confused about you in general, it seems like a waste of time. But I'm glad you're at least on the same page now, about how checking if something is ready and yielding to the runtime are separate things.


You're in a reply chain that began with another user claiming that tokio implements preemption by shooting signals at itself.

> But I'm glad you're at least on the same page now, about how checking if something is ready and yielding to the runtime are separate things.

I haven't ever said otherwise?


There's nothing buggy about a future that never yields because it can always make progress, but people prefer that a runtime doesn't let all other execution get starved by one operation. That makes it a problem that runtimes and schedulers work to solve, but not a bug that needs to be prevented at a language level. A runtime that doesn't solve it isn't buggy, but probably isn't friendly to use, like how Go used to have problems with tight loops and they put in changes to make them cause less starvation.


> because stackfull coroutine requires a runtime that preempts coroutines

I've used stackful coroutines many times in many codebases. It never required or used a runtime or preemption. I'm not sure why having a runtime that preempts them would even be useful, since it defeats the reason most people use stackful coroutines in the first place.


"stackful coroutines" the control-flow primitive is cumbersome to build on top of "green threads" but for use cases that are mostly about blocking on lots of distinct I/O calls at the same time people may be indifferent between these two things. These conversations are often muddled because the feature shipped most often is called "async" and not called "jump to another stack please" :(


> I've used stackful coroutines many times in many codebases. It never required or used a runtime or preemption.

Can you tell us which? Go, Haskell and the other usual suspect all have runtime with automatic, transparent preemption.


It was always C++ for some type of high-performance data processing engine. Around half the stackful coroutine implementations were off-the-shelf libraries (e.g. Boost::Context) and the other half were purpose-built from scratch, depending on the feature requirements. The typical model is that you have stackful coroutines at a coarse level, e.g. per database query, which may dispatch hundreds of concurrent state machines. All execution and I/O scheduling is explicitly done by the software, which enables some significant runtime optimizations.

If coroutines can be preempted then it introduces a requirement for concurrency control that otherwise doesn't need to exist and interferes with dynamic cache locality optimizations. These are some of the primary benefits of using stackful coroutines in this context.

Being able to interrupt a stackful coroutine has utility for dealing with an extremely slow or stuck thread but you want this to be zero-overhead unless the thread is actually stuck. In most system designs, the time required to traverse any pair of sequential yield points is well-bounded so things getting "stuck" is usually a bug.

Letting end-users inject arbitrary code into these paths at runtime does require the ability to interrupt the thread but even that is often handled explicitly by more nuanced means than random preemption. Sometimes "extremely slow" is correct and expected behavior, so you have to schedule around it.


Lua comes with this sort of thing. OCaml, Python, and C have libraries providing this sort of thing in decreasing order of adoption.

Python also comes with 2 features that seem to be stackless coroutines with attached syntax ceremonies, but one of those 2 features is commonly used with a hefty runtime instead of being used for control flow. JavaScript comes with 2 features named similarly to those of Python, but only one of them seems to be "runtime-free" stackless coroutines.


The reason Rust chose stackless coroutines is because it allows zero cost FFI, which for a systems language is extremely important.


> Yes, async is effectively a much harder version of Rust, and it's regrettable how it's been shoved down the throats of everyone, while only 1% of projects using it really need it.

Yes. I just noticed that Tokio was pulled into my program as a dependency. Again. It's not being used, but I'm using a crate which has a function I'm not using which imports reqwest, which imports h2, which imports tokio.


Exactly, because something somewhere needs to make one http call, and it's would be impossible if it wasn't done with scalable async executor. /i


PR them to use ureq. ;)


I recently did this in a relatively small crate, and it halved the dependencies. Highly recommended if you don't need async.


Is there any reason to use async when your platform supports virtual threads?

I ask as someone who uses java and is about to rewrite a bunch of code to be able to chuck the entire async paradigm into the trash can and use a blocking model but on virtual threads where blocking is ok.


Virtual threads or green threads, etc., are all names for the same thing: stackful coroutines. I would say yes! If your language/platform/runtime supports them, that should definitely be your starting point.


> that should definitely be your starting point.

Could you expand a bit? Why?


Not OP, but synchronous code is much, much easier to understand and write than asynchronous code. What Java is doing is making synchronous code have all the advantages of asynchronous code by making blocking a Thread become a cheap operation (instead of blocking a real OS Thread), making the whole benfit of async code go away while getting rid of async's difficulties, specially in a language that doesn't have async/await (which makes async code "look" synchronous - but in Rust, as this blog post shows, that is not really the case).


Hot off the presses from the JVM Language Summit a few weeks ago; The Challenges of Introducing Virtual Threads to the Java Platform [1]

[1] https://www.youtube.com/watch?v=WsCJYQDPrrE


Async is also spread through so many crates that your program will have to be async in its entirety, or at least depend on the tokio crate for a lot of things. Want a web server? Async + tokio or gtfo. Want an sql connector? You better write your own, unless you want async. Each with a different solution to the various problems async brings -- and dont even get me started on async closures and such shit, thats where hell pokes through the earth and does unholy things your compiler.

I enjoy Rust, and I love how the compiler helps me solve problems. However, the ecosystem is "async or gtfo", or "just write it yourself if you dont want async lmao", and that's not good enough.


A lot of that pain could have been avoided if the language had better primitives for async in the std or in the futures crate. Like a trait that executor must implement and a "default" blocking executor to execute async code from sync.

Right now even building a library that support multiple async runtimes is a PITA, I have done it a couple times. So you end up supporting either just tokio and maybe async-std.


so it's clear to non-Rust devs, we do have basic primitives for "running async code from sync":

https://docs.rs/futures/latest/futures/executor/fn.block_on....

imagine you have an:

    async fn do_things() -> Something { /* ... */ }
you can:

    use futures::executor::block_on;
    fn my_normal_code() {
      let something = block_on(do_things());
    }

but this does get messy if the async code you're running isn't runtime-agnostic :(


> You can break the chain by commanding the entire runtime to block on the completion of a future, but you probably shouldn’t do this pervasively since it isn’t composable. If a function blocks on a future, and that future calls a function that blocks on a future, congrats! The runtime panics!

article says you can panic if you use the pattern you show. specifically, if you call `my_normal_code()` from an async context.

is the author just talking about a quirk in tokio? or is this sort of wrapping intrinsically dangerous somehow?


It's not intrinsic to async functions; you can block on a future inside another future via e.g. pollster or even manual polling.


In the .Net/C# world library users just expect that the library has implemented both DoThings() and DoThingsAsync(). However that is easier said that done because many of the foundational low-level IO APIs were implemented by Microsoft that first implemented IoMethod() and then implemented IoMethodAsync() when async/await became a thing in C#.


> A lot of that pain could have been avoided if the language had better primitives for async in the std or in the futures crate. Like a trait that executor must implement and a "default" blocking executor to execute async code from sync.

This is one of the goals of the async working group. Hopefully, when ready, that'll make it possible to swap out async runtimes underneath arbitrary code without issues.


Admittedly, I’m no expert in async rust, but I’ve written several thousand lines of sync rust this month. One thing I’ve found is when rustc makes a particular approach hard to implement, it usually does so for a good reason (i.e. there is a better way to achieve a similar result).

If you’re learning the language, I would suggest starting out with some more vanilla sync code, loops and if statements, get used to the borrowing. Async is clearly still under heavy development, and not just from an implementation level, but also from the level of our philosophical paradigm about what async means and how it ought to work for the user. It’s entirely possible for humanity to have the wrong approach to this issue and maybe someone in this discussion will be able to answer it more effectively.

The compiler really depends on traits, and the ability for traits to handle async is not stable. Many highly intelligent people are hard at work thinking about how to make async rust more correct, readable, and accessible. For example, look here: https://blog.rust-lang.org/inside-rust/2022/11/17/async-fn-i...

I would argue, if the async functionality of traits is not stable in rust, then it is silly for us to attack rust for not having nice async code, because we’re effectively criticizing an early rough draft of what will eventually be a correct and performant and accessible book.


I am curious what a "good async design" looks like, if we were going "all the way" with async and trying to design a highly scalable and maintainable and understandable server. The X11/Wayland post yesterday was interesting in how it described async drawing APIs in X11.

What does a good async API look like?

Also how do you prevent it spreading throughout a codebase?

I am trying to design a scalable architecture pattern for multithreaded and async servers. My design is that you have IO threads have asynchronous events into two halves "submit" and "handle". For example, system events from liburing or epoll are routed to other components. Those IO thread event loops run and block on epoll.poll/io_uring_wait_cqe.

For example, if you create a "tcp-connection" you can subscribe to async events that are "ready-for-writing" and "ready-for-reading". Ready-for-writing would take data out of a buffer (that was written to with a regular mutex) for the IO thread to send when EPOLLOUT/io_uring_prep_writev.

We can use the LMAX Disruptor pattern - multiproducer multiconsumer ringbuffers to communicate events between threads. Your application or thread pool threads have their own event loops and they service these ringbuffers.

I am working on a syntax to describe async event firing sequences. It looks like a bash pipeline, I call it statelines:

   initialstate1 initialstate2 = state1 | {state1a state1b state1c} {state2a state2b state2d} | state3
It first waits for "initialstate1" and "initialstate2" in any order, then it waits for "state1", then it waits for the states "state1a state1b state1c" and "state2a state2b state2d" in any order.


If it's not stable, then people shouldn't use it in production either.


People who do not know what they are doing should do research before using it in production.

Edit: Of course, since this is what "unstable" means, right?


People who know what they are doing will understand the peril of using a moving-target interface in a production application. Depending on the project that may or may not be a problem.


It's funny how some commenters assume the author is a novice in Rust, what if his experience exceed their...


> Used pervasively, Arc gives you the world’s worst garbage collector. Like a GC, the lifetime of objects and the resources they represent (memory, files, sockets) is unknowable. But you take this loss without the wins you’d get from an actual GC!

The lifetime of an Arc isn’t unknowable, it’s determined by where and how you hold it.

I think maybe the disconnect in this article is that the author is coming at Rust and trying to force their previous mental models on to it (such as garbage collection) rather than learning how to work with the language. It’s a common trap for anyone trying a new programming language, but Rust seems to trip people up more than most.


> The lifetime of an Arc isn’t unknowable, it’s determined by where and how you hold it.

In the same sense that the lifetime of an object in a GC'd system has a lower bound of, "as long as it's referenced", sure. But that's nearly the opposite of what the borrow checker tries to do by statically bounding objects, at compile time.

> maybe the disconnect in this article is that the author is coming at Rust and trying to force their previous mental models on to it

The opposite actually! I spent about a decade doing systems programming in C, C++, and Rust before writing a bunch of Haskell at my current job. The degree to which a big language runtime and GC weren't a boogeyman for some problem spaces was really eye-opening.


> But that's nearly the opposite of what the borrow checker tries to do by statically bounding objects, at compile time.

Arc isn't an end-run around the borrow checker. If you need mutable references to the data inside of Arc, you still need to use something like a Mutex or Atomic types as appropriate.

> The degree to which a big language runtime and GC weren't a boogeyman for some problem spaces was really eye-opening.

I have the opposite experience, actually. I was an early adopter of Go and championed Garbage Collection for a long time. Then as our Go platforms scaled, we spent increasing amounts of our time playing games to appease the garbage collector, minimize allocations, and otherwise shape the code to be kind to the garbage collector.

The Go GC situation has improved continuously over the years, but it's still common to see libraries compete to reduce allocations and add complexity like pools specifically to minimize GC burden.

It was great when we were small, but as the GC became a bigger part of our performance narrative it started to feel like a burden to constantly be structuring things in a way to appease the garbage collector. With Rust it's nice to be able to handle things more explicitly and, importantly, without having to explain to newcomers to the codebase why we made a lot of decisions to appease the GC that appear unnecessarily complex at first glance.


There's a good chance this is rather a Go issue than a GC one. People get fooled by Go's pretense to be a high level C replacement. It is highly inadequate at performing this role at best.

The reason for that is the compiler quality, the design tradeoffs and Go's GC implementation throughput are simply not there for it to ever be a good general purpose systems-programming-oriented language.

Go receives undeserved hype, for use cases C# and Java are much better at due to their superior GC implementations and codegen quality (with C# offering better lower level features like structs+generics and first-class C interop).


Java GC has a non trivial overhead. I’ve moved workloads from Java to rust and gotten a 30x improvement from lack of GC. Likewise I’ve gotten 10x improvement in Java by preallocating objects and reusing then to avoid GC. (Fucking google and the cult of immutable objects). Guess what, lots of things that “make it harder to introduce bugs” make your shit run a lot slower too.


This is not an improvement from lack of GC per se but rather from zero cost abstractions (everything is monomorphised, no sin such as type erasure) first and foremost, and yes, deterministic memory management. Java is the worse language if you need to push performance to the limit since it does not offer convenient lower level language constructs to do so (unlike C#), but at reaching 80th percentile of performance, it is by far the best one.

But yes, GC is very much not free and is an explicit tradeoff vs compile time + manual memory management.


As an ops guy for decades, it makes me laugh to hear claims about Java GC superiority. Please go back in time and fix all the crashes and OOMs caused by enterprise JVM, as opposed to near-zero problems with the Go deployments.

Making stong statements without a backup in hard facts is a sign of zealotry...


I assure you if that code was to be ported to Go 1:1, Go GC would simply crawl to a halt. Write code badly enough and no matter how good hardware and software is, it won't be able to cope at some point. Even a good tool will give, if you beat it down hard enough.

For example, you may be interested in this read: https://blog.twitch.tv/en/2019/04/10/go-memory-ballast-how-i...

Issues like these simply don't happen with GCs in modern JVM implementations or .NET (not saying they are perfect or don't have other shortcomings, but the sheer amount of developer hours invested in tuning and optimizing them far outstrips Go).


it makes me laugh to hear claims about Java GC superiority. Please go back in time and fix all the crashes and OOMs caused by enterprise JVM,

I don’t see how running into an OOM problem is necessarily a problem with the GC. That said, Java is a memory intensive language, it’s a trade off that Java is pretty up front about.

I don’t have a horse in this race but I would be quite surprised if Go’s GC implementation could even hold a candle to the ones found in C# and Java. They have spent literally decades of research and development, and god knows how much money (likely north of $1b), optimizing and refining their GC implementations. Go just simply lacks any of the sort of maturity and investment those languages have.


Java has billions spent on marketing and lobbying.

Since the advent of Java in mid-90s I hear about superiority of its VM, yet my observations from the ops PoV claim otherwise. So I suspect a huge hoax...

Hey btw, you're saying "Java is _memory intensive_", like it would magically explain everything. Let's get to that more deeply. Why is it so, dear Watson? Have you compared the memory consumption of the same algo and pretty much similar data structures between languages? Why Java has to be such a memory hog? Why also its class loading is so slow? Are these a qualities of superior VM design and zillions of man-hour invested? huh?

By the way, if the code implementing functionality X needs N times more memory than the other language with gc, then however advanced that gc would be (need to find a proof for that btw), it wouldn't catch up speedwise, because it simply needs to move around more. So simple.


Java has billions spent on marketing and lobbying.

Marketing is not a silver bullet for success and the tech industry is full of examples of exactly that. The truth is that Sun was able to promote Java so heavily because it was found to be useful.

Since the advent of Java in mid-90s I hear about superiority of its VM, yet my observations from the ops PoV claim otherwise.

The landscape of the 90s certainly made a VM language appealing. And compared to the options of that day it's hardly any wonder.

So I suspect a huge hoax...

It's you verses a plurality, if not majority, of the entire enterprise software market. Of course that's not to say that Java doesn't have problems or that the JVM is perfect, but is it so hard to believe that Java got something right? Is it honestly more believable that everyone else is caught up in a collective delusion?

Hey btw, you're saying "Java is _memory intensive_", like it would magically explain everything.

It's not that Java is necessarily memory intensive, but that a lot of Java performance tuning is focused towards optimizing throughput performance, not memory utilization. Cleaning out a large heap occasionally is in general better than cleaning out a smaller one more frequently.

By the way, if the code implementing functionality X needs N times more memory than the other language with gc, then however advanced that gc would be (need to find a proof for that btw), it wouldn't catch up speedwise, because it simply needs to move around more. So simple

It's not so simple. First of all, the choice of a large heap is not mandated by Java, it's a trade off that developers are making. Second of all, GC performance issues only manifest when code is generating a lot of garbage, and believe it or not, Java can be written to vastly minimize the garbage produced. And last of all, Java GCs like Shenandoah have a max GC pause time of less than 1ms for heaps up to 16TiB.

Anyway, at the end of the day no one is going to take Go away from you. Personally I don't have a horse in this race. That said, the fact is that Java GCs are far more configurable, sophisticated, and advanced than anything Go has (and likely ever will). IMO, Go came at a point in time where there was a niche to exploit, but that niche is shrinking.


I would like to answer your points more deeply, not having much time for it now.

But I think you are avoiding a direct answer to the question why Java needs so much memory in the first place. You say about "developer's choice for a big heap", first I don't think it is their choice, but the consequence of the fact that such a big heap is needed at all, for a typical code. Why?

Let's code a basic https endpoint using typical popular framework returning some simple json data. Usually stuff. Why it will be consuming 5x - 10x more memory for Java? And, if one says it's just unrealistic microbenchmark, things go worse when coding more real stuff.

Btw,having more knobs for a gc is not necessarily a good thing, if it means that there are no fire-and-forget good defaults. If an engineer needs continously to get his head around these knobs to have a non-crashing app, then we have problem. Or rather - ops have a problem, and some programmers are, unfortunately, disconnected from the ops realm. Have you been working together with ops guys? On prod, ofc?


Honestly, the biggest stumbling block for rust and async is the notion of memory pinning.

Rust will do a lot of invisible memory relocations under the covers. Which can work great in single threaded contexts. However, once you start talking about threading those invisible memory moves are a hazard. The moment shared memory comes into play everything just gets a whole lot harder with the rust async story.

Contrast that with a language like java or go. It's true that the compiler won't catch you when 2 threads access the same shared memory, but at the same time the mental burden around "Where is this in memory, how do I make sure it deallocates correctly, etc" just evaporates. A whole host of complex types are erased and the language simply cleans up stuff when nothing references it.

To me, it seems like GCs simply make a language better for concurrency. They generally solve a complex problem.


> Rust will do a lot of invisible memory relocations under the covers.

I don't think it's quite accurate to point to "invisible memory relocations" as the problem that pinning solves. In most cases, memory relocations in Rust are very explicit, by moving an owned value when it has no live references (if it has any references, the borrow checker will stop you), or calling mem::replace() or mem::swap(), or something along those lines.

Instead, the primary purpose of pinning is to mark these explicit relocations as unsafe for certain objects (that are referenced elsewhere by raw pointer), so that external users must promise not to relocate certain objects on pain of causing UB with your interface. In C/C++, or indeed in unsafe Rust, the same idea can be more trivially indicated by a comment such as /* Don't mess with this object until such-and-such other code is done using it! */. All pinning does is to enforce this rule at compile time for all safe code.


Memory pinning in Rust is not a problem that has to do with concurrency because the compiler will never relocate memory when something is referencing it. The problem is however with how stackless coroutines in general (even single-threaded ones, like generators) work. They are inherently self-referential structures, and Rust's memory model likes to pretend such structures don't exist, so you need library workarounds like `Pin` to work with them from safe code (and the discussion on whether they are actually sound is still open!)


>(and the discussion on whether they are actually sound is still open!) Do you have a reference for this? Frankly, maybe I shouldn't ask since I still don't even understand why stackless coroutines are necessarily self-referential, but I am quite curious!


See for example https://github.com/rust-lang/rust/issues/63818 and https://github.com/rust-lang/rfcs/pull/3467

Basically the problem is that async blocks/fns/generators need to create a struct that holds all the local variables within them at any suspension/await/yield point. But local variables can contain references to other local variables, so there are parts of this struct that reference other parts of this struct. This creates two problems:

- once you create such self-references you can no longer move this struct. But moving a struct is safe, so you need some unsafe code that "promises" you this won't happen. `Pin` is a witness of such promise.

- in the memory model having an `&mut` reference to this struct means that it is the only way to access it. But this is no longer true for self referential structs, since there are other ways to access its contents, namely the fields corresponding to those local variables that reference other local variables. This is the problem that's still open.


> I still don't even understand why stackless coroutines are necessarily self-referential, but I am quite curious!

Because when stackless coroutines run they don’t have access to the stack that existed when they were created. everything that used to be on the stack needs to get packaged up in a struct (this is what `async fn` does). However now everything that used to point to something else on the stack (which rust understands and is fine with) now points to something else within the “impl Future” struct. Hence you have self referential structs.


Interestingly, the newest Java memory feature (Panama FFI/M) actually can catch you if threads race on a memory allocation. They have done a lot of rather complex and little appreciated work to make this work in a very efficient way.

The new api lets you allocate "memory segments", which are byte arrays/C style structs. Such segments can be passed to native code easily or just used directly, deallocated with or without GC, bounds errors are blocked, use-after-free bugs are blocked, and segments can also be confined to a thread so races are also blocked (all at runtime though).

Unfortunately it only becomes available as a finalized non-preview API in Java 22, which is the release after the next one. In Java 21 it's available but behind a flag.

https://openjdk.org/jeps/8310626


> In the same sense that the lifetime of an object in a GC'd system has a lower bound of, "as long as it's referenced", sure.

These are not the same.

The problem with GC'd systems is that you don't know when the GC will run and eat up your cpu cycles. It is impossible to determine when the memory will actually be freed in such systems. With ARC, you know exactly when you will release your last reference and that's when the resource is freed up.

In terms of performance, ARC offers massive benefits because the memory that's being dereferenced is already in the cache. It's hard to understate how big of a deal this is. There's a reason people like ARC and stay away from GC when performance actually begins to matter. :)


> With ARC, you know exactly when you will release your last reference and that's when the resource is freed up.

It's more like "you notice when it happens". You don't know in advance when the last reference will be released (if you did, there would be no point in using reference counting).

> In terms of performance, ARC offers massive benefits because the memory that's being dereferenced is already in the cache.

It all depends on your access patterns. When ARC adjusts the reference counter, the object is invalidated in all other threads' caches. If this happens with high frequency, the cache misses absolutely demolish performance. GC simply does not have this problem.

> There's a reason people like ARC and stay away from GC when performance actually begins to matter.

If you're using a language without GC built in, you usually don't have a choice. When performance really begins to matter, people reach for things like hazard pointers.


> It's more like "you notice when it happens". You don't know in advance when the last reference will be released

A barista knows when a customer will pay for coffee (after they have placed their order). A barista does not know when that customer will walk in through the door.

> (if you did, there would be no point in using reference counting).

There’s a difference between being able to deduce when the last reference is dropped (for example, by profiling code) and not being able to tell anything about when something will happen.

A particular developer may not know when the last reference to an object is dropped, but they can find out. Nobody can guess when GC will come and take your cycles away.

> The cache misses absolutely demolish performance

With safe Rust, you shouldn’t be able to access memory that has been freed up. So cache misses on memory that has been released is not a problem in a language that prevents use-after-free bugs :)

> If you’re using a language without GC built in, you usually don’t have a choice.

I’m pretty sure the choice of using Rust was made precisely because GC isn’t a thing (in all places that love and use rust that is)


> A barista knows when a customer will pay for coffee (after they have placed their order). A barista does not know when that customer will walk in through the door.

Sorry, no chance of deciphering that.

> There’s a difference between being able to deduce when the last reference is dropped (for example, by profiling code) and not being able to tell anything about when something will happen.

> A particular developer may not know when the last reference to an object is dropped, but they can find out.

The developer can figure out when the last reference to the object is dropped in that particular execution of the program, but not in the general sense, not anymore than they can in a GC'd language.

The only instance where they can point to a place in the code and with certainty say "the reference counted object that was created over there is always destroyed at this line" is in cases where reference counting was not needed in the first place.

> With safe Rust, you shouldn’t be able to access memory that has been freed up. So cache misses on memory that has been released is not a problem in a language that prevents use-after-free bugs :)

I'm not sure why you're talking about freed memory.

Say that thread A is looking at a reference-counted object. Thread B looks at the same object, and modifies the object's reference counter as part of doing this (to ensure that the object stays alive). By doing so, thread B has invalidated thread A's cache. Thread A has to spend time reloading its cache line the next time it accesses the object.

This is a performance issue that's inherent to reference counting.

> I’m pretty sure the choice of using Rust was made precisely because GC isn’t a thing (in all places that love and use rust that is)

Wanting to avoid "GC everywhere", yes. But Rust/C++ programs can have parts that would be better served by (tracing) garbage collection, but where they have to make do with reference counting, because garbage collection is not available.


GC generally optimises for throughput over latency. But there is also another cost: high-throughput GC usually uses more memory (sometimes 2-3x as much!). Arc keeps your memory usage low and can keep your latency more consistent, but it will often sacrifice throughput compared to a GC tuned for it. (Of course, stack allocation, where possible, beats them all, which is why rust and C++ tend to win out over java in throughput even if the GC has an advantage over reference counting, because java has to GC a lot more than other languages due to no explicit stack allocation)


> In terms of performance, ARC offers massive benefits

but it also has big disadvantage, that it communicates to actual malloc for memory management, which is usually much less performant than GC from various reasons.


> which is usually much less performant than GC from various reasons.

Can you elaborate?

I've seen a couple of malloc implementations, and in all of them, free() is a cheap operation. It usually involves setting a bit somewhere and potentially merging with an adjacent free block if available/appropriate.

malloc() is the expensive call, but I don't see how a GC system can get around the same costs for similar reasons.

What am I missing?


- Like others have said, both malloc()/free() touch a lot of global state, so you either have contention between threads, or do as jemalloc does and keep thread-local pools that you occasionally reconcile.

- A moving (and ideally, generational) GC means that you can recompact the heap, making malloc() little more than a pointer bump.

- This also suggests subsequent allocations will have good locality, helping cache performance.

Manual memory management isn't magically pause-free, you just get to express some opinion about where you take the pauses. And I'll contend that (A) most programmers aren't especially good at choosing when that should be, and (B) lots (most?) software cares about overall throughput, so long as max latency stays under some sane bound.


> Can you elaborate?

I've seen some benchmarks, but can't find them now, so maybe I am wrong about this.

> free() is a cheap operation. It usually involves setting a bit somewhere and potentially merging with an adjacent free block if available/appropriate.

there is some tree like structure somewhere, which then would allow to locate this block for "malloc()", this structure has to be modified in parallel by many concurrent threads, which likely will need some locks, meaning program operates outside of CPU cache.

In JVM for example, GC is integrated into thread models, so they can have heap per thread, and also "free()" happens asynchronously, so doesn't block calling code. Additionally, malloc approaches usually suffer from memory fragmentation, while JVM GC is doing compactions all the time in background, tracks memory blocks generations, and many other optimizations.


Sub (ignore pls)


This feels like a large portion of the criticism. I thought this article was going to be more about how the async transformation gets in the way of a lot of transformations that the compiler could make in non-async code.

The point about wrangling with Weak suggests that they're trying to build complex ownership structures (which, to be fair, would be easier in to deal with a single thread) which isn't really something easy to express in Rust in general. I use weak smart pointers exceedingly rarely. Outside of the first section (which isn't talking about async Rust specifically, it's just speaking about concurrency generally) channels aren't even mentioned. They're the main thing I use for communication between different parts of my program when writing async code and when interfacing between async and non-async code, plus the other signalling abstractions like Notify, semaphores, etc. Mutexes are slow and bottlenecky and shared state quickly gets complicated to manage, this has been known for ages. I think the problem might be more the `BIG_GLOBAL_STATIC_REF_OR_SIMILAR_HORROR` in the first place.

The comment about nothing stopping you from calling blocking code in an async context is valid, but it's relatively manageable and you can use `tokio::spawn_blocking` or similar when you must do it.


Reference counting is a type of GC [0]. Just not a very good one in many cases.

I think it's a fair assumption to say that the author is aware of what Arcs are and how they work. I believe their point is more so that because of how async works in Rust, users have to reach for Arc over normal RAII far more often than in sync code. So at a certain point, if you have a program where 90% of objects are refcounted, you might as well use a tracing GC and not have the overhead of many small heap allocations/frees plus atomic ops.

Perhaps there are in fact ways around Arc-ing things for the author's use cases. But in my (limited) experience with Rust async I've definitely run into things like this, and plenty of example code out there seems to do the same thing [1].

For what it's worth, I've definitely wondered whether a real tracing GC (e.g. [2]) could meaningfully speed up many common async applications like HTTP servers. I'd assume that other async use cases like embedded state machines would likely have pretty different performance characteristics, though.

[0] https://en.wikipedia.org/wiki/Garbage_collection_(computer_s...

[1] https://tokio.rs/tokio/tutorial/shared-state

[2] https://manishearth.github.io/blog/2015/09/01/designing-a-gc...


> I think it's a fair assumption to say that the author is aware of what Arcs are and how they work.

Fair, but when reading an article like this I have to refer to what's written, not what we think the author knew but didn't write.


> Reference counting is a type of GC [0]. Just not a very good one in many cases.

…on a server where you can have a ton of RAM. It's superior on client machines because it's friendlier to swapped out memory, which is why Swift doesn't have a GC.


> The lifetime of an Arc isn’t unknowable, it’s determined by where and how you hold it.

Obviously it's not random. It's statically unknowable.


Arc in Rust can be moved or borrowed, and used without touching the reference count.

In many cases this means it's much cheaper than objects in languages with implicit reference counting.


It's exactly that people trying to force other mental models onto rust and then complaining that it does things differently.


I love Rust, but async is a hot mess and you cannot just write async code the same way that you write sync code. I'm getting more convinced that mixing the two is a bad idea, and that Go's approach of making everything sync with a single async channel primitive might be right.

I'm currently plumbing through some logic to call a sync method on a struct that implements Future and it's... an interesting challenge.

While we can make zero-cost async abstractions somewhat easy for users, the library developers are the ones who suffer the pain.


I disagree with you in the last point, async is definitely painful for end users. It indeed feels like you're using a completely different language, which has Rust's core features removed – lifetimes and explicit types, sprinkled with a mess of Pins on top.

You cannot run scoped fibers, forcing you to "Arc shit up", Pins are unusable without unsafe, and a tiniest change in an async-function could make the future !Send across the entire codebase.


It may no longer be necessary for pins to exist for async implementation: https://doc.rust-lang.org/std/rc/struct.Rc.html#method.new_c... (but the current async interface requires using them, so my point is definitely a whatifism).


Replacing Pin with Rc is what they refer to as "Arc shit up". Pin avoids the need for a heap allocation like Rc/Arc entirely.


Library developers can afford to deal with complexity much more than users of libraries. Offloading such work on highly skilled people developing the basic infrastructure is surely the right approach.


I totally agree that library developers are the ones who _can_ handle complexity, but I have found even some of the top Rust devs are making async mistakes -- either the APIs are not correct from an async perspective, or there are basic bugs like losing wakers. The latter is so common it's not funny.


many of these libraries then are debugged and troubleshooted by users..


I've seen one wasm VM for Rust that offered what looked like transparent M:N, which should solve (in that case) most async difficulties. We'll see how that evolves.


Which one? I wonder what its performance is like.

A good candidate for this is Graal. It can compile (JIT/AOT) both WASM and also LLVM bitcode directly so Rust programs can have full hardware/OS access without WASM limitations, and in theory it could allow apps to fully benefit from the work done on Loom and async. The pieces are all there. The main issue is you need to virtualize IO so that it goes back into the JVM, so the JVM controls all the code on the stack at all times. I think Graal can do this but only in the enterprise edition. Then you'd be able to run ~millions of Rust threads.


Curious too. I follow Lunatic [0] as a candidate for future use, and also wasmCloud [1].

[0] https://lunatic.solutions/

[1] https://wasmcloud.com


Async Everything is a bad language.

Async/await was a terrible idea for fixing JavaScript's lack of proper blocked threading that is currently being bolted onto every language. It splits every language and every library-ecosystem in half and will cause pains for many years to come.

Everyone who worked with multi-threading outside of JavaScript knows that using actors/communicating sequential processes is the best way to do multi-threading.

I recently found an explanation for that in Joe Armstrong's thesis. He argues that the only way to understand multi-threaded programs is writing strictly sequential code for every thread and not muddling all the code for all the threads in one place:

"The structure of the program should exactly follow the structure of the problem. Each real world concurrent activity should be mapped onto exactly one concurrent process in our programming language. If there is a 1:1 mapping of the problem onto the program we say that the program is isomorphic to the problem.

It is extremely important that the mapping is exactly 1:1. The reason for this is that it minimizes the conceptual gap between the problem and the solution. If this mapping is not 1:1 the program will quickly degenerate, and become difficult to understand. This degeneration is often observed when non-CO languages ["non concurrency-oriented", looking at you JavaScript!] are used to solve concurrent problems. Often the only way to get the program to work is to force several independent activities to be controlled by the same language thread or process. This leads to a inevitable loss of clarity, and makes the programs subject to complex and irreproducible interference errors." [0]

[0] https://erlang.org/download/armstrong_thesis_2003.pdf

There is also a good rant against async/await by Ron Pressler who implemented project loom in java: https://www.youtube.com/watch?v=oNnITaBseYQ


> Async/await was a terrible idea for fixing JavaScript's lack of proper blocked threading that is currently being bolted onto every language.

As fun as it is to hate on JavaScript, it's really interesting to go back and watch Ryan Dahl's talk introducing Node.js to the world (https://www.youtube.com/watch?v=EeYvFl7li9E). He's pretty ambivalent about it being JavaScript. His main goal was to find an abstraction around the epoll() I/O event loop that didn't make him want to tear his eyes out, and he tried a bunch of other stuff first.


If you look around at the other competition at the time, it's worth noting that many other languages that already existed for decades ultimately came up with the same basic solution. In fact one of the weird things about the Node propaganda at the time was precisely that every other major scripting language tended to have not just one event-based library, but choices of event based libraries. Perl even had a metapackage abstracting several of them. It was actually a bog-standard choice, not some sort of incredible innovation.

I don't think it's a "good" solution in the abstract, but in the concrete of "I have a dynamically-typed scripting language with already over a decade of development and many more years of development that will happen before the event-based stuff is really standard", it's nearly the only choice. Python's gevent was the only other thing I saw that kinda solved the problem, and I really liked it, but I'm not sure it's a sustainable model in the end as it involves writing a package that aggressively reaches into other packages to do its magic; it is a constant game of catch-up.

I do think it's a grave error in the 2020s to adopt async as the only model for a language, though. There are better choices. And I actually exclude Rust here, because async is not mandatory and not the only model; I think in some sense the community is making the error of not realizing that your task will never have more than maybe a hundred threads in it and a 2023 computer will chomp on that without you noticing. Don't scale for millions of concurrent tasks when you're only looking at a couple dozen max, no matter what language or environment you're in. Very common problem for programmers this decade. It may well be the most impactful premature optimization in programming I see today.


> Don't scale for millions of concurrent tasks when you're only looking at a couple dozen max, no matter what language or environment you're in. Very common problem for programmers this decade.

And also with fibers/virtual threads (project loom) you can actually have a million threads using blocking hand-off on one machine. So the performance argument is kind of gone.


He was strongly advocating callbacks with node.js, which is understandable given the time. But a few years later when he wrote some Go code, he said that's a better model for networked servers (sorry no reference right now, it was in a video I watched, not sure which one)

JS callbacks are indeed better than C callbacks because you can hold onto some state. Although I guess the capture is implicit rather than explicit, so some people might say it's more confusing.

I'm pretty sure Joyent adopted and funded node.js because they were doing lots of async code in C, and they liked the callback style in JavaScript better. It does match the kind of problems that Go is now being used for, and this was pre-Go.

But anyway it is interesting how nobody really talks about callbacks anymore. Seems like async/await has taken over in most languages, although I sorta agree with the parent that it could have been better if designed from scratch.


> As fun as it is to hate on JavaScript, it's really interesting to go back and watch Ryan Dahl's talk introducing Node.js to the world

Agreed. JavaScript was actually my first language after TurboPascal in 1996.

I was also there listening to the first podcasts when node came out.

JavaScript is a very interesting language, especially with it's prototype memory model. And the eventloop apart from the language is interesting as well. And it's no coincidence Apple went as far as baking optimizations for JavaScript primitive operations into the M1 microcode.

But I still think multithreading is best done by using blocking operations.

NIO can be implemented on top of blocking IO as far as I know but not the other way round.

Also, sidenote, I think JavaScript's only real failure is the lack of a canonical module/import system. That error lead to countless re-implementations of buildsystems and tens of thousands of hours wasted debugging.


> Also, sidenote, I think JavaScript's only real failure is the lack of a canonical module/import system. That error lead to countless re-implementations of buildsystems and tens of thousands of hours wasted debugging.

Agreed. I don't hate on JS, in fact I think it's the best tool for the several very common use cases it targets, and I'll even defend the way objects work in it (i.e. lets me do what I want with minimal fuss). The import/require drama was annoying, though.


It’s so interesting that the async/await stuff basically makes his presentation points meaningless? if you’re just using the async/await why use the callback style in the first place…

but I get it, you can always go back to the promises and callbacks if you want.


async/await actually originated in C#, not Javascript. C#'s author, Anders Hejlsberg also authored Typescript. Typescript's additional features like classes, arrow functions and async/await eventually crept into ES6+.

I actually think it was a great solution in JS/TS given it's a single threaded event loop. The lower level the language the worse of an abstraction it is though. So I think most of the complaints here about async Rust are valid.


While the article elucidates well on the intricacies and challenges of async Rust, I feel it's crucial to note that one of Rust's core philosophies is ensuring memory safety without sacrificing performance.

The async patterns in Rust, especially with regards to data safety assurances for the compiler, are emblematic of this philosophy. Though there are complexities, the value proposition is a safer concurrency model that requires developers to think deeply about their data and execution flow. I do concur that Rust might not be the go-to for every massively concurrent userspace application, but for systems where robustness and safety are paramount, the trade-offs are justifiable. It's also worth noting that as the ecosystem evolves, we'll likely see more abstractions and libraries that ease these pain points.

Still, diving into the intricacies as this article does, gives developers a better foundational understanding, which in itself is invaluable.


I've been writing a lot of async lock free rust. The main problem is that tokio futures are 'static, which is because of a design mistake that's baked deep into the rust ecosystem: Leaking memory is 'safe'.

This implies that you can't statically guarantee that a future is cleaned up properly, which means that if you spawn some async work, something may std::mem::forget a future, and then the borrow checker won't know that the references that were transitively handed out by the future are still live.

Rather than sprinkle Arc everywhere, I just use an unsafe crate like this:

https://docs.rs/async-scoped/latest/async_scoped/

This catches 99% of the bugs I would have written in C++, so it's a reasonable compromise. There's been some work to try to implement non-'static futures in a safe way. I'm hoping it succeeds.

The other big problem with rust (but this is on the roadmap to be fixed this year) is that async trait's currently require Box'ed futures, which adds a malloc/free to function call boundaries(!!!)

As for the "just use a channel" advice: I've dealt with large codebases that are structured this way. It explodes your control flow all over the place. I think of channels as the modern equivalent of GOTO. (I do use them, but not often, and certainly not in cases where I just need to run a few things in parallel and then wait for completion.)


> The main problem is that tokio futures are 'static

An important distinction to make is that tokio Futures aren't 'static, you can instead only spawn (take advantage of the runtime's concurrency) 'static Futures.

> This implies that you can't statically guarantee that a future is cleaned up properly.

Futures need to be Pin'd to be poll()'d. Any `T: !Unpin` that's pinned must eventually call Drop on it [0]. A type is `!Unpin` if it transitively contains a `PhantomPinned`. Futures generated by the compiler's `async` feature are such, and you can stick this in your own manually defined Futures. This lets you assume `mem::forget` shenanigans are UB once poll()'d and is what allows for intrusive/self-referential Future libraries [1]. The future can still be leaked from being kept alive by an Arc/Rc, but as a library developer I don't think you can/would-care-to reasonably distinguish that from normal use.

[0]: https://doc.rust-lang.org/std/pin/#drop-guarantee

[1]: https://docs.rs/futures-intrusive/latest/futures_intrusive/


> which is because of a design mistake that's baked deep into the rust ecosystem: Leaking memory is 'safe'.

Would you prefer not to have internal mutability, not to have `Rc`, or have them but with infectious unsafe trait bounds, or something else?


I don’t see why it would be an either/or. For instance, Rc does not leak memory, and neither do many cases of interior mutability.

If an API leaks memory, then I’d like it to be deemed unsafe. That way, leaking a future would be unsafe, so the borrow checker could infer (transitively) that freeing the future means that any references it had are now dead (as it can already infer when a synchronous function call pops returns).

Am I missing something subtle?

Edit: Rc with cycles would be a problem. I rarely intentionally use Rc though (certainly less often than I create a future).

Edit 2: maybe an auto trait could statically disallow Rc cycles somehow?


> I don’t see why it would be an either/or. For instance, Rc does not leak memory, and neither do many cases of interior mutability.

`Rc` and internal mutability together do allow creating cycles and thus leaking with only safe code. I suggest you to read https://cglab.ca/~abeinges/blah/everyone-poops/ if you haven't done already, it explains the historical reasons for why `std::mem::forget` was changed to be safe.

> Edit 2: maybe an auto trait could statically disallow Rc cycles somehow?

It could be done with an auto and implied trait that is implemented in the opposite case (the type can be safely leaked), but then all the ecosystem must be changed to use `?Leakable` and it would be a pain.


What about the other way? There could be a trait that means “references have the same escape semantics as a stack frame”, perhaps called NotLeaky, and then a variant of tokio spawn could require NotLeaky, and return a non-‘static future. NotLeaky could be inferred in the same way as Send and Sync.

As a bonus, high-availability systems could require NotLeaky at the top of their event loop, precluding runtime memory leaks.

Edit: that wouldn’t work, since the future could be leaked by the caller… will read that reference.


This won't work because there's nothing forcing you to pass `NotLeak` types to APIs that respect them. In other words, implementing a trait can only add capabilities to a type, not restrict it.


We do not want red and blue functions. Any language that implements async / await as coroutines instead of green threads is making a fundamental CS mistake. https://journal.stuffwithstuff.com/2015/02/01/what-color-is-...

Concurrency's correct primitive is Hoare's Communicating Sequential Processes mapped onto green threads. Some languages that have it right are Java (since JDK17 - Java Virtual Threads), Go, Kotlin.


Always with the blooming red and blue functions. You can say exactly the same thing about const.

The fact that a function can perform asynchronous operations matters to me and I want it reflected in the type system. I want to design my system on such a way that the asynchronous parts are kept where they belong, and I want the type system's help in doing that. "May perform asynchronous operations" is a property a calling function inherits from its callee and it is correctly modelled as such. I don't want to call functions that I don't know this about.

Now you can make an argument that you don't want to design your code this way and that's great if you have another way to think about it all that leads to code that can be maintained and reasoned about equally well (or more so). But calling the classes of functions red and blue and pretending the distinction has no more meaning than that is not such an argument. It's empty nonsense.

"We" don't all agree on this.


> The fact that a function can perform asynchronous operations matters to me and I want it reflected in the type system.

async doesn't tell you whether the function performs asynchronous operations, despite the name. async is an implementation detail about how the function must be invoked.

As TFA correctly points out, there's nothing stopping you from calling a blocking function inside a future, and blocking the whole runtime thread.


I didn't say it tells me whether the function does perform such operations, I said it tells me it can. More importantly it tells me which functions (most) can't.


It would be swell if functions could be generic over this capability at compile time, so that you could get the same guarantees from the type system without implementing the same protocols more than one time.


Haskell supports this, but right from the start Rust was always wary of trying to add higher kinded types, which are necessary to support this.


As a Zig programmer I also get to enjoy this, but from the angle of language implementors not caring about type theory


Keyword generics!


Maybe a better example is returning errors, than const.

Either way, all of these changes are really annoying to make. We want less of these annoyances, not more.


We do not want functions that take floating point arguments, only u32 should be used. And don't get me started on more than one argument!


you can convert a float to a u32.

you cannot convert an function that calls async code into a sync function.


An async function is some syntactic sugar around a sync function that returns a future. You can merrily call one from the other.

You can only convert an int to a float with significant caveats. It's not a general trivial conversion. More complicated types may not be convertible at all or behave in all sorts of exciting ways (including having arbitrary side effects).

The point is that none of that is different to async functions. Of course you have to know what to do with them for them to be useful, but there is no requirement for them to "infect" calling code.


You can call .Wait on the Task it returns :)


Right, but now you are forced to convert the calling function to async.

u32 / float does not have the problem. It does not "bubble up", unless you want it to.


no, .Wait in C# or block_on in Rust keep the caller sync while evaluating the async callee, preventing the "bubble up".


Virtual threading is fun and all until you find out SimpleDateFormat and a bunch of other classes built tight into your standard library aren't thread safe and now you need to go through your program and find out what else you missed. Go too has these fancy green threads at the cost of manually locking resources and finding out about race conditions when you forget about them.

Futures aren't a fundamental CS mistake, they're a design decision. You may disagree with that decision, but the advantage Rust brings is that you don't need to worry about thread safety once your program actually compiles, at the cost of different code styles.

Neither asynchronous processing design is fundamentally wrong, they both have their strengths and weaknesses.


> Virtual threading is fun and all until you find out SimpleDateFormat and a bunch of other classes built tight into your standard library aren't thread safe and now you need to go through your program and find out what else you missed. Go too has these fancy green threads at the cost of manually locking resources and finding out about race conditions when you forget about them.

Why would that ever be an issue? Instances of those classes shouldn't be shared between virtual threads just the same as when using regular threads.


> until you find out SimpleDateFormat and a bunch of other classes built tight into your standard library aren't thread safe

true, but DateTimeFormatter has been available since Java 8, released almost 10 years ago.

VirtualThreads will be available in Java tomorrow

Also: https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDa...


I want colored functions. I want to know which code is running synchronously and which doesn't, which raises errors and which doesn't. Color is just a description of the function's properties (and effects) and how it's compatible with other colors.

There is also nothing fundamentally bad with cooperative scheduling in scope of a single process.


I got into programming in the 1990s. At that point in time, there was still a large contingent of programmers loudly insisting they needed assembly language to do everything. And to be clear, I mean, everything. Not "Yeah, I can't really bring up an OS without a bit of specialized assembly" but "every programmer should write every program in assembly".

The vast majority of them were already wrong. They only got more wrong.

You may just be used to knowing what code is "synchronous" and what isn't because it's been shoved into your face and you've adapted your thought process to it. In practice, "everything important is doing something 'asynchronously'" turns out to be the vast majority of what you need, and the vast majority of your mental energy you are dedicated to splitting the world in two is a waste. For the little bit that remains, by all means use something specialized, but it's just not something that everyone, everywhere, needs to be doing all the time, any more than everyone everywhere should be manually allocating registers, or any more than programs need to have line numbers because otherwise how can they work? (One of my favorites because I remember having that conception myself.)


I think you have your analogy backwards: the "assembly programmer" in this situation is the person who doesn't understand why one would "color" functions and/or express a fundamental property as part of their types. "Why do we need to express this in their type? Every programmer should be able to understand this without help".


To me this kind of sounds like circular reasoning. Without function coloring there's no distinction that you need to know of.

Can you elaborate?


> Without function coloring there's no distinction that you need to know of

Why do you think you don't need to know of it? I want to know if the function I'm calling is going to make a network request. Just because I can have a programming language that hides that distinction from me doesn't mean I want that.

Ideally I want to have the fundamental behavior of any function I call encoded in the function signature. So if it's async, I know it's going to reach out to some external system for some period of time.


> I want to know if the function I'm calling is going to make a network request.

That has nothing to do with function coloring.

> Ideally I want to have the fundamental behavior of any function I call encoded in the function signature.

There is no distinction of async functions if you don't have function coloring that you can encode in type signatures.


> That has nothing to do with function coloring.

Sure, in the same way that types have nothing to do with enforcing logical correctness of software.

> There is no distinction of async functions if you don't have function coloring that you can encode in type signatures.

What are you trying to say with this statement?


getaddrinfo() is a synchronous function that can do network requests to resolve DNS. The network property isn't reflected in its function signature becoming async. You can have an async_getaddrinfo() which does, but the former is just a practical example of network calls in particular being unrelated to function coloring.


It's nonsense. Async in rust is just syntactic sugar around a function signature. You can merrily call async functions from sync rust, you just have to know what to do with the future you get back.


"you just have to know what to do" is the problem. You can call any color from any color, but for some colors it's trivial, e.g. sync function from a sync or async one, or a non-failing function from a failing or non-failing one.

I don't want to be able to call fallible function from an infallible one trivially, I want the compiler to force me to specify what exactly I'm going to do with an error if it happens. Likewise for async-from-sync: there are many ways I could call these: I can either create a single threaded executor and use it to complete the future to completion, or maybe I want to create a multithreaded executor, or maybe I expect the future complete in a single poll and never suspend and I don't even need a scheduler.


Well yes to all that. I still don't see the problem. An async function isn't really an async function, it's a sync function that returns a future. Would it be better if all that was manual? I've done quite a bit of stuff using manual async traits and it's painful and I highly value the syntax sugar that async brings. That said, I certainly don't want some executor running quietly behind the scenes doing async stuff for me without my explicit and full control. If I want to manually poll a future, that's for me to decide.


You seem to raise valid points and I don't disagree with you, however I don't see how it's relevant to the original concern regarding colored functions.


I suppose I'm struggling to understand what "colour" means in the context of Rust. It's surely just another word for signature. For some reason it's trotted out every time there's a discussion about async. I can only assume it's to do with the original use of the term for JavaScript async (which I know almost nothing about and have no opinion on), but I just cannot see its point in Rust async.


It has to do with the fact that most of the code in the project is not async but having to call async functions often propagates all the way to your main function. It's infectious and many people don't like it, myself included, that's why I'm working with Elixir and Golang where async is transparent and 99% automatic, or explicit but non-infectious, respectively.

I do love Rust and found a number of very valid uses for it but its async story leaves a lot to be desired. I don't enjoy writing it though I do enjoy the results.


This classic article mixes two things:

1. inability to read an async result from a sync function, which is a legitimately major architectural limitation.

2. author's opinion how function syntax should look like (fully implicit, hiding how the functions are run).

And from this there is the endless confusion and drama.

The problem 1 is mostly limited to JS. Languages that have threads can "change colour" of their functions at will, so they don't suffer from the dramatic problem described in the article.

But people see languages don't fit the opinion 2, of having magic implicit syntax, and treat it as an equally big deal the dead-end problem 1. But two syntaxes are somewhere between minor inconvenience to actual feature. In systems programming it's very important which type of locks you use, so you really need to know what runs async.


Maybe the issue is that we overload the concept of a function with an entirely different thing, a Future / Promise. Maybe if the syntax would have been entirely different too, it would have been easier to understand. We tend to have different syntax for different things.

I’m hesitant towards not distinguishing different things anymore and let the underlying system “figure it out”. I’m sure this could work as long as you’re on the happy path, but that’s not the only path there is.


I think with stackful coroutines you lose low-overhead interoperability with C. Also, it possible to use stackless coroutines without introducing async/await 'colors'.


I've used all of the models mentioned. Personally I've found async/await to be the most annoying. However, I think there's an important option that's overlooked here. For many of the domains where these approaches tend to fall down hardest, notably network servers with very many connections (I was cited in the C10K paper on this subject BTW), arenas can absorb a lot of the annoying lifetime issues. They can be per-request, per-connection, per-something else, but however you do it is likely to leave you with a much smaller set of objects and lifetimes that can be managed by other means. Arenas for most things plus a few very well-isolated bits of code to handle the remainder worked for me from at least 1992 through 2017 in code bases up to more than a million lines (yes, highly concurrent the whole way) without having to deal with async/await or promises/futures. Maybe a bit of CSP/actor model here and there, but that's it.


This is bound to get some criticism (or some tangent-at-best discussion), but it seems like a pretty fair discussion to me.

What I'm missing at the end of the article is the author's point: I believe they're advocating for the use of raw threads and manual management of concurrency, and doing away with the async paraphernalia. But, at the same time, earlier in the article they give the example of networking-related tasks as something that isn't so easy to deal with using only raw threads.

So, taking into account that await&co. are basically syntactic sugar + an API standard (iirc, I haven't used Rust so much lately), I wonder about what the alternative is. In particular, it seems to me like the alternative you could have would be everyone rolling their own "concurrency API", where each crate (inconsistently) exposes some sort of `await()` function, and you have to manually roll your async runtime every time. This would obviously also not be ideal.


I thought the author's point was relatively clear: Rust might not be a good fit for the kind of tasks that need more concurrency than raw threads can give you. Such programs should be written in some other language instead.

> Maybe Rust isn’t a good tool for massively concurrent, userspace software. We can save it for the 99% of our projects that don’t have to be.


So 99% of projects need raw threads only, according to the author. I doubt that.


It sounds very reasonable to me. I would say 90% of programs don’t need threads or concurrency at all.


Anything that waits on I/O needs concurrency (but not necessarily threads). Web backends, web frontends, deeper backends, desktop GUIs, that's probably 90% of software right there.


Rust is a systems programming language though.


I interpreted the 99% thing as referring to all software. If it's just Rust projects then sure, then again anyone who needs async has probably been avoiding a language that lacked async until recently.


I thought his point was async is not good for apps with lots of work to do, and that green threads are a much better idea. IDK.


The author's point is that Rust is not a good language for software like that example. But very, very little software is like that, and you can always divide it up in large blocks inside of what Rust fits quite well.

Personally, I'm a bit more radical than the author. You won't be able to write software like the example correctly. It should just not be done, ever. Machines can still optimize some sanely organized software into the same thing, maybe, if it happens to be a tractable problem (I'm not sure anybody knows). But people shouldn't touch that thing.


Rust made a critical safety mistake when it chose its async paradigm. It gave the code the option to decide when to yield.

What that means is that when I'm writing async code, I have to audit every library I import to make sure that library is guaranteed to yield after a few microseconds of execution, otherwise my own core loops starve. Importing unknown code when using async rust is not safe for any application that needs to know its own threads won't starve.

A safe async language must guarantee that threads will make progress. Rust should change the scheduler so that it can pre-empt any code after that code has hogged a thread for too long.


> Rust should change the scheduler

Rust doesn't have a scheduler, and having one would be a no-go for any sufficiently low level code (e.g. in microcontrollers).


> A safe async language must guarantee that threads will make progress.

You might be looking for parallelism, not concurrency.


No? You don't need parallelism to guarantee global progress as long as the scheduler has the ability to preempt tasks. Of course coroutines (as opposed to e.g. userspace threads) can't really be preempted, which is the issue here.


> parallelism to guarantee global progress as long as the scheduler has the ability to preempt task

Preempting tasks is a single-core simulation of parallelism. I suspect there is confusion about what parallelism and concurrency are here: the terms are often used interchangeably (especially saying "concurrency" instead of "parallelism"), but they are definitely not interchangeable - or even, arguably, related at all. Concurrency, by definition, is concerned with continuations. If you remove continuations (async/await and/or futures/promises - depending on the language choices) then you aren't talking about concurrency any more.

Either way, you can use parallelism is Rust today - just use blocking APIs, locking, and threads. I don't get what the big deal is. You can even use concurrency and parallelism together, just use await/async across multiple threads.

I agree with the premise of the article, but the reasoning in this comment chain is something along the lines of "cats are horrible, because 5." The criticism is foreign to the entire subject matter.


Preemption simulates localized concurrency (running multiple distinct things logically at the same time) not parallelism (running them physically at the same time). You can have concurrency outside continuations. OS threads for example are not continuations, but still express concurrency to the kernel so that it can (not guaranteed) express concurrency to the physical CPU cores which hopefully execute the concurrent code in parallel (not guaranteed again, due to hyperthreading).


Naive question: Why not use OS concurrency then?


Promise/Future style of async is just a bad idea regardless of language.

It was used because of ineptitude of languages where it become popular, and its far easier to implement into GC-less languages than message-passing-based asynchronous, but it's just misery to write code in. I'd prefer to suffer Go ineptitudes just to use bastardised message passing called channels there rather than any of the Python/JS/Rust async.


Yes. That atomized model of concurrency where your state goes everywhere and you somehow collect it back at some point was always (literally) the textbook example of how not to do it.

It was created to be an improvement over the Javascript situation, and somehow every language that had a sane structure adopted it as if it was not only good, but the way to do things. This is insane.


And yet, people are going to use async in Rust. The feature has already proven itself useful long ago in other languages, beyond the timespan a fad could survive. Everyone started out doing it the other way and got sick of it.


> beyond the timespan a fad could survive

On the voodoo ridden land that is software development, we have plenty of clearly harmful fads that are much older than Rust and yet practiced everywhere.

Up to now, rust async has lasted for less time than the NoSQL craziness. I'm hard pressed to think of any large fad that lasted less than it.


Async is much older in other languages. It's new in Rust, and time will tell, but I don't see it playing out differently this time.

Btw, the turnaround time is longer with a database, which often forms the foundation of a system. NoSQL bandwagoning was so destructive in part because of how long it looked like a good idea each time. Same with ORMs.


"Many people use it so it is good" is an idiotic argument


Yes it is. "Many people have been using it for a long time without regrets" is a better reason.


> It was created to be an improvement over the Javascript situation

I see this repeated everywhere in this thread. async/await originated in C# not JS.


The C# implementation is clearly an attempt of putting type-safety over exactly the same implementation JS promises use. Done because MS wanted to port the same behavior.


I honestly can't tell if you're trolling.

- the C# implementation predates even Promises in JS, so it is not "the same implementation" and your implication that C# was inspired by JS as opposed to the other way around is false. More background: [0]

- Typescript works fine with the JS implementation so any differences aren't for type safety reasons, but largely because C# has a multithreaded event loop unlike JS

Also promises (or "futures" as they're called elsewhere) aren't unique to any language. They're used in lots of places that predate both C# and JS's use, for example the twisted framework in Python.

[0]: https://news.ycombinator.com/item?id=37438486


You're implying people went from writing C# to JS code willingly and not just wrote it but went on "improving" the ecosystem and I just don't believe there are people insane enough to do it willingly so "it was invented separately in both instances" is far more likely.


I can't tell if you're talking about the language designers or users.

For language designers I point out in my other comment that Anders authored both C# and TS. TS' influence on ES6 is documented publicly.. heck TC39 has an open proposal to add type annotations to JS now!

As for users, any dev that touches frontend has to write JS, unless you're purely a mobile or desktop shop (even then, there's electron). So yes, I think tons of folks willingly write C# and JS/TS. I'm certainly one (though write more Python than both these days). Was I an early adopter of async in Python because of my familiarity of it in C#/JS? You bet I was. Maybe I'm "insane."


> concurrency where your state goes everywhere and you somehow collect it back at some point was always (literally) the textbook example of how not to do it.

can you tell why it is not how not to do it in your opinion? What are the obvious issues with this approach?


> Promise/Future style of async is just a bad idea

JVM's futures are a joy to work with compared to JS's promises (or Kotlin's coroutines for that matter). While similar, I don't think you can conflate them.


You're conflating the idea of using channels with green threads. They are different, you can easily use channels with async/await and global state/mutexes with green threads.


I somewhat agree with the author, sometimes with async rust I need to figure out how to tell the compiler that yes I want to recursively call this async function. This can be a huge pain, especially because it’s not always clear what went wrong.

Other times however rust stops me from writing buggy code and where I didn’t quite understand what I was doing. In some sense it can help you understand what your software better (when the problem isn’t an implementation detail).

I get the authors frustration, I often have the same feelings. Sometimes you just want to tell rust to get out of your way.

As an aside, I think there is room for a language similar to golang with sum types and modules and be a joy.


What do you mean by modules?


I wish people would stop saying concurrency and parallelism are different.

Concurrency is a subtype of parallelism. All concurrency is parallelism, but leaving some aspects of parallelism off the table.

I've worked in both worlds: I've built codes that manage thousands of connections through the ancient select() call on single processes (classic concurrency- IO multiplexing where most channels are not active simultaneously, and the amount of CPU work per channel is small) to synchronous parallelism on enormous supercomputers using MPI to eke out that last bit from Amdahl's law.

Over time I've come to the conclusion that a thread pool (possibly managed by the language runtime) that uses channels for communication and has optimizations for work stealing (to keep queues balanced) and eliminating context switches. Although it does not reach the optimal throughput of the machine (because shared memory is faster than message passing) it's a straightforward paradigm to work with and the developers of the concurrency/parallel frameworks are wise.


If that's what you need: it's Erlang / Elixir then.


Async Rust also ends up with these super nasty types involving Future that can't even be named half the time and you have to refer to them by existential types, like `impl Future<Output = Foo>`.

But these existential types can only be specified in function return or parameter position, so if you want to name a type for e.g.:

  let x = async { };
You can't! Because you can only refer to it as `impl Future<Output = ()>` but that's not allowed in a variable binding!


You're not wrong, but I don't really see the problem? Even well before async Rust, closures worked the same way with not being able to specify a concrete type, and `impl Trait` syntax didn't even exist for a while. Annotating local variable types is a way to fix certain things that would otherwise be ambiguous; it's a means to an end, only an end itself.


Ah that's true, but I think it ends up hairier when you combine the two together and have closures that are async, e.g.:

  let x = || -> i32 { 1 };  // fine
  let x = || -> impl Future<Output = i32> { async { 1 } };  // error: `impl Trait` only allowed in function and inherent method return types, not in closure return types
Unless I'm missing something, sometimes you do have to name the return type of an async closure if it's returning e.g Result<T, Box<dyn Error>>, and use of the ? operator means that the return type can't be inferred without an explicit annotation.


I think part of the disconnect is that that what you're calling an "async closure" is more directly an analog of a "regular" function that happens to return a future rather than an "async fn" declared function; you'd need similar syntactic boilerplate for annotating a function that's not declared as "async". Currently, there is no closure version of an "async fn" in Rust, but it's arguably not particularly necessary because you can use a Future as the async version of a closure in a lot of cases due to them being lazy already. For example, spawning a task with tokio just takes a Future, not a closure like spawning a sync thread.


You can annotate the return type inside the closure in those cases, for example:

    let x = || async {
        let file = std::fs::read_to_string("foo.txt")?;
        Ok::<_, Box<dyn std::error::Error>>(file)
    };


Now, I think async is bad syntactic sugar that hides what's really going on under the surface. And I rail against it all the time. Especially the way dropping in async contaminates code bases by building tendrils across call-sites all through the application. ... But the tools that have been built around it are very useful and there's some good stuff there.

I have some quibbles with this article:

"Rust comes at this problem with an “async/await” model"

No, it does not. It allows for that, and there's a big ... community ... around the async stuff, but in reality the language is entirely fine with operating using explicit concurrency constructs. And in fact for most applications I think standard synchronous functions combined with communicating channels is cleaner. I work in code bases that do both, and I find the explicit approach easier to reason about.

In the end, Async is something people ideally reach for only after they hit the wall with blocking on I/O. But in reality they're often reaching for it just because -- either because it's cool... or because some framework they are relying on mandates it.

But I think the pendulum will swing back the other way at some point. I don't think it's fair to tar the whole language with it.


> It allows for that

This is like saying C++ allows for templates, and theres a big community around it. Sure, but its the entire community.


Practically speaking: the only libraries I regularly use that demand me to use async stuff are HTTP clients, and those have optional blocking imports. I still need to sprinkle [tokio::main] in front of the entry point, but from that point on, everything is blocking.

I don't think it's "the entire community" at all. Dealing with futures across library calls is a pain and almost every library that can avoid it, will avoid it.

I try to avoid async code because of its annoying pain points and I rarely see any circumstances where spawning a new thread doesn't work. Sure, there's more overhead, and you need some kind of limiting factor to prevent spawning a billion of them, but async isn't really required in most circumstances.

It's like saying Go allows for generics. Very few people and libraries bother with them. Working with them is kind of a pain. They're there jf you want to use them, but you generally don't.


If you're a web facing developer and looking at web services, then I guess you could think this about Rust.

Believe it or not, there's other types of things being built in Rust. Systems work, which I think Rust is more appropriate for.


Almost all Rust libraries just don't deal with concurrency at all.

Maybe async is the most popular concurrency construct there (I have no idea). But the entire population here is small.


I don't get the "Maybe Rust isn’t a good tool for massively concurrent, userspace software" conclusion.

Rust is all about lifetimes and the borrow checker. Async code (a la C#) will introduce overhead to reason about lifetime and it might not be as "fun" as it is with other languages that makes use of GCs and bigger runtimes.

The CSP vs Async/Await discussion is valid, but like in the majority of the cases, the drawbacks and benefits are not language relevant.

In CSP, the concurrent snippets behave just like linear/sequencial code as channels abstracts await a lot of the ugly bits. Sequential code tends to be easier to reason and this might be very important for Rust considering it design.

A good tool for massively concurrent software will as expected depend on the aspects you're evaluating: - Performance: the text does not show benchmarks evaluating Rust as a slow language. - Code/Feature throughput: the overall conclusion from the text if that Async Rust is a complex tool and expose the programmers in many ways to shoot themselves in the foot.

Assuming the "Maybe Rust..." is only talking about Async Rust, the existence of big Async Rust projects is a good counter argument. We also have the whole rest of the Rust language to code massively concurrent, userspace software.

Massively concurrent, userspace software tends to be complex and big to the point that design decisions generally impact way more the language decision.

Rust is a modern language with interesting features to prevent programmers from writing unsafe programs and this is a good head start to many when making those kind of programs, more than whether you want to use Async code or not.


This is a pretty interesting article... and I generally agree with the pain points... but I don't really like the conclusion

* While the author states that not many apps "need" high concurrency in userspace... I would invert that and say that we may be missing so much performance, new potential applications, etc because highly concurrent code is so hard to get right. One bit of evidence of this (to me at least) is how often in my career I have had to scale things up due to memory or other resource limitations and not CPU. And when it is CPU, so often looking into it more finds bugs with concurrency that are the root cause or at least exacerbate the issue

* While I completely agree that rust is not easy with async and have myself poked around at which magical type things I need to do each time I have touched async rust code, I don't really like the suggestion being to "go use a different language", first, because if you are picking up rust, you (IMHO) should have a very good reason to already have chosen it. Rust is not easy enough or ubiquitous enough that you should be choosing it "just for fun" and your reason for using Rust should be compelling enough that you (right now) are willing to put in the effort to learn async when you need it

* What the other mentions in the body of the article, but I think is more of what my suggestion would be: don't use async unless you need it!. While I would love to see Rust (and think it should) evolve to the point where async is "easy", maybe we instead just need to get more pragmatic in what is taught and written about. I think when people start Rust they want to use all the fanciness, which includes async, and while some of that is just programmers, I think it is also how tutorials, docs, and general communication about a programming language happens where we show the breadth of capability, rather than the more realistic learning path, which leads people to feel like if they don't use async, they aren't doing it right

Finally, I do really hope Rust keeps working on the promise of these zero cost abstractions that can really simplify things... but if that doesn't work, I am at least hopeful of what people can build on top of the rust featureset/toolchain to help make things like async more realistic to be the default without the need for a complex VM/runtime.


I wonder if Rust should have gone down the same path as Java’s Project Loom and implemented async I/O using the same memory model that is used with operating system threads.

I suspect that to take advantage of 1024-thread systems the only sane programming model will be structured concurrency with virtual threads instead of coroutines.

It’s the same progression as we saw in the industry going from unstructured imperative assembly programming to structured programming with modular features.

Both traditional mutexes and to a degree async programming are unstructured and global. They infect the whole codebase and can’t be reasoned about in isolation. This just doesn’t scale.


I believe rust started with green threads early on, before ditching them.

To your point, the C# guys seem to be interested in experimenting with green threads: https://twitter.com/davidfowl/status/1532880744732758018


It was looked at and deemed an inferior design. Especially so given that existing async/await paradigm in .NET works really well with existing language features that would make adoption of green-threads-like approach problematic.


Use Erlang/Elixir for orchestration and call into rust implementations.

It's an amazing combination.


Elixir/Rust is the new Python/C++, and Rustler makes the communicating between the 2 languages super easy: https://github.com/rusterlium/rustler


Yup. I think it's because Elixir has powerful LISP-like meta-programming facilities that it allows the seamless communication Rustler and Zigler provide. I haven't seen anything as good as Rustler for Python despite its popularity.


And if Zig feels like it fits the gestalt of Elixir/Erlang better for you... then there's Zigler (https://hexdocs.pm/zigler/Zig.html). The fact that you can just "insert a little zig code right here" in the middle of your Elixir code in a ~Z sigil is the coolest darn thing there is. I haven't seen something that cool in the way of embedding performance enhancing fragments/snippets in another dynamic/expressive system since Klaus Gittinger's Smalltalk/X (https://live.exept.de/doc/online/english/programming/primiti...)


Java performance is really good, and I find Java development much faster than Rust. Also the JVM has all these awesome performance monitoring and profiling tools. Rust is unmatched for very high performance concurrency, but for most things, a garbage collected language is going to do just great.


Most of the rant, apart from the old man yells at function colors, is about lifetimes of arguments of async functions. And it's presenting a special case as some kind of pervasive limitation.

Async functions don't have to always own their arguments. Just the outermost future that is getting spawned on another thread has to. The rest of the async program can borrow arguments as usual. You don't need to spawn() every task — there are other primitives for running multiple futures, with borrowed data, on the same thread.

In fact, this ability for a future to borrow from itself is the reason why Rust has native await instead of using callbacks. Futures can be "self-referential" in Rust, and nothing else is allowed to.


> At this scale, threads won’t cut it—while they’re pretty cheap, fire up a thread per connection and your computer will grind to a halt.

Maybe in the 2000's but I feel this reasoning is no longer valid in 2023 and should be put to rest.

10k problem.. Wouldn't modern computing not work if my Linux box couldn't spin up 10k threads? Htop says I'm currently at 4,000 threads on an 8 core machine.


The context switch for threads remains very expensive. You have 4,000 threads but that's lots of different processes spinning up their own threads. it's still more efficient to have one thread per core for a single computational problem, or at most one per CPU thread (often 2 threads per core now). You can test this by using something like rayon or GNU parallel using more threads than you have cores. It won't go faster, and after a certain point, it goes slower.

The async case is suited to situations where you're blocking for things like network requests. In that case the thread will be doing nothing, so we want to hand off the work to another task of some kind that is active. Green threads mean you can do that without a context switch.


> The context switch for threads remains very expensive

It got even more expensive in recent years after all the speculative execution vulnerabilities in CPUs, so now you have additional logic on every context switch with mitigations on in kernel.


Since that time, context switching changed from a O(log(n)) operation to an O(1) one.

I have no doubt that having a thread per core and managing the data with only non-blocking operations is much faster. But I'm pretty current machines can manage a thousand or so threads locked almost the entire time just fine.


> Since that time, context switching changed from a O(log(n)) operation to an O(1) one.

I'm not sure how that's relevant here, if for example something takes 1ms and I do it 1000 times a second, I'm using 1000 ms of CPU time vs not doing it at all. So if you want to use big o notation in this context it should be O(n) where n is the number of context switches, because you are not comparing algorithms used to switch between threads but you are comparing doing context switch or not doing it at all.


> Maybe in the 2000's but I feel this reasoning is no longer valid in 2023 and should be put to rest.

So do we discard existing ways of making software more efficient because we can be more wasteful on more recent hardware? What if we could develop our software such that 2000s computers are still useful, rather than letting those computers become e-waste?


Yes, I think you're generally right. I'm a big fan of this blog post: https://eli.thegreenplace.net/2018/measuring-context-switchi...

> The numbers reported here paint an interesting picture on the state of Linux multi-threaded performance in 2018. I would say that the limits still exist - running a million threads is probably not going to make sense; however, the limits have definitely shifted since the past, and a lot of folklore from the early 2000s doesn't apply today. On a beefy multi-core machine with lots of RAM we can easily run 10,000 threads in a single process today, in production. As I've mentioned above, it's highly recommended to watch Google's talk on fibers; through careful tuning of the kernel (and setting smaller default stacks) Google is able to run an order of magnitude more threads in parallel.


so, in that benchmark, context switch is comparable to copying 64k mem, which is kinda significant, I run some heavy load database with few hundreds threads, and see that it does 100k context switching per sec some times.


> 10k problem.. Wouldn't modern computing not work if my Linux box couldn't spin up 10k threads? Htop says I'm currently at 4,000 threads on an 8 core machine.

By the 2010s the problem had been updated to C10M. The people discussing it (well, perhaps some) aren't idiots and understand that the threshold changes as hardware changes.

Also, the issue isn't creating 10k threads it's dealing with 10k concurrent users (or, again, a much higher number today).


Async Rust is especially problematic in the enterprise world where large software is built out of micro-services connected through RPC.

Typically, if you want to build something with Rust, it'll have to use async, at least because gRPC and the like are implemented that way. So the vanilla (and excellent, IMO) Rust language doesn't exist there. Everything is async from the get-go.


> Async Rust is especially problematic in the enterprise world where large software is built out of micro-services connected through RPC.

A weird way to use Rust since you can do a lot of messaging within the process, and use the computing power much more efficiently.

RPC is essentially messaging and message-passing. Message-passing is a way to avoid mutable shared state - this is the model with which Go became successful.

RPC surely has its use but message passing is another, and very often inferior, solution to the problem set where Rust has excellent own solutions for.


RPCs are for separate services possibly operating on separate machines, where in-process message passing wouldn't work.


One of these days I really want to sit down and read a bunch of the takes on different async approaches by the Rust designers, and ponder the design choices in depth. Until then, I will defer judgement. I will say I prefer green threads as a user, but the common argument is that it is not a zero cost abstraction, and thus not appropriate for Rust, makes sense to me atm. I do wish async would get rid of its rough edges though (lack of async drop, async traits, scoped task, etc.) and at least become a first class citizen.


Does anyone have links to resources that someone new to Rust should read on how to conceptualize async code in Rust? Based on reading the comments, it would seem there are ways to start with writing synchronous code, and if necessary make it async but do it in such a way that is runtime agnostic... don't just reach for Tokio.

If I'm implementing a library, how should I write it so that the consumer of the library doesn't have to pull in Tokio if they don't want to?


I really can't even begin to empathize with the author. I find async/await to be significantly more ergonomic then the alternative in almost every use case.

The arguments about Arc fall flat because how else would you safely manage shared references, even in other lower level languages. And so called "modern GCs" still do come with a significant hit in performance; it's not just some "bizarre psyop".

Really the only problem I've run into with Rust's async/await is the fact that there is not much support for composing async tasks in a structured way (i.e. structured programming) and the author doesn't even touch on this issue.

Ultimately the goals and criticism of the author are just downright confusing because at the end he admits that he doesn't actually care for the fact that Rust is design constrained by being a low level language and instead advocates for using Haskell or Go for any application that requires significant concurrency. So to reformulate his argument: we should never use or design into low level languages an ergonomically integrated concurrency runtime because it may have a handful of engineering challenges. When put concisely, their thesis is really quite ridiculous.


I find this post extremely weird. The author starts with saying sharing state is bad and then continues to complain sharing state in async Rust is hard, which is like, use channels if you don't want to pass mutexes everywhere?


Async Rust is a bit half-baked still, but the Rust folks are quite aware of that, and working to improve things. See https://rust-lang.github.io/wg-async/welcome.html and https://github.com/rust-lang/wg-async


To do user space concurrent software efficiently, one really needs at least some degree of OS cooperation. Only the system knows what kind of work is running on the machine and what the relative priorities are. One problem with the common executors is that they attempt to grab all the resources the system can offer - if you have multiple applications like that, you end up with extra thread contention. I also agree with the article in that the common coroutine as state machine model has its pitfalls.

With all this in mind, I really like Swift concurrency runtime. It does automatic thread migration and compaction to reduce the overhead of context switches, balances the thread allocation system-wide taking relative priorities into account, and it appears to be based on continuations instead of state machines. A very interesting design worth studying IMO.


Rust still isn't the language I'm looking for.

It's too complex.

Something simpler is needed with the benefits of memory safety.


So anything from Python, Perl, PHP, Visual Basic, Java, C#, Go, Ruby, Erlang, OCaml, Scheme, Common Lisp?


I don’t know enough rust to comment on that part of the article but I’ve run GC-based systems at scale and load and the remarks about that do not match my experience at all.

I've coded performant applications on an OS that used channels and it sucked. It just got in the way and was confusing to engineers used to lower level constructs. "Just get out of my way!"

I think rust async is hard.

And that's what it comes down to. 99.9% (maybe more nines) of people do not need that level of control. They need conceptually simple things, like channels, and GC, and that will work for nearly everyone. The ones who need to drop to rust either have the engineers to do that, or their problem is intractable (for them). I pity those who drop to rust prematurely because it's cool.


> an OS that used channels

I'm very curious; what OS is this?


> We want to use the whole computer. Code runs on CPUs, and in 2023, even my phone has eight of the damn things. If I want to use more than 12% of the machine, I need several cores.

Isn't that already, in this strong generality, an almost always wrong assumption?

Sure, one can do massively parallel or embarrassingly parallel computation.

Sure, graphic cards are parallel computers.

Sure, OS kernels use multiple cores.

Sure, languages and concepts like Clojure exist and work - for a specific domain, like web services (and for that, Clojure works fascinatingly well).

But there are many, even conceptually simple algorithms which are not easy to parallelize. There is no efficient parallel Fast Fourier Transform I know of.


And there are even different degrees of parallelization. Some things will scale almost linearly to CPU cores, some will share a little state and see diminishing returns, some will share a lot of state and maybe only make good use of 2 cores, and it'll all depend on the hardware too.


Well, if he wants to get close to using 12% of the machine, he'll need the SIMD intrinsics that are hidden behind `unsafe` :(


In the past I did concurrency with non-preemptive multitasking and all I/O was handled by an event-loop. I find this strictly superior to async. It seems to have about 0.1% the popularity of async though.


> Some problems demand a lot of concurrency. The canonical example, described by Dan Kagel as the C10K problem back in 1999, is a web server connected to tens of thousands of concurrent users. At this scale, threads won’t cut it—while they’re pretty cheap,5 fire up a thread per connection and your computer will grind to a halt.

Try it. It'll probably work fine. It may be very expensive, memory wise, but it's easy to get a machine with a lot of memory.


It's not just that. As you increase OS thread active count, each thread starts to respond slower and slower.

It's been tried, periodically. Still sucks.


Write a little program that starts up 10k threads that just wait. The other tasks on the machine won't be any slower once they're set up.

Of course, if they're doing real work they'll be using CPU time, but that's true of any scheme you might pick.


Obviously. My point is that spawning 10_000 "processes" (green threads / fibers, really) on the Erlang BEAM VM is almost not noticeable at all f.ex. in web server mode. Everything just gets a tiny little bit laggier but chugs along nicely. Same goes for Golang's goroutines, though not exactly to the same extent (the runtime does not tolerate as huge a number as easily as Erlang's runtime).

Whereas spawning native OS threads (not sure about the 10k number, could be even more with the good hardware these days) and having them all do stuff is gonna lag a whole lot more due to context switches.

So you know, apples to apples, but some apples are much better than others.


Rust is designed like this because it seeks to achieve zero-cost abstractions and safety.

Or in other words, the goal is that you can think in abstract what the natural optimal machine code would be for a program, and you can write a Rust program that, in principle, can compile to that machine code, with as little constraints as possible on what that machine code looks like.

Unlike C, that also has this property, Rust additionally seeks to guarantee that any code will satisfy a bunch of invariants (such as that a variable of a data type actually always holds a valid value of that data type) provided the unsafe code part satisfies a bunch of invariants.

If you use Go or Haskell, that's not possible.

For example, Go requires a GC (and thus requires to waste CPU cycles uselessly scanning memory), and Haskell requires to use memory to store thunks rather than the actual data and has limited mutation (meaning you waste CPU cycles uselessly handling lazy computations and copying data). Obviously neither of this are required for the vast majority of programs, so choosing such a language means your program is unfixably handicapped in term of efficiency, and has no chance to compile to the machine code that any reasonable programmer would conceive as the best solution to the problem.


Is it possible to use Rust and all the important 3rd party crates without using async?

Out of curiosity, could Rust be limited to a language subset to mimic the simplicity of Golang (with channels and message passing) and trade-off some of the powerful features that seem to be causing pain?

Pardon a naïve question. I’m a systems engineer who occasionally dabbles with simple cli tools in all languages for fun, but don’t have a serious need for them.


I'd really like something like this! I would also love a Cargo switch / config that disables all panicky Rust API like `expect` and `unwrap`.

From what I can gather, such projects will never happen though. That's why I moved part of my work to Golang itself.

Rust is an amazing language. Though the team really takes the "system language" thing very seriously and they're making decisions and tradeoffs based on that, so it seems us its users should adapt and not use Rust for everything. That's what I ended up doing.


Although still experimental, GHC Haskell has a linear types extension that enables developers to specify lifetimes [0] that can be statically checked.

Good call, re: garbage collection FUD. Ultimately many programs have to clean up memory after it is no longer needed by the program and at a certain scale in a program it becomes necessary to write code that handles allocations/deallocations; and you end up manually writing a garbage collector. Done well you can get better performance for certain cases but often it's done haphazardly and you end up with poor performances.

It seems a good amount of Rust evangelism has given up on the, "no GC is required for performance," maxim. Is that the case, Rust friends?

That being said, I think it would be neat if there were a language like Haskell where there was an interface exposed by the compiler where a user could specify their own GC.

[0] https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/line...


I think the persons behind Rust are working on a particularly hard and complicated problems, so I don't want to say that async Rust work is bad. I would instead say it is complicated and hard to understand and use casually, instead.

Async Rust is many language features and behaviour all interacting with each other at the same time, to create something more complicatedly than how you would describe the problem you're actually trying to solve (I want to do X when Y happens, and I want to do X when Y happens × the number of hardware threads). When you're using async rust, you are having to think more carefully about:

* memory management (Arc) and safety and performance

* concurrency

* parallelism, arbitrary interleavings

* thread safety

* lifetimes

* function colouring

All interacting together to create a high cognitive load.

Is the assembly equivalent of multithreading and async complicated?

Multithreading, async, coroutines, concurrency and parallelism is my hobby of research I enjoy. I journal about it all the time.

* I think there's a way to design systems to be data intensive (Kleppmann) and data orientated (Mike Acton) with known-to-scale practices.

* I want programming languages to adopt these known-to-scale practices and make them easy.

* I want programs written in the model of the language to scale (linearly) by default. Rama from Red Planet Labs is an example of a model that scales.

* HN user mgaunard [0] told me about "algorithmic skeletons" which might be helpful if you're trying to parallelise. https://en.wikipedia.org/wiki/Algorithmic_skeleton

I think the concurrency primitives in programming languages are sharp edged and low level, which people reach for and build upon primitives that are too low level for the desired outcome.

[0]: https://news.ycombinator.com/item?id=36792796

[1]: https://blog.redplanetlabs.com/2023/08/15/how-we-reduced-the...

Note: You can use async Rust without threading but I assumed you're using multithreading.


Whether or not you agree with the conclusion, this is an excellent article. It gives a great overview of the background topics (I had a few gaps filled in!), and does a really good job of explaining the problem

Re: the conclusion, I wonder if this is a problem that can be solved over time with abstractions (i.e. async Rust is a good foundation that's just too low-level for direct use)?


As far as I understand the article, the author is talking about "massively concurrent userspace programs" which also have one more constraint: they can't rely on IPC to breakup or offload that concurrency.

(They mention this extra constraint early in the article: "But this approach has its limitations. Inter-process communication is not cheap, since most implementations copy data to OS memory and back.")

I'm familiar with writing services with large throughputs by offloading tasks onto a queue (say Redis/Rabbitmq whatever) and having a lot of single threaded "agents" or "workers" picking them off the queue and processing them.

But as implied in the earlier quote from the article, this is not an acceptable fast or cheap enough solution for the problems the author is talking about.

So now am left wondering: what are some examples of the class of (1%) problems the author is talking about in this article?


I am very happy someone finally addressed the concerns, and kudos for breaking out of the anti-gc zealotry.


Not a single mention of Erlang. Haskell's parallelism is very much inspired by Erlang's model.


Can someone explain where the stackless/stackful naming comes from and what does it mean?


Rust futures don't have stacks, all local variables get translated into struct fields, so suspending execution on an awake point is trivial, you just start executing another future. It also means there is no need for a runtime for a future to make sense (it's just a struct with a trait implemented) and the futures are completely static and amenable to compiler optimisations like inlining. On the other hand, it means that the structure is by necessity self referential (one "local" variable might refer to another, so the struct has addresses in it), which means it can't be safely moved (because will point at the old address).

"Stackful" coroutines, on the other hand, do have runtime stacks (holding local variables) that get swapped out by the runtime on await points. It makes the code behave exactly like non-async code, but requires a runtime to manage those stacks. Rust didn't go this way, preferring the benefits of the stackless approach.


> You might say this isn’t a fair comparison—after all, those languages hide the difference between blocking and non-blocking code behind fat runtimes, and lifetimes are handwaved with garbage collection. But that’s exactly the point! These are pure wins when we’re doing this sort of programming.

Until all the work you're trying to push is generating so many allocations that your GC goes to shit once every two minutes trying to clean up the mess you made. (https://discord.com/blog/why-discord-is-switching-from-go-to...)


Go's GC is hardly state of the art.


I got no horse in this race -- I like both Golang and Rust and use them for different things -- but as far as I can tell, Golang's GC has improved a lot.

Not sure where does it stand on a global competition rating board (if there's even such a thing) but it's pretty good. I've never seen it crap the bed, though I also never worked at the scale of Twitch and Discord.


I have a lot of respect for Discord's technical decisions. They know when to do things the bland way and when to use more specialized technologies. Note that that article also praises async Rust.


Anyone know why there isn't a single type/interface that allows for consumers to supply any of Arc, Rc etc boxed values?

I haven't investigated it deeply, but I was developing something in Rust, and whether something needs to be threadsafe or not is entirely on the consumer's use case... bad separation of concerns for the provider of a generic interface to have to specify the specific type of boxed value. 100% fine if the behavior in this case is to pre-allocate the max possible boxed type memory requirement.

This is the only thing I was really frustrated with in Rust


> Anyone know why there isn't a single type/interface that allows for consumers to supply any of Arc, Rc etc boxed values?

Your gene