Hacker News new | past | comments | ask | show | jobs | submit login
Monoio – A thread-per-core Rust async runtime with io_uring (github.com/bytedance)
158 points by losfair 46 days ago | hide | past | favorite | 81 comments



I truly appreciate that this team uses nightly rust and only runs only Linux currently (due to relying on io_uring). Truly in the spirit of systems programming - focusing on a single, tight, efficient implementation first and leaving other considerations such as cross-platform compatibility for later. Lets those of us who love to live on the bleeding edge have our nice things too! :)



Following that through to the blog post ([1]), it's interesting how much new API they had to add. The Windows IO API is already completion-style (as opposed to epoll, etc's readiness-style) - userspace submits an async operation to the kernel, blocks on a channel to receive its result, and the kernel enqueues the result to said channel when it's done. So I naively assumed that they'd "just" have to refit io_uring's API on top of the existing Win32 API.

[1]: https://windows-internals.com/i-o-rings-when-one-i-o-operati...


uring is used for disk-IO as well, and Windows disk I/O completion has similar limitations as Linux aio does for disks, i.e. anything that actually goes into filesystem code, like allocating new blocks, creating files, anything involving metadata, still blocks. Readiness-oriented I/O of course doesn't work for disk I/O at all, because disks are always ready. And IOCPs still mean that for queuing new I/O you're going to call into the kernel at least once for each operation.


> for later

or for never :P

why not prove like... "this is the best. all other platforms/circumstances/situations are subpar"

if the performance differences are enough, i can picture people making excuses to avoid all of those other platforms (or just never using this... more likely)


Agreed. I wouldn't care if Rust dropped Windows/ OSX support entirely. Totally niche platforms.


I think you're being sarcastic, but I'm not totally sure.


I'm honestly not. Desktop platforms like Windows are incredibly niche. How much rust software actually runs on those? Some, of course. But probably an incredible minority.

Supporting them is smart because people expect it, and not supporting it would be bad optics, but if a language legitimately only targeted Linux, and was better for it, I'd be fine with that - they target the most popular OS by far.


I am a Windows user, so all of the Rust software I use runs on it. And that’s virtually all Rust software. Sometimes you need a small patch or two because someone did something weird with path handling, but 99.99% of it Just Works.

Apparently we haven't published platform statistics since 2019, but according to that years' survey: https://blog.rust-lang.org/images/2020-03-RustSurvey/32-what...

* 55% of Rust users develop on Linux

* 24% develop on Windows

* 23% develop on macOS

So yeah Linux was the most popular, but dropping effectively half of your users isn't always a great idea. Of course, some people will build OS-specific software in Rust, and that's great! But it is a tradeoff you're making.


I'm only saying it's a tradeoff. Obviously given Rust's origins and goals dropping desktop as a target would make no sense - it was designed explicitly for those platforms.

But if a language said "we're not going to support those" I wouldn't care at all, and a massive number of use cases - the majority, I think - would be solved with that language.

In terms of what you develop on, that's a whole other story. The majority, of course, are on Linux. But I'd be interested to know what platform they target.


> In terms of what you develop on, that's a whole other story. The majority, of course, are on Linux

I'm not sure that's as obvious as you're making it sound. The number of people who use Linux on the desktop is absolutely minuscule compared to the combined user base of Windows and MacOS. It's probably not as lopsided for developers, but I've never seen anything to imply that most developers in general are on Linux, and I'd honestly be surprised if that's the case given how much smaller the portion is in terms of people I know or have worked with, and that's as someone who does not personally own any laptop or desktop that runs anything other than Linux.


The linked poll has the majority of rust developers on linux


Sure, but only barely a majority. Nearly half of Rust users are on Windows or MacOS. Dropping support for those would be crazily irresponsible and would probably be a bigger programming scandal than even the Python 2->3 ordeal. And for what benefit? Making some low level libraries easier?


Neat! Thanks for admitting you use Windows.

Perhaps MS could pay for crater runs on Windows? Are crater runs done on Windows now?

Looks like yes, https://github.com/rust-lang/crater/blob/master/docs/agent-m...


Not just that, but like, https://github.com/microsoft/windows-rs. Microsoft is a member of the Rust Foundation. etc etc.


> How much rust software actually runs on those?

Some of rust's most well-known and widely distributed success stories for a start - Firefox & Dropbox


Yeah, obviously. It would be stupid to not support Windows in order for rust to be a healthy language. But I wouldn't care at all, and the vast majority of code - not just the code written, but in terms of it being deployed to N systems - will be Linux.


As much as I like deving only on Linux, this I'm pretty sure would be awful for rust's continued adoption.


With context switches becoming more and more expensive relative to faster and faster I/O devices, almost the same order of magnitude, I believe that thread-per-core is where things are heading, because the alternative of not doing thread-per-core might literally be halving your throughput.

That's also the most exciting thing about io_uring for me: how it enables a simple, single-threaded and yet highly performant thread-per-core control plane, outsourcing to the kernel thread pool for the async I/O data plane, instead of outsourcing to a user space thread pool as in the past. It's much more efficient and at the same time, much easier to reason about. There's no longer the need for multithreading to leak into the control plane.

My experience with io_uring has been mostly working on TigerBeetleDB [1], a new distributed database that can process a million financial transactions a second, and I find it's a whole new way of thinking... that you can now just submit I/O directly from the control plane without blocking and without the cost of a context switch. It really changes the kinds of designs you can achieve, especially in the storage space (e.g. things like LSM-tree compactions can become much more parallel and incremental, while also becoming much simpler, i.e. no longer any need to think of memory barriers). Fantastic also to now have a unified API for networking/storage.

So much good stuff in io_uring. Exciting times.

[1] https://www.tigerbeetle.com


Totally. And these thread per core apps can talk directly to virtio or NVMe passed into the kvm guest, you can get the best of having a unix host but have applications that run directly under KVM w/o sacrificing a rich control plane. And control plane reliability doesn't impact the data plane. Wonderful times!


There are four articles in Chinese about the design and implementation on one of the author's blog. Here's a link to the first one: https://www.ihcblog.com/rust-runtime-design-1/. I don't know the subject and Chinese enough to know how much is lost by automatic translation (Google in my case), but it looks relatively good.

As an aside, Google translation integrated in Chrome breaks the formatting of the code blocks, which is surprising since they're in a <pre> block.


I've found in my own use of rust I want async/nonblocking for two things.

1. Be able to timeout a read of a socket

2. Be able to select over multiple sockets, and read whichever is ready first

Usually a combination of both.

epoll/io_uring(I guess? I only ever did research on epoll) seem like the solution being handed to me on a silver platter, however my understanding is if you want to use either of those you're meant to use async in rust, and that while there are some libraries which provide interfaces for this kind of behavior outside of async, they're usually very ad-hoc and that the community is just very laser focused on async as a language construct. What I don't understand is why does Rust consider it necessary to introduce async, futures, runtimes, an executor, async versions of TcpListeners, Files, etc for this?

Why can't I just have a function in the standard library that takes a slice of std::net::TcpListeners, a timeout, blocks, and then gives me whichever is ready to read, when its ready to read, or nothing if the timeout is reached? Its not like I was going to do anything else on that thread while I wait for a packet to be received, it can happily be parked.

Instead I have to select a runtime, and libraries compatible with the runtime, replace all the TcpListeners from the standard library I'm using with tokio TcpListeners or whatever, deal with API change, and now I have to deal with the "turtles all the way down" problem of async as well.

That's not even to get into the whole nightmare that a lot of really sick libraries which I want to use in a blocking nature, are now only providing async APIs, which means its "async or the highway". I am very much not happy about this situation and I don't know what I can do about it. It seems the only response I ever get is "just use it, its easy" but that is not at all convincing me. I don't want to use it!


You can absolutely do readiness-based IO. Either call the system-specific APIs or use the MIO library, which is a low-level platform abstraction for that: https://github.com/tokio-rs/mio


If you need to handle <100,000 sockets, you will probably be fine with a thread per socket. Call set_read_timeout to implement the deadline. Run load tests and adjust the socket limit.

Async lets one process handle millions of sockets.

If you want to handle millions of sockets with threads, you can use `mio` [0]. Mio's API has footguns that can cause UB. If your wire protocol is complicated, you may find yourself implementing something like async, but worse.

[0] https://crates.io/crates/mio


I wonder how much and for what Rust is being used at Bytedance, given that this seems to be under their GitHub org. It's pretty interesting that they are apparently using it.

Edit: For those wondering who Bytedance are, I can save you a Google search. These are the people making TikTok.


Bytedance uses Rust extensively in Feishu/Lark (which competes with Microsoft Teams). It is even listed at https://www.rust-lang.org/production/users

From what I read (in Chinese), this "Monoio" is meant to be used for the proxy/whatever part of their next-generation service mesh. So maybe a corp-wide thing.


It looks like they shared a lot of internal projects: https://github.com/orgs/bytedance/repositories

I'm pleasantly surprised. Have many other Chinese tech companies embraced FOSS?


Yes its an increasing trend. Much of it is way less well known outside China. But lots of code from Alibaba, Tencent and smaller companies etc. many new CNCF projects are from China.


It's exciting to see another thread-per-core async runtime for Rust. It's truly understated how difficult the Send + Sync requirements in Tokio are for writing regular code. It's typically rare for async tasks in Tokio to be used across two threads simultaneously, but now all of your data must be Send+Sync.

Plus, Tokio is unique in that it's one of the very few runtimes in existence that's work-stealing (meaning the task can move off its original thread). Most other runtimes in other languages do not have that requirement, meaning you can use traditional Cell/RefCell/Rc instead of their slower, atomic variants.

Right now, the best you can do is write a thread pool that spawns a Tokio LocalSet to run !Send futures. In fact, this is what Actin-web does to achieve its crazy performance. Web requests finish so quickly you rarely need work-stealing to achieve good performance, and often the cost of atomics/stealing is greater than the performance gain.


> It's truly understated how difficult the Send + Sync requirements in Tokio are for writing regular code. It's typically rare for async tasks in Tokio to be used across two threads simultaneously, but now all of your data must be Send+Sync.

https://docs.rs/tokio/1.14.0/tokio/task/fn.spawn_local.html

> Plus, Tokio is unique in that it's one of the very few runtimes in existence that's work-stealing (meaning the task can move off its original thread). Most other runtimes in other languages do not have that requirement, meaning you can use traditional Cell/RefCell/Rc instead of their slower, atomic variants.

Most other languages don't have any concept of Send/Sync, or non-atomic Cell/RefCell/Rc. Rather than the compiler stopping you you only find out about the issue when you hit it, and if you're lucky you can skate by a long long while (or alternatively the language has no way to sync and everything's send).

Work-stealing runtimes may be more common than you think because of that: Erlang's BEAM, Go's scheduler(s), Java's fork/join, .net's TPL, many most if not all OpenMP implementations, Apple's GCD, ... implement work-stealing in various measures.

Also... thread-per-core makes work-stealing even more necessary because the OS can't perform the balancing? The only ways to avoid work-stealing eventually becoming necessary (for a general-purpose scheduler) is either to have a completely single-threaded scheduler, or to not have an application-level scheduler at all and use OS threads.


Quoting from a reply on an issue in the rust async-wg repo:

> Are people doing this? Does it work? :)

> Kind of every single successful async framework since the inception of eventloops :-)

> libevent, libev, libuv, boost asio, GTK, QT, seastar, nginx, javascript + node.js, dpdk, netty, grizzly, dart, etc.

> Besides Rust the main frameworks which tried to do move tasks between executors are Go and C#'s Threadpool executor (although I think the ASP.NET default executor might have fixed threads).

> Therefore the state of the world is actually more that Rust would need to prove that its approach of defaulting to thread-safe is a viable alternative than questioning the effectiveness of single-threaded eventloops. Their effectiveness in terms of reducing context switches and guaranteeeing good cache hit rates was more or less what triggered people to move towards the model, despite the ergonomic challenges of using callbacks.


I have absolutely no idea what the relevance of that mess is.

Are you arguing rust’s async runtimes should be single threaded, right after having somehow expressed interest in a non-single-threaded runtime?


I've been thinking about how one could get around the Send + Sync requirements, it's a fascinating conundrum, from a language design standpoint.

If we didn't have the Send + Sync requirements, and multiple async functions are running concurrently on the same thread, and multiple of them locked the same RefCell, might that cause a panic?


> If we didn't have the Send + Sync requirements, and multiple async functions are running concurrently on the same thread, and multiple of them locked the same RefCell, might that cause a panic?

Hopefully.


> the best you can do is write a thread pool that spawns a Tokio LocalSet to run !Send futures

Can you go into this a bit more or provide a code reference? I'm new to writing async Rust code and am interested in this idea.


Yeah, it's actix-web's runtime.

I brought it up to tide's maintainers and got a bunch relevant links. Feel free to click through the tide issue for more context.

- https://github.com/http-rs/tide/issues/837

- https://github.com/rust-lang/wg-async-foundations/issues/87

- https://github.com/rust-lang/wg-async-foundations/issues/128


There's a ton of bleeding edge and novel work being done by Chinese companies and communities due to their insane scale requirements. I've been trying to learn the language to gain more insight and stay current with what they're doing.


Have any other Rust async runtimes use io_uring/gotten at all good yet?

Best of the best modern systems programmers gotta get good sometime. Not sure if it's happening yet. Ok here's one point of call: https://github.com/tokio-rs/tokio-uring


Glommio uses io_uring: https://github.com/DataDog/glommio

And I integrated Hyper as an example: https://github.com/DataDog/glommio/blob/master/examples/hype...

And the performance was blisteringly quick (6x better latency streaming from a file compared to Nginx).


> Have any other Rust async runtimes use io_uring/gotten at all good yet?

yes, check out `actix-rt`

https://github.com/actix/actix-net


actix-rt is a wrapper around tokio's single threaded runtime, and (optionally) tokio-uring.


I have a stupid question, why isn't an async runtime a language feature of rust? we don't seem to see so many async runtimes in other languages? They seem to have a default way to run async tasks?


It sort of goes against Rust’s philosophy to bake anything like that into the language.

Where they screwed up was not providing the machinery to make libraries agnostic of the runtime an end user wants to use in their program, so libraries either depend on a specific runtime explicitly or use features to allow users to switch runtimes at compile time. This causes a lot of headaches for library maintainers and end users both.

There’s a lot of interest in adding said machinery (through collections of traits in std) to enable libraries to be generic over different runtimes, but a solution is still some ways off.


> I have a stupid question, why isn't an async runtime a language feature of rust?

Because it was considered inimical to the core values and purpose of the language, which is to be a systems language.

An async runtime being a language feature means a runtime is a language feature, and that is very much undesirable (in fact it used to be part of the language and was removed as it "settled" into its niche from its original design, which was much higher level and more applicative).


an async runtime inherently requires knowledge of the underlying system which Rust remains agnostic to. Rust can be written to target bare-metal (where no OS exists) to WASM and everything in between. If a runtime was added to the language itself you would inherently limit the flexibility of the platforms by which it could target.


Does someone knows the pro/cons of io_uring vs DPDK in these these one thread per core setup?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: