Hacker News new | past | comments | ask | show | jobs | submit login
Tokio internals: Understanding Rust's async I/O framework (cafbit.com)
110 points by bowyakka on Dec 20, 2017 | hide | past | favorite | 64 comments

I'm (one of) the author of Tokio, hopefully I can clarify some points.

> Unfortunately, Tokio is notoriously difficult to learn due to its sophisticated abstractions.

IMO, this is largely due to the current state of the docs (which are going to be rewritten as soon as some API changes land).

The docs were written at a point where we were still trying to figure out how to present Tokio, and they ended up focusing on the wrong things.

The Tokio docs currently focus on a very high level concept (`Service`) which is an RPC like abstraction similar to finagle.

The problem is that, Tokio also includes a novel runtime model and future system and the docs don't spend any time explaining this.

The next iteration of Tokio's docs is going to focus entirely at the "tokio-core" level, which is the reactor, runtime model, and TCP streams.

tl;dr, I think the main reason people have trouble learning Tokio is because the current state of the docs are terrible.

> Aren’t abstractions supposed to make things easier to learn?

Tokio's goal is to provide as ergonomic abstractions as possible without adding runtime overhead. Tokio will never be as "easy" as high level runtimes simply because we don't accept the overhead that comes with them.

The abstractions are also structured to help you avoid a lot of errors that tend to be introduced in asynchronous applications. For example, Tokio doesn't add any implicit buffering anywhere. A lot of other async libraries hide difficult details by adding unlimited buffering layers.

I've written a few production-facing applications with Tokio; and I think the author exactly identifies the stumbling blocks I hit while learning the landscape.

My takeaway from working with Tokio is that it's a fairly low-level abstraction and doesn't do much to address the challenges of building networked _applications_. And this is OK.

We'll need higher-level layers that use Tokio, however, to address more specific use cases. I'll point to the nascent tower-grpc[1] library as something in this direction. I hope to see more things like this fall out of our work on Conduit[2].

[1] https://github.com/tower-rs/tower-grpc

[2] https://github.com/runconduit/conduit

> Unfortunately, Tokio is notoriously difficult to learn due to its sophisticated abstractions.

Aren’t abstractions supposed to make things easier to learn? Something about the idea of “complex abstractions” seems wrong.

(Edit: this is not a criticism of Tokio, it’s a criticism of the OP’s characterization of “sophisticated abstractions” which IMO should reduce complexity)

> Aren’t abstractions supposed to make things easier to learn?

Not always. Some abstractions are designed to make it easier to solve hard problems correctly (than without the abstraction).

For example, consider Rust's memory model. Many people criticize that model as difficult to learn. By comparison, you might argue that C's memory model is simpler to learn. Yet, the C approach to allocating, using, and freeing memory is highly error-prone. C programs historically have frequently had mistakes such as use-after-free errors, or buffer under/overflow/reuse errors. The high-profile OpenSSL Heartbleed vulnerability was an example of a weakness in C's memory model and memory handling abstractions [1].

Rust's memory model may be more difficult to learn than C's, but once learned, they are abstractions that provide an advantage in building correct software, by ruling out certain classes of mistakes. (GC in languages like C# and Java and Go can also prevent these mistakes, but comes with a runtime cost. Rust aims to provide zero-cost abstractions.)

Building correct async IO programs using kernel abstractions is difficult for similar reasons as it's difficult to write correct programs with C's memory model. It's especially difficult if you want the async IO program to be portable across multiple OS/kernels. I have not used Tokio, but I would guess that its Rust-powered abstractions will make it difficult or impossible to leak memory or sockets, or to fail to handle error cases that might arise handling async IO.

[1] https://www.seancassidy.me/diagnosis-of-the-openssl-heartble...

Yeah, but I actually have a reasonable chance of accomplishing what I want in C++.

vs Rust where I bash my head against it for 2 days then give up. I'm not smart enough for Rust, oh well.

I can totally understand that opinion, but let me give a further view on that:

I implemented in the last years dozens of async/event-driven networking libraries, and wanted to do the same in Rust about 4 years ago. I guess I even started to work on the first async IO libraries for it (https://github.com/Matthias247/revbio). However I also got frustrated quite quickly with it, since the ownership model makes the typical solutions for asynchronous programming super hard (e.g. callbacks or callback interfaces). Therefore I also gave up on that (and also temporarily on the language).

However some years and evolution of Rust later my viewpoint on this changed a bit:

- Asynchronous I/O is generally messy and leaves a lot of room for errors. E.g. each callback that is involved might cause reentrancy problems or invalidations, object lifetimes might not be well-defined or matching expectations, etc. While most languages still allow it, it's hard to get fully right. Especially when manual memory management is involved. Async I/O plus multithreading is mostly a recipe for disaster.

- Rust just puts these facts directly in our face, and wants to you to go the extra route to provide that things are working directly. It's far from easy to figure out how to do this in a sane way. I think the tokio authors did an awesome job on finding some primitive abstractions with the poll model that allows for async I/O and uses Rusts type system for safety guarantees. It's a little bit akward to use without syntactic sugar like async/await or coroutines, but I think that is in the nature of the problem.

- Trying to do async IO probably shows off the domain which is the most inconvenient in Rust (besides similar problems like object-oriented and callback-driven UI frameworks). Therefore it shouldn't be used as a general point of measurement how easy or complex Rust is to use.

I am not trying to personally attack you here, but most times when people say something like this, what they are actually saying is: "I think I am accomplishing the same thing in C++, but by not facing (and solving) the problems, I am actually creating a program with nasty hidden bugs, that might or might not blow up in my face later".

No, it's more "I know the lifetimes are right, but I can't figure out how to convince the compiler. I give up, I'm just going to go write it in C++".

I'm not the only one to make such comments. They made sure to try and address this in v2 of the book. It still sucks.

In that case RefCell[1] is the right approach. I use it a ton with C callbacks where like you said you can't get the compiler to understand lifetimes. You get the added bonus of Rust validating the lifetime at runtime and catching any regressions.

In the rare case where the lifetime check is too expensive you're free to turn the types into pointers and use an unsafe block, which gives you the exact same constraints as you have in C++.

[1] https://doc.rust-lang.org/std/cell/index.html

Sorry that you had this experience. If you have the time, any concrete specifics about things would be helpful.

On the other hand, a buggy C++ program is still better than no rust program.

Having had to track down memory data races in a production C++ codebase I'm not sure that I completely agree with your assessment.

I've once spent quite a few hours debugging a dangling pointer that wasn't zeroed after delete and was written to which might or might not have corrupted a piece of lua interpreter's state.

fixing that felt good. the preceding 24 hours, not so much.


Data races are particularly nasty because any print statements or debug tracing can trigger a memory fence/reorder and make the problem disappear.

Nothing more frustrating then adding a printf only to see the issue no longer manifest.

Trivially false: buggy autopilot software is, many times, worse than no autopilot software if the bug is “blow up immediately”.

Some problems are worth the time to solve them.

Ok yes, I wouldnt write an autopilot in C++.

However, for example, on the average desktop or mobile app, it's not clear to me(yet!) it is worth the pain of writing in rust.

For the average desktop or mobile app it’s rarely worth it to write in any language with manual memory management.

but I'm not writing autopilot software and rust isn't going to save you from everything.

Rust really doesn’t require much intelligence per se; it’s just a different set of patterns to learn.

I find myself thinking more about data ownership, allocations, and type structure. C++ is more about testing correctness about memory ownership and trying to break existing assumptions as I code. Different skillsets entirely!

You may be able to produce C++ faster, but i’d choose maintaining a rust codebase any day. Memory models require a lot of energy to maintain correctly, and rust does the heavy lifting for you.

Yup, I'd even go further to say that if you don't need the memory or performance advantages that come with native languages it's probably a better idea to reach for your favorite flavor of GC'd language instead.

That said, when you need it Rust does the right thing both in keeping you from blowing off your foot and making cross-platform development a breeze.

I can tell you from experience that if you persistently ask newbie questions on the Rust help channels, they'll get enough information out of you that they can say "oh, just put X, Y and Z on line 22 of your code" and it will work, even if you don't understand it.

An argument could be made around you not being smart enough for C++ either if you are truly unable to write the equivalent program in Rust.

Rust simply makes memory-correctness something the compiler can check. Correct C++ programs are the same as correct Rust programs, the compiler simply isn’t enforcing it.

> Rust simply makes memory-correctness something the compiler can check

and the latter half of that statement is where things get problematic. Prove it to the compiler has repeatedly been too hard.

if it's too hard, drop to unsafe - you have the same guarantees as C++ there.

the real problem is telling when it's impossible, e.g. in recurrent data structures, like linked lists and trees. that takes practice.

I wouldn't say you're "not smart enough", I would say the subject is not yet well-taught.

You don't have to, as Tokio is just a building block for people who're implementing protocols. But those implementations will have great safe, zero-copying characteristics.

I think you’re comparing apples and oranges here.

Writing memory safe code without Rust is harder than using Rust’s abstractions to do the same task. If you agree with that then my comment stands.

There are various criticisms of tokio, coming from different directions. Some have to do with the fact that some abstractions in the futures ecosystem are leaky today and that makes them less easy to use than they could be (though they won't always be leaky[]). But others have to do with understanding the internal implementation of these abstractions - people who feel they must understand how their library works internally before using it.

Of course, schedulers are just complicated. Most of the time you don't think about how complicated your scheduler is, because its either an OS primitive in the kernel or a language primitive in your language's runtime. But since tokio is a library - and modular - it gets criticism for being complex that in my opinion is unfair.

[] To be more concrete: a future is essentially a state machine representing the stack state at any yield point; it can't (currently) contain lightweight references into itself because they'd be invalidated when the future is move around. This means using borrowing in futures programs is often infeasible today. Solutions are in the works.

My point is that “sophisticated abstractions” should reduce complexity, not increase it.

If Tokio’s abstractions are seemingly increasing complexity, maybe they aren’t sophisticated abstractions.

This is a criticism of the OP, not Tokio.

I understood your point. As my comment alluded to, the criticisms of tokio this blog post is attempting to address have to do with the implementation complexity - abstractions do not reduce the implementation complexity of themselves.

It depends on whether an abstraction is intended for novices or, for lack of a better term, power users. Tokio is, IMHO, more of the latter. It's a complex set of concepts that is designed to scale up very well as the complexity of the task increases but doesn't scale down very well for simple tasks* . Learning quite often involves those simple tasks, so Tokio gets a reputation for being hard to learn. But when you take an easy-to-learn abstraction and try to scale it up to handle very complex problems, you often find the abstraction breaks down much more easily than something like Tokio does.

Tokio doesn't subscribe the the Larry Wall philosophy of making the easy things easy and the hard things possible. It seems more focused on making the hard things as easy as possible without much regard for the easy things.

* Before anyone attacks this...yes, you can accomplish simple tasks in Tokio, but it requires learning a lot more concepts than should be necessary to accomplish that simple task.

So what, if you aren't already an expert then go away you aren't wanted?

I'd argue it should be "abstractions take care of problems you don't care about". What you care about may be different from others, hence the variety of abstractions. Some (many!) favor ease-of-learning for general use, some favor safety at all costs, some favor explicit memory layout, some hardware independence, etc.

I think the author's quote is a bit of an overstatement. Reading the docs on tokio_core seems that knowledge could be readily transferred if you've worked with Node, or browser JS, Java Futures, or a game engine -- this, of course, means you've been exposed to similar abstractions, potentially with different terminology.

I think the quote is to be interpreted in terms of, if you've only ever seen blocking IO, and have never seen async IO or deferred computation, you have to learn some things first, but this isn't unique to Tokio in particular.

Actually, Tokio's futures are a bit different from the async IO (including most implementations of futures) in ecosystems. Nothing world-breaking, but they have shaped the abstraction a bit differently in order to be more efficient with respect to memory management, so it can be a bit surprising.

Reading rust subreddit it seems like around 90% of opinions about Tokio are negative, that's why they are rewriting it.

> The tokio-core crate provides the central event loop

Does it mean it only uses a single thread for IO notifications?

If yes, the performance won’t be exceptionally great, especially on servers with many CPU cores and fast network interfaces.

The underlying OS APIs (both epoll, kqueue and iocp) do support multithreaded asynchronous IO, so that’s not some platform limitation.

One can spawn as many reactors as you would like. The only thing a reactor does is receive events off of epoll (or other system selector) and notify the associated task. The task could be on the reactor thread or across a different thread.

Generally speaking, how to optimize concurrency for a network based application is pretty use case specific.

tl;dr, you can fully take advantage of many core systems w/ Tokio.

High I/O systems usually spawn multiple reactors, e.g. one reactor per CPU core, and run these reactors on the same set of file descriptors.

Does that library support such use case? Or does it imply 1-to-many relation between reactors and files/sockets? The latter doesn’t scale well.

Tokio is about being flexible (true today, even more true in the upcoming release). It is more about set of primitives that you can assemble in a way that fits your needs. You can structure the concurrency of your application however you want.


> High I/O systems usually spawn multiple reactors, e.g. one reactor per CPU core.

Depends on what you are calling multiple reactors. If you mean a loop that responds to events and run tasks, then yes. For example, you can plug in [this](http://github.com/carllerche/futures-pool) as the task executor and get a multi threaded, work stealing, scheduler.

Or, maybe you are talking about OS level selectors (epoll), in which case you are going to run up against OS limitations.

> you are talking about OS level selectors (epoll), in which case you are going to run up against OS limitations.

Yes, about them, but I’m not sure what OS limitations do you mean?

For Windows, it works fine from the very first NT 3.51 version of IOCP.

For Linux it indeed didn’t work in the very first version of epoll, but they fixed that adding EPOLLONESHOT flag, and recently EPOLLEXCLUSIVE flag for accept(), allowing to implement proper scaling of IO readiness notifications across multiple CPU cores.

You can get this working with Tokio if you know what you are doing.

However, I personally advise against it as I have found that deferring to the OS for scheduling results in poor thread affinity (your state gets bounced around threads unnecessarily).

You generally get better throughput by either running multiple fully isolated reactors (the seastar approach) or running a single reactor which only flags tasks as ready to be executed, and then use a work stealing thread pool to do the actual execution.

Both of these structures are easy to achieve with Tokio.

libevent2 (which tokio is very similar to) has the same constraint:

> Currently, only one thread can be dispatching a given event_base at a time. If you want to run events in multiple threads at once, you can either have a single event_base whose events add work to a work queue, or you can create multiple event_base objects.


We don't want to have to decide which thread will handle each connection, just pick an idle one.

Yes you can do this. The actual socket is not pinned to any thread, so you can move it to a thread pool or whatever.

I prefer golang's syscalls and os modules. Just direct interfaces to the C functions, no abstractions and no cruft.

That's still an abstraction, just one built into the language. And Go's approach (as well as other things, like garbage collection) requires a runtime; while that does support an interesting programming model, it also makes Go unusable for a variety of use cases that need to run without a runtime.

This approach in Rust, as well as Rust's approach to memory management and other things, allow it to run without a runtime, which allows it to work for anything C does.

There's another reason Rust's ecosystem uses futures (as opposed to having some library-based greenthreading system): each future is like the stack of a userspace thread, but perfectly sized (it will be as large as the largest stack space needed at a yield point). This reduces the memory footprint of services using futures.

What use cases need no runtime (genuine question)? I understand the benefit of being compiled, since you can target a cpu instruction set and run without dependencies.

Low-level firmware, embedded applications, OS kernels, interrupt service routines, applications that can't afford the latency of being periodically interrupted, platforms where you don't have a scheduler or most OS services, libraries intended to be loaded into other applications written in other languages (e.g. where you don't control the main program entry point), writing language runtimes themselves, etc.

Early Rust, pre-1.0, did have a runtime and a green-threads mechanism; they ripped it out because they recognized that they couldn't go everywhere they wanted to go if they kept it. And if they hadn't done that, I believe Rust would have been far less successful than it has been.

So I'm guessing it's the runtime itself that expects to interact with things like an OS scheduler. Does that mean if you attempted to write an OS in Go, you'd basically have to do it without many of the features of the runtime (that provide the abstractions that are useful for application progamming?). That would be pretty crummy.

> Does that mean if you attempted to write an OS in Go, you'd basically have to do it without many of the features of the runtime (that provide the abstractions that are useful for application progamming?). That would be pretty crummy.

You'd have to write the lowest-level of it in a language that wasn't Go. That language could be C, or Rust, or a pseudo-Go that didn't have many of the features people expect (including garbage collection and goroutines).

Isn't go's greenthread offering part of it's runtime? If you use a go channel or thread, is that not calling some go std lib abstraction? What you're talking about seems like a pretty non-idiomatic use of go.

I'm not familiar enough with golang, but from your description it seems like in the Rust world, you're saying "I prefer the mio crate[1]"

[1] https://docs.rs/mio/0.6.10/mio/

AFAICT, a more go-like approach would be to use std::fs inside a std::thread

What's missing here is that Go has a userspace scheduler and greenthreads. Despite presenting the user with something that looks like old school blocking IO, because you are only blocking a greenthread it is actually asynchronous IO under the hood.

And of course there is a huge amount of abstraction to build this up. It's just all hidden in the language runtime, so people can say with a straight face that there is "no abstraction and no cruft."

Go's I/O is anything but a direct interface to the C functions. Its scheduler is a large abstraction.

For all asking what OP is talking about, this is example of using epoll directly in Go:


This is description (from dotGo2017) of how netpoll works in Go under the hood if anyone is interested:


I am not sure what exactly that gist tries to show. As it forces the go runtime to run each read call on a separated native thread, it just shows that one can use go to implement rather bad epoll antipattern.

My understanding is that Go programs typically tackle these problems with goroutines and channels which are definitely an abstraction over kernel functionality.

Does golang use io completion on Windows? I know Tokio does so if that's not the case then there's probably a pretty large delta in performance.

That's not idiomatic Go, is it? Go also abstracts I/O by converting it to async under the hood.

Go IO is implemented with green threads using a scheduler underneath.

Rust had green threading in the initial phases but this was voted out.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact