
Tokio internals: Understanding Rust's async I/O framework - bowyakka
https://cafbit.com/post/tokio_internals/
======
carllerche
I'm (one of) the author of Tokio, hopefully I can clarify some points.

> Unfortunately, Tokio is notoriously difficult to learn due to its
> sophisticated abstractions.

IMO, this is largely due to the current state of the docs (which are going to
be rewritten as soon as some API changes land).

The docs were written at a point where we were still trying to figure out how
to present Tokio, and they ended up focusing on the wrong things.

The Tokio docs currently focus on a very high level concept (`Service`) which
is an RPC like abstraction similar to finagle.

The problem is that, Tokio also includes a novel runtime model and future
system and the docs don't spend any time explaining this.

The next iteration of Tokio's docs is going to focus entirely at the "tokio-
core" level, which is the reactor, runtime model, and TCP streams.

tl;dr, I think the main reason people have trouble learning Tokio is because
the current state of the docs are terrible.

> Aren’t abstractions supposed to make things easier to learn?

Tokio's goal is to provide as ergonomic abstractions as possible __without
adding runtime overhead __. Tokio will never be as "easy" as high level
runtimes simply because we don't accept the overhead that comes with them.

The abstractions are also structured to help you avoid a lot of errors that
tend to be introduced in asynchronous applications. For example, Tokio doesn't
add any implicit buffering anywhere. A lot of other async libraries hide
difficult details by adding unlimited buffering layers.

------
olix0r
I've written a few production-facing applications with Tokio; and I think the
author exactly identifies the stumbling blocks I hit while learning the
landscape.

My takeaway from working with Tokio is that it's a fairly low-level
abstraction and doesn't do much to address the challenges of building
networked _applications_. And this is OK.

We'll need higher-level layers that use Tokio, however, to address more
specific use cases. I'll point to the nascent tower-grpc[1] library as
something in this direction. I hope to see more things like this fall out of
our work on Conduit[2].

[1] [https://github.com/tower-rs/tower-grpc](https://github.com/tower-
rs/tower-grpc)

[2]
[https://github.com/runconduit/conduit](https://github.com/runconduit/conduit)

------
kindfellow92
> Unfortunately, Tokio is notoriously difficult to learn due to its
> sophisticated abstractions.

Aren’t abstractions supposed to make things easier to learn? Something about
the idea of “complex abstractions” seems wrong.

(Edit: this is not a criticism of Tokio, it’s a criticism of the OP’s
characterization of “sophisticated abstractions” which IMO should reduce
complexity)

~~~
jcrites
> Aren’t abstractions supposed to make things easier to learn?

Not always. Some abstractions are designed to make it easier to solve hard
problems correctly (than without the abstraction).

For example, consider Rust's memory model. Many people criticize that model as
difficult to learn. By comparison, you might argue that C's memory model is
simpler to learn. Yet, the C approach to allocating, using, and freeing memory
is highly error-prone. C programs historically have frequently had mistakes
such as use-after-free errors, or buffer under/overflow/reuse errors. The
high-profile OpenSSL Heartbleed vulnerability was an example of a weakness in
C's memory model and memory handling abstractions [1].

Rust's memory model may be more difficult to learn than C's, but once learned,
they are abstractions that provide an advantage in building correct software,
by ruling out certain classes of mistakes. (GC in languages like C# and Java
and Go can also prevent these mistakes, but comes with a runtime cost. Rust
aims to provide zero-cost abstractions.)

Building correct async IO programs using kernel abstractions is difficult for
similar reasons as it's difficult to write correct programs with C's memory
model. It's especially difficult if you want the async IO program to be
portable across multiple OS/kernels. I have not used Tokio, but I would guess
that its Rust-powered abstractions will make it difficult or impossible to
leak memory or sockets, or to fail to handle error cases that might arise
handling async IO.

[1] [https://www.seancassidy.me/diagnosis-of-the-openssl-
heartble...](https://www.seancassidy.me/diagnosis-of-the-openssl-heartbleed-
bug.html)

~~~
lurr
Yeah, but I actually have a reasonable chance of accomplishing what I want in
C++.

vs Rust where I bash my head against it for 2 days then give up. I'm not smart
enough for Rust, oh well.

~~~
hobofan
I am not trying to personally attack you here, but most times when people say
something like this, what they are actually saying is: "I think I am
accomplishing the same thing in C++, but by not facing (and solving) the
problems, I am actually creating a program with nasty hidden bugs, that might
or might not blow up in my face later".

~~~
lurr
No, it's more "I know the lifetimes are right, but I can't figure out how to
convince the compiler. I give up, I'm just going to go write it in C++".

I'm not the only one to make such comments. They made sure to try and address
this in v2 of the book. It still sucks.

~~~
vvanders
In that case RefCell[1] is the right approach. I use it a ton with C callbacks
where like you said you can't get the compiler to understand lifetimes. You
get the added bonus of Rust validating the lifetime at runtime and catching
any regressions.

In the rare case where the lifetime check is too expensive you're free to turn
the types into pointers and use an unsafe block, which gives you the exact
same constraints as you have in C++.

[1] [https://doc.rust-lang.org/std/cell/index.html](https://doc.rust-
lang.org/std/cell/index.html)

------
Const-me
> The tokio-core crate provides the central event loop

Does it mean it only uses a single thread for IO notifications?

If yes, the performance won’t be exceptionally great, especially on servers
with many CPU cores and fast network interfaces.

The underlying OS APIs (both epoll, kqueue and iocp) do support multithreaded
asynchronous IO, so that’s not some platform limitation.

~~~
carllerche
One can spawn as many reactors as you would like. The only thing a reactor
does is receive events off of epoll (or other system selector) and notify the
associated task. The task could be on the reactor thread or across a different
thread.

Generally speaking, how to optimize concurrency for a network based
application is pretty use case specific.

tl;dr, you can fully take advantage of many core systems w/ Tokio.

~~~
Const-me
High I/O systems usually spawn multiple reactors, e.g. one reactor per CPU
core, and run these reactors on the same set of file descriptors.

Does that library support such use case? Or does it imply 1-to-many relation
between reactors and files/sockets? The latter doesn’t scale well.

~~~
carllerche
Tokio is about being flexible (true today, even more true in the upcoming
release). It is more about set of primitives that you can assemble in a way
that fits your needs. You can structure the concurrency of your application
however you want.

So...

> High I/O systems usually spawn multiple reactors, e.g. one reactor per CPU
> core.

Depends on what you are calling multiple reactors. If you mean a loop that
responds to events and run tasks, then yes. For example, you can plug in
[this]([http://github.com/carllerche/futures-
pool](http://github.com/carllerche/futures-pool)) as the task executor and get
a multi threaded, work stealing, scheduler.

Or, maybe you are talking about OS level selectors (epoll), in which case you
are going to run up against OS limitations.

~~~
Const-me
> you are talking about OS level selectors (epoll), in which case you are
> going to run up against OS limitations.

Yes, about them, but I’m not sure what OS limitations do you mean?

For Windows, it works fine from the very first NT 3.51 version of IOCP.

For Linux it indeed didn’t work in the very first version of epoll, but they
fixed that adding EPOLLONESHOT flag, and recently EPOLLEXCLUSIVE flag for
accept(), allowing to implement proper scaling of IO readiness notifications
across multiple CPU cores.

~~~
carllerche
You can get this working with Tokio if you know what you are doing.

However, I personally advise against it as I have found that deferring to the
OS for scheduling results in poor thread affinity (your state gets bounced
around threads unnecessarily).

You generally get better throughput by either running multiple fully isolated
reactors (the seastar approach) or running a single reactor which _only_ flags
tasks as ready to be executed, and then use a work stealing thread pool to do
the actual execution.

Both of these structures are easy to achieve with Tokio.

------
jbirer
I prefer golang's syscalls and os modules. Just direct interfaces to the C
functions, no abstractions and no cruft.

~~~
JoshTriplett
That's still an abstraction, just one built into the language. And Go's
approach (as well as other things, like garbage collection) requires a
runtime; while that does support an interesting programming model, it also
makes Go unusable for a variety of use cases that need to run without a
runtime.

This approach in Rust, as well as Rust's approach to memory management and
other things, allow it to run _without_ a runtime, which allows it to work for
anything C does.

~~~
cshenton
What use cases need no runtime (genuine question)? I understand the benefit of
being compiled, since you can target a cpu instruction set and run without
dependencies.

~~~
JoshTriplett
Low-level firmware, embedded applications, OS kernels, interrupt service
routines, applications that can't afford the latency of being periodically
interrupted, platforms where you don't have a scheduler or most OS services,
libraries intended to be loaded into other applications written in other
languages (e.g. where you don't control the main program entry point), writing
language runtimes themselves, etc.

Early Rust, pre-1.0, did have a runtime and a green-threads mechanism; they
ripped it out because they recognized that they couldn't go everywhere they
wanted to go if they kept it. And if they _hadn 't_ done that, I believe Rust
would have been far less successful than it has been.

~~~
cshenton
So I'm guessing it's the runtime itself that expects to interact with things
like an OS scheduler. Does that mean if you attempted to write an OS in Go,
you'd basically have to do it without many of the features of the runtime
(that provide the abstractions that are useful for application progamming?).
That would be pretty crummy.

~~~
JoshTriplett
> Does that mean if you attempted to write an OS in Go, you'd basically have
> to do it without many of the features of the runtime (that provide the
> abstractions that are useful for application progamming?). That would be
> pretty crummy.

You'd have to write the lowest-level of it in a language that wasn't Go. That
language could be C, or Rust, or a pseudo-Go that didn't have many of the
features people expect (including garbage collection and goroutines).

