
Io_uring By Example: cat, cp and a web server with io_uring - shuss
https://unixism.net/2020/04/io-uring-by-example-article-series/
======
Matthias247
> the readv() call will block until all iovec buffers are fil led with file
data. Once it returns, we should be able to access the file data from the
iovecs and print them on the console.

This is wrong. readv() can return as soon as a single byte had been read, in a
similar fashion as read(). If you need to read all bytes you have to use a
loop. This program is potentially printing non-initialized memory.

I think the same applies to the uring examples, which also don't seem to check
the actual processed bytes.

I'm also actually not sure if in the uring version requests are guaranteed to
be processed in order or not. I would have assumed the latter - and thereby
the printed result could have been out of order.

Obviously the demos also miss all the required memory management - but I guess
that was intended. But if we would add it, the uring versions would increase
more in complexity than the synchronous version.

==> IO is hard. uring is exciting for high performance use-cases, but will
rather make it harder than easier. However ideally most end-users would not
have to use it directly (and neither liburing), but instead use an
async/await/coroutine framework on top of it that makes the asyncness
transparent to the application and allows to avoid most of the pitfalls.

~~~
tene
[https://docs.rs/rio/](https://docs.rs/rio/) is a pretty credible attempt at
providing a misuse-resistant API for io_uring, both synchronous and async.

There's still some nonzero complexity in using it instead of read/write, but
it doesn't seem particularly burdensome to me.

I've got a lot of hope for higher-level misuse-resistant libraries on top of
io_uring.

~~~
jstarks
Unfortunately rio is unsound (and the maintainer doesn’t seem to care) [1].

It’s also GPL3 which limits its use in many projects.

But I too am hopeful. Just not for rio’s API model specifically.

[1]
[https://github.com/spacejam/rio/issues/11](https://github.com/spacejam/rio/issues/11)

~~~
staticassertion
This doesn't read to me as "the maintainer doesn't seem to care" so much as
they feel that the unsafety is likely an edge case and, in another issue,
there's some more discussion around fixing it.

[https://github.com/spacejam/rio/issues/12#issuecomment-58375...](https://github.com/spacejam/rio/issues/12#issuecomment-583759740)

~~~
codys
`forgetting` is not an edge case. In the context of rust, having an API that
invokes unsafe behavior without it being marked unsafe is not acceptable.
Keeping ourselves to this rule is what makes the concept of `unsafe` useful
because we can rely on the behavior/api-spec of other code.

Failing to ensure correct behavior in the presence of `forget` means that once
the program gets complex enough it can blow up in undetermined ways because
one piece of code somewhere wasn't aware of a restriction in another piece of
code that is potentially very far away.

Failing to isolate this type of unsafety is a pitfall waiting for some future
developer to be trapped in.

~~~
staticassertion
I've never personally seen or used mem::forget. I'm sure it happens, of
course, and I'm in agreement that the API should be unsafe. I'm just saying
it's not clear that they "don't care".

------
frevib
Here is an echo server that uses some of the 5.7 features:

[https://github.com/frevib/io_uring-echo-server/tree/io-
uring...](https://github.com/frevib/io_uring-echo-server/tree/io-uring-feat-
fast-poll)

One of the 5.7 features, IORING_FEAT_FAST_POLL, gives a (free) performance
boost of up to 68% compared to epoll:

[https://twitter.com/hielkedv/status/1234135064323280897?s=21](https://twitter.com/hielkedv/status/1234135064323280897?s=21)

------
lazerl0rd
I maintain a NGINX fork with io_uring support (instead of AIO) here
[https://github.com/lazerl0rd/nginx](https://github.com/lazerl0rd/nginx).
io_uring has been providing promising results, and recent updates to Linux
have just been constantly improving it.

~~~
carterli
Glad to see that my patch is used somewhere :)

------
aseipp
Nice set of articles, thank you! io_uring is actually pretty easy to use with
liburing, and seeing more people adopt it is exciting.

The problem with io_uring, really, is that every kernel release has now become
much, much more exciting than the last one due to all the improvements (c.f.
5.7 now having buffer selection primops, big deal!) I'm constantly, anxiously,
ever-awaiting the next kernel release... :(

------
graetzer
Probabbly don't really know what I am talking about, since only ever used
async IO through boost.asio:

Isn't it a bit crazy that there are so many different ways of doing async IO
on linux alone ? You would think that this is a more or less solved problem by
now.

~~~
kccqzy
No, I don't think so.

The terminology here is a bit confusing because different people have defined
"async" differently, but in the strictest sense of the word, before io_uring
the only other way to do async IO is POSIX AIO. See aio(7) in your man pages:
[http://man7.org/linux/man-pages/man7/aio.7.html](http://man7.org/linux/man-
pages/man7/aio.7.html)

On the other hand, using select(2), poll(2), and epoll(2) is not really async
IO.

The difference is really simple: for non-async IO, whenever you use read(2) or
write(2) and that call returns without an errno, that operation is decidedly
complete from the perspective of the user space. The buffer might not actually
make it to disk (say because of caching) or the network (say because of the
Nagle algorithm). But really from the user space perspective it is done.

What about "async" libraries in user space? All the bells and whistles added
by these libraries in user space merely give you support for knowing when a
file descriptor is ready. It doesn't make the actual reading or writing
asynchronous. So with a such library, your code might appear to call read(),
but the framework knows that the file descriptor isn't actually ready,
suspends your greenthread/fiber/coroutine or whatever it uses and does
something else. The actual read(2) call doesn't happen. The kernel doesn't
know you want to read.

With io_uring, you actually deal with read or write requests sent to the
kernel that are incomplete. You actually tell the kernel you want to read (by
writing to a ring buffer, hence the name). The kernel knows you want to read.
That's the difference.

Now don't get me wrong but I think for the majority of applications you don't
need true async IO. All you really need is efficient notification of whether a
file descriptor is ready. So AIO and io_uring are both niche topics that
likely won't affect your app.

~~~
loeg
Really importantly, the behavior of poll/select/epoll on _files_ is to always
return "available." But a read from that file may still sleep waiting on disk
— available does not mean the result is already in memory. They're only really
effective primitives for networking sockets and artificial fds like
signalfd().

Prior to io_uring, libraries that provide async file access in userspace (by
necessity) use some kind of threadpool, with each thread processing an
operation synchronously and providing an async result via self-pipe or other
user-driven event.

------
kqr
This might be slightly off-topic, but I'm curious:

> The core of this program is a loop which calculates the number of blocks
> required to hold the data of file we’re reading by first finding its size.
> Memory for all the required iovec structures is allocated. We iterate for a
> count equal to the number of blocks the file size is, allocate block-sized
> memory to hold the actual data and finally call readv() to read in the data.

My reading of this is that the first readv(2) implementation of cat uses space
in O(n) where n is file size. With a very small constant, yes, but still
linear in the number of blocks.

Before reading this article, I would have assumed one wanted to write cat to
be constant space, and stream one block at a time through user space – i.e. in
O(n) number of system calls instead.

When I'm reading this implementation, I'm starting to realise this is the good
old space-time tradeoff, where one can either use O(n) space or O(n) system
calls. For very small files, the preallocated references are probably
preferable, and for bigger files, the streaming approach is better.

Does anyone know of any rules of thumb or approaches to estimate where the
cutoff is? Or is it just a matter of benchmarking on various platforms?

------
Aissen
It might be a newbie question, but why use fputc and outputing character by
character instead of using liburing to write directly to fd 1 ? (or writev?)

~~~
shuss
Yes, you could do that. But I wanted to keep the example programs in the
beginning very simple. I the same vein, they only deal with one request at a
time, for instance.

~~~
Aissen
Makes sense, thanks!

------
unlinked_dll
I almost certainly have no idea what I'm talking about but

Can I write a device driver using io_uring? As in talk to i2c endpoints
instead of normal files?

~~~
shuss
Nope. io_uring is an API that helps with asynchronous I/O.

~~~
unlinked_dll
That's kind of my question. I want async I/O with a device, can io_uring help?

~~~
Aissen
If you have a character/block device that does read/write ops, you'd probably
get io_uring support for free.

------
megous
This seems to assume there's always space in the ring during submission at
various places in the code.

------
dirtydroog
Does it support timers / timeouts?

Thanks for the article, it's very appealing to start to play around with this.

~~~
shuss
Yes. Please see io_uring_wait_cqe_timeout() in liburing.

