
Getting Hands-on with io_uring using Go - rbanffy
https://developers.mattermost.com/blog/hands-on-iouring-go/
======
siculars
Another interesting post on io_uring:
[https://www.scylladb.com/2020/05/05/how-io_uring-and-ebpf-
wi...](https://www.scylladb.com/2020/05/05/how-io_uring-and-ebpf-will-
revolutionize-programming-in-linux/)

disclaimer - I used to work at ScyllaDB

------
networkimprov
Discussion of supporting this in the Go runtime is here:

[https://github.com/golang/go/issues/31908](https://github.com/golang/go/issues/31908)

TL;DR there's no ETA for it.

~~~
The_rationalist
The JVM project Loom will seamlessly enable io_uring and restartable sequences
for Async IO

------
cycloptic
Just from taking a look, it seems like there are two use cases where one would
really want to use this:

1\. Programs that do a lot of IO to and from regular files, and that don't
want to bother with a thread pool. For this reason I would expect to see it
land in the golang runtime, and in other event loop implementations like
libuv.

2\. Legitimately IO bound programs, which can use SQ polling. Other users of
aio fall in this category, as well as qemu.

For everything else, it looks like epoll is still mostly an equivalent choice
to io_uring? Has anyone got any benchmarks for using io_uring in a typical
network daemon, i.e. something that would generally be bound by socket I/O?

~~~
the8472
Lots of things that don't seem like heavy IO operations can still incur high
walltime costs, e.g. stating all files in a directory tree. Since each stat is
a syscall executing all of them sequentially is clostly. io_uring can be used
for batching these kinds of things even when your code isn't built around
asynchronous execution.

And with the 5.7 changes you can do polling + buffer selection + reading for
any number of sockets with a single syscall[0].

[0] [https://lwn.net/Articles/815491/](https://lwn.net/Articles/815491/)

~~~
cycloptic
>stating all files in a directory tree

That does seem useful. But wouldn't you want to put this into a library that
falls back to a thread pool approach on older kernels? And in that case, it
seems like the application doesn't particularly care what the underlying
implementation is? Since it's not built around asynchronous execution it just
calls some function that blocks until the CQ is complete. This appears to be
what the golang runtime will have to do.

>And with the 5.7 changes you can do polling + buffer selection + reading for
any number of sockets with a single syscall

Thank you, this is very close was what I was looking for. (It hasn't made its
way into the manpages yet)

~~~
the8472
> That does seem useful. But [...]

That was mostly meant as an example of the more general case where using io-
uring can reduce overhead of any kind of IO operation, even when they're not
the primary focus of your application.

The point is that yes, io-uring is great for event-based, asynchronous
libraries. But even traditional synchronous code can make significant gains by
switching to it. As the article hints with its package name: io-uring is the
one ring to bind them all.

> This appears to be what the golang runtime will have to do.

The go standard library can tie io-uring into its goroutine scheduler. When
doing IO it suspends the green thread, submits the work to IO uring and polls
the completion queue of the ring when looking for tasks that need to be woken
up.

------
nickcw
Great article! I now finally understand what io_uring is about. IO with no
syscalls - that is clever.

I've been confused by the name for ages. I now can parse it as IO Userpace
RING, rather than some sort of typo!

------
nnx
Looks like it only support files right now, could io_uring work for network
FDs? If so, can it reach performance close to kernel bypass (eg. DPDK) ?

~~~
couchand
Yes, io_uring supports network io. Kernel 5.5 added support for accept, but
you have to wait till 5.7 before you can string together an accept followed by
a read(v). I still haven't seen much in the way of hard numbers, yet, but
theoretically it should be darn quick. If you're using preregistered buffers
there's no copying needed, and if you ask the kernel nicely it will spin up a
thread to monitor the submission queue so you don't even need syscalls after
setup.

------
highfrequency
> Fortunately, there’s an age-old solution to this problem - ring buffers. A
> ring buffer allows efficient synchronization between producers and consumers
> with no locking at all.

I don't think this is correct--ring buffers still require a mutex to prevent
the producer from updating the write pointer while the reader checks the write
pointer.

[https://stackoverflow.com/questions/871234/circular-lock-
fre...](https://stackoverflow.com/questions/871234/circular-lock-free-buffer)

[https://stackoverflow.com/questions/52522512/do-i-need-a-
mut...](https://stackoverflow.com/questions/52522512/do-i-need-a-mutex-when-
using-a-circular-buffer-and-the-producer-consumer-desig)

~~~
WJW
As long as you have a reasonably modern CPU, you can use one of the various
Compare-And-Swap (CAS) instructions to implement properly lock-free FIFO
queues. There are a few variations but a circular array is one of the
possibilities. We can probably discuss for quite some time if it's entirely
lock-free if the lock is implemented in hardware but I think we can agree it's
not a mutex :)

The people over at enqueuezero have a nice article about lock free queues at
[https://enqueuezero.com/lock-free-queues.html](https://enqueuezero.com/lock-
free-queues.html).

