This is wrong. readv() can return as soon as a single byte had been read, in a similar fashion as read(). If you need to read all bytes you have to use a loop. This program is potentially printing non-initialized memory.
I think the same applies to the uring examples, which also don't seem to check the actual processed bytes.
I'm also actually not sure if in the uring version requests are guaranteed to be processed in order or not. I would have assumed the latter - and thereby the printed result could have been out of order.
Obviously the demos also miss all the required memory management - but I guess that was intended. But if we would add it, the uring versions would increase more in complexity than the synchronous version.
==> IO is hard. uring is exciting for high performance use-cases, but will rather make it harder than easier. However ideally most end-users would not have to use it directly (and neither liburing), but instead use an async/await/coroutine framework on top of it that makes the asyncness transparent to the application and allows to avoid most of the pitfalls.
There's still some nonzero complexity in using it instead of read/write, but it doesn't seem particularly burdensome to me.
I've got a lot of hope for higher-level misuse-resistant libraries on top of io_uring.
It’s also GPL3 which limits its use in many projects.
But I too am hopeful. Just not for rio’s API model specifically.
Failing to ensure correct behavior in the presence of `forget` means that once the program gets complex enough it can blow up in undetermined ways because one piece of code somewhere wasn't aware of a restriction in another piece of code that is potentially very far away.
Failing to isolate this type of unsafety is a pitfall waiting for some future developer to be trapped in.
> I'm also actually not sure if in the uring version requests are guaranteed to be processed in order or not. I would have assumed the latter - and thereby the printed result could have been out of order.
io_uring may not make completions available in the same order as submissions. The article goes into some detail explaining this. However, the "cat" example only deals with one request at a time which uses readv() and readv()'s iovecs are returned in order. So, this is not a problem. The "cp" example does deal with multiple submissions and completions, however it uses writev() on the completion of each readv(), writing the correct offsets. So, it wouldn't matter if completions arrive out of order. But this is indeed something to look out for.
You assume correctly. There is no ordering guarantee, unless events are chained (which is a special flag).
One of the 5.7 features, IORING_FEAT_FAST_POLL, gives a (free) performance boost of up to 68% compared to epoll:
The problem with io_uring, really, is that every kernel release has now become much, much more exciting than the last one due to all the improvements (c.f. 5.7 now having buffer selection primops, big deal!) I'm constantly, anxiously, ever-awaiting the next kernel release... :(
Isn't it a bit crazy that there are so many different ways of doing async IO on linux alone ? You would think that this is a more or less solved problem by now.
The terminology here is a bit confusing because different people have defined "async" differently, but in the strictest sense of the word, before io_uring the only other way to do async IO is POSIX AIO. See aio(7) in your man pages: http://man7.org/linux/man-pages/man7/aio.7.html
On the other hand, using select(2), poll(2), and epoll(2) is not really async IO.
The difference is really simple: for non-async IO, whenever you use read(2) or write(2) and that call returns without an errno, that operation is decidedly complete from the perspective of the user space. The buffer might not actually make it to disk (say because of caching) or the network (say because of the Nagle algorithm). But really from the user space perspective it is done.
What about "async" libraries in user space? All the bells and whistles added by these libraries in user space merely give you support for knowing when a file descriptor is ready. It doesn't make the actual reading or writing asynchronous. So with a such library, your code might appear to call read(), but the framework knows that the file descriptor isn't actually ready, suspends your greenthread/fiber/coroutine or whatever it uses and does something else. The actual read(2) call doesn't happen. The kernel doesn't know you want to read.
With io_uring, you actually deal with read or write requests sent to the kernel that are incomplete. You actually tell the kernel you want to read (by writing to a ring buffer, hence the name). The kernel knows you want to read. That's the difference.
Now don't get me wrong but I think for the majority of applications you don't need true async IO. All you really need is efficient notification of whether a file descriptor is ready. So AIO and io_uring are both niche topics that likely won't affect your app.
Prior to io_uring, libraries that provide async file access in userspace (by necessity) use some kind of threadpool, with each thread processing an operation synchronously and providing an async result via self-pipe or other user-driven event.
It is true that readiness notification does not lend well to disk io though.
> The core of this program is a loop which calculates the number of blocks required to hold the data of file we’re reading by first finding its size. Memory for all the required iovec structures is allocated. We iterate for a count equal to the number of blocks the file size is, allocate block-sized memory to hold the actual data and finally call readv() to read in the data.
My reading of this is that the first readv(2) implementation of cat uses space in O(n) where n is file size. With a very small constant, yes, but still linear in the number of blocks.
Before reading this article, I would have assumed one wanted to write cat to be constant space, and stream one block at a time through user space – i.e. in O(n) number of system calls instead.
When I'm reading this implementation, I'm starting to realise this is the good old space-time tradeoff, where one can either use O(n) space or O(n) system calls. For very small files, the preallocated references are probably preferable, and for bigger files, the streaming approach is better.
Does anyone know of any rules of thumb or approaches to estimate where the cutoff is? Or is it just a matter of benchmarking on various platforms?
Can I write a device driver using io_uring? As in talk to i2c endpoints instead of normal files?
: https://github.com/torvalds/linux/blob/master/include/uapi/l... search IORING_OP_
But you can still want async I/O with an I2C device, of course, as in you don't want to block a thread to wait on a message. And for that I believe you can still use good old select/epoll as usual on your I2C device file, and as a consequence also just use your favourite async I/O framework of the day (libevent, node.js, what have you).
Thanks for the article, it's very appealing to start to play around with this.