*> there’s a trick to turning these kinds of I/O operations into something you c...

deathanatos · on March 13, 2017

In some cases, you have knowledge that you're about to do a whole bunch of I/O operations, sometimes all of the same time, sometimes not. (It doesn't really matter.) Ideally, I'd like to transfer this knowledge — that is, the list of I/O operations — to the kernel wholesale, so that it also has complete knowledge of the task at hand, and can better figure out the most optimal way to go about completing that (which it really can't do if it can't see the whole picture). This might mean scheduling disk operations more efficiently, to avoid seeking, or batching multiple network requests into a single packet, etc.

You can't do that with synchronous APIs without hackery, since the very structure of the API is self-defeating when it comes to getting the complete picture to the kernel. If I have a 1000 I/O operations, I do not want to spawn 1000 hardware threads: relative to the amount of information required to describe an I/O op, threads are incredibly expensive.

I don't want the syscall to represent the entirety of the work, simply, the request to have the work performed. The kernel's response is then essentially "Acknowledged, beginning this I/O. Here's a handle/means¹ to obtain the result of the operation." Then, I can batch-request notifications of results through some kernel I/O event queue … e.g., kqueue or epoll.

¹if handles are too much, you could also agree to have it stuck in some sort of queue of results, that might be usable with kqueue/epoll.

0xbadcafebee · on March 13, 2017

Do you want this bulk i/o syscall to inform you every time an operation is complete, or in stages? Do you want it to prioritize latency over bulk operations? Do you want it to take up more or less CPU? Will interrupts get thrown each time you query the status? Do you want to know when the operation is complete on spindle, in on-disk cache, in the filesystem cache? Do you want it to handle network filesystems differently? Do you want it to take advantage of multichannel NCQ and other features or implement your own in the kernel? Do you want this new i/o scheduler to affect the rest of the system's i/o, or only your application's? Do you want multiple applications to use different command queues or for yours to trump them (priority) ? Do you want the kernel to implement it's own batch ordering or rely on vendor firmware? (It sounded at first like you were describing vectored i/o but I assume you want something more abstract than that, kinda like a more generalized blk-multiqueue?)

deathanatos · on March 13, 2017

All good questions, but none of these seem possible in today's POSIX APIs either. (Most, I feel, probably are best just implemented as "options" to the syscall in either the sync or async view of the world.) The point was more to have async operations be possible, whereas today, they're not.

> It sounded at first like you were describing vectored i/o but I assume you want something more abstract than that, kinda like a more generalized blk-multiqueue?

Asynchronous I/O, not so much vectored (though vectored is similar, but I want to stay away from that term as most of the APIs (e.g., readv writev) I've seen for that aren't actually asynchronous and are just more efficient user-to-kernel bindings).

spc476 · on March 13, 2017

In our case at $WORK, it's because we have hard timeouts to meet our service level agreements with the Monopolistic Phone Companies. We need to return a response within X time, no exceptions.

0xbadcafebee · on March 13, 2017

In my past working with teams with network service SLAs, they had to design a robust multithreaded backend app and modify the frontend service to ensure all http transactions finished within 60ms. Timeouts were one of the smaller concerns...