
Designing a Userspace Disk I/O Scheduler for Modern Datastores (2016) - spooneybarger
https://www.scylladb.com/2016/04/14/io-scheduler-1/
======
bzillins
Using a tool like diskplorer (mentioned in article 1) to gather the
concurrency and IOPS relations at various read sizes (mentioned in article 2)
seems like it a good datapoint for use in tuning a database filesystem.
[https://github.com/avikivity/diskplorer](https://github.com/avikivity/diskplorer)

------
spooneybarger
Part 2: [https://www.scylladb.com/2016/04/29/io-
scheduler-2/](https://www.scylladb.com/2016/04/29/io-scheduler-2/)

------
glommer
Such an interesting timing!

Just yesterday we have finished a redesign of this addressing some room-for-
improvement areas that we saw after those two years.

And I was about to start writing an article about it right now! Link soon =)

~~~
glommer
And here it is: [https://www.scylladb.com/2018/04/19/scylla-i-o-
scheduler-3/](https://www.scylladb.com/2018/04/19/scylla-i-o-scheduler-3/)

------
derefr
I've considered before, setting up my workstation so that it's running a
hypervisor OS (Xen; VMWare ESX), installing a Linux or BSD VM on it, using SR-
IOV to pass that VM control over my SATA/SAS/NVMe controller, and then setting
that VM up as an iSCSI target to share the disks out.

Then I could install another OS as a VM (e.g. macOS) and use SR-IOV to give
_it_ all my other hardware—but, rather than relying on my "shell OS" actually
having good semantics for things like software RAID, logical volume management
or thin provisioning—or, indeed, good disk I/O scheduling—I could just set up
those abstractions on Linux, and rely on the Linux kernel's I/O scheduler
(which I could patch to be anything I wanted, like, for example, the above)
and then share out the result as a plain block device for my "shell OS" to
consume using an iSCSI driver.

I suppose the same would be true for the network as well—the Linux VM could
act as a local packet filter and expose services like a caching DNS resolver
for the "shell OS" to consume.

I mean, this kind of stuff is really obvious when you've got more than one
computer (I'm just talking about a SAN and a backplane router); it's just the
fact that you can do it all on _one_ modern computer, with near-zero overhead,
that's interesting.

~~~
wtallis
I think what you're talking isn't actually SR-IOV; that involves having
multiple virtual functions on a single PCIe device, but you're just talking
about using the IOMMU to partition your devices into disjoint domains without
trying to give multiple VMs access to the same peripheral.

~~~
derefr
This used to be a real distinction (when computers had plain PCI peripherals),
but it's not really any more.

SR-IOV is the name for the PCI SIG's standardization of the control protocol
between the CPU and the IOMMU in respect to controlling PCIe devices. Even if
you're just mapping a device to a single VM, if it's a _PCIe_ device, you're
using SR-IOV to do it. (The ability to do this might still be referred to as
VT-d in Intel-platform BIOSes, but if you look in your OS device tree, the
functionality enabled with an "Enable Intel VT-d" setting should appear as an
SR-IOV controller. You could think of VT-d as the part _in the CPU_ that
speaks the SR-IOV protocol to the IOMMU.)

