
Block-layer I/O polling merged into Linux kernel - chilledheart
http://lwn.net/Articles/663879/
======
xlayn
Interesting resume from a comment on the article

>There are basically two types of polling on the block side. One takes care of
interrupt mitigation, so that we can reduce the IRQ load in high IOPS
scenarios. That is governed by block/blk-iopoll.c, and is very much like NAPI
on the networking side, we've had that since 2009 roughly. It still relies on
an initial IRQ trigger, and from there we poll for more completions, and
finally re-enable interrupts once we think it's sane to do so. This is driver
managed, and opt-in.

>The new block poll support is a bit different. We don't rely on an initial
IRQ to trigger, since we never put the application to sleep. We can poll for a
specific piece of IO, not just for "any IO". It's all about reducing
latencies, as opposed to just reducing the overhead of an IRQ storm.

~~~
jpgvm
It's worth mentioning that isn't just any commenter, that is Jens Axboe, the
current block layer maintainer.

------
mangeletti
Would somebody mind explaining what this means in layman terms? I understand
what this change is, based on the article, but I don't know what implications
it has, from a high level viewpoint, because I don't know much about an I/O
system in the first place. For instance, does this mean that a program now be
able to detect block level disk changes via this mechanism, or something
completely different?

~~~
linuxguy2
I don't think the two other responders to your question quite have the whole
answer so I'm going to chime in. Warning: I'm not a kernel developer, have
only read the article and work with linux daily.

To answer your second question, this doesn't have anything to do with disk
changes/inotify/etc that a program would use. My understanding is that
currently \many\ IO devices respond to the kernel's request for data by
triggering an interrupt that then takes time for the kernel to get to for
reading. The interrupt process can be a bit slow leading to latency when
waiting for the disk to respond. The new system, rather than waiting for an
interrupt, continuously checks the driver for new data and as it doesn't rely
on an interrupt can achieve far better latency. Lower latency means more
operation per second.

With spinning disks who are only able to do <200 operations per second with
latencies around 5ms this won't have much of an effect but with SSD who are
able to do >2000 operations per second with latencies around 0.5ms trimming
off 0.1ms per operation (made that number up) via polling rather than waiting
on an interrupt can mean about 20% more operations per second.

~~~
truncate
I fear, this may impact negatively in terms of power. If I'm not wrong, the
point of having interrupts was to save CPU cycles wasted on checking status,
i.e. polling. Overall, I think it would be interesting to see some benchmarks
on real scenarios.

EDIT: Thinking a bit more about it, interrupts were introduced when CPU were
much more slower than they are now. So, the tradeoff I'm thinking isn't that
bad.

~~~
wallacoloo
The power difference may actually be immeasurable, or even an improvement on
some systems. Switching from one power state into a lower-power state may
actually involve one-time energy _payments_ , which are eventually negated by
the power savings as time progresses. In an extreme scenario, a core might
power-down its cache to reduce leakage current, which involves sending all of
its pending writes onto some bus, which is costly, and then it'll be reading a
lot of data back in from elsewhere once it is powered back on as it refills
the cache.

In the worst case, you might spend more time and energy switching between
power states than you actually spend in a lower power state.

But indeed, it does seem counter-intuitive even with that, as there are often
power mode changes available that would pay off in just a few microseconds. It
sounds to me like x86 may suffer from a limited IRQ system - there are other
systems out there in which IRQ overhead is < 10 cycles.

------
atomic77
Where this could be interesting is in cloud/VM environments where the block
device may actually be mounted over the network.

The performance improvement for fast devices cited in a link on the article
[1] are pretty dramatic, but I wonder about how slow the device needs to be
before polling becomes a problem. That same link mentions that slow devices
benefit, but, speculate that it may be due to the CPU not being able to go
into a deeper sleep state.

[1] [http://lwn.net/Articles/663543/](http://lwn.net/Articles/663543/)

~~~
ArkyBeagle
I would think that you'd have the network latency profile and the disk latency
profile and that they would add in amusing ( read : non-intuitive ) ways. I
read "mounted" to indicate a drive, not always a good assumption.

