
Ask HN: Is Evented or Async I/O just polling at the end of day - kureikain
Hi hacker,<p>So I were trying to understand evented, asyncio, epoll etc...<p>I understand that instead of waiting and blocking when doing IO, we will continue to running, and telling the &quot;kernel&quot; or &quot;lower stack&quot; that we&#x27;re interested when the data arrive...<p>but how does the kernel or the lower component that really do IO &quot;know&quot; when the data is arrived to notify us? Isn&#x27;t it just polling at the lowest layer? something has to wait there and keep checking?
======
LowerThanZero
Not necessarily. See, at hardware level the CPU can use what is called an
"interrupt". In its simplest form it's a CPU pin that receives electrical
signals from the outside world. The CPU can then well... interrupt from what
it was doing and see what the problem was. That part of code which is run when
the interrupt occurs it's called "interrupt handler". Say you wanna read a
sector from a disk drive and your disk controller is hooked to the interrupt
signal of the CPU... you tell the drive controller "gimme sector X" and then
you (as a OS kernel) mind your own business doing other things. Once the
platter spins in the right position and the read head is positioned etc etc
that is milliseconds later the controller gets what you asked for and it
issues an "interrupt" for the CPU. The kernel interrupt handler will go and
fetch the desired block. Or even better, a DMA controller can do the job and
take the data from the disk controller and move it somewhere in memory then
only bother the CPU to say "The required block has arrived your highness" ;-)

Network controllers can issue interrupts so you know a packet has arrived...
keyboard controllers too and so on. Hopefully you get the idea. :)

~~~
kureikain
Wow thanks so much. This really click in for me.

> In its simplest form it's a CPU pin that receives electrical signals from
> the outside world.

That is truly async :-).

------
brudgers
At the lowest layer, it's electrons moving through transistors or quantum
mechanics or quarks or turtles or what have you. It's all a question of
relevant abstractions. I'm not being dismissive because it is hard to accept
that object oriented programs quickly become indistinguishable from functional
programs as we move down the stack.

It's really a question of what our chosen abstraction allows us to take for
granted. If we're worried about what we take for granted, then we need to be
wary of "just" as in "just polling." Because polling entails scheduling and
efficient scheduling is NP hard and the code that implements the scheduling
abstraction at the layer of the kernel abstraction is almost certainly of
greater inherent complexity than the code written at the application
abstraction layer that utilizes the asyncio abstraction because the kernel
abstraction layer has to deal with multiple hardware threads and predictive
execution and pipelining and multiple magnitudes of latency for lots and lots
of processes pooled together.

If you're interested in diving down, I recommend _The Design of the Unix
Operating System_ by Bach. It has clear diagrams and language that show the
big ideas happening below the application layer.

~~~
kureikain
Thanks for that book introduction. I absolutely will check it out.

------
LowerThanZero
Your question though rises an interesting problem: ULTIMATELY, is there a
piece of code which loops while waiting for data? I guess the answer is
"maybe" but it's more of a philosophical question... in my previous example
the OS kernel running on the main CPU doesn't loop awaiting for a disk drive
block or a network packet but the code running on the controller CPU most
likely does.

So it's a matter of perspective I guess. You don't want to poll in your code
and you want to let the kernel do the job for you. Does it mean that the
kernel doesn't poll? Not necessarily; The kernel itself might be forced to
"poll" (e.g. the device driver can't do otherwise) or it can outsource ;-) the
dirty and time consuming poll job to a peripheral which in turn will have an
embedded CPU and firmware.

Bottom line: if you poll it is GUARANTEED to be a waste of time and CPU cycles
but if you let the kernel do it there's a chance that the system might do a
better overall job. That's because main CPU time is really expensive while
peripheral CPU resources might be cheaper in the big picture so ultimately
it's not a "poll vs. no poll" but rather "specialized poll vs wasteful poll"

------
citrin_ru
I'm not a kernel expert, but to my understanding it works in a following way:
kernel have threads which e. g. process incoming TCP/UDP packets or send
outgoing traffic. When such thread detect 'interesting' event, e. g. incoming
data was saved into a buffer for a TCP connection it send a notification which
cases user-space threads to wake up from a sleep and process an event.

There is always a polling in at least one place - a kernel scheduler, but the
scheduler generally doesn't care if sync or async IO is used (with sync IO
schedule has more work, because there are more threads/process for same number
of connections, but operation principles are the same).

