
I/O Is Faster Than CPU – Let’s Partition Resources and Eliminate OS Abstractions [pdf] - ingve
https://penberg.org/parakernel-hotos19.pdf
======
Animats
Mainframe designers had this problem under control by 1970. Mainframes had,
and have, "channels". A channel is part of the processor architecture. It
takes commands, sends them to a peripheral, and manages the data transfer in
both directions. Channels have some privileged functions through which the OS
tells them where the data is supposed to go in memory. The architecture of
channels is well defined, and peripherals are built to talk to channels. The
CPU has I/O instructions to control channels in a well defined way.

The peripheral never has access to main memory. There is no peripheral-
controlled "direct memory access" (DMA). So it's possible to give control of a
channel to a userland program without a memory security risk.

Minicomputers of the 1970s had low transistor counts and slow CPUs. So
peripherals were usually put directly on the memory bus, with full access to
memory. I/O operations were performed by storing into memory addresses, which
caused bus transactions detected by the peripheral device. There were no CPU
I/O instructions.

Microprocessors copied the minicomputer model. IBM's people knew this was a
bad idea, and in the IBM PS/2, they introduced the "microchannel". Peripheral
vendors, facing a new architecture that required more transistors, screamed.
IBM backed down and went back to bus-oriented peripherals.

That model persists today, even though the few thousand transistors required
for a channel controller are nothing today. Even though most modern CPUs have
I/O channel like machinery, it's exposed to the program as registers the
program stores into and memory accesses by the peripheral device.

So there's no standardization on how to talk to devices at the hardware level.
Some CPUs have protection systems, an "I/O MMU", and there have been various
channel-like interfaces, especially from Intel, but they have never caught on.

Instead, we mostly have heavy kernel mediation between the hardware and the
user program. And way too many "drivers". This has become a problem with
"solid state disk", which is really a random access memory device that doesn't
write very fast. Mostly, it's used to emulate rotating disks.

Samsung makes a key/value store device which uses SSD-type memory devices but
manages the key/value store itself. But you need a kernel between the device
and the user program. You can't just open a channel to it and let the user
program access it.

~~~
bogomipz
>"Mainframe designers had this problem under control by 1970. Mainframes had,
and have, "channels". A channel is part of the processor architecture."

What are some examples of these processors?

I would be interested in reading more about these mainframe processors
architecture. Might you or anyone else have some links?

~~~
neop1x
It is always surprising for me to see how mainframes have always been one step
further. I don't have any experience with them but I've learned that there
have been for example live VM migrations between hosts for a long time
already. And that there was a cool system called IBM AS/400 introduced in 1988
with integrated database and "everything is an object" design and peripherals
with processors and which is still being used for some critical applications
today. Our x86 clouds with Kubernetes may sometimes feel like a bunch of cheap
toys in a way. :P

------
jandrewrogers
At least in database kernels, we noticeably reached this threshold around five
years ago with typical server hardware. This is an interesting computer
science problem in that virtually all of our database literature is based on
the presumption that I/O is much slower than CPU.

If you cleanroom a database kernel design based on the assumption that I/O
performance is not the bottleneck, you end up with an architecture that looks
very different than the classic model you learn at university. It is always a
tradeoff of burning a resource you have in abundance to optimize utilization
of a resource that is scarce, and older database architectures are quite
wasteful of resources that have become relatively scarce on newer hardware.

~~~
owaislone
Are you aware of any efforts by existing databaes or entirely new databases
built with this realization in mind?

~~~
jchrisa
NoSQL is based on this realization. Rather than optimize for IO with
normalization, denormalize and pipe data more or less directly from storage to
clients, simplifying the compute and consistency model to make distributed
data easier.

~~~
mike_ivanov
The point of normalization is logical consistency, not IO optimization.

~~~
jchrisa
Sure. But the point of (early) NoSQL was IO optimization, not logical
consistency.

~~~
bogomipz
I was curious about the parenthetical "early" in your comment. Is this
optimization no longer the case then? Could you elaborate on recent
developments regarding this? I've been out of the NoSQL loop for some time and
am genuinely curious.

~~~
tracker1
The problem is, it varies... you have what are effectively key/value stores to
document databases, column stores and everything in between. You have systems
built on other systems. RethinkDB and Cockroach have different approaches than
Redis, Mongo, Cassandra or others.

CockroachDB gives an SQL interface over the top of a distributed data store
with better consistency and relations. Cassandra has no real relations over a
BigTable/Column-Store solution. It really just depends.

I think with what's coming out of SSD/NVMe and even Optain DIMMS, that there
will be databases directly tuned to control/set their own data storage in
these environments.

~~~
bogomipz
>"I think with what's coming out of SSD/NVMe and even Optain DIMMS, that there
will be databases directly tuned to control/set their own data storage in
these environments."

I think this is already happening. I think it was Aerospike that used the FTL
of of NANd/NVMe drives for a direct key value store and I think another vendor
maybe Fusion had an SDK for this as well. The Optane stuff looks really
interesting, are there any server vendors shipping with those?

~~~
Rafuino
Not OP but, yes, server vendors are shipping with Optane DC Persistent Memory,
like Lenovo [1] and Supermicro [2], among others. Google also has a testing
program [3, 4] with access to instances with 7TB of total memory.

[1] [https://lenovopress.com/lp1066-intel-optane-dc-persistent-
me...](https://lenovopress.com/lp1066-intel-optane-dc-persistent-memory) [2]
[https://www.storagereview.com/supermicro_superserver_with_in...](https://www.storagereview.com/supermicro_superserver_with_intel_optane_dc_persistent_memory_first_look_review)
[3] [https://cloud.google.com/blog/topics/partners/available-
firs...](https://cloud.google.com/blog/topics/partners/available-first-on-
google-cloud-intel-optane-dc-persistent-memory) [4]
[https://docs.google.com/forms/d/e/1FAIpQLSeX1tN6Qt-
aQUK2iVVi...](https://docs.google.com/forms/d/e/1FAIpQLSeX1tN6Qt-
aQUK2iVVioClFX5N-061jqO46vzpHzAPHkjwzVw/viewform)

------
thaumaturgy
The paper reads like it's suggesting moving the burden of complexity in
dealing with varying hardware interfaces from the kernel to userland so that
userland can take direct advantage of higher performance hardware when it's
available.

I could see that for some very small niches, but in general I think it would
be a terrible development for the industry.

Hardware vendors don't like to share. They don't share code, they don't share
common interfaces, they don't even share documentation. As it is now, these
are all problems which most userland developers don't have to care about --
those problems get dealt with in the kernel, by developers who specialize in
building support for uncooperative hardware.

The average application developer doesn't want to have to figure out how many
queues are supported by a NIC just to open a connection on the network.
Further: the average application developer isn't experienced enough to do this
correctly.

Given the niche where these tradeoffs make sense, I'm not sure why the paper
bothers to emphasize security at all.

~~~
johnm1019
Disclaimer: IANA kernel developer.

Is there any benefit to having those specialized developers create frameworks
or libraries in Userland which other developers can leverage? This way they
remain the interface to uncooperative hardware, but the code is in Userland so
the bold folks can try their own approach.

~~~
dorlaor
Seastar.io is another one, it's an async engine that utilizes all of the cores
in a modern system and is the heart of Scylla's DB. However, it's complex to
use it correctly

------
yingw787
There was a great blog post I read a while back about constructing a caching
layer across network by Dan Luu: [https://danluu.com/infinite-
disk/](https://danluu.com/infinite-disk/)

I asked a friend who works in a quant firm and he was like yes it’s true, and
it is pretty insane.

I think there’s research Microsoft and Google are doing for RDMA over 100G
Ethernet for intra data center communication as well. Pretty neat.

~~~
theincredulousk
Yes! Was surprised the paper didn't specifically mention RDMA, or to a lesser
degreee SRIOV, with all their focus on NICs.

Also, there may be ongoing research, but it isn't theory at all. HPCs, HFTs,
and the could providers have been leveraging RDMA for a long time - e.g.
Infiniband. Doing it over Ethernet (RRoCE) is relatively new, and it isn't
necessarily any big leap that is happens over 100G instead of 40 or 1.

However, an interesting point as network links go to 100G+ (esp. for RDMA) is
again on the storage/processing side. E.g. a wireshark capture on a 100G
connection? ~12.5 GB/second, near max bandwidth for DDR3, and can fill 64GB of
RAM in about 5 seconds at full fire-hose. So again the hot-potato of
bottleneck will be passed, at least for maximum sustained performance
situations.

Side note, AFAIK RoCE exists mostly due to non-technical arguments,
particularly the inertia created by existing familiarity and deployment of
Ethernet in data centers. I think Microsoft was the one flexing on a
standards-body to push it through. It is somewhat of a kludge as Ethernet
wasn't designed with RDMA in mind - no guaranteed predictable latency, frames
can and will disappear if switch buffers overflow, etc. So IMO "research" into
the topic isn't super profound - akin to studying how your sedan might be
heavily modified to go off-roading almost (but not quite) as well as a pick-up
truck.

Even now many that have the luxury are just going Infiniband from the get-go
if RDMA/latency are the key priorities rather than tacked on later.

~~~
Veedrac
The paper does mention RDMA.

------
deRerum
In the past (like around the time most programming languages were invented)
memory speeds were faster than processor speeds. So all variable accesses were
instantaneous. Languages like C did not have to worry about memory
hierarchies.

If memory speed is 100ns then you would notice the memory bottleneck around
the time when your processor speed is 10Mhz. This point was reached in the mid
1980s with the 286 processor. Yet through the addition of cache memory this
bottleneck was hidden from most software. They continued to operate in a
bubble as if they were still running on the hardware of the 1980s.

It’s a bit like life itself...we land mammals carry around bags of water under
our skin and our cells are still batched in fluids as if we are still living
in the environment of the oceans hundreds of millions of years ago.

Many programming languages have been invented since the 90s but as far as I
know none of them explicitly model memory latency and make reference to memory
hierarchies. It’s as if they still need to maintain the illusion that they are
running on the hardware of the past.

(Note: I once read about a language called Sequioa developed at Stanford that
explicitly modelled the memory hierarchy. I don’t know what happened to it).

~~~
bogomipz
>"If memory speed is 100ns then you would notice the memory bottleneck around
the time when your processor speed is 10Mhz."

Sorry I'm not following the math there, whats the relation between 100ns and
10Mhz? Why is that the tipping point?

~~~
deRerum
A 10Mhz processor has a clock cycle of 100ns (0.1 millionths of a second).
Those are just rough representative numbers I picked...any particular RAM
delays would be different and the actual latency would be complicated by bus
speeds and protocols etc.

~~~
bogomipz
Thanks for the explanations that math makes sense to me now. Cheers.

------
the8472
Don't kTLS sockets[0] with crypto offloading[1], sendfile/vmsplice, device-to-
device DMA transfers[2] and possibly io_uring solve all those things on linux?
Granted, they're not POSIX, but they're incremental extensions.

Netflix implemented similar extensions in freebsd[3]

[0]
[https://www.kernel.org/doc/Documentation/networking/tls.txt](https://www.kernel.org/doc/Documentation/networking/tls.txt)
[1] [https://lwn.net/Articles/734030/](https://lwn.net/Articles/734030/) [2]
[https://lwn.net/Articles/767281/](https://lwn.net/Articles/767281/) [3]
[https://people.freebsd.org/~rrs/asiabsd_2015_tls.pdf](https://people.freebsd.org/~rrs/asiabsd_2015_tls.pdf)

~~~
loeg
Not really. These are all incremental performance improvements on POSIX but
don't address the author's concerns / desires in the paper. All of them
continue to require the kernel to mediate IO between userspace and the
hardware. For some reason the author is fixated on direct user access to
partitioned hardware queues.

Netflix's CDN operating system is based on FreeBSD, and they did add a kind of
kTLS implementation, but they did not add it to FreeBSD upstream for a host of
reasons.

~~~
the8472
The means may be different, but the ends they aim for seem to be similar. The
ends here are improved latency, throughput, parallelism, non-blocking APIs and
security.

The above-mentioned improvements aim to address the first four without
completely bypassing the kernel, instead changing APIs so they step out of the
way most of the time, limiting them to coordination tasks and then either
offloading to the hardware or directly piping the data to/from userspace
mappings without additional context switches where needed.

VMs using SR-IOV address the security aspect.

It's basically the difference between a green field design and tacking all
those innovations onto the glueball that is linux. The latter may be ugly and
complex, but it has the advantage of being backwards-compatible.

~~~
loeg
> It's basically the difference between a green field design and tacking all
> those innovations onto the glueball that is linux. The latter may be ugly
> and complex, but it has the advantage of being backwards-compatible.

I'm definitely not trying to argue in favor of the paper's opinion. :-)

I just believe the author of the article would disagree that the glueball your
earlier remarks describe addresses the author's concerns. They specifically
say that Linux cannot be fixed to their satisfaction, or at least argue that
idea.

Look for the section on page 5 headed, "Why not use kernel-bypass techniques
on Linux?"

~~~
the8472
Well, and all I am saying is that the paper, including the section you
mention, don't take these latest developments into account. They're not kernel
bypasses. They're driver level optimizations and traditional APIs operating on
file descriptors and virtual memory aiming to solve similar problems.

This stuff is very new of course, so we have to wait for something to actually
integrate all those pieces and then for benchmarks.

------
pulkitsh1234
This paper was very accessible as compared to other academic papers, is there
a way to find other papers like these ? Maybe its the lack of math equations
and benchmarks.

I like how most of the statements are supported by examples, which makes it
easier to understand (after some Googling ofc), especially for someone like me
who is a million miles away from academia and a programmer who rarely has to
think about kernel/CPU/memory intricacies, mostly due to working with higher
level languages and abstractions on top of the OS itself.

My uneducated and naive thoughts on this paper: Instead of replacing the
kernel with `parakernel`, is it possible to implement a POSIX compatible
kernel layer over the parakernel itself ? So that drivers, linkers, and other
abstractions don't have to be re-implemented again for the parakernel.

------
bluetomcat
We need entirely new OS abstractions to replace the dated notions of
hierarchical file systems built around the metaphor of file cabinets, I/O as
streams of bytes, terminals, process hierarchy. Essentially, say goodbye to
the Unix model after 50 years. It would open up an entirely new world of
software experimentation and craftsmanship.

~~~
Ericson2314
Amen to that. Unix was never a good design, and now is severely out of date.
We can no longer afford to hack around it.

~~~
thaumaturgy
Yeah, Unix was a terrible design. Composable architecture combined with
standardized i/o and IPC models. I can't imagine anything worse, especially
compared to today's proliferation of data formats and inscrutable APIs.

It's a good thing nobody uses it anymore.

~~~
pjmlp
Had Bell Labs been allowed to sell it in first place and that would have been
exactly its outcome.

~~~
jstimpfle
missing the sarcasm?

~~~
Ericson2314
I think you are.

------
LorenPechtel
This takes an idea I had years ago and goes much farther with it. My idea:
Disk and file access is handled by the memory paging system. A 64-bit
machine's segment registers can point to a space far bigger than the largest
hard drives. Thus a drive ID would simply be a segment register value, the
drive would be accessed by reading/writing memory at an offset from that. A
file handle would likewise be a segment register value. The results of doing
this would be the use of all surplus memory for disk cacheing and the paging
system would take care of all disk buffering, you could efficiently read/write
small chunks of data.

Now lets add their approach: When you cause a page fault from accessing stuff
not in memory you get the context switch but the actual workload could be
handled by an auxiliary controller, it need not be on the CPU.

Changes: Locking parts of a file would be on a friendly basis, you would be
able to get around the rules. Access to remote files with small chunks of data
would still be inefficient--but the vast majority of accesses are local and
remote accesses are generally documents that are read in their entirety.

------
ktpsns
Strictly speaking, the sentence "I/O is faster then CPU", aka "memory access
is faster then computations" is nonsense, because it compares apples with
bananas. One could probably say "transfering x data between CPU and SSD is
faster then performing the computation f(x) on the CPU", where still f remains
undefined.

~~~
masklinn
You're misreading the article. Its subject is that because of the way IO
stacks have been built _CPUs are becoming the bottleneck in IO_ , this is an
issue for both network and non-volatile storage IO e.g.

> a 40 GbE NIC can receive a cache line sized packet every 5 ns, but the last
> level cache (LLC) access latency is already up to 15 ns, which means a
> single LLC access can already prevent the OS from keeping up with arriving
> packets

and

> NVMe SSDs perform I/O faster than the OS can accept new (asynchronous) I/O
> requests and notify their completion.

They also note e.g. that while nvme provides for 65k command queues OS
generally have one IO queue per CPU.

~~~
jeanmichelx
Latency ≠ throughput

~~~
masklinn
The LLC access is a sequential cost to processing the packet.

~~~
loeg
Sure, but the CPU can amortize that cost over many packets if it is
infrequent, while maintaining the same throughput. Also, cache line sized
packets are relatively tiny.

------
MrTonyD
Well, I spent a number of years writing drivers for PC systems. Some years the
I/O chips were faster than CPU, and some years the CPU was faster than I/O
chips. DMA was usually slower, just because release cycles for CPUs tended to
be faster than release cycles for I/O controllers. Eventually, most driver
writers decided that it was usually better to use CPU, even if the I/O
controller was faster. That way, when the CPU got upgraded, you would
automatically get a speed boost. While programming an I/O controller was both
more arcane and more more likely to require a complete reimplementation in a
couple of years (as well as customer complaints and market share losses.)

I'm not saying that things are the same today - but it kind of sounds to me
like they are. Back in the days, people were always claiming that we should
switch to the newest and fastest I/O controller since CPUs were more general
purpose and would therefore always be slower. It just didn't work out that way
in practice.

------
pjc50
Interesting. It's long been the case that a "computer" pretends to be a single
processor to the programmer while in fact being a cloud of semi-general
processors which communicate through messages. This makes that completely
explicit, giving the programmer all the power and hassle involved in speaking
as directly to the devices as possible while maintaining isolation. Similar
esoteric architectures are already available (e.g. Tilera, or all the way back
to the Inmos Transputer).

Given the allocation of particular hardware devices - NIC, RAM, NVMe - to
particular processors running a (static?) application process, it's not clear
how the filesystem abstraction would work or whether that's simply delegated
to the application. This is very definitely a server-focused system as no
mention is made of GPUs or interactive devices.

------
laythea
This is kinda like what they did in the graphics API world. Moving from OpenGL
to Khronos in order to "cut the fat" between the user program and the
hardware.

~~~
ambrop7
s/Khronos/Vulkan/

------
m0zg
Some types of IO have been faster than CPU for quite a long time. For
instance, a cache miss is typically about 200x slower than accessing data
already in the register file. What this means in practical terms is if you
miss cache all the time (aren't garbage collected languages wonderful?) your
4GHz CPU turns into a 20MHz pumpkin (or thereabouts), and a fully sequential
read from a modern _spinning drive_ (150+ MB/sec) could produce more
throughput. A consumer-grade 10GbE NIC will leave it completely in the dust,
as will USB3.

------
bhouston
Very interesting shift that happened over the last 2 decades.

We likely haven't designed OSes or CPUs to match this new reality.

~~~
ajross
Largely because it's not really a new reality. IBM faced the same issues on
the 360's half a decade (edit: sorry, century!) ago -- you could stream data
off of stacked platters in a drive into core much faster than a CPU could
manage the copy. And the solution was to invent "I/O Channels", which were
early DMA controllers. And the VM layer (when it was added) was cognizant of
this stuff, so applications could be written directly to the channel
interface.

There's nothing new under the sun, basically. It's an Ecclesiastes design. I
haven't read through the whole article, but my guess is that the "parakernel"
interface the authors are positing is going to look a lot like the IBM Channel
interface.

~~~
pjc50
Is there a good concise explanation of the 360 architecture available on the
web? It seems historically important but I've not seen such a thing.

~~~
MagicPropmaker
Why, yes there is! The IBM 360 Principles of Operation. When I got my C.S.
degree we spent a whole semester studying this.

[http://bitsavers.trailing-
edge.com/pdf/ibm/360/princOps/A22-...](http://bitsavers.trailing-
edge.com/pdf/ibm/360/princOps/A22-6821-0_360PrincOps.pdf)

~~~
stormbeard
This is a treasure trove of information; however, it's not what I would call
concise. Did you run across anything else to supplement this during your
program?

------
phkamp
Congratulations!

You have reinvented the Mainframe Channel Processor!

Your next challenge: Try to avoid reinventing the 3745 Frontend Processor.

~~~
tinktank
Why so negative and condescending?

~~~
p_l
While I don't like that tone, it's at times hard to not fall into it.

Because all of this had happened before and will happen again, often without
learning anything about the past (example case: NoSQL)

------
oblio
Is this true for most real life workloads? There's that famous rule-of-thumb
indicator for latencies: [https://www.prowesscorp.com/computer-latency-at-a-
human-scal...](https://www.prowesscorp.com/computer-latency-at-a-human-scale/)

It doesn't seem to be that the orders of magnitude are so close as to require
totally rethinking mainstream kernels.

Or am I looking at this the wrong way?

~~~
jcranmer
> Is this true for most real life workloads?

No. It's true if you care about NVMe drives, or high-speed networking--which
is to say, it's true if you care about a few kinds of server workloads, but
it's absolutely not true for most consumer hardware.

~~~
wtallis
In consumer use cases, it is common to see the storage bottleneck be the CPU
rather than the SSD. Firing up a video game isn't much faster on an Intel
Optane NVMe SSD than on a SATA SSD, because the data on disk has to be
decompressed and parsed on the CPU before it is usable. A lot of software is
still written under the assumption that the disk is slow, and that capacity is
somewhat limited. Taken together, those assumptions usually lead to single-
threaded loading and decompressing/parsing on the same thread that makes the
system calls for IO.

------
ObscureScience
I apologize for not reading much of it yet, but could someone give a quick
comparison to the exokernel idea?

~~~
_0ffh
The parakernel does not multiplex resources that are partitioned to allow
applications to maximize the performance obtained from the underlying
hardware. For multiplexed resources it looks much the same to me.

------
ketralnis
A real world example at Alipay:
[https://news.ycombinator.com/item?id=17814185](https://news.ycombinator.com/item?id=17814185)

------
sinisa_cyprus
Only mainframes had channels. Best implementation thereof is in IBM machines.
It is nothing like DMA or that Intel chip or anything else. No Unix machines
nor PC and not even specialised hardware, like Tangent, had anything similar.

It is something like separate FPU or MMU units, built for the total control of
the peripherals, so that CPU had little or no work to do. Don't forget that
device drivers run on CPU.

------
wmu
BTW, does anybody know a paper about doing some DB ops directly on disc
controllers? The other day my former colleague mentioned that he come across
such paper (maybe a blog post?), but we couldn't find it. It's really
interesting idea and I believe it's doable, although under very specific
circumstances (disc vendor specific, sectors layout aligned to DB needs,
etc.).

------
Blackstone4
Does this mean we go from virtual machines (i.e. VMWare) to kubernetes &
containers in data centers? Something similar to RanchOS

~~~
ori_b
It means you go from an OS that provides complex drivers to one that barely
provides drivers -- essentially muxing the raw hardware between processes.

~~~
donaldihunter
And [https://superuser.openstack.org/articles/vpp-
vswitchvrouter-...](https://superuser.openstack.org/articles/vpp-
vswitchvrouter-openstack/) for userspace packet processing.

------
inetknght
Look to GPUs for better solutions: make more discrete -- less functional
overall but higher performance -- cores, and move them closer to the data;
then let the CPU just handle coordination of the discrete processors. Think of
SIMD but on a massive scale.

Think of blocks of RAM with math processors. Or the same in your NVMe/NIC/etc.

~~~
convolvatron
'systems code' like protocol implementations and device drivers tend to be
very control-flow centric. on a simd machine in the worst case this means
narrowing the 'vector length' to effectively 1. since these are throughput
machines, thats often pretty bad.

I do agree with you that lots of little processors is a good way forward here
with a careful eye towards reducing sharing of state, but maybe its useful in
this case for them to have their own instruction streams.

~~~
inetknght
I thought separate cores _do_ have their own separate instruction streams --
sometimes even completely different architectures and/or supported
instructions? Is that not the case?

~~~
p_l
No, they aren't. Generally you have some level of grouping and complex rules
on how and when can they branch

------
vkaku
C - (stdio) - (net) = what an ideal programming library looks like. IO ring
FTW.

Eventually, you realize that you're trying to use line buffers / getc / ungetc
to parse lines on that packet of data on the iio_ring to serve that cat
picture for teh Internetz. :)

We need to eliminate variable length protocols to make these interfaces go
away.

~~~
warrenm
It's turtles all the way down

\- Wondermark: [http://wondermark.com/357/](http://wondermark.com/357/)

\- XKCD: [https://xkcd.com/676/](https://xkcd.com/676/)

------
racuna
On Exadata architecture (and the recent versions of Oracle Database) the
search is made by the hardware, not the software.

I'm not a fan of Oracle, but things like that are awesome.

~~~
vetinari
Really? What I vaguely remember that the storage nodes were embedded linux
boxes. Yes, they understood indices and would return only minimum needed, but
it was still software.

------
sly010
Isn't the BPF infrastructure an example of this idea?

------
warrenm
If you _actually_ knew the storage was fast enough, then sure

But you can't - except in _very_ specialized (ie dedicated) designs

------
loeg
Is this like the academia equivalent of an op-ed? No methods, no data, just
opinion based on some recent trends?

~~~
scott_s
That's not a bad way of thinking about it. The HotOS workshop is not a place
where people publish completed work, but rather present new, early-stage
ideas. From the call-for-papers
([https://hotos19.sigops.org/cfp.html](https://hotos19.sigops.org/cfp.html)):

> _We solicit position papers that propose new directions of systems research,
> advocate innovative approaches to long-standing problems, or report on deep
> insights gained from experience with real-world systems. We seek early-stage
> work, where the authors can benefit from community feedback. An ideal
> submission has the potential to open a line of inquiry for the community
> that results in multiple conference papers in related venues, rather than a
> single follow-on conference paper. The program committee will explicitly
> favor early work and papers likely to stimulate reflection and discussion
> over mature ideas on the verge of conference publication._

------
z3t4
IO faster then CPU puts CS best practice and intuition upside down.

------
kaetemi
So... separating the data and control planes?

------
toolslive
the OS kernel has been the problem for quite a while now.

------
Ericson2314
> A prototype parakernel written in Rust is currently underdevelopment

...Where?

------
agumonkey
Are we going to have transputers ? :)

------
ummonk
Except I/O is still slower than CPU...

