Hacker Newsnew | comments | show | ask | jobs | submit login

> The kernel is the problem

I had an epiphany one day when I realized that the kernel is nothing but a library with an expensive calling convention.

The only reason we bother calling the kernel at all is because it has privileges that userspace programs don't have, and it uses those privileges to control/mediate access to shared resources.

The downside of the kernel is that there's no way around it. You can't "opt out" of any of its decisions. With any normal library, if had an O(n^2) algorithm somewhere, or wasn't optimized for your use case, or just generally got in the way, you could choose not to link against it. User-space libraries are democratic in this respect; you vote with your linker line. But with the kernel, it's "my way or the highway." The kernel is the only way to the hardware.

Here's an unfortunate example: O_DIRECT is one of those few ways that you can sidestep the kernel. With O_DIRECT you can bypass the kernel's page cache, which there are very good reasons for wanting to do. But Linus's opinion is that "I should have fought back harder": https://lkml.org/lkml/2007/1/10/233 He thinks it's unfortunate that you can sidestep the kernel, because "You need a buffer whatever IO you do, and it might as well be the page cache."

But what if you want better memory accounting, or better isolation between users, or a different data structure, or any number of other tweaks to the page cache implementation? Well thankfully we have O_DIRECT. Otherwise, your only choice would be to try to convince the Linux kernel maintainers to integrate your change, after you've tweaked it so that it's suitable for absolutely everyone else that uses Linux, and given it an interface that Linux is willing to support forever. Good luck with that.

The kernel has always been the problem. User-space is where it's at.




People have had that realization in the past

http://en.wikipedia.org/wiki/Exokernel

http://en.wikipedia.org/wiki/Microkernel

http://en.wikipedia.org/wiki/Tanenbaum%E2%80%93Torvalds_deba...

There are good reasons why they have not caught on, performance being the most salient one.

-----


> People have had that realization in the past

"The kernel is just a library" isn't exactly the same sentiment as "the kernel should be as small as possible" -- I believed the latter before I fully understood the former. "The kernel is just a library" means that all of the experience we have designing and factoring userspace APIs carries over into kernel design. Furthermore it means that the kernel is a strictly less flexible library than userspace libraries, with a strictly more expensive calling convention, and that its only advantage is that it can protect and mediate access to hardware.

> There are good reasons why they have not caught on, performance being the most salient one.

Most of the received wisdom about microkernels is based on outdated designs like Mach, and not modern designs like L4. L4 is significantly more efficient than Mach.

-----


...and that its only advantage is that it can protect and mediate access to hardware.

It has another advantage, which is that it acts as a trusted third party that can allow two mutually untrusting users/program to share a software-implemented resource - for example, a disk cache or a TCP stack.

-----


Exokernels use library operating systems.

L4 was also one of a kind endeavor extremely optimized for the specific architecture. I can't really imagine that something like this could be achievable for a commercial OS.

-----


OKL4 is built on a third-generation L4 microkernel and is deployed to over 1.5 billion mobile devices: http://www.ok-labs.com/releases/release/ok-labs-software-sur...

-----


"[OKL4] is a microkernel-based embedded hypervisor". That does not quite sound like an OS.

-----


What is an OS but a process hypervisor?

-----


monolithic kernels are good for general purpose computing, but lose their edge in specific use cases (e.g. 10M connections).

-----


Instead of moving kernel functionality inside user mode, another way of shortcutting this "expensive calling convention" would be to run your code in kernel mode (kernel module, or with something like kernel mode linux). (Possibly implementing most of your code with tools/languages that makes screwing up with global memory difficult or impossible)

Security and isolation might be provided by other means like virtualization.

Not saying it's a better way, it's just an alternative to writing time critical drivers in user-space as TFA proposes. I suspect that if you have to handle 10 million connections, you don't really need a full featured multiuser desktop system and you can manage this component to have specific deployment requirements.

-----


'Virtual hardware' is an old idea, nobody has mentioned it here. The idea is, the hardware interface can be exposed in a page in user space. The driver is a library that can run in the application process. The hardware has to reconcile several user-mode apps but that's usually just queueuing.

Gain: no user-kernel process switches; no kernel call to poll completion; no data copying between processes; no interrupt transition. This was the idea behind InfiniBand but virtual hardware is older than that.

-----


>Gain: no user-kernel process switches; no kernel call to poll completion; no data copying between processes; no interrupt transition

AFAIK that's may be implemented on existing Linux system using Intel DPDK or PF_RING (routing packets directly from NIC to mmap'ed memory in userspace), "maxcpus=" Linux boot directive (to run Linux kernel on one or two cores only), and "pthread_setaffinity_np()" call in user process (to run the process on other cores which will never be imterrupted by Linux kernel).

-----


Compile everything to MSIL (or Java) bytecode. Apparently the time saved on syscalls and on not using some of the hardware memory protection features is enough to compensate for the type-safety overhead, so it's not even slower.

http://research.microsoft.com/en-us/projects/singularity/

-----


Agree completely. Linux is developing pretty nice opt-out mechanisms though, despite Linus's best intentions.

In Snabb Switch (http://snabb.co/snabbswitch/) we take large blocks of RAM (HugeTLB) and whole PCI device access (mmap via sysfs) from Linux and then do everything else ourselves in userspace. The kernel is more like a BIOS in that sense: it gets you up and running and takes care of the details you really don't care about.

EDIT: Extremely early draft ahem detailed design document: http://snabb.co/misc/snabbswitch-07may2013.pdf

-----


But what if you want better memory accounting, or better isolation between users, or a different data structure, or any number of other tweaks to the page cache implementation?

Oh, I don't know, you could always write your own kernel module. Considering I used to work on a kernel module that hooked the IRQ calls to implement hard RT, I'm not convinced that the kernel is that constraining. If you really feel constrained by the kernel, you could do without and go bare metal. Even if you don't want to do that, you've still got tons of other possibilities (the BSDs, any of the OSS microkernels, etc, etc). At least you've got those options with FLOSS kernels; I'd like to see how far you can get with changes to a closed source kernel, or worse, some hardware that's locked down to "keep you safe from malicious code".

As for the approval process of Linux patches, there's good reason for that, and other FLOSS kernels are just as (if not more) picky. The truth is, the Linux kernel has probably had more use cases thrown at it than any other software, ever. It's not perfect, nor heavily optimized towards any particular configuration; but it's highly configurable, and with a bit of tweaking (which would take less time and be less error prone than going bare metal), you can almost always get an acceptable solution to your problem.

Some cases where routing around the kernel are of genuine interest and necessary; most of the time, though, I far too often hear of programmers blaming others for performance problems that they haven't run a profiler on (if they've even heard of profiling).

-----


That's an interesting perspective.

Thinking in those terms, the benefits of the "privileged lib" are:

- has a lot of functionality (support for hardware X, Y etc, good implementations of important, difficult algorithms)

- interoperability and network effects, if you use this library you can conveniently interoperate with other applications using it.

You can also get around the "strictly more expensive calling convention" by writing your code to run as a loadable kernel module. Obviously that's inconvenient, but that's kind of the point of this - trading convenience for performance.

-----


It's funny that you mention those as benefits, because to me those are two big reasons why userspace is better.

For example, think of LLVM. GCC has a lot of functionality, and implements a lot of important, difficult algorithms. But because GCC is in userspace, LLVM/Clang have been able to get better and better until they can realistically challenge GCC in this space. People are starting to use LLVM more and more; it offers really compelling new tools like ASAN (http://clang.llvm.org/docs/AddressSanitizer.html), Clang Static Analyzer (http://clang-analyzer.llvm.org/), and LLDB. LLVM is also offering an embeddable and modular architecture, which GCC has opposed on philosophical grounds for years.

Because this is in user-space, LLVM was able to form and grow alongside GCC. People didn't have to make a big switch just to try out LLVM; it's not intrusive like it would be to switch from Linux to FreeBSD or anything like that. That's why I think the network effects of user-space are better. In the kernel it's "get merged or die." In user-space, similar projects can compete and people can vote with their linker lines.

There are tons of examples of this. In the last 10 years we've seen a lot of displacement in userspace, where next-gen technologies have made a lot of inroads against the incumbents:

    - Apache has been challenged by nginx
    - Perl has been challenged by Python/Ruby
    - screen has been challenged by tmux
None of these upstart competitors had to ask permission or get buy-in from the incumbents, they just did it. Now compare this with when Con Kolivas had an idea in 2007 for how to improve the Linux scheduler. He was really passionate about this and was maintaining his own patch against the Linux kernel -- he called his scheduler the Staircase Deadline scheduler (http://lwn.net/Articles/224865/). He showed positive results and had a lot of fans of his patches. But then Ingo Molnar, the established Linux scheduler guy, took Con's ideas and created his own separate rewrite of the scheduler that he called the "Completely Fair Scheduler." Linus picked Ingo's scheduler instead of Con's, which left Con frustrated and he stopped working on Linux.

We'll never get to find out what might have happened if Con could have realistically offered his work as an alternative. The scheduler is in the kernel, which means that Linus picks winners and losers. In userspace, no one has that power.

-----


I think it's less about "one true kernel" with too much control by an established crew and more about system integration and risk.

The vast majority of people don't run a Linus kernel. They run a distro kernel. Distros can (and do) ship multiple kernels with different sets of patches and options. They have a default, but they also have a default web server and C compiler.

The kernel is abused by all kinds of different workloads. Distros choosing to offer kernels with more "speculative" patches will have to support them. The kernel is a risk-averse environment. I think that's the reason, not fiat by Linus.

Also note, a good counterexample to your main point is the android kernel. Linus-kernel:Android-kernel is quite close to gcc:llvm.

Replacing big chunk of functionality takes a lot of resources, so it helps to be apple (LLVM) or google (android kernel).

Also, don't underestimate kernel modules either. If you want to, say, expose page table information that "linux doesn't let you", you can write a module to do so.

[edit - removed Con Kolivas related part of response. Don't want to drag up old flamewar.]

-----


You might find HaLVM interesting. Runs a Haskell stack directly on the machine.

http://corp.galois.com/halvm

-----


Also mirage for OCaml. Unlike HalVM--which does seem like an awesome project--it's currently under active development. They also have some clever ideas about security.

-----


> The downside of the kernel is that there's no way around it. You can't "opt out" of any of its decisions.

Well, you can sidestep most of it. For example, re the c10k/c10m problem, you can the transport protocol impl to userspace. See example here how they did it with SCTP: http://www.cs.ubc.ca/labs/dsg/mpi-sctpuserspace_ICCCN2012.pd... - you could do the same with TCP if you don't care about having a TCP-enabled kernel.

The other example of course is all the protocols built on top of UDP.

This still has the NIC driver shoveling at least the raw IP packets via the kernel driver, but you could sidestep that too by using the hypervisor interface to route the NIC pci space and interrupts to your program (like Xen uses).

-----


> I had an epiphany one day when I realized that the kernel is nothing but a library with an expensive calling convention.

And a difficult debugging environment.

-----




Applications are open for YC Winter 2016

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: