Hacker News new | comments | ask | show | jobs | submit login
Little’s Law, Scalability and Fault Tolerance: The OS is your bottleneck (paralleluniverse.co)
39 points by pron on Feb 4, 2014 | hide | past | web | favorite | 20 comments

Yes, the OS will become the bootleneck. If you design your application in a way that makes the OS do a shitload of work.

Or, you could just avoid the OS altogether: https://github.com/SnabbCo/snabbswitch

Our current engineering target is 1 million writes/sec and > 10 million reads/sec on top of an architecture similar to that, on a single box, to our fully transactional, MVCC database (write do not block reads, and vice versa) that runs in the same process (a la SQLite), which we've also merged with our application code and our caching tier, so we're down to—literally—a single process for what would have been at least three separate tiers in a traditional setup.

The result is that we had to move to measuring request latency in microseconds exclusively. The architecture (without additional application-specific processing) supports a wire-to-wire messaging speed of 26 nanoseconds, or approx. 40 million requests per second. And that's written in Lua!

To put that in perspective, that kind of performance is about 1/3 of what you'd need to be able to do to handle Facebook's messaging load (on average, obviously, Facebook bursts higher than the average at times...).

Point being, the OS is just plain out-of-date for how to solve heavy data plane problems efficiently. The disparity between what the OS can do and what the hardware is capable of delivering is off by a few orders of magnitude right now. It's downright ridiculous how much performance we're giving up for supposed "convenience" today.

I read a paper recently and in some cases anyway, even the operating system doing TCP for you can be slow.

The paper was "Network Stack Specialization for Performance" and by moving pretty much all of tcp out of the kernel and into userspace they were able to get a web server that outperformed nginx 3.5x or so. The point of this particular paper was that, as the title suggests, keeping these things generalized as they must be in the kernel comes at a fairly significant performance cost.

Now obviously I don't want to have to write tcp for every network application I write, but it is interesting to think of such things as libraries instead of in the kernel, where I can pick the tcp that I know performs best for my particular application.


A related work may be the Mirage OS ( http://www.cl.cam.ac.uk/projects/ocamllabs/tasks/mirage.html ) which compiles applications and libraries into a microkernel that runs on Xen hypervisor. IMHO, compiling applications into the kernel or avoiding kernel share the same idea: the OS layer provides the generality with performance penalty; building them together for particular purpose lose the generality but achieve better performance.

Sure, but the point is that to avoid that you need either functional programming or lightweight threads (though the two are certainly not mutually exclusive).

Put the two together and you get Erlang

> What’s remarkable about this result is that it does not depend on the precise distribution of the requests, the order in which requests are processed or any other variable that might have conceivably affected the result.

It says that (# requests) = (# requests / time) * time

There is nothing else "that might have conceivably affected the result", it's just algebraic cancellation.

Everything else that might have mattered is ruled out by saying "stable system" and saying that all variables are averages.

I'm not saying it's not useful, but that's an awfully "gee whiz" tone to use to comment on r = (r/t) * t

This part is more worthy of an exclamation point, since it makes clear a practical impact:

> Because L is the minimum of all these limits, the OS scheduler suddenly dropped our capacity, L, from the high 100Ks-low millions, to well under 20,000!

Well, it's not quite as simple as that. We're talking about a stochastic process here, and while a "hand-wavy" proof may satisfy some, there's a Little ;) more to it: http://www.ece.virginia.edu/~mv/edu/715/lectures/littles-law...

Fair enough, although even that terse lecture hand-waved (by leaving it to a reference) about ergodic systems.

I think the deeper point is along the lines of Wigner's classic "The Unreasonable Effectiveness of Mathematics in the Natural Sciences"


It happens to be the case that elementary dimensional analysis is sufficient in the case in hand -- and it often is.

Parallel Universe are churning out some seriously impressive software [0]. How many programmer-hours have gone into Quasar and its supporting cast so far?

[0] Caveat: impressive in terms of features; i don't have any information about the quality of the implementation

Quasar developer here: I'd guess about 1 man-year on Quasar+Pulsar, a similar amount on Galaxy, and much more than that on SpaceBase. Comsat was easy once Quasar was in place.

There's also another option for doing async on the JVM: http://vertx.io/

The polyglot aspect, very impressive performance (as it's based on netty), and simple deployment model make it an interesting choice. It let's you put computationally intensive operations into worker verticles that run in their own thread pool, while also having i/o blocking operations run async on the event loop in a nodejs style.

Yep, it's just that I find that these solutions (Node.js and Vertx.io) make you adopt a functional/callback-based style not because it's appropriate from a design perspective, but rather to work around OS limitations.

Fibers first remove the problem, and then let you choose the most appropriate programming style for your domain.

Is it time to ditch the "normal" OS? My startup is looking to do exactly that but I would like to know what others thoughts are on the matter.

My proposal is to have a small exokernel between the hardware and the application. The exokernel is there to provide very simple access to the hardware (like the disk or network) and will rely on the application to do anything complicated (like handling TCP/IP).

"Plain" OS threads certainly have their place and the OS does a fine job scheduling them. It's just that the more information you have about the threads' behavior the better they can be scheduled (you can reduce latencies by keeping related fibers on the same core to share cache; that's one of the things Quasar does).

So the increased latency of the OS scheduler has little to do with the number of layers between your application and the hardware, and a lot to do about assumptions the OS can make about your code.

You most certainly want a general-purpose OS scheduler, it's just that many applications can benefit from user-level lightweight threads.

For me, when it comes to really drastic solutions like running bare metal or building a custom OS then if you have to ask if you should then you shouldn't. But to echo the others, it depends on what you are doing.

I hope I don't come off as a creeper because I was very curious on what you are working on and dug through your comments. Is it this BareMetalOS project?

You could also be asking about whether it is time to ditch "commodity" hardware (which by the way is not massively parallel).

The answer is, of course, it depends on what your priorities are.

I've seen projects that write OS's from the ground up for a wide variety of reasons (security, correctness, optimization of use case, etc.) But very rarely is it for price reasons.

I wrote a comment about a paper I read that ditched tcp and had the application handle it to get some performance gains earlier, so you might want to page up to see it if you haven't already (Network Specialization for Performance).

How is it that you intend to do an OS startup? That's a pretty tough nut to crack.

you might want to check out the rustboot project then. one of the use-cases is to have rust boot on a hypervisor that then just has not much more than the runtime like you're after

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact