
Why Aren't Operating Systems Getting Faster as Fast as Hardware? (1990) [pdf] - vezzy-fnord
https://web.stanford.edu/~ouster/cgi-bin/papers/osfaster.pdf
======
nickpsecurity
It's clearly a combination of bad OS architecture, bad software architecture,
and misalignment of hardware + software interests. We've seen improvements on
the last two in different ways while OS improvements are hacks piled on hacks
piled on original mess. There were attempts to improve performance with
purpose-built designs: BeOS demo made desktop scream with performance; QNX
smoked microkernels such as Mach; IX did this for dataplanes; mainframes'
Channel I/O gives them insane throughput and utilization (90+%); Intel's i432,
IBM's System/38, Burroughs B5500, Cavium's Octeon network SOC's, and recent
Azul Systems Vega all accelerated critical functions with purpose-built
hardware.

So, there's all kinds of ways to improve things at each layer. Many were done
and are being done with vast improvements over the competition of the time
period. Why do _mainstream_ OS's not improve as _mainstream_ hardware
improves? Bad design, bad implementation, and strong desire for backward
compatibility that makes fixes hard. Good news is there's a niche market and
academic R&D that are always creating clean-slate stuff that does it better.

Admittedly, though, replacing a full OS isn't likely to happen as users reject
anything without feature X, app Y, or benchmark Z. The thing is there's a
_lot_ of those things in an OS to the point that barrier of entry to building
one is insanely high. Even Solaris 10, despite having a hell of a start, cost
over $200 million to design, implement, and test. Best shortcut to that I've
seen are the Rump kernels. That combined with the clean slate work means we
might see a better thing happen incrementally over time.

~~~
vezzy-fnord
_Admittedly, though, replacing a full OS isn 't likely to happen as users
reject anything without feature X, app Y, or benchmark Z. The thing is there's
a lot of those things in an OS to the point that barrier of entry to building
one is insanely high._

 _Best shortcut to that I 've seen are the Rump kernels._

Well, not really. Rump kernels are a godsend shortcut (similar to DDE, but
much more advanced) for device drivers, but Blub-loving users wanting their
apps means you still need to roll a POSIX userland of some sort (the way GNU
Hurd does it over Mach is quite interesting, it supports things like processes
with multiple uids with all POSIX calls being RPC that delegates to object
servers) or since we're talking about a mainstream improvement here,
Darwin/XNU and maybe even Windows NT. No escape.

~~~
nickpsecurity
The L4 kernels have just been porting Linux, etc to user-mode to run the
untrusted apps with calls to underlying API. Might be the best that can be
done outside some kind of source to source translation with heuristics.

------
SixSigma
Because programmers

Systems Software Research is Irrelevant (aka utah2000 or utah2k)

    
    
        By Rob Pike (Lucent Bell Laboratories, Murray Hill)
    

“This talk is a polemic that distills the pessimistic side of my feelings
about systems research these days. I won’t talk much about the optimistic
side, since lots of others can do that for me; everyone’s excited about the
computer industry. I may therefore present a picture somewhat darker than
reality. However, I think the situation is genuinely bad and requires action.”

[http://doc.cat-v.org/bell_labs/utah2000/](http://doc.cat-v.org/bell_labs/utah2000/)

------
amelius
I think the paper should be titled: Why isn't hardware as fast as advertised?

If memory bandwidth forms the bottleneck, then a faster processor is of little
use.

~~~
agumonkey
Recently learned that reading this article about Cray computers.

[http://www.techrepublic.com/blog/classics-
rock/the-80s-super...](http://www.techrepublic.com/blog/classics-
rock/the-80s-supercomputer-thats-sitting-in-your-lap/)

Old Cray sustained bandwidth is still higher than recent Core iX cpus, making
them 'faster' for actual workload, even though their peak bandwidth is lower.
The CPU GFLOPS will be a lot less useful if they don't have data to flop onto.
System Design vs Marketing.

~~~
CyberDildonics
Not true. If you program like it's the 80's you don't get nearly the same
speedups, but some structures like linked lists are basically obsolete or
extremely exotic for performance. Even so Intel's out of order execution,
branch prediction and caching is very strong.

If you do trivial operations across arrays of linear memory current Intel
processors are orders of magnitude faster than naive serial programming. Lots
of C code that looks fine can be sped up by 12x by rearranging memory access,
and by 50x using SIMD.

~~~
walterbell
_> Lots of C code that looks fine can be sped up by 12x by rearranging memory
access, and by 50x using SIMD._

Any recommended papers on this topic?

~~~
lukego
Computer Systems: A Programmer's Perspective (3rd ed)

[http://www.amazon.com/Computer-Systems-Programmers-
Perspecti...](http://www.amazon.com/Computer-Systems-Programmers-Perspective-
Edition/dp/013409266X)

------
y_g
Meh. For > 15 years I have been architecting I/O intensive server products
(web caches, wan optimization, filesystems) and unless it is crashing, we
mostly don't worry about the operating system. Our performance issues are of
our own doing, and are dealt with in user space, as with nearly everything we
do.

I thought this paper was kind of silly when it came out, and still think it
is.

Also, while I'm piling on Ousterhout, Tcl has got to be the most overrated
winner of the ACM system software award.

~~~
ArkyBeagle
This is a Usenix paper from 1990. The past is a different country...

Tcl plus the Welch book is pretty good. You get purely event driven programs.
I've replaced three or four Big Piles O' Java (that didn't work) with Tcl
scripts and in short amounts of time. Nothing particularly wrong with Java,
but these Piles still existed...

The Welch book is the critical component...

Its nearest neighbor is Python, which is also just fine except that package
management then becomes a real chore. It's also easy to forget that Scotty (
the SNMP extensions to Tcl ) was the real winner back in the first dotcom era.

It's probably pretty lousy for web caches, wan optimization and filesystems.
It's very good at constructing workstation-based programs to run against
embedded systems.

------
iolothebard
Running DOS 6.22 on a SSD on an i7 (can only use 1 proc, but 3-4GHz is pretty
fast).

Runs pretty fast IMO. But that's not really the issue. OSes do 1000x more than
they ever did in the past, and if you go boot up some mid 80s or 90s machine,
you'll see how much faster OSes actually are.

I play around on my old Pentium 133 from time to time, I always forget how
long it took to load programs (at the time it seemed blazing fast ~1996).

~~~
tracker1
It's funny, I was in a recent discussion in a BBS group on Facebook, and the
question came up regarding running older DOS programs.. imho emulation is
plenty fast, but some consistently balked at the idea because it was too
slow... I remember running some of those programs for only a single user on
386/486 class hardware, and its leaps and bounds faster today with a dozen
users, and a bunch of other stuff running on a server.

Now, for the past 5-6 years computers really haven't gotten much faster (CPU
or memory-wise)... they've gotten much lower power, and with SSDs becoming
more common place that helps a lot. It will be interesting to see how the next
few years shapes up.

It would be nice to see a ground up OS effort... I think ChromeOS is a decent
attempt, but think it could be better with a slightly cleaner baseline.

------
MichaelCrawford
Because Linus Torvalds actually believes that userspace processes have
"infinite stack".

I swear I'm not making this up.

~~~
MichaelCrawford
Classic Mac OS applications had 32 kb of stack. Desk Accessories had 8. Much
of my effort to fixing Working Software's QuickLetter went towards reducing
stack consumption. My predecessor had apparently not read the fine manual.

We spent a lot if money to purchase MacCLint but it overflowed the stack by
megabytes and so corrupted the heap. The author wasnt receptive of my advice
to remove all its recursion, he asserted rcursion is necessary for compiler-
like programs.

Recursive algorithms di not require recursive implementations. The pricedure
fir converting recursion to iteration is documented by Robert Sedgewick in
"Algorithms".

Really you do have a stack but its not the runtime stack.

~~~
Someone
It comes 'a bit' late, but couldn't you increase stack size by calling
SetApplLimit before calling MaxApplZone?

And of course, initially, classic Mac OS didn't have 32kB of stack on original
hardware. It didn't even have 32kB available to applications.

~~~
MichaelCrawford
I started at WSI about seven months before System 7 was introduced. At the
time the documented limit was 32 kB. Yes applications could call SetApplLimit
but Desk Accessories could not, QuickLetter's stack was on top of the
application's stack. We had no control over that.

I expect the System always ensured that DAs could have 8 kB of stack on top of
whatever the application had; that is, app+DA defaulted to 40 kB.

MacCLint was such a skanky program I don't think it would have done it a whole
lot of good to increase the stack.

~~~
Someone
Aha! I didn't get your problems were with a desk accessory.

