The micro- v. macro-kernel debate is moot; all of the interesting development is happening elsewhere in kernel-land. On that, he was wrong. But RISC will dominate CISC unless Intel manages to pull a miracle out of their ass, <s>and GNU will dominate, at least in terms of #installs, through Android.</s> nope, I stand corrected.
So Linus was right about what mattered in the 90's, and Tanenbaum had his finger on the pulse of history. Both knew what they were talking about, and both were right in their own way.
Although reading the Usenet postings, Linus does come across as more of an arrogant upstart, and less of a dickish master than usual. That alone is worth the read.
Sorry, he was wrong.
As to the 2nd point about x86 vs "RISC" processors, it turns out he was so massively wrong the very basis of his understanding was incorrect. CISC processors are dead today, they stopped being a substantial part of the market in the late 90s. I'm sure they still exist somewhere in new devices (RAD hardened Pentiums, perhaps) but for the most part the CISC vs RISC battle is over, and RISC won overwhelmingly. But Tannenbaum thought that a necessary consequence of that would be that the x86 architecture would die due to the weaknesses of CISC processor designs. But what actually happened is that Intel (and later AMD and others of course) started making processors with a RISC core that support the x86 (IA32) instruction set through transparent op-code translation. Every Intel cpu since the Pentium-Pro has worked that way (and every AMD CPU since the Athlon).
You can't just wave your hands and say "yeah, but he was fundamentally right in some ways though there were some things he couldn't have foreseen". That's part of the deal, there's always something that you can't foreseen. Imagining that CISC's weaknesses are identical to the weaknesses of the x86 architecture are just the sort of naivety and shallow reasoning that can lead you to make woefully wrong predictions.
Instead, they have a RISC style pipeline with an op-code translation layer that decodes instructions into smaller RISC-like instructions that get run through the pipeline.
So really, modern x86 processors are more like x86-compatible processors.
That's a pretty far fetched argument. RISC/CISC is about
instruction set, and the x86 instruction set is CISC.
Of course since the big RISC/CISC battle the implementations have converged a lot on the microarchitecture
level, primarily because the transistor budget sweet spot targeted by RISC melted away and the amount of chip area saved in instruction decode and ISA simplicity was later dwarfed by out-of-order machinery, caches etc.
So an equally valid argument (as "CISC is dead") is "RISC is dead" since RISC chips today have brainiac instructions and pipelines like divide/multiply, unaligned access, variable length instructions (Thumb on ARM), out-of-order execution etc.
He said in the early 90's that x86 would be dead in 5 years. If that's not a wrong prediction I don't know what is.
As for microkernels being superior to macrokernels, the trend has been to evolve into hybrid kernels (one of which Linux is now in practice). CISC vs RISC: same outcome. Hybrid approaches have come on top. Modern x86 processors are RISC inside, CISC outside and this produces concrete advantages by needing less memory bandwidth. Even if the instruction set was nominally RISC, we'd do more instruction pipelining producing a similar result. It's got to the point where the difference is nominal. We still call them x86 but they are fundamentally different processors. We still call them RISC and they do instruction compositing now.
He was WRONG with capitals, as usually "religious" and opinionated people are. There's rarely black and white in the real world. It's shades of grey.
Also, being off by a 400% in time scale, even if the outcome is similar as predicted, is being wrong no ifs or buts. Predictions like this mean total practical failure in any decision making.
Android is not GNU, although its kernel is Linux. So much so that it is an example cited by GNU's GNU/Linux FAQ in order to make the case that Linux and GNU/Linux are different.
There was a follow-up to this email joust detailed in "Just for Fun"  with this remark, "Maybe a year later, when Linus was in the Netherlands for his first public speech, he made his way to the university where Tanenbaum taught, hoping to get him to autograph Linus's copy of Operating Systems: Design & Implementation, the book that changed his life. He waited outside his door but Tanenbaum never emerged. The professor was out of town at the time, so they never met."
The book also detailed that the main reason for the spat was that Tanenbaum was publicly commenting. Hence the response.
Microkernels are the future
For software that needs to be 'perfect' microkernels are the way to go and in fact in the embedded world there are more microkernel varieties that you can choose from now than ever before. Once performance penalties are no longer important and people will start to demand software that does not crash with every change of the weather I believe microkernels will see another wave of increased adoption. As far as I'm concerned this can't come soon enough. Userland drivers are so much better than a monolithic kernel.
x86 will die out and RISC architectures will dominate the market
(5 years from then) everyone will be running a free GNU OS
As far as microkernels go, I'd say that the future has arrived. We don't call them microkernels, of course -- we call them hypervisors. But they're fundamentally the same thing.
It's the difference between 'potatoes' and 'mashed potatoes' ;)
(edit: I should clarify "same technology" to mean "same use of address space separation". Microkernels don't need to use virtualization technology like VT-d instructions because their separated modules don't need to think they're running on unadulterated hardware.)
Complexity is the difference.
Hypervisors "won" because it was easier to implement; They only had to add another layer to the stack, rather than fundamentally change the structure of the OS.
The outcome is a more baroque collection of code, though. Worse truly is better.
And the fault tolerance argument applies both ways. That's generally the reason behind VM sharing too. One simply separates processes along lines visible to the application (i.e. memcached vs. nginx) or to the hardware (FS process vs. display process).
Potato, potato. This simply isn't something worth arguing over. And it's silly anyway, because there are no microkernels in common use that meet that kind of definition anyway. Find my a consumer device anywhere with a separate "display server", or one in which the filesystem is separated from the block device drivers. They don't exist.
(edit rather than continue the thread: X stopped being a userspace display server when DRM got merged years ago. The kernel is intimately involved in video hardware management on modern systems. I can't speak to RIM products though.)
Blackberry, every computer running 'X'.
It wasn't just Tanenbaum who was wrong about that. Billions of dollars were dumped into RISC architectures on the assumption that x86 wouldn't scale. Microsoft committed to an expensive rewrite of Windows (or OS2) to make it portable. Apple considered x86 and decided to bet on RISC instead.
So this wasn't just some wacky college professor opinion, the industry thought RISC was a sure thing. (Linus of course didn't really care, he just wanted something to run on his 386 clone.)
edit: it bothers me that this debate is always presented without context. Torvalds is a PhD student busy reinventing a 20 year old unix kernel design, and Prof. Tanenbaum is pointing out he isn't advancing the state-of-the-art, which is totally correct. The fact that Linux turned out to be really useful and popular is mostly aside the point - the advancement was Torvald's open source management, not the kernel design.
The ACE Consortium formed in the early 90s picking MIPS as the chosen processor and included Microsoft and SCO.
But microkernels see more and more adoption every day. They offer a degree of reliability that is unprecedented. But they also come with a performance penalty that is for a lot of people enough of a drawback that they would rather have 'good enough' than 'perfect'.
For software that needs to be 'perfect' microkernels are the way to go and in fact in the embedded world there are more microkernel varieties that you can choose from now than ever before.
I'm looking into this space a bit for some personal projects. Would you be able to point me to some examples/good resources on this?
x86 will die out and RISC architectures will dominate the market
Correct. The future of computing is mobile and the weakness of the Linux kernel's monolithic architecture is highlighted by Android's numerous design and implementation issues as well as Android's numerous maintainability, upgrade, reliability and performance problems.
Tanenbaum was actually right.
No? Yeah; sounded like a content-free platform flame to me too.
Actually: I'd be curious to hear some more knowledgable folks on this. My understanding of the iOS kernel is that it's a microkernel only via historical label: the PVR driver stack, network devices and filesystems live in the same address space and communicate with userspace via single context switched syscalls. Is that wrong?
The Android kernel code is more than just the few weird drivers that were in the drivers/staging/android subdirectory in the kernel. In order to get a working Android system, you need the new lock type they have created, as well as hooks in the core system for their security model.
In order to write a driver for hardware to work on Android, you need to properly integrate into this new lock, as well as sometimes the bizarre security model. Oh, and then there's the totally-different framebuffer driver infrastructure as well.
This means that any drivers written for Android hardware platforms, can not get merged into the main kernel tree because they have dependencies on code that only lives in Google's kernel tree, causing it to fail to build in the kernel.org tree.
Because of this, Google has now prevented a large chunk of hardware drivers and platform code from ever getting merged into the main kernel tree. Effectively creating a kernel branch that a number of different vendors are now relying on.
Now branches in the Linux kernel source tree are fine and they happen with every distro release. But this is much worse. Because Google doesn't have their code merged into the mainline, these companies creating drivers and platform code are locked out from ever contributing it back to the kernel community. The kernel community has for years been telling these companies to get their code merged, so that they can take advantage of the security fixes, and handle the rapid API churn automatically. And these companies have listened, as is shown by the larger number of companies contributing to the kernel every release.
But now they are stuck. Companies with Android-specific platform and drivers can not contribute upstream, which causes these companies a much larger maintenance and development cycle.
For your 2nd question
In Mac OS X, Mach is linked with other kernel components into a single kernel address space. This is primarily for performance; it is much faster to make a direct call between linked components than it is to send messages or do remote procedure calls (RPC) between separate tasks. This modular structure results in a more robust and extensible system than a monolithic kernel would allow, without the performance penalty of a pure microkernel.
And how exactly does having a microkernel fix the problem of having a stable driver API? Drivers must be written to some framework. Windows NT derivatives are microkernels too, and they're on, I believe, their third incompatible driver architecture.
And did you actually read that second link? It's drawing a single "kernel environment" with all the standard kernel junk in it. That is not a microkernel.
Sigh. I probably shouldn't have gotten involved.
at this stage i will refer you to, ironically, Andy's book:
1. In terms of pure Micro-Kernels he was off. It was tried, the benefits didn't outweigh the drawbacks so people moved to Hybrid Micro-Kernels (Windows NT, OS X, iOS et al) so from that perspective he was about 50% right.
2. Given the way ARM is trouncing everybody in the mobile space, unless Intel manages the biggest comeback since Lazarus the future is almost certainly RISC. Whether this feeds back to the desktop space remains to be seen.
3. Unlikely to happen, although the future is most likely Open Source in some form or other. GPL v3 has largely ruled out GNU dominating as vendors that previously shipped GNU components replace the GPL components with other Open Source Licenses because they find the new terms a bit much.
What does RISC even mean anymore? Seriously, I remember the debates during the early 90s (back when MIPS and friends were going to destroy Intel) and the RISC of then is very different from the RISC of today. Then the merit of RISC was that you literally reduced the instruction set to the minimum possible, putting the demand on the compiler to gang them to do even rudimentary work. The idea was that the simpler silicon would be easier to scale up (frequency scaling was a major problem), and the compiler would have more insight into the operations of a product giving such a product a performance advantage.
The MIPS of the 90s had about 45 instructions, total, and a corresponding simplicity of implementation. The 8086 had 114, providing higher level, much more complex silicon, and has grown since then.
How many instructions does ARMv7a provide (this is actually a hard question to answer)? It has floating point operations, SIMD / NEON, virtualization support, and on and on and on. I do know that while it once feature just 25,000 transistors (ARM2), a modern ARM9 design like the Tegra2 hosts 26 million transistors for just the cores (not the GPU).
I realize that I'm stepping into a linguistic landmine, and various contrived "this is the differentiator" definitions will appear, but the original intent of RISC versus CISC was exactly what I described above. Today the meanings are absolutely nothing like that.
But you are correct that the water today is somewhat muddy especially as CISC processors borrowed stuff from the RISC processors and vice versa. I think someone in the late 90s coined the term CRISP (Complex Reduced Instruction Set Processor) to describe these beasts, although I haven't seen the term mentioned in recent years.
Back when this debate was happening CPU design teams were a lot smaller, meaning that any given feature hadn't had enough effort put into it to get as far into the realm of diminishing returns, so there was a much bigger payoff to be had in reducing the number of features you implemented.
You also weren't devoting most of your die to huge arrays of cache, so adding - say - more addressing modes would tend to mean you couldn't have as many pipeline stages. Any given feature will still make the overall design more complicated and so will make it more difficult to add any other feature you want, but the issue isn't as pressing as it used to be.
One area where RISC does still has a big advantage is instruction decode. When you run into an x86 instruction you have to read a lot of bits to figure out how long it is, and its not self synchronizing so you could read an instruction stream one way if you start at byte FOO, but if you start at byte FOO+1 you can find an entirely different but equally valid sequence of instructions. So decoding N bytes of x86 instructions grows in complexity faster than linearly. In fact, I suspect that modern processors have to use some sort of "Guess the three most likely solutions throw out the results if we're wrong" solution for current processors to get the performance needed.
If I were to design an ISA I'd probably want some sort of UTF-8 style variable length scheme, where you can always tell where an instruction boundary is without reading from the beginning but with the space savings from having the most common instructions be shorter than the least common ones.
 This apparently also annoys my security researcher friend.
EDIT: Found the link to that really good explanation Mashey had on RISC vs. CISC:
IIRC, it's way more sophisticated than that.
As I understand Intel's trace caches, their guess is basically the result of decoding the next N instructions, accounting for branch prediction.
And yes, it includes detection/recovery for writes into the instruction memory that would invalidate that guess.
*Can you recommend any (unbiased) literature that points out the strengths and weaknesses of the two approaches?*
Tanenbaum is putting his money where is mouth is with the project(Or at least other peoples money he acquired) and yes the traction is slow and it it might fail but I would really enjoy the day running a operating system based on a micro-kernel, that does everything Tanenbaum promises with Minix3.
And MacOS X and Windows are actually hybrid-kernels.
Linus "my first, and hopefully last flamefest" Torvalds
>Re 2: your job is being a professor and researcher: That's one hell of a good excuse for some of the brain-damages of minix. I can only hope (and assume) that Amoeba doesn't suck like minix does.
The fact that he doesn't swear, doesn't make his arguments weak.