Hacker News new | comments | show | ask | jobs | submit login
Xv6, a simple Unix-like teaching operating system (mit.edu)
200 points by cturner 1368 days ago | hide | past | web | 78 comments | favorite



Yes! I'm so happy to see this featured on HN. You'll be fascinated by the topics presented in xv6.

Have you ever:

- ... Wondered how a filesystem can survive a power outage?

- ... Wondered how to organize C code?

- ... Wondered how memory allocation works?

- ... Wondered how memory paging works?

- ... Wondered about the difference between a kernel function and a userspace function?

- ... Wondered how a shell works? (How it parses your commands, or how to write your own, etc)

- ... Wondered how a mutex can be implemented? Or how to have multiple threads executing safely?

- How multiple processes are scheduled by the OS? Priority, etc?

- How permissions are enforced by the OS? Security model? Why Unix won while Multics didn't (simplicity)?

- How piping works? Stdin/stdout and how to compose them together to build complicated systems without drowning in complexity?

- So much more!

I credit studying xv6 as being one of the most important decisions I've made; up there with learning vim or emacs, or touch typing. This is foundational knowledge which will serve you the rest of your life in a thousand ways, both subtle and overt. Spend a weekend or two dissecting xv6 and you'll love yourself for it later. (Be sure to study the book, not just the source code. It's freely available online. The source code is also distributed as a PDF, which seems strange till you start reading the book. Both PDFs are meant to be read simultaneously, rather than each alone.)


> Why Unix won while Multics didn't (simplicity)?

Considering how many not-so-simple systems have succeeded in the market, I suspect that the real reasons for the failure of Multics are more nuanced.


Eh?

How do you figure?


I'm not the commenter you're replying to, but I'll take a crack.

Saying it was a matter of "simplicity" glosses over too many of the details to be useful. Also consider that many things that early UNIX didn't include from multics got added back later (memory mapped files, dynamic linking, symlinks, ...) so by that measure it was maybe too simple.

A few points to consider:

* Multics was a joint project, and each participant had different levels of commitment and priorities for the project. This wasn't a recipe for success (see also: Taligent, OS/2)

* Hardware. UNIX moved to the PDP-11 soon after that machine became available. That machine went on to become the smash hit of the decade, especially in academia. The GE and Honeywell mainframes that Multics ran on continued their decades-long loss to IBM.

I've often wondered how different history would be if Bell Labs had bought any other minicomputer for the project. Probably today UNIX would only be known as a bunch of research papers. By skill or by luck they landed on the platform that would let their creation spread.

* Language choice. When the Multics project kicked off, using PL/I seemed like a progressive choice. After all, it had been designed by all the right committees! However, PL/I only had a moment in the sun.

To me, the single most impressive thing about the original UNIX team was that they also built C at the same time. This created a virtuous cycle where language-geeks wanted to use C (and thus wanted UNIX) and OS geeks wanted UNIX (and ended up learning C). If PL/I had lived up to its billing of being the true successor to Algol then C might not have had a void to fill and this cycle wouldn't be possible.

* Being born to the right (corporate) parent. Building even a small OS at that time cost quite a bit of money: not only to pay bright people for years to work on it, but to buy the minicomputers they needed. Bell labs had the resources and interest to do this but, uniquely, couldn't meaningfully productize it (due to their status as the telephone monopoly at the time) This meant that in the early days it was near-free, source code included. If UNIX had instead been developed inside of a computer company it would have been turned into an exclusive product and probably wouldn't have spread so far in its early days. Bell labs was one of the few organizations outside academia that would have done this.

* Just plain talent. The UNIX guys were, of course, flat out brilliant. Take pipes for example. Multics had the ability for processes to be chained, but it wasn't recognized as a first-class feature.. they were still thinking of programs as usually being separate beasts. The flash of insight was "this would be super-useful if all programs behaved this way and if the shell made it easy to construct" It's not a matter of "simplicity", but that they simply saw the problem in different way than everyone who came before. In short, UNIX was just damn good.

Certainly UNIX started out as a simple system, and that was part of its success. However, other things have been simple and never went anywhere. The point I'm trying to make is that to understand why UNIX won you need to consider a lot more of the history.


The quick, short example is VMS and Windows NT, two systems which were quite successful and not so simple.


Interestingly, both of those were worked on by Dave Cutler originally at DEC before being hired by Microsoft.


Wow, this comment made me look twice at the OP and now I want to really give this a go! Are there any other versions of this class for videos/lecture notes? The links in OP seem to be dead.


Yes, there are lecture videos and notes from 2011 course:

http://pdos.csail.mit.edu/6.828/2011/schedule.html



I'm sold. Based on your experience, would this course be approachable for self-study by individuals without 3 years of CS education? The MIT page[1] suggests the intended audience for the class are seniors and graduate students (M.Eng and PhD).

[1] http://pdos.csail.mit.edu/6.828/2012/general.html


I wish I'd studied xv6 in my early teens. It would've answered many of my inexperienced questions and served as a guide for how to write quality code and architect large systems.

For what it's worth, I was able to learn xv6 just by working hard to understand everything fully. Effort seems to matter more than prerequisites. Plus it was a fun challenge.

Also, please feel free to email me (sillysaurus2 at gmail) if I can be of assistance to anyone, e.g. by talking about some nuance of unix architecture or about anything at all. Intensive self-study can sometimes feel overwhelming, so feel free to message me for encouragement.


In my opinion, the 3 years of CS has more to do with age of the student than actual learning. With a requisite level of maturity, this should not be a problem.



I'll definitely add this to my "List of things to read"

A bit off topic, but you really know how to sell things :D


Really cool project. Older, simpler, unixes are surprisingly relevant for undergrad-level learning.

It reminds me of my OS class in my undergrad days. We skipped a few bits by using the JVM to get a few things for free, but we still had to write (as groups of 3-5 people) an entire unix-like, complete with user-space programs, messaging systems, etc.

Definitely one of the most valuable classes I took for my degree.


Also xv6 book that's meant to accompany the source code. http://pdos.csail.mit.edu/6.828/2012/xv6/book-rev7.pdf


The line numbers in the book reference a 92-page printout of the source code here: http://pdos.csail.mit.edu/6.828/2012/xv6/xv6-rev7.pdf


Purdue has a similar OS, Xinu, that was created for teaching. It is actually used in industry around the world. - http://www.xinu.cs.purdue.edu/


+1 I learned a boatload by reading the xinu books.

Also met Doug, he's a great guy, old school C hacker.


The OS classes at several universities revolve around reimplementing Xinu and really getting to know a linksys router.


That reminds me, I stumbled upon the website of the Unix Heritage Society (http://www.tuhs.org/) which has the source code for the early versions of Unix.

Here is the first version

http://minnie.tuhs.org/cgi-bin/utree.pl?file=V1

I did not really spend enough time reading the source to understand what's going on but it's interesting regardless.


There's also a port of V7 Unix to x86, by http://www.nordier.com/v7x86/

I only wish that there was an add-on tcp/ip stack for that version, then one could really have fun with it. Yes, I know, the ip stack add on is called BSD.


You might want to look for 2.9BSD, a PDP-11 Unix variant which was more or less v7 with the TCP stack from 4.xBSD grafted in. Patching this into an x86 port would still be a bit of a project, but it's probably easier than starting with 4.xBSD directly, or starting from scratch.


This is great advice. 2.9BSD was the first UNIX I got to play with in depth it has all the pieces you need and it can do networking.


What about 386BSD?


This version hasn't been updated in a long time. I hope the author gets back to it at some point. I've only played with it a bit, but it was very nostalgic. I think adding modern stuff to this is a bit missing the point, but it might be fun to add some sort of GUI to it, something much simpler than X Windows.


>Russ Cox (rsc@swtch.com)

>Frans Kaashoek (kaashoek@mit.edu)

>Robert Morris (rtm@mit.edu)

sh$%t, this got to be good!

Cloning the repo and surfing at the source code, just for fun, right now :)


Because splatting object files and intermediates all over the same directories as the source drives me nuts, I shuffled things so that it's a bit easier to find things (splitting source into kernel/, user/, ulib/, tools/, and include/ directories) and generated files are collected into kobj/, uobj/, out/, and fs/ directories.

https://github.com/swetland/xv6

I also tweaked the scheduler to halt if there's nothing to schedule, per tankenmate's suggestion.

A side-effect of this change is the generate-source-code-printout targets were rather severely broken, so they're simply removed for the moment.


If you are running this on your laptop and want to save your battery then use this patch...

http://pastebin.com/BxXDDtTh


I remember in the pre-Pentium II days people cared rather little about power consumption (TDPs were only a few watts or maybe a dozen at most), and the idle process was basically a very tight infinite loop.

Around the time of the PII people (especially overclockers) started noticing that the CPU would stay cooler when halted, and thus started a little industry of "CPU cooling software"... there's an interesting historical page on that here:

http://estu.nit.ac.jp/~e982457/other/cpuidle/idle.htm

A modern processor, on the other hand, takes a few watts halted and a hundred or more when active, and transitions between these states many many times per second. How much things have changed...


Nice catch. Actually speeds everything up a bunch. I did it a bit more tersely with:

if (p == &ptable.proc[NPROC]) hlt();

after the sti() and initializing p to 0;


Interesting. How does the processor restart after hlt?


Any interrupt will take the processor out of hlt state; i.e. hit a key on the keyboard, it will fire an interrupt and the processor will start processing again. The clock tick will also fire an interrupt. Once it fires an interrupt it will start searching for runnable processes again; e.g. a process can now return from a read() from it's STDIN tty now that you have typed a key.


There's some inefficiency here in that wake() doesn't cause the other cpu to wake immediately from hlt so it'll remain halted until the next timer tick or whatnot.

It'd be fun to add some scheduler stats and poke around with things a bit. Some really trivial cprintf() tracing in scheduler() implies that single processes can ping-pong between CPUs, but the printing itself is slow and invasive and could possibly be the cause.


xv6 is a pleasure to hack on and learn from! It's got everything you'd expect from a unix-like OS (multi-processor support, simple scheduler, unix-like FS, many of the standard syscalls, fork/exec based shell, etc).

Linux and *BSD may be the modern standard for unix-like OSes, but they're often too complex and opaque for beginners to learn from. It's super easy to read and toy with xv6's code if you want to learn more about OS internals, though. For example, I was able to implement a simple log structured file system: https://github.com/Jonanin/xv6-lfs

Note: if you're looking for companion reading material, check out this OS book: http://pages.cs.wisc.edu/~remzi/OSTEP/ (IMO it's super approachable and understandable for undergrad-level learning)


I just did a quick grep through the code, and apparently students are expected to understand this.


Shame this is getting downvoted, probably by people who don't understand the reference. See the second section ("You are not expected to understand this") in: http://cm.bell-labs.com/cm/cs/who/dmr/odd.html


I wonder how this compares to pintos (http://www.stanford.edu/class/cs140/projects/pintos/pintos_1...) which is an i386 unix like OS. In my under grad class we had to build thread scheduling, sys calls, virtual memory, and some file system functionality on top of pintos. It was the most challenging and most interesting set of projects I've ever completed.


Adding to the list of alternatives:

MINIX (http://www.minix3.org/)


At the risk of reigniting an old flame war, it seems to me that a microkernel in which core OS functions such as process management and the virtual file system are in separate servers would be more complicated than a monolithic kernel. Having device drivers and specific file systems in user space may be useful, though.


The impression I get (university class on OSes) is that the microkernel/modular approach is more about making it simpler to port than making a specific instance simple? Similarly it'd be less complicated to extend since that's just a case of adding a new loadable module. In terms of a learning OS I'm not sure exactly which should be prioritised - extendability/portability or a simple structure? If this is a topic with a lot of depth I'm obviously not getting I'd definitely be interested in a discussion - my exam is soon :)


Have you read the debate between Andy Tanenbaum and Linus Torvalds from about 1992? It's a discussion of the tradeoffs between microkernels (Tanenbaum) and monolithic kernels (Torvalds). You can find it here:

http://oreilly.com/catalog/opensources/book/appa.html

The chapter by Torvalds in the same book is also worth reading:

http://oreilly.com/openbook/opensources/book/linus.html

The claim that microkernels improve portability makes no sense to me, since Linux itself is very portable, as are the BSDs (especially NetBSD and OpenBSD).


This is flame between two man with HUGE egos, not a debate.


Awesome! I really hope MIT releases video lectures for 6.828 so I can do the full course on my own. I'm listening to Berkeley's OS lectures right now but I would love a course built around xv6.


This page has links to 6.828 lecture videos (2011):

http://pdos.csail.mit.edu/6.828/2011/schedule.html


Alternatively there's also Barrelfish: http://www.barrelfish.org/ but there the focus lies on multiple cores (48ish+++).


Someone's going to have the fun job of porting this to UEFI at some point in the not-too distant future.


Doesn't seem like it'd be a big ordeal. Just adapt an existing bootloader to load the kernel.

Also it looks like they mostly run it in qemu.


It's multiboot, so you can just boot it from grub.efi. See sheet 10, xv6/entry.S.

The more annoying thing is likely to be that it's 32-bit only. Are there any common machines that do x86-32 UEFI boot? Alternatively, does a 64-bit grub.efi know how to exit long mode and go into protected mode, to multiboot a 32-bit kernel?

(I was actually talking the other day with some folks about whether a 6.828 project involving a UEFI port of JOS, the other teaching OS, would make sense, but the 64-bit thing seemed to make it less interesting unless you also plan to do a 64-bit port of JOS. Which has been done before in a final project, admittedly...)


Ah, true - there's no support for any other BIOS-based services, so that's not going to be a problem. Being 32-bit only isn't really an issue, there's no theoretical problem with a UEFI-based bootloader jumping into 32-bit before executing a multiboot image. You won't be able to make any UEFI runtime calls, but that's probably not a high priority anyway.


It's definitely 32-bit only, but structurally it's simple enough that converting to 64-bit might not be a huge effort (and the kernel is only about 6kloc of .c files).

I haven't done x86 systems stuff since 64bit became common, so it might be a fun way to learn about what's different.


As long as there are things like SeaBIOS, there is no need for UEFI.



Were using the Xv6 kernel for the OS class here at the university of Utah, for the first time. In my opinion an OS class that makes you hack on low level code is significantly more beneficial tha just a theroy class.


For anyone else who is playing with this in OS X Mavericks, I've written a blog post describing how to get things working on it (there are a couple steps that need to be tweaked.) - http://blog.ljs.io/post/71424794630/xv6

Hopefully this is useful to somebody and isn't too blatantly link whorey :)


Alternatively, historical codebases of various unices at http://www.tuhs.org/archive_sites.html


If "Xv6's use of the x86 [is intended to make] it more relevant to students' experience," then why use AT&T assembly language syntax rather than Intel?


Because the GNU toolchain uses AT&T syntax and most students have the GNU toolchain readily available?


> GNU toolchain uses AT&T syntax

But also accepts Intel syntax with a directive, and even admits "almost all 80386 documents use Intel syntax" [1].

As for user acceptance, a poll on the xkcd forums shows that the vast majority of users prefer Intel syntax [2].

That takes care of the political/network effects arguments. As far as the actual technical arguments, the position against AT&T syntax is obvious and overwhelming -- the unnaturalness of src,dest operand order; the lazyness of the parser's authors at the expense of its users by requiring percents or dollar signs to indicate token types; the usually-redundant size postfix; and the utter incompatibility with basically all non-GNU x86 documentation.

[1] https://sourceware.org/binutils/docs/as/i386_002dVariations....

[2] http://forums.xkcd.com/viewtopic.php?f=40&t=19558


Strictly imho, the translation overhead of reading one or the other if you're used to the opposite syntax is not that high. We're not talking haskell vs. c++ here, it's operand ordering and maybe a few characters' worth of difference in the commands (e.g. add vs addl) or typing register names and so on. The basis of my opinion, fwiw, is that all of my classes involving asm used GNU tools/AT&T, and I never felt enormous difficulties reading Intel-inflected docs. It's quite possible that I was just lucky enough to have never been bitten badly, I don't know.

So while I don't disagree with your technical reasoning, I don't think that it's a significant pedagogical barrier. The puckish part of me wants to suggest that having to figure something out when the documents are a) insane b) backwards c) written in Martian d) all of the above is great preparation for commercial work. ;)


The reality is you just don't need to write a whole lot of assembly day-to-day so using what the tools support best tends to be the path of least resistance.

Xv6 itself has about 5000 lines of .c and 364 lines of .S in the kernel. Another 1500 lines in vectors.S but that's machine generated. There's a ton of stuff one could add to Xv6 (drivers, syscalls, services, etc), almost all of which one would likely just write in C.

In practice, I've found assembly mostly used for some hand-tuned inner loops (codecs, memcpy, etc) and little snippets of glue code like entry.S, swtch.S, etc.


Which one does the Linux kernel use? Which one does the FreeBSD kernel use? Which one does the Mac OS X kernel use?



rethoric question


Why is src,dst operant order unnatural?


x = 5

x := 5

mov x, 5

mov 5, x

5 = x

x = x - y

sub x, y

Most programming languages have dst, src order. It also making comparisons/subtractions and conditional jumps very confusing to read.


I thought this was pretty awesome when I found it a couple of years ago, especially since it's multicore.


I'm not knowledgeable about different philosophies when it comes to OS's, but it seems that pretty much everyone thinks that being Unix-like is a good thing. Is that true? Or is there some people that think that some different philosophy that is perhaps more or less mutually exclusive with regards to being Unix-like is better? I don't even know if that last question makes sense.


There are of course a bazillion ideas about OS design, there used to be entire conferences on them :-)

One of the features of UNIX was that it was designed by taking a few very simple ideas and more importantly simple constructs, and then building more complex constructs out of them. It made it possible for a handful of researchers to write all the bits of an OS needed to get to a multi-user prompt. Which at the time other examples had giant teams of engineers associated with them. That made it easy to understand at a fundamental level and you could reason about the effects various things would have on it should you make them.

One of my favorite books is "Operating Systems Concepts" which has a great survey of the features of any good operating system.

So is being Unix-like better? Yes if you want a simple robust operating system. No if you want a hard real-time operating system. Yes if you want a system you can flexibly add devices too. No if you want to build a security model based on capabilities. Yes if you want it to work on a small architectures with stack support. No if your processor doesn't support stacks or the C model of frame pointers.

In general though understanding what an Operating system has to do is priceless for a CS person to have in their toolkit.


Inspiring comment from Bane early this year, https://news.ycombinator.com/item?id=5624912

If you need a yacht that can carry many people on the high seas through bad weather at speed, unix is solid. But there hasn't been a great racing dinghy since the 80s.

Maybe there is an demoscene/hacking tradition out there, waiting for to be discovered.

Consider the debates that we don't have:

* Hardware abstraction. Casual audio programming is harder now than it was twenty years ago. Also - why doesn't hardware self-describe?

* High-level languages. Chuck Moore's ideas of code bloat are on a totally different level to the mainstream. He's onto something.

* Protected memory. Does it matter on a machine designed for fun? How much more hackable could your OS be if you got rid of stuff like this?

Stability doesn't matter so much in fun systems. If you're in a dinghy and not capsizing, you're not sailing hard enough.


Thanks for the link, very interesting. Have you read Massalin's thesis on Synthesis OS (Partially Evaluated/JIT kernel objects) ? It's from the late 80s and very 'demo-scene' like (The author mentions efficient audio playing through his object based system).

http://valerieaurora.org/synthesis/SynthesisOS/

http://c2.com/cgi/wiki?SynthesisOs

http://lwn.net/Articles/270081/


The VPRI (vpri.org) has some interesting ideas with respect to your second bullet point.

Also, there was something about Arthur Whitney looking to build a stand-alone OS in/under K recently.


Found the reference. http://kparc.com/os.htm

Thanks - looks awesome. I particularly like his approach to HTML.


UNIX-like is known to work. That is not the same as "good", but it may be close enough.

UNIX-like includes philosophies such as "almost everything is a file" and "use human-readable (and writable!) plain text formats by default" and "there will be a stdin, stdout, and stderr; use them all". More than anything else, I think that a UNIX is defined by the idea that any end-user might decide to become a programmer at any moment, and this is OK and perhaps even desirable.


> UNIX is defined by the idea that any end-user might decide to become a programmer at any moment, and this is OK and perhaps even desirable.

Very, very desirable in my view; in contrast to the raging trend of simplification/hiding going on these days, trying to turn computers into sealed-box appliances "you're not supposed to know how it works" that give little in the way of things that can get people interested in learning more about them, I find classic UNIX to be very open in encouraging users to learn more about the system.


I think the unix design (particularly as implemented in GNU/Linux) sacrifices coherency for flexibility and customizability. There are some nice integrated features you can get if you have e.g. an OS-native command shell that understands the kernel; features like DTrace become much easier, and you can get better interactive performance when the scheduler has a deeper understanding of what's going on. I'd say linux's KMS is a move away from being unix-like; likewise ALSA is a very ununixy API, the whole udev/hal kerfuffle was not unixlike, and network manager is the antithesis of the unix philosophy.

(Myself, I moved to FreeBSD a while ago).


I think one thing has to do with the fact that you can find a lot more detailed information about Unix-like OSs than anything else (e.g. Windows), and the general principles being available on over many vendors/variants also makes it a more "generic OS" instead of e.g. tied to Microsoft.


MIT has essentially recreated XINU. BFD. http://en.wikipedia.org/wiki/Xinu




Applications are open for YC Winter 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: