Hacker News new | past | comments | ask | show | jobs | submit login
Selfie: tiny self-compiling C compiler, MIPS emulator and MIPS hypervisor (cs.uni-salzburg.at)
249 points by ingve on Mar 2, 2017 | hide | past | favorite | 38 comments

This is a really, really cool project. Some fun quotes:

> In fact, selfie goes one step further by implementing microkernel functionality as part of the emulator and a hypervisor that can run as part of the emulator, on top of the emulator, and even on top of itself, all with the same code.

> The design of the compiler is inspired by the Oberon compiler of Professor Niklaus Wirth from ETH Zurich. The design of the selfie microkernel is inspired by microkernels of Professor Jochen Liedtke from University of Karlsruhe.

As someone who is vaguely trying to learn more about / get involved in kernel development, the easiest place to start figuring things out is in simpler kernels like this or Minix. (Then followed by OpenBSD. Then later on NetBSD / FreeBSD. Then much, much further, Linux. [Haven't spent time looking into Illumos yet but it's on the list.]) But I haven't actually been able to contribute much more than ports to any of these yet.

...I have just installed Minix 2 on my new laptop; a 386 with 1MB RAM, running at 16MHz, with a 80MB (with an M) hard drive.

In 16-bit mode, it runs beautifully (there's not quite enough RAM for the bigger 32-bit binaries). It's a pretty good Unix, with networking support, cut down versions of all the standard tools, it comes with full source and several toolchains making it almost completely self-hosted, and it'll rebuild its own kernel in 13 minutes.

It is beautiful. If you're interested in operating systems, definitely take a look. (There is a Minix 3, but I find it doesn't have the same minimalist charm.)


"There is a Minix 3, but I find it doesn't have the same minimalist charm"

They have same name but aren't really related. Minix 3 is a new OS designed to take place of a modern UNIX with higher reliability. They ported a NetBSD userland to get UNIX apps working. Main benefits are reincarnation server and more code being user-mode (i.e. limiting damage).

I installed Minix 3 the other day in VMWare Fusion virtual machine. It felt a lot like running NetBSD due to the userland reason you mentioned.

The one oddity I noticed in my brief testing was that the Minix DHCP client doesn't allow for setting your own hostname, even though NetBSD's does.

> The design of the selfie microkernel is inspired by microkernels of Professor Jochen Liedtke from University of Karlsruhe.

It might be worth noting that Liedtke is the architect of the reasonably well-known L4 microkernel.


Here's another one designed for that:


Thank you! I saw Xv6 once before but haven't read that much about it. I will add it to my to-read-list.

One of the neat things about Xv6 is that the references in the documentation refer to line numbers in the OS source code.

In a similar vein: tcc, the Tiny C Compiler:


34 kLOC instead of 7, and no fancy VM/kernel stuff, but it does full C99.

Even more impressive (to my way of thinking) is c4:


c4 is what you might call a microscopic C compiler; it's surprising how much it does for the code it has.

C4 handles a similar subset of C to what selfie does, but is a full order of magnitude smaller : about 600 lines. And it is very readable despite the terseness!

One thing I love is how it reads the whole source file into RAM and operates on it as an array. So many compilers (including my own...) avoid this, which I guess ostensibly lets them handle huge machine generated source files (but that's an extreme outlier, and really I'd be inclined to let people depending on that use something else or work around it). The first compiler I saw that didn't was Wouter van Oortsmerssen's AmigaE compiler [1], which demonstrated that it was a viable strategy even when your compiler is expected to run on a machine with 512KB RAM. It's one example of where we often over-abstract for dubious benefits.

[1] http://strlen.com/amiga-e

On a machine with virtual memory couldn't it fix the source size issue by mmaping instead of reading it into ram, and leting the OS page it in/out of memory?

Yes, that should work, as long as you make sure to ensure it gets zero terminated so you don't have to do length checks all over the place - that'd lose a lot of the benefit. For files that are not a multiple of the page sizze that's guaranteed. I've never tried to request mapping of more bytes than a file takes, though... I'm going to guess that still does something sane, but I've not tested it.

> I've never tried to request mapping of more bytes than a file takes, though... I'm going to guess that still does something sane, but I've not tested it.

Sadly, the Opengroup and Linux man pages say that

> Memory access within the mapping but beyond the current end of the underlying objects may result in SIGBUS signals being sent to the process.

Joy... Oh, well, I think on modern systems it'd be reasonable to just refuse to deal with source files that are too large to load into RAM, and leave it at that.

Wait, are you saying that one can't mmap a file larger than physical memory. Isn't this exactly what mmap is for. I think what masklinn is saying if you mmap a file, you can't read past the end or over allocate a single mapping. One could have another output mmap region.

SIGREFUSE is a lamentable omission in POSIX.

It doesn't reliably do something sane. IIRC, LLVM in that case just falls back to reading the file into an array the ordinary way. That only happens one time in 4096, so there's no practical difference to performance.

There's a bunch of other stuff in Selfie beyond the C compiler so you can't really compare them by line count.

c4.c author here - selfie seems to fill a niche between c4 and my other more full featured compiler/os swieros. Very cool!


Yep, I just meant to point out they're different. Selfie is something educators wrote for their purposes. c4.c is more like a piece of code poetry.

That's mind-boggling to me. My lisp interpreter is hardly shorter than that, granted it was designed with extensibility as a goal. Still, incredible.

Yeah, but wasn't TCC designed to be the opposite of easy to understand? It was entered into the obfuscated, C contest. I'll leave it to a C coder to tell me if TCC and Selfie are in same ballpark on amateur comprehension.

It was originally an obfuscated C effort, then grew into a full-blown small and fast compiler for stuff like compiling the linux kernel on the fly during boot. It's not especially designed for easy source comprehension (so I'd answer no to your last question) but it's been a long time since it was specifically designed against it.

@ you and vidar: Appreciate it. That's what I was expecting. It was cool and small but in a different category given readability didn't matter.

Originally, yes. An unobfuscated version is still being developed. The obfuscated version is certainly a massive exercise in frustration to try to read.

Why don't you look at the code and decide for yourself?

Brain injury. Forgot C & ASM minus tiny fragments. Still haven't relearned them. Getting close to time to do that again, though. A subset at least.

Hi, have you written about your experience programming after a brain injury?

I might once I do it. I remember enough fragments of what worked and didn't to do my R&D at a high-level just above the actual implementation. I'm seeing so many opportunities in a number of paradigms I've had a really hard time committing to which I'll start with. Given it will be a slog of a process...

I might do a write-up if I do. I did code a problem or two afterward. Reusing principles from high-assurance systems and security helped. Keep design, language, & implementation simple. Straightforward control flow with no looping between layers. Safe language. Straight-forward toolchain. I ended up doing those problems in a subset of FreeBASIC since I was up and running so fast w/ no fighting with IDE's, etc. Solutions came out correct in first implementation due to design principles & refactoring specs on paper. Coding phase simply didn't have much room for added error.

The language is interesting. It is a subset of C, indeed - but one notable thing about it is that it only has two types: int, and pointer thereto.

If they only made one more step and combined these two into a single machine word type, they'd get B...

This is pretty interesting.

Some of things I saw in the source (such as casting string literals to an int) made me cringe a bit, but remembering it is truly a very small subset of C, I'll give it a pass.

Overall, the code quality seems pretty high and is very easy to read (I quickly read through maybe 20% of it).

I think it'd be easier to follow if it broken up into separate files, but I noticed they didn't implement a preprocessor, so I assume part of the rationale for a single file was to avoid includes, macros and all the other baggage the preprocessor brings with it.

I'll definitely have to come back to this project later when I've more time to look at it.

The product name should have been different. Searching any development related issues by product name on search engines will become very difficult.

first, i thought why another MIPS emulator thing but they offer a RISCV port. That seems much more interesting. But naming is questionable.

Is there video? A presentation probably sounds hilarious with all the "selfie selfie selfie mipster hypster mipster" talk.

I feel like calling one thing MIPSter and another thing mipster was a poor choice

cool project, really unfortunate name, reminiscent of 2012

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact