Hacker News new | past | comments | ask | show | jobs | submit login
Writing my own init with Go (mustafaak.in)
149 points by CSDude on Feb 9, 2016 | hide | past | favorite | 76 comments

> The reason that I wrote sleep 3 seconds is that I want to see how the kernel reacts when init dies.

That was my favorite line in the entire post. I love seeing this kind of fearless experimentation.

When I first learned to program, I desperately wanted everything to work and would try very hard not to break things. As I've matured into senior roles, I spend of a lot of effort creating "safe" development and testing environments like the author did here, just so that I can break things and make observations and then throw the environment away when I've learned what I needed to learn. Tools like vagrant and cheap cloud computing have made this easier to do than ever.

it should be called acquiring information, not 'breaking' things :D

"Actively exploring modes of failure"

Or maybe hacking ;)

I'll be using this in future conversations, thanks!

NOTE: You can't write an init without threads in Go, as you can in C. (I tried awhile back)

At least you can't with its portable interfaces such as exec.Command() and cmd.Wait():


You end burning a thread per process, which is pretty lame for an init. The wait() call blocks a goroutine AND OS thread, so the go runtime will have to start a new thread for the next wait().

Whereas in C you can do the entire thing from a simple thread (I call it "async processes"). You set SIGCHLD handler and get async exit notifications, and then process those in a single-threaded loop. I'm pretty sure there is a problem doing this in Go (besides resorting to the syscall module)... I think it has to do with the fact that Go turns signals into messages on a channel, but I don't remember right now...

EDIT: Now I remember. The problem with using the syscall module is that Go does not export Fork and Exec. It exports ForkExec, because it needs to ensure they run on the same thread.


This severely limits the usefulness of your init program, because in Unix there are all sorts of important things that happen in between. It's how you set the child process state. You basically can't do anything with file descriptors or containers at all with this model.

Signal handlers are pretty much impossible to write directly in a managed language (at least without heavy contortions from the runtime), because you could get a signal during a non-GC-safe point and then deadlock or corrupt your program. For example, malloc often takes locks internally; if you get a signal during malloc with that lock held and then try to malloc in your signal handler you can deadlock. Managed languages like Go treat allocation as an implementation detail rather than something the programmer has to explicitly invoke, making them basically incompatible with signal handlers.

(BTW, this is mostly a problem with signal handlers as a concept, not with Go.)

> The problem with using the syscall module is that Go does not export Fork and Exec. It exports ForkExec, because it needs to ensure they run on the same thread.

And because you can't safely allocate after a fork in managed languages, because fork drops your other threads, leading to the same problems above. :)

So the way Python does it is that it doesn't call back into the interpreter loop from the signal handler. That would be a really bad idea as you note, because the interpreter can call malloc and do all sorts of other stuff.

It queues your Python signal handling function, returns from the signal handler, and then runs it on the main loop at the next tick.

As far as I know this works fine. I wrote the same program in Python and it seems to work just fine, and have more functionality than the Go version. I guess one problem is that if you do a long blocking syscall on a tick in the main loop, your signal handler can be delayed indefinitely. There might be some hacks around this in the Python interpreter, but I don't recall offhand.

Yeah, that's basically having the runtime advance to the next GC safe point before running the handler. It's not direct control over signal handlers, but it's fairly low overhead. I guess Go could do this too if it wanted to. I don't think it works for fork though.

Yeah the problem with Go and "async processes" doesn't have to do with GC -- it has to do with M:N threading. I'm glad Rust abandoned M:N threading.

Though it is definitely a killer feature of Go to write in a blocking style rather than in a state machine style... but I feel that there might have been a way to do it without baking it so hard into the runtime (?). Maybe like coroutines in Python, which run on a single thread and are optional.

M:N threading causes a lot of other complications too, like calling from C to Go, etc.

A signal interrupts a blocking syscall. You can call PyErr_CheckSignals() that may execute Python signal handler before you restart the syscal on EINTR in C code. Look at how os.waitpid() is implemented: https://github.com/python/cpython/blob/master/Modules/posixm...

Long blocking syscalls will return with EINTR when a signal is received, at which point the interpreter should be checking the signal flag. The real problem with this scheme is if you call into some long-running C extension code, in which case the interpreter doesn't get a chance to check the signal flag.

Yeah doing something else on the main interpreter loop, BEFORE checking the signal flag, is what what I meant. The "something else" could be long-running C extension code, or a blocking syscall. (I realize the original signal will interrupt a syscall, but there could be another one after that.)

In either case, the signal handler doesn't get run for an arbitrary period of time. And another signal could occur.

I didn't trace through the code to see how this is handled... but it seems like it is hard to handle simply and correctly (you're kind of recapitulating the problems of signals to begin with, in a higher level language).

Any docs / clarity on this would be interesting to me, as I will probably implement this in my own language soon.

This is really tricky to get right: it's kind of like the problems you get with threads, except with the added bonus that you can't even wait on a lock, because the lock might be held by the thread that got interrupted. The easiest thing to do is just ignore the problem and not let users install signal handlers written in your language. The next easiest thing is the Python model, and it's pretty much the only option if your runtime is not multithreaded (as Python's is not: thanks to the GIL only one thread can be running Python at a time). If your language runtime is multithreaded, you can try to set things up so all signals are routed to a special signal handling thread that spins in a loop waiting for a signal handler to set a flag, then runs the appropriate handler.

Writing any code in a signal handler besides setting a flag is a very bad idea, specially when writing code across UNIX compliant OSes.

Unless it changed, POSIX gives no guarantees of stack size and function calls re-entrancy guarantees depend a lot on the OS and compiler versions.

Hence the best approach in managed system languages is just to mark a signal as active and re-trigger it outside the handler when control gets back to user space.

I don't understand. How is using ForkExec, Wait4 forcing you to use threads? What do you mean with "You basically can't do anything with file descriptors or containers at all with this model."? It is possible to pass a ProcAddr containing a SysProcAddr to ForkExec with fds, chroot, namespaces, &c.

Attached simple demo code of what I mean with ForkExec, Wait4.

EDIT: to clarify, the demo code is not intended to work like init, just to spawn a couple of processes and wait for them to terminate.

        package main

        import (

        func main() {
            cmds := [][]string{
                {"/usr/bin/sleep", "4"},
                {"/usr/bin/sleep", "5"},
                {"/usr/bin/sleep", "1"},
                {"/usr/bin/sleep", "2"},

            for _, cmd := range cmds {
                if len(cmd) > 0 {
                    if _, err := syscall.ForkExec(cmd[0], cmd, nil); err != nil {

            for nwaited := 0; nwaited < len(cmds); nwaited++ {
                var w syscall.WaitStatus
                pid, err := syscall.Wait4(-1, &w, 0, nil)
                if err != nil {
                } else {
                    log.Println("pid", pid, "exited", w.Exited(), "exit status", w.ExitStatus())

OK, I see that there is a lot of stuff like chroot and UIDs in SysProcAttr [1]. I think they have added a bunch since I last tried this.

But the point still stands... you get a subset of the OS API. Everything has to be "canned" into that struct and you can't execute arbitrary code between fork and exec.

Especially with containers, this is a never-ending target... They apparently have user namespaces, but what about network namespaces or IPC namespaces?

[1] https://golang.org/src/syscall/exec_linux.go

Now, it might be good enough for your application, and it probably IS good enough for everything except a full-fledged production quality init system. For that case I would be worried about running into limitations.

> Especially with containers, this is a never-ending target... They apparently have user namespaces, but what about network namespaces or IPC namespaces?

There's no reason calls to e.g. unshare(2) and setns(2) can't be added to forkAndExecInChild in exec_linux.go like they have been added for the other syscalls needed to set up a child. I guess if anyone is reading this thread and wants that they can follow the guidelines for contributions and look at https://github.com/golang/go/issues/5968

EDIT: It's also possible to supply CloneFlags though the SysProcAttr struct, e.g., CLONE_NEWIPC, CLONE_NEWNET.

You're free to argue that, but it's still fair for everyone to concede that there's a difference in friction between "I can do what I need, now" and "I have to wait for the language core to add features trailing discovering of these niche needs".

Also, the subject's been raised before, and regrettably (IMO) the proposal rejected. As a result, you'll find a nontrivial amount of C appearing in e.g. the runc project. (Which is a crying shame, because it invoking cgo means runc ceased to be trivially reproducibly buildable, last time I checked...)

So, don't do process management inside init. Spawn a child and check on it after every "wait" to handle the process management. Less opportunity for the kernel panic, less for init to handle (fewer bugs), greater flexibility.


Yes, but if you see my edit, even with GOMAXPROCS=1, you are still constrained to the syscall.ForkExec API, which severely limits the usefulness of an init program. Its entire purpose is to start processes in different states (i.e. with different UID, etc.)

Not sure what you mean by 2). If you're suggesting polling, then that sucks, because one important feature of an init system is to restart processes immediately.

All in all my point is that Go is a very poor language for this... worse than Python (which I also tried and ended up using). I think it's great that the author is experimenting though -- I think that's the only way to make these issues clear.

#1 is not equivalent to being single threaded (every go program is multithreaded), it just means there only 1 thread for user goroutines. Every call to wait will block and spawn a new thread. If you force it to stay on the same thread, you block the runtime, and your entire program.

An init system has to call wait, that's one of its primary tasks, especially on orphaned processes it inherits.

Option 1) is not feasible though, as you'll block your one thread.

Though I wonder if you can work around all this by using signalfd (http://man7.org/linux/man-pages/man2/signalfd.2.html) and have SIGCHLD delivered through normal file descriptors - I don't know the low levels of Go enough to know if you can integrate a raw file descriptor with its runtime, or whether messing with SIGCHLD causes other issues, but it might be worth pursuing.

You could easily do a syscall to get a signalfd file descriptor, then call os.NewFile to turn that numerical FD into a Go *os.File (https://golang.org/pkg/os/#NewFile). It seems like that should work fine!

wait() doesn't need to block at all actually, you can use WNOHANG.

Yeah but then you're polling, which isn't good as mentioned elsewhere).

The author might look at how a minimal (yet fully functional) init works:

http://git.suckless.org/sinit/tree/ (look at config.def.h and sinit.c)

and build up from there. Given how minimal sinit is, it's a great place to start your functionality from, and see the power (and limitations) of a basic SystemV init system is.

Wow, this is great, thanks!

Nice project! Even if you believe that systemd is too complicated, it's a great example on how to build an asynchronous, modern init system. Upstart uses some highly questionable tricks (ptrace!) to track processes across forks whereas systemd uses cgroups.

Not only that, it is a nice example of using Go in a role that many would use C for, so it might convince a few to try their hands to safer programming languages.

Next step, bare metal runtime! :)

Is there even any concurrency in this case? Wouldn't e.g. OCaml be a better choice?

Personally I would use OCaml over Go, as I prefer expressive languages.

However given Oberon's influence on Go, I find positive other devs that like Go, do use it for such purposes.

OCaml already has MirageOS advertising it for system level coding. :)

Edit: Regarding concurrency, even systems programming languages older than C have built in support for concurrency. C and C++ are probably the outsiders in terms of built-in support for concurrency, language or std libs.

If you want to build a system that can survive out-of-memory conditions, you need to implement init so that it doesn't allocate memory at semi-random moments. Both Go and OCaml will fight you on that requirement. (IIRC, Go is designed to allocate memory the moment your stack grows too deep - which is usually a good idea, but not here.)

Of course, very few Unices actually survive out-of-memory - Linux runs an OOM killer by default, OpenBSD tends to crash the kernel, etc. Apparently Solaris does have a good story here. (Of course, you'd have to ensure that the system doesn't spend all its time swapping, regardless.)

What benefit would adding the overhead of learning OCaml be to this project? It's not like a functional or immutable language would add any value here, would it?!

Using high-level languages for init systems is a great idea. In fact, we do this for the Guix System Distribution. Our init system is called GNU Shepherd, and it's written in Guile Scheme. It's great because you write services as Scheme code. For example, I have a number of Ruby web servers that I run for development at work that are nearly identical. Since services are first-class Scheme objects, I wrote a simple function that returns a service object for running a Ruby web server, specialized for a given application. For this workflow, I run Shepherd as an unprivileged user and additionally manage all of my user services like gpg-agent, emacs daemon, offlineimap, etc. No crappy external domain specific language to learn; data is code. Highly recommended.


When C is taken as the baseline, there's a very long list of languages that look better for this sort of program. I don't see a huge reason that init has to be C, or even particularly benefits from it. (Specific init implementations may have the advantage of being battle-tested, but that's a characteristic of the program and time, not implementation language per se.)

Even at this 3-line stage, the stateful log.SetOutput construct is ugly and error-prone. So I'd say a functional/immutable language would already be adding value, and that would only increase as the amount of code grows. At a minimum an init is going to involve parsing config and something similar to a state machine, both good use cases for functional languages.

Init's job is to start the init scripts and then hang around to foster orphan processes. Ideally, it should not be parsing configs, but should spawn another process to do that sort of work. A good PID 1 is very, very minimal.

This project isn't just a PID1 though, it's a full system. (Well at the moment it's a hello world, but that's the intent)

Erlang would look be a very interesting choice here, seeing how any Erlang system is basically a miniature OS. It even has an init already.

cgroups are overkill for process tracking, per se. The Linux kernel has the Netlink proc connector for that purpose. Subscribing to cgroupfs entails a much higher cognitive and resource load in having to deal with a hierarchical resource management subsystem altogether.

Upstart is much more readable even though it has various idiosyncrasies.

I get the impression that the primary reason for systemd using cgroups was "security". This in that it provided the means of namespace isolating daemons.

This in turn because if you follow Poetterings project history it starts with him creating Pulseaudio to handle moving between sound devices while in use, then crated Avahi to provide a means of Pulseaudio equipped computers to find each other on a network, and then came Systemd based on what he learned about daemon security while writing Avahi.

Sadly i can't shake the feeling that the security in systemd is all too often leaning towards "security theater".

Writing init in Go probably have size issue compare to typical C implementation. Last golang helloworld I built was 1.4MB. The /sbin/init in my ubuntu dist is only 194k. Busybox version of init optimized for size/features can be much smaller.

Tough to use it in OpenWRT, DDWrt type of embedded system where system might only have a few megabytes of RAM/ROM.

Yeah, you are right. Go is too heavy weight. But the nice thing in Go 1.4+, you can install go std and dynamically link your applications. I will do it in the next blog posts to save some space, but yeah, I would still need some huge lib.

@CSDude: Thanks for sharing this very interesting project!

From a technical point of view, what are the reasons you chose Go over alternatives, such as C, C++, OCaml or Rust?

(Just curious, as all those languages have compilers that produce fast, optimized, self-contained binaries.)

Thank you. I found it easier to develop in Go and I use Go in the work environment heavily. But compared to C/C++, I think having a garbage collector is a must for modern init, and I will write all the other utilities in Go as well for better maintainability.

Garbage collection isn't a must for a modern init.

Memory safety may arguably be, but I see no reasonable argument that garbage collection (a particular strategy for memory management) is.

Well I would say memory safety, not necessarily garbage collection. It was also a bit overstatement but I would really love memory safety for many core utilities.

Do you also have some comparison to Rust?

Rust has a very advanced memory lifecycle management. The promising part is that it may give you almost all of the advantages of a garbage collector while not needing an actual garbage collector in most cases. Like RAII in C++, but much more advanced.

I'd be interested whether this actually works well for a nontrivial project such as an init system.

Although very different than a traditional Linux distribution, the init system for RancherOS might be worth taking a look at. It's written entirely in Go as well. https://github.com/rancher/os/tree/master/init

Using Go even earlier in the boot stage in an initramfs (as a replacement for tools like dracut) crossed my mind recently. Go's statically linked binaries and fast builds make this particularly appealing to me. With (a lot of) work you could forego busybox entirely.

Good point about static linking. Though Go's big fat binaries count against it as a busybox replacement in areas where storage space is a premium.

> Linux is mostly perfect as it is

Your critical thinking has not been engaged. Linux is a useful bit of code but "perfect" is not what it is known as everywhere.

My side of the room like this little ditty

Linux, by amateurs, for amateurs -- Dave Presotto [1]


i’ve wondered whether Linux sysfs should be called syphilis -- Charles Forsyth [2]

Or if you like your detractors with a bit more fame

Unix has retarded OS research by 10 years and linux has retarded it by 20. -- Dennis Ritchie as quoted by by Boyd Roberts in 9fans.

I won't go on

[1] http://research.google.com/pubs/author4927.html

[2] http://www.terzarima.net/

So what does an OS purist like yourself run? BeOS? Plan9? All OS's suck. They're just horrible. But so is everything else. It's all flawed in one way or another. Windows is flawed, OS X is flawed, our tools are flawed, our languages, our editors - everything. If you want perfect, you have to throw away your computer and somehow live inside Knuth's books.

> If you want perfect, you have to throw away your computer and somehow live inside Knuth's books.

Either that or wait for GNU/Hurd.

Try reading what I said.

I said "Linux isn't perfect, like you think it is"

Perfection is a nice strawman to argue against. "Mostly perfect" is harder to argue against. In addition, there's an implied "for my purposes" attached to the end of the statement.

Nothing is "perfect", and no one has claimed that "Linux is perfect".

You are being a dick, offering no reasons other than the opinions of people you presumable agree with.

If you could just tell my why it's perfect then, give me something to work with.

I can try... you scorned the commenter for not reading your comment literally, and inferring things instead. When I read your comment literally, I see you quoting people and saying it's not perfect.

Truly an empty comment, as it adds nothing except makes you look smarter than someone else. You don't offer anything insightful, you are trying to be insightful by proximity. So anyway, you reinforced your complex when you said "Try reading what I wrote". So I called you a dick.

Now take your own advice and "try to read what _I_ wrote". I only called you a dick. Why are you trying to make me defend a position I never took about perfection?

> I only called you a dick.

oh we're going down that route. Well if it's literalism we're wasting time with then try reading your own comment

> You are being a dick, offering no reasons other than the opinions of people you presumable agree with.

That isn't just "calling me a dick".

but lets stop here

@CSDude, I recommend putting a link to part 2 on the part 1 page.

Done! Thanks.

If you improve the perf/functionalities of init using Go, I will give this language a go right after. It seems promising tho.

Is that possible to recode init in any other language using your technique? For instance can I try it in Ruby?

Since init is the first process that runs, you can't use an interpreted language like ruby (since the interpreter would have to run first, which is impossible if your program itself is init). Maybe there's a Ruby-compiler out there somewhere, but as-is Ruby would be impossible.

> Since init is the first process that runs, you can't use an interpreted language like ruby

Sure you can, you can even use a shell script: https://wiki.gentoo.org/wiki/Custom_Initramfs#Init

This is simply not true. It's not impossible and it works exactly like any program would.

No it wouldn't. Either hack a quick shim to start the interpreter or rely on the operating system to figure out how to run a script. The #! is there for a reason.

All you need is fork(), exec(), and signal handling. You could even write one in PHP.

The kernel already supports acquiring a DHCP lease via the “ip=dhcp” boot option. Why does this need to be a feature of init?

I did not know we could have ip=dhcp option, but I will need to create a network manager now or later, but for trying out some stuff, I can use that flag. Also, I will not bundle services to init, it will be just forking, signaling and waiting. All the other utitilies will be small independent code exercises for me.

Because init is common to several UNIXes and ip=dhcp is a Linux specific feature?

> Because init is common to several UNIXes and ip=dhcp is a Linux specific feature?

And yet the author seems to be comfortable using those linux-specific features:

“Also, the kernel paremeters must have the rw flag or you have to remount the /dev/sda1 as rw, but changing kernel parameters are much more easier.” (from the follow-up post).


Please don't muddy someone's learning experiment by rehashing a tired flame-war

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact