Systemd can now pull and update container images from the Internet

houst0n_ · on Jan 17, 2015

When the big debate about systemd was going on, I couldn't really care less. In fact, I thought a future where the big distros used the same init system was a rosy one. No need to have to rewrite some init script to yet another system depending on the flavour of my clients. No need for our config management tools to have a gazallion 'if os == foo' statements.

Now, I'm just not so sure. The feature creep here is remarkable. Why on earth would systemd have anything to do with container management? Surely the 'docker' (or whatever) service which systemd is supposed to be managing should be doing this?

Another nail in the coffin of simple software that does one thing well.

Edit: s/docker/container management/

geofft · on Jan 17, 2015

Where was the simple software that does one thing well? Did it ever exist?

I'd believe you if Docker was some elegant, tiny, well-written tool, and systemd was a bloated mess of AbstractLinuxContainerFactoryImpls. But as far as I can tell, Docker is the thing that is neither simple nor does its one thing particularly well, and systemd implements this straightforwardly, because of all things you might want to do, booting a system is kind of easier if you're already an initsystem.

If systemd decides to implement WebGL or take on Sublime Text, let's talk, but these 200 lines of C don't seem like a reason to believe that we've moved away from simple software that does things well:

http://cgit.freedesktop.org/systemd/systemd/commit/src/impor...

lvh · on Jan 17, 2015

The entire diff being discussed isn't just about import.c, but a bunch of other things too. For example, to get it to do something interesting, you also want import-dkr.c. Add it all up, and we're talking 2000, not 200, lines of C:

http://cgit.freedesktop.org/systemd/systemd/commit/?id=72648...

geofft · on Jan 17, 2015

Oops. I got confused by gitweb's interface... thanks. I have some opinions, mostly negative, about new C code, but even so it doesn't seem all that bad by UNIX standards. The bulk of the stuff I missed mostly looks like it's about connecting to the Docker registry and linking to libcurl, and this is about as big as I'd expect for C.

vruiz · on Jan 17, 2015

It's not about docker, it about systemd-nspawn which is systemd's own alternative to docker.

I'm with you here, at first I didn't understand the alarm but now I'm starting to see the danger of this trend.

ris · on Jan 17, 2015

I don't think systemd is moving into this territory just for fun. I seem to recall it's because now the kernel is moving towards unified cgroups-trees if systemd wants to be able to use cgroups it more or less has to be "in charge" of them at the root, meaning it's going to have to take care of this sort of functionality.

digi_owl · on Jan 17, 2015

Last i read on LWN the cgroups kernel dev was backing away from the idea of having a singular cgroups management daemon.

And no, its not just for fun. RH is betting big on competing with Amazon and MS on cloud services. It would not surprise me if they are trying to get old RHEL server customers to move their servers onto the RH cloud with the switch to RHEL7.

sciurus · on Jan 18, 2015

What Red Hat cloud?

digi_owl · on Jan 18, 2015

http://www.redhat.com/en/technologies/cloud-computing

sciurus · on Jan 19, 2015

That doesn't answer the question. To clarify, what I meant was that Red Hat doesn't run a public IaaS cloud to which people can "move their servers".

jacquesm · on Jan 17, 2015

Linux should simply adopt jails and get it over with rather than to get stuck in NIH territory.

ris · on Jan 17, 2015

Right. They should "port" a feature that's intricately linked with the internal workings of a kernel over from another completely alien kernel.

jacquesm · on Jan 17, 2015

Nobody said it would be easy.

ris · on Jan 17, 2015

You... more or less implied it

dezgeg · on Jan 17, 2015

Cgroups are not a virtualization / containerization feature like jails, their use cases are entirely different.

geofft · on Jan 17, 2015

I'm not super familiar with either, but as I understand it, systemd-nspawn is only an alternative to "docker run". The docker command supports all sorts of other things, like creating container images, manipulating, uploading, etc. systemd-nspawn just runs things, and the upstream docs suggest using debootstrap, yum, etc. to create a thing for nspawn to run. (Running things is both the easy part, and the part that most makes sense for an initsystem / system daemon to do.)

systemd-import will grab Docker images from the internet, but I'm not sure it will do very much else. In other words, even if you adopt systemd's way of doing things in deployment, your developers are still using the docker command.

justincormack · on Jan 17, 2015

Well thats kind of why Rocket from CoreOS is writing specs and splitting these things up [1] so you can use different tooling.

[1] https://github.com/appc/spec/blob/master/SPEC.md

linuxydave · on Jan 17, 2015

>When the big debate about systemd was going on, I couldn't really care less.

>Now, I'm just not so sure.

You're starting to see what the original opponents feared a long time ago. It doesn't look like the feature creep is going to end any time soon, although I wonder if there will be a breaking point when even the fanboys say, "Hang on, this is a bit much..."

jacquesm · on Jan 17, 2015

By the time the frog realizes the water is boiling it is a bit late.

eleitl · on Jan 17, 2015

> When the big debate about systemd was going on, I couldn't really care less.

> Now, I'm just not so sure. The feature creep here is remarkable.

Now you know why we had that acrimonious debate. This is precisely what was obvious right from the start.

Linux is Windows NT now, the only way to have a decent nix is to pick BSD.

up-n-atom · on Jan 17, 2015

It is singular software by implying a plural tense. It may be miss branded as a single monolithic unit but it exist to an extent as separate entities, and yes, some have a hard dependency between each other. It is no different to that of iproute2 which has succeeded net-tools; what were your thoughts on that?

sciurus · on Jan 17, 2015

It seems like, if they renamed systemd-nspawn and systemd-import to take "systemd" out of their names, a lot of controversy would go away.

pmahoney · on Jan 17, 2015

I was looking into systemd-nspawn recently because I wanted some container features (tcp port namespace, so I can have multiple groups of processes run and connect to an instance of mysqld on the default port, on a ci server).

I installed it, ran it, and it immediately complained "not a systemd system" and refused to run. I've not looked into things further, but presumably systemd-nspawn requires that systemd be running, which was a surprise to me since systemd-nspawn calls itself "chroot on steroids", and chroot cares nothing about the init system.

chimeracoder · on Jan 17, 2015

> systemd-nspawn calls itself "chroot on steroids", and chroot cares nothing about the init system.

If I remember correctly, systemd-nspawn uses cgroups (hence the "steroids" part of of "chroot on steroids").

On Linux, cgroups require a single manager for all cgroups (in this case systemd). This is not a systemd limitation; it is a requirement set by the Linux kernel[0].

In theory you could bring-your-own cgroup manager, but cgroups are nascent enough that trying to make a userland tool like systemd-nspan work with a completely pluggable cgroup manager would be a nightmare.

[0] http://www.freedesktop.org/wiki/Software/systemd/ControlGrou...

pmahoney · on Jan 17, 2015

Hm, interesting. I'm certainly largely ignorant of how cgroups works, but I was wishing for a tool like "chroot" that could also "ch-network-namespace" or something. I've skimmed over cgroups documentation [1] several times, but never quite gotten a solid mental model of how everything fits together.

[1] https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt

geofft · on Jan 17, 2015

You're probably looking for the "unshare" command and corresponding system call (or the same flags to the "clone" system call). Specifically, running "sudo unshare -n bash" will get you a shell with no network devices other than lo. You can then find the pid and use "ip link set dev eth0 netns 1234" from outside to move eth0 into the new namespace (more practically, you might make a virtual network device and move one end into the new namespace).

http://man7.org/linux/man-pages/man2/unshare.2.html

http://man7.org/linux/man-pages/man2/clone.2.html

http://man7.org/linux/man-pages/man1/unshare.1.html

Loosely a "container" (in the LXC or Docker sense) is a combination of making new namespaces, which isolate the process tree from the rest of the system in various ways (filesystem, network, hostname as returned by "uname", etc.), and making a cgroup, which allows for process tracking and resource allocation.

pmahoney · on Jan 18, 2015

Ah, great stuff, thanks! This small piece of the whole set of container features may be exactly what I want.

geofft · on Jan 17, 2015

It'd still be part of the same source tree, and there was tons of controversy about udev and systemd being joined into the same source tree. Alternatively, it could be a separate source tree if systemd implemented deep APIs for everything it does, but they don't want to do that.

revelation · on Jan 17, 2015

Great, and it's a bunch of C with obvious memory leaks (remember, if theres a dup in that function name, it's gonna allocate memory) and other problems parsing complex formats it's downloaded off the web.

This is terrible in both the "why is systemd doing this" and "it's just plain terrible software" sense.

pedrocr · on Jan 17, 2015

> it's a bunch of C with obvious memory leaks (remember, if theres a dup in that function name, it's gonna allocate memory)

That's only really an issue if someone ever turns this into a library and calls it from long-running code. As a command line utility the small amounts of memory that are leaked don't ammount to too much waste and everything gets reclaimed on exit.

revelation · on Jan 17, 2015

I don't trust someone that can't get memory leaks in a simple utility function right to get the big, big things with writing software in C correct.

This is Poetterings code after all, and he had his fingers in more than just command line utilities when it comes to systemd.

Writing this kind of software in C requires serious discipline and religious use of valgrind.

jjoonathan · on Jan 17, 2015

"Leak everything" is a valid memory management model if you can put bounds on runtime, etc. Seeing it in the wild isn't a reason to conclude that

> someone ... can't get memory leaks in a simple utility function right

because in a simple one-off utility it is right. Potterings didn't write these utilities to prove that he knew how to refcount.

revelation · on Jan 17, 2015

The point has been made elsewhere already, but I'll mention it again: because theres no guarantee this code will stay in a simple command line utility.

At some point someone may come along and refactor this Docker import code into a library or something bigger.

tree_of_item · on Jan 17, 2015

Isn't the point that it's not a leak if the program exits quickly?

jacquesm · on Jan 17, 2015

If you can't be trusted to get the bookkeeping right what makes you think the rest of the program is solid?

Being able to manage your allocation is a pretty good sign that a C programmer knows what he's doing. Relying on 'exit' to free your memory is backporting the web mentality to unix land, it just simply doesn't work that way. There is no different attitude when you write a long running daemon versus a utility program because for all you know your utility program code will be re-purposed to become part of a longer running daemon. So you write your code in as clean a manner as possible and balance your allocs/frees and make sure that you don't have any latent buffer overflows which you may not care about today because of the context your code executes in today because tomorrow that context of execution might change and then we're looking at yet another exploit.

digi_owl · on Jan 17, 2015

> Relying on 'exit' to free your memory is backporting the web mentality to unix land

A very fitting description of systemd as a whole.

DougBTX · on Jan 17, 2015

Cleaning up by exiting can be much faster too, eg, https://lists.gnu.org/archive/html/coreutils/2014-08/msg0001...

enqk · on Jan 17, 2015

there's something to be said about the elegance of short-running C programs with no deallocation, using the OS to properly relinquish memory

tomegun · on Jan 17, 2015

Care to point to the line with the memory leak. Can't see one here (you sure you understood how the _cleanup_* macros work?).

cremno · on Jan 17, 2015

Where are the _obvious_ memory leaks?

revelation · on Jan 17, 2015

http://cgit.freedesktop.org/systemd/systemd/tree/src/import/...

The liberal use of completely unportable cleanup attributes shifts preventing memory leaks to the equally intractable problem of tracking stack allocated variable assignments, with an extra helping of double frees. I won't even comment on the show of horrors that is util:

http://cgit.freedesktop.org/systemd/systemd/tree/src/shared/...

tomegun · on Jan 17, 2015

Hm, this criticism seems hardly fair.

systemd is explicitly unportable (it uses glibc/gcc/linux-specific features liberally), this is not some sign of sloppy coding, but an explicit design decision as codified in the CODING_STYLE file.

Preventing memory leaks and tracking stack allocated variable assignment are not comparable at all. In the vast majority of cases the tracking of stack allocated variable assignments is entirely trivial. The problem with double freeing is also very hard to run into in practice. Getting a memory-leak when you don't use _cleanup_ is all too common on the other hand. Basically, as soon as there are more than one branch in your function you either have to use ugly goto's or remember to free on every branch, in practice either way you are just implementing the _cleanup_ macros manually all over the place (with all the bugs that implies).

Why don't you point to some code that would have been better without the use of the macros? It seems you think util.c is a god source of examples...

EmanueleAina · on Jan 17, 2015

Note that unfreed results of strdupa(), despite having "dup" in the name, does not result in memory leaks. The "a" in the name means it uses alloca(), thus the memory comes from the stack and it's properly released on function return. It's also probably slightly faster than malloc() since it just moves the stack pointer.

Before jumping to conclusions, please verify what the code really does. C is a complex beast and what's "obvious" may not be such.

Relying on such niceties (as weird as it may seem, to a C programmer those are niceties) is what you gain if you are willing to trade them for portability (strndupa() is a GNU extension).

jacquesm · on Jan 17, 2015

This is 100% correct. The 'a' stands for 'automatic', just like the variables allocated on the stack are 'automatic' variables.

When the stack frame is unwound when the function returns the allocation is undone.

cremno · on Jan 17, 2015

I don't like the usage of that attribute either (but if that's their strategy, fine by me.), but I still don't see where the obvious memory leaks are. Your first link links to a line with a strdup() call, but the corresponding free() can be found in L168.

revelation · on Jan 17, 2015

Sorry, I linked to line 1140, but it doesn't highlight that and can't scroll to it as it is towards the end.

I did just check again and realize that the single 'a' at the strndupa means it's using alloca, which is of course similarly unportable and has a whole slew of other problems, all of them intractable (did you pass that pointer to something that will use it later on? stack overflows?).

So hey, not a memory leak! Rejoice! It's just similarly broken.

tomegun · on Jan 17, 2015

Similarly broken how?

Just because strdupa is not magical, does not mean it has its valid uses. In particular, the problem you allude to (knowing the scope of your memory) is something you would have to get right also if you use heap-allocated memory (unless you just leak everything of course, but I think we have established that that's not something we want to do).

cremno · on Jan 17, 2015

Oh, I am the one that has to be sorry! I didn't check the URL, even though I am aware of that issue with cgit (no line highlighting).

Yeah, using alloca() (in any form, incl. VLA) isn't good style. At least it's bounded, since filename_is_valid() checks the length. But it uses FILENAME_MAX, which could be too large for the stack.

jacquesm · on Jan 17, 2015

> But it uses FILENAME_MAX, which could be too large for the stack.

That's extremely unlikely.

This is definitely not pretty or good quality C code by any standard but let's not start spreading wrong information.

The stack overflowing because you use alloc for something of FILENAME_MAX length is just as likely as a malloc call running out of memory. After all, the heap grows 'up' and the stack grows 'down'. Other than that the mechanisms are roughly (very roughly, ok) identical.

The only time when you have to be extremely careful with alloca is when you use it in functions that might be called in a recursive manner. But then you're playing with fire anyway if you do not have hard upper bounds on the depth of your recursion.

to3m · on Jan 17, 2015

Which operating systems does systemd support? Few that don't have gcc as their official compiler, I would have thought?

digi_owl · on Jan 17, 2015

Linux. Poettering has in the past stated to take the book on Linux and Unix programming, and toss the sections on Unix.

jacquesm · on Jan 17, 2015

That sort of code is the reason why C has become a dirty word.

Spidler · on Jan 17, 2015

I just hope they check for certificates better than Docker does.

systemd-nspawn is a really fun thing to work with for developing early-init/daemons of various kinds, and this adds a bit more tools to that.

jacquesm · on Jan 17, 2015

If you want to avoid systemd your choices are apparently: Slack, Crux and Gentoo or switching to *BSD.

http://distrowatch.com/weekly.php?issue=20140908#qa

Here's to hoping that debian at least will reverse their position or that some group will fork it:

https://devuan.org/

justincormack · on Jan 17, 2015

Or Alpine or Sabotage if you want something a bit different (Musl libc support).

digi_owl · on Jan 18, 2015

http://without-systemd.org/wiki/index.php/Main_Page

elementai · on Jan 18, 2015

Or one can even consider taking a shot at GNU DMD with Guix. Alpha-quality software, but it looks promising (at least for brave and true schemers).

eleitl · on Jan 17, 2015

I am switching to *BSD.

linuxydave · on Jan 17, 2015

I'm seriously considering it. The fact that Digital Ocean offers FreeBSD means I get to try it out beforehand which is a bonus.

jacquesm · on Jan 17, 2015

You're going to have to change your nick ;)

linuxydave · on Jan 17, 2015

Yeah :(

tmikaeld · on Jan 17, 2015

Is there anything more we can throw into systemd, perhaps http/https/v8 server so we can replace apache, nginx and nodejs as well?

I don't like how systemd starts doing everything, people keep pointing to alternatives but when things start to depend on systemd for functioning the alternatives start to disappear one after another.

I'm also thinking that each new feature like this would widen the area for potential security vulnerabilities.

Spidler · on Jan 17, 2015

There already is a HTTP server there. It's log support / shipping is done via HTTP as a transport.

digi_owl · on Jan 17, 2015

And presenting your phone with a QR code housing the initial private key of the journald forward security "feature".

A feature apparently developed by Poettering's brother as a doctoral thesis...

transfire · on Jan 17, 2015

What operating system do you use? "Systemd"

dschiptsov · on Jan 17, 2015

Do not forget security updates and service packs.

SixSigma · on Jan 17, 2015

PXE, Gpxe were all you ever needed.

http://etherboot.org/

signa11 · on Jan 17, 2015

how close are we to realizing zawinski's law with systemd ?

moe · on Jan 17, 2015

Shouldn't systemd embed a BitTorrent client before it starts downloading things?

throwawayaway · on Jan 17, 2015

If systemd could somehow be controlled from emacs we're looking at a whole new operating system paradigm here.