
Walkthrough for Systemd Portable Services - type0
http://0pointer.net/blog/walkthrough-for-portable-services.html
======
luckycharms810
It’s astonishing to me that people would argue that docker is some how a
simpler solution. Some would happily maintain a container registry, constantly
worry about the order of operations in your docker file, create a build
pipeline in a format incompatible with non docker use cases, spend time
understanding how dockers bridge interacts with multiple Network controllers
over learning how your operating system works.

~~~
gsich
>It’s astonishing to me that people would argue that docker is some how a
simpler solution.

Some say docker is the new "curl|bash" ;)

~~~
TheDong
Really now? Here's a docker container with a shell you can type anything in
to: [https://contained.af/](https://contained.af/)

If you manage to tell me what's in /etc/os-release on the machine (not inside
the container), I'll eat my lunch.

~~~
progval
The issue with curl|bash is not just the security (you can run curl|bash as an
unprivileged user).

It's also the fact that it generally doesn't do anything to integrate well
with your system, and just pull all its dependencies in a folder and never
update it afterward.

------
InTheArena
Oh yeah. Systemd and containers. Two technologies certain militant technical
users love to complaint about :-)

Do we really need a non-OCi image format and tool chain here?! I get
immutability benefits, district benefits, and isolation. having a completely
separate toolchain makes no sense to me...

Containers will be in this role, but fragmentation will only make that process
slower and more painful.

I for one salute our future k8s kubelet overlords.

~~~
TheDong
> Do we really need a non-OCi image format and tool chain here?

This doesn't use a new competing image format, but rather uses disk images or
tarballs, which have existed longer than OCI has.

The toolchain also predates OCI since the underlying technology is really
basically nspawn (and various other systemd.service options), which have been
a part of systemd since before OCI stabilized. This is simply a new coat of
paint on something that has been there for a while.

> Containers will be in this role, but fragmentation will only make that
> process slower and more painful.

Sure, so we shouldn't have let docker do anything and instead used LXC, since
LXC already did containers, already had its own format and toolchain, etc.
That's the exact same argument you're making now.

OCI is also not really a good standard. It has exactly one usable
implementation (and only in Go, not even C where it's easy to link against and
use it from other languages), and since systemd has already defined various
things for services (such as seccomp filters etc) which overlap with what OCI
does, it makes almost no sense for systemd to begin using it.

~~~
cyphar
I actually think more people should have adopted LXC. The engineering behind
LXC is beyond remarkable and it's a shame that it got such a bad wrap in the
early days of Docker. I've worked with some of the LXC folks and they're all
incredibly sharp and have a much wider breadth of work than just LXC.

With regards to image formats, I actually think that we do need to improve the
image format significantly and not just stick to tried and tested stuff.
Tarballs are (quite frankly) simply the most awful format to use for container
images (disk images are a close second). It's a shame that everyone uses them.
I am working on a blog post to better explain what I mean by this.

And yeah, runc being written in Go is probably one of the most frustrating
things about working on it. I often say that one of the worst things you could
decide to write in Go is a container runtime -- it's a little bit odd that
people keep doing exactly that. runc actually isn't entirely written in Go --
it has to have a fairly substantial amount of C because the Go runtime cannot
play well with the delicate dance required to set up a container properly.

All of that being said, containers really did need to be standardised
otherwise the container wars wouldn't have ended as nicely as they did. But
that's not to say that the OCI doesn't have issues. It definitely does, but
I'm hoping they can be fixed over time.

[ I am one of the maintainers of the "exactly one usable" OCI implementation,
and have been working on OCI stuff since its inception. ]

~~~
CameronNemo
Something to note about LXC developers: they don't just provide a wide array
of usable interfaces to containers (daemon with a RESTful API, command line
tools, and a library with many language bindings). They also are some of the
most active contributors to the underlying container technology in the kernel,
including namespaces and cgroups.

~~~
cyphar
> They also are some of the most active contributors to the underlying
> container technology in the kernel, including namespaces and cgroups.

This is what I was referring to when talking about their breadth of work. I've
collaborated with them quite a bit, and it's always amazing working with them
on a hard problem.

------
mathnmusic
Yet another tool for containerization technology :-)

While this is interesting, the containers that I'd appreciate, are: multiple
versions of programming languages & packages to be installed by each user
without affecting the system globally. Every language has their own separate
answer (virtualenv/venv, rbenv, node version manager etc).

Just today I struggled with "brew install python3" which complained with
"Error: python 2.7.14_2 is already installed" and the only option offered was
to upgrade. Whereas what I wanted was parallel installations of python 2.7 and
python3. :-/

~~~
clhodapp
That's exactly what Nix does! Nix is wonderful tech that unfortunately seems
doomed to low adoption due to poor usability. My belief is that this stems the
fact that maintaining the package tree is obscenely costly of contributor
time, leaving almost no time or energy to make progress on "niceties" like
improving the core CLI experience and the continuous integration system.

~~~
manveru
The Nix team (in particular grahamc with
[https://github.com/NixOS/ofborg](https://github.com/NixOS/ofborg)) is working
hard on making CI better . But it's true that the on-boarding experience and
CLI could be much improved.

It's quite frustrating to me, because after I've been using Nix & NixOS for
about 3 years now, there's no way I'd go back to the insanity out there. I
never had the amount of stability, control, and predictability on any other
platform, from configuring my OS, editors, WM, shell, random ricing, or
building VMs, Servers, Docker containers, production deploys, development
environments, there's really not much Nix can't handle. But getting familiar
with Nix took some serious effort I'm sure not many people can afford. I begin
to understand how frustrated people in the Lisp and Smalltalk communities must
feel while people slowly reinvent everything they've enjoyed for decades (and
Nix is still in its teens).

Teaching people about the value of having just a single package manager, no
matter what OS or language they're using, is mostly futile because there's so
much value attributed to being "mainstream" that now we have hundreds of
mainstream package managers with varying degrees of sophistication, security,
predictability, and ability to _not_ try and take over your system.

I use it to build and develop Go, Ruby, JS, Crystal, Elm, Mint, Haskell, Perl,
Bash, VimL, Elisp, Guile, and whatever else comes my way. All I have to do to
get a working and isolated dev env is go into that directory and let nix-shell
do the rest. On NixOS you also get nixos-container with all its benefits, so I
can test whole networks or just spin up some DBs, all behaving exactly the
same they'll be once deployed by simply reusing that configuration.

Of course it also manages Systemd, so I guess we'll just write yet another
function to slap a checksum on those containers and control them as well, it's
just a shame that humanity loses so much time reinventing the wheel every few
months.

On the other hand, there's still a lot of opportunities for start-ups in that
space, like [https://nixcloud.io](https://nixcloud.io),
[https://www.tweag.io](https://www.tweag.io) (they just hired Eelco Dolstra so
he can work full-time on the Nix core),
[https://www.packet.net](https://www.packet.net) (they sponsor almost all our
new CI infrastructure), or [https://vpsfree.org/](https://vpsfree.org/) (first
one to offer their own NixOS based distro specifically made for VPSs), which
are all doing well.

~~~
donbright
i spent a lot of time trying to make nix the 'standard build environment' for
an open source project. it was going really really well, even on top of
standard distros like ubunutu, redhat, etc.

the problem is OpenGL and Nvidia. NiX doesnt want to deal with NVidia's closed
source drivers, and i can't blame them, i got so depressed working on the
issue, that i gave up looking into it. there is no good solution. there are
only kludges and kludges of kludges.

------
cagenut
This is cool. I'm betting hard on k8s at $dayjob, but I often think of how
much _less_ I could be happy with. A simple daemon that used some
gossip/broadcast protocol to work out a list of systemd service states would
be juuust enough.

~~~
navaati
You mean exactly like CoreOS Fleet ?

~~~
irq-1
> fleet is no longer actively developed or maintained by CoreOS. CoreOS
> instead recommends Kubernetes for cluster orchestration.

[https://coreos.com/fleet/docs/latest/launching-containers-
fl...](https://coreos.com/fleet/docs/latest/launching-containers-fleet.html)

~~~
nickik
While they no want to do cluster orchestration, as a distributed systemd it
can still be useful. The core functionality is there and well tested.

No longer actively maintained is a problem, but I think it still has a quite a
few users.

------
CuriousSkeptic
I just spent a few days trying to navigate all this. Perhaps some one here
could enlighten me.

If I just want to download some semi-trusted source code. And Configure; make;
run; it in an environment where it can’t access anything unless I explicitly
white list it (in particular, it should not be allowed to access the internet
or any network resources, but I should be able to send http requests to it)

Which variant of all these container stuff would make things simplest to set
up?

Docker seems way overkill (and seems to have the wrong defaults for this need
anyways) but handcrafting things with iptables or ip netns seems a bit to low
level. (And, well, not “contained”... I would prefer something declarative
with automatic setup/teardown)

Ubuntu snaps looks like an interesting middle ground. But it also looks
completely dead.

Any tips?

Edit: should probably rtfa before asking. it actually looks like a good fit ;)

~~~
mongol
I think the systemd containerization features are worthwhile to explore. Both
portable services and nspawn. They will be around on most Linux systems "out
of the box" which is a big benefit. It will be like bash, maybe not the best
shell, but the one that is around and that you can count on.

------
geertj
Does the portable service run in a PID namespace? Couldn't find the response
in the article.

~~~
CameronNemo
I don't know. But if it does the service would need to reap zombies or systemd
would need to spawn yet another instance of itself inside the container (like
Rocket does).

------
king_nothing
I’ll take s8 and some sort of log-structured FS for local buffering of
structured messages logged to stderr, TYVM. No pid files, no lock files, no
log files to rotate awkwardly and no unstructured log files.

PS: log events shouldn’t be perceived of as lines of text but structured data
messages, so that they don’t need special parsing.

------
yarrel
The software that is just a better init system continues to not be.

~~~
donbright
"Because the Portable Service concepts introduces zero new metadata and just
builds on existing security and resource bundling features of systemd it's
implemented in a set of distinct tools, relatively disconnected from the rest
of systemd. "

clearly, this is a messy situation. lots of different pieces doing different
things in different ways. in order to harmonize this, and simplify it, I
believe we need to have systemdd, a daemon for systemd, so that all of it's
various pieces can be in one, neat, centralized location.

~~~
chris_wot
What a ridiculous comment. Keeping these tools relatively decoupled is a
_good_ thing.

~~~
dogecoinbase
I can't speak for the GP, but I believe their comment is parody. Keeping tools
decoupled is, of course, very good -- systemd is the antithesis of that norm.

~~~
chris_wot
Ah, if it was sarcasm then apologies - it's not something that one can easily
pick up through text.

~~~
donbright
That is no problem. We had our comments scattered out into separate pieces,
each doing their own thing in a non-standard comment idiom. In order to
simplify and organize things better, I have created commentd which will
combine them all into one comment. To save valuable disk space I have also
compressed them and put them in binary format.

begin 644 - M0F%S:6PZ($QI<W1E;BP@9&]N)W0@;65N=&EO;B!T:&4@=V%R(2!)(&UE;G1I
M;VYE9"!I="!O;F-E+"!B=70@22!T:&EN:R!)(&=O="!A=V%Y('=I=&@@:70@ )86QR:6=H="X* `
end

~~~
chris_wot
Base64 this:

H4sIAAAAAAAA/4vML1UvSlVIVEjJTM7WAwBgPKpvDgAAAA==

------
sandGorgon
Interestingly, systemd maintainers have consistently refused to make available
a Docker compatible version of systemd to run as the init 1.

Everyone brings their own here (e.g. [http://phusion.github.io/baseimage-
docker/](http://phusion.github.io/baseimage-docker/)) like supervisord,
runit,etc.

What would have been ideal is for a docker-compatible systemd to run inside
the container.

IMHO the maintainers are bent on creating a competing standard (like this one)
and don't want to build anything that brings the advantages of systemd to the
Docker ecosystem.

~~~
TheDong
> Interestingly, systemd maintainers have consistently refused to make
> available a Docker compatible version of systemd to run as the init 1.

Please provide evidence of that. I will provide counter-evidence, proving that
is not the case unless you have very compelling evidence.

1\. [https://lists.freedesktop.org/archives/systemd-
devel/2014-Ma...](https://lists.freedesktop.org/archives/systemd-
devel/2014-May/019005.html)

Poettering writes: > To say this explicitly: we are really interested in
making sure that systemd runs out-of-the-box in containers, and docker is just
one implementation of that.

2\. A PR to make systemd-logind work better in a docker container is accepted
with no issue:
[https://github.com/systemd/systemd/pull/4154](https://github.com/systemd/systemd/pull/4154)

3\. Running systemd as pid1 in docker (privileged or not) works, so clearly
the maintainers didn't do a good job of preventing it from happening. (just
google, there's plenty of info out there on how to do this)

I'm sure there are many other examples too, but this is with a quick glance
around.

I'd be interested in your reference before you spread such strange FUD.

~~~
cyphar
I would recommend speaking to someone from the LXC team. I only know of their
systemd pains second-hand, but there are many many many things that systemd
has done throughout its history that made it a royal pain to run in a
container. I also have my own battle-scars from systemd but they are mostly
related to systemd on the host rather than inside a container.

Here's one piece of evidence though[1]. Effectively systemd enabled a security
feature for the host's /dev/console and when we said it wasn't necessary for
containers (because /dev/console is not a real console) and it actually broke
runc, Lennart said that we should fix runc.

[1]:
[https://github.com/systemd/systemd/pull/4262](https://github.com/systemd/systemd/pull/4262)

~~~
JdeBP
That discussion does not match your summary of it.

This wasn't a security feature that got enabled. This was an expectation of
the long-standing semantics of /dev/console, which as M. Poettering pointed
out long pre-date systemd, being broken by a container manager.

So yes, it's right that the container manager be fixed so that it is possible
for every open file descriptor for /dev/console to be closed and then the
device re-opened again. Such semantics have been around longer than Linux
itself has; systemd is written to expect them. /dev/console is not supposed to
magically vanish/become inoperable once all currently open file descriptors
for it have been closed. Quite a lot of other softwares, including everything
that uses openlog() with LOG_CONS, expect this of /dev/console too.

Indeed, /dev/console is one of the very few device files mandated to exist by
the Single UNIX Specification (XBD part 10). The container manager was
actually setting up an execution environment that is not POSIX conformant.

And I observe that indeed said container manager _did_ get fixed.

* [https://github.com/containerd/console/pull/10](https://github.com/containerd/console/pull/10)

~~~
cyphar
GP was arguing that the idea that systemd would refuse to make a change that
would benefit running inside containers was ridiculous and required evidence.
I have required that evidence -- it is an example of a change in systemd being
rejected in favour of fixing it in the container runtime.

Of course we fixed it, and of course you can argue that Lennart was correct (I
still think that having SAK protections in a container is nonsensical but
that's all water under the bridge). I obviously agree that our /dev/console
handling was incorrect. In our defense, /dev/console doesn't actually make
much sense in a container since the purpose of /dev/console is to access the
physical console not the current PTY -- so anything we put there would still
be "wrong" from the standpoint of POSIX. But you can't just ignore
/dev/console because then a bunch of programs don't work. I could also go on
about how it was also a Go stdlib issue because we'd assumed io.Copy "did the
right thing" but it turns out it really doesn't handle any form of
interruptions properly. But I'm sure you're not interested in that discussion.

The point is that I agree it was fixed, and I agree that fixing it in the
container runtime was overall correct. But that wasn't the point I was making
-- it was that there have been examples where systemd has made a change that
broke running inside a container and they were not willing to make concessions
for container runtimes. Which is what GP was arguing about.

I do have plenty of other examples (cgroups are particularly fruitful for
systemd bugs that won't die), but they aren't really related to running inside
a container.

[ I'm not a hater of systemd, or Lennart. I actually really like having a
declarative service manager. My frustration comes from having to deal with it
when developing system tools that don't want to be tightly coupled with it.
That's where systemd really starts to get ugly to deal with. ]

> M. Poettering

What does the 'M' stand for? His first name is Lennart.

~~~
lwf
>> M. Poettering > What does the 'M' stand for? His first name is Lennart.

In French, "M." is short for Monsieur.

