

When the kernel ABI has to change - moonboots
http://lwn.net/Articles/557082/

======
yk

         The plan is to add a new control interface, and any new
        features will probably only work with that new interface, 
        but the existing interface, including multiple 
        hierarchies, will continue to be supported until it's 
        clear that it is no longer being used. 
    

So they will support it forever or until such a time that devs do the 'right'
thing instead of the quick hack, whichever comes first.

~~~
gizmo686
Rule 1 of kernel development: don't break userspace.

------
jeltz
I do not like that they are creating a new API which is harder to use (see the
example from Tim at Google) while the kernel policy also means that these to
interfaces will have to be maintained forever (or almost forever). I cannot
see how this will simplify the internal code.

------
contingencies
I am absolutely no kernel expert but use cgroups quite a lot. I wrote most of
[http://en.gentoo-wiki.com/wiki/LXC](http://en.gentoo-wiki.com/wiki/LXC),
wrote and maintain [https://github.com/globalcitizen/lxc-
gentoo](https://github.com/globalcitizen/lxc-gentoo), and have previously had
interactions with Serge & Daniel back when they were at IBM. Gentoo doesn't
use systemd by default, but can allegedly be configured to do so. I offer only
my perspective.

First, cgroups are far better than nothing and are immensely valuable as is.

Second, individual kernel subsystem cgroup controllers that present issues
when used in conjunction should be able to come up with _some_ sort of
solution, even if it is complex, takes some time and is suboptimal from a
performance or resource management perspective. Some of them already do this
to implement their own functionality (for example memory and swap accounting
overheads). If people don't want the overheads, they can opt to exclude them
from the kernel. Working code is the ultimate goal, and we are after all
talking about an entirely new feature set that favours higher level userland
requirements of resource management over efficiency (and delivers them far
more efficiently than paravirt, the dominant near-term alternative).

Third, resource management - regardless of the resource - seems to be
essentially done in two ways: declarative (formalistic allocation of declared
portions of a known total) and ad-hoc (swapping, OOM killer, some forms of
rate limiting, etc.). The issues cited in the post seem to fall across these
boundaries. For the first case, it is generally possible to validate the
sanity of a policy and reject or warn on potential craziness. Particularly in
the second case, complex interactions are somewhat inevitable and should be
considered expected. In such cases, there is no substitute for testing.

Fourth, the security allegations are in my view somewhat invalid, since it is
easily possible to deny cgroup filesystem access to a process. Similar to how
many modern daemons drop privileges and/or chroot after startup, similar
privilege transitions could easily occur regarding cgroup filesystem access
(change to a user without access, chroot, apply a kernel security toolkit
based policy, etc.). This could potentially be implemented as a capability
(eg. drop a new capability, CAP_SYS_CGROUP).

Fifth, the comment _Multiple hierarchies are seen to be misconceived and
unmaintainable on their face_ seems somewhat loaded. What it really seems to
be saying is that simultaneously referencing multiple, potentially conflicting
hierarchies from within a cgroup controller in order to manage a single
resource or subsystem is potentially fundamentally misconceived. While this
may be true, it does not mean that the in-kernel maintenance of and transition
between multiple hierarchies is without value. It seems like the kernel people
are thinking in terms of the integrity of cgroup controller logic determinism
and wondering why anyone would want to tag processes from any other
perspective, whereas the real users such as myself and Google are using the
multi-hierarchy subsystem to semantically tag processes as a precursor to the
later assignment of post-facto resource management policies. (For example,
consider a critical power management event based resource allocation policy
change. It could be very valuable to change management hierarchies in such a
case, one potential example might be to reclassify block IO intensive
processes at higher priorities across multiple cgroup controllers in order to
ensure rapid termination prior to the exhaustion of power. Or the inverse: if
a solar powered glider found that energy had returned for an estimated period
above a short threshold, it could thaw and/or reallocate greater resources to
higher power draw batch communications or processing tasks. These examples are
hinged upon the assumption that ultimately, embedded power systems with enough
complexity and interplay do probably belong in userspace. There are probably
far better examples.)

~~~
jeltz
So it seems most real world users of cgroups are against the change, which in
my world means that the change probably should not be done.

