
Systemd's DefaultTasksMax makes problems for MySQL/RabbitMQ/Docker/lxc/make/ - JdeBP
https://github.com/systemd/systemd/issues/3211
======
JdeBP
The systemd mechanism here was proposed in November 2015. Lennart Poettering
acknowledged that it "had the potential to break some daemons".

* [https://lists.freedesktop.org/archives/systemd-devel/2015-No...](https://lists.freedesktop.org/archives/systemd-devel/2015-November/035006.html)

In conversation with Lennart Poettering, systemd developer Zbigniew
Jędrzejewski-Szmek's position was to turn the mechanism on and gather
statistics about who and what it broke. Lennart Poettering did this about 20
hours later. Within a week, this was in the released version of systemd.

* [https://lists.freedesktop.org/archives/systemd-devel/2015-No...](https://lists.freedesktop.org/archives/systemd-devel/2015-November/035011.html)

* [https://github.com/systemd/systemd/commit/0af20ea2ee2af2bcf2...](https://github.com/systemd/systemd/commit/0af20ea2ee2af2bcf2258e7a8e1a13181a6a75d6)

* [https://github.com/systemd/systemd/commit/9ded9cd14cc03c6729...](https://github.com/systemd/systemd/commit/9ded9cd14cc03c67291b10a5c42ce5094ba0912f)

* [https://github.com/systemd/systemd/blob/ccddd104fc95e0e76914...](https://github.com/systemd/systemd/blob/ccddd104fc95e0e769142af6e1fe1edec5be70a6/NEWS#L373-393)

Their position has been since that the correct thing to do if one hits this is
_not_ to revert Lennart Poettering's change to the DefaultTasksMax and
DefaulTasksAccounting settings, but to tweak the service unit files of any and
all affected daemons.

The systemd bug report paints this as an Ubuntu problem, in part because the
systemd developer who submitted it is also the Ubuntu maintainer for systemd,
but there are actually quite a lot of bug reports _across several
distributions_ that relate to this. Several people have simply placed
TasksMax=infinity in their service unit files; but a few have indeed reverted
the DefaultTasksMax setting, as the Ubuntu maintainer also asked for.

The maximum applies to kernel "tasks", not to processes. So whilst one can hit
it by forking a lot of processes, one can also hit it with only a few
processes that happen to use a lot of threads per process. This is of
particular concern to MySQL, MariaDB, Percona, and their ilk, which use a
thread per client connection. As Brad Mashall discovered, the effect is that
the SQL servers start refusing client connections.

* [https://bugs.launchpad.net/charms/+source/percona-cluster/+b...](https://bugs.launchpad.net/charms/+source/percona-cluster/+bug/1578080)

Someone hit this with a multi-threaded Java program and reported it to Arch.

* [https://bugs.archlinux.org/task/47787](https://bugs.archlinux.org/task/47787)

The issue was clouded by two things: First, it was obscured by _another_
systemd issue that imposes a task limit on every user service spawned by a
user's per-user instance of systemd. Second, it was obscured by the
complication that GNOME Terminal does not spawn terminal instances directly
but rather uses Desktop Bus to spawn them remotely via "DBUS activation",
resulting in terminals and the processes in their associated login sessions
having the _non_ -login-session task maxima applied.

* [https://github.com/systemd/systemd/issues/1955](https://github.com/systemd/systemd/issues/1955)

Others have reproduced this with lots of processes. Dominique Leuenberger
reported to OpenSUSE that this affects a lot of xyr builds because of parallel
make. Gustavo Romero created a short makefile to use with parallel make (make
-j500) that exhibited the problem.

* [https://bugzilla.opensuse.org/show_bug.cgi?id=965564](https://bugzilla.opensuse.org/show_bug.cgi?id=965564)

* [https://bugs.launchpad.net/ubuntu/+source/ubuntu-meta/+bug/1...](https://bugs.launchpad.net/ubuntu/+source/ubuntu-meta/+bug/1561658/comments/1)

Marko Mihovilic reported to Arch a similar problem on build machines.

* [https://bugs.archlinux.org/task/47662#comment142757](https://bugs.archlinux.org/task/47662#comment142757)

Jakub Sztandera reported to Arch that LXC encounters this limit. Other people
observed there that it has affected Java services such as Freenet. Lennart
Poettering reiterated that people should not revert his changes but instead
change their individual TasksMax settings.

* [https://bugs.archlinux.org/task/47303](https://bugs.archlinux.org/task/47303)

* [https://bbs.archlinux.org/viewtopic.php?id=207255](https://bbs.archlinux.org/viewtopic.php?id=207255)

* [https://github.com/systemd/systemd/issues/2388](https://github.com/systemd/systemd/issues/2388)

* [https://github.com/systemd/systemd/issues/2388#issuecomment-...](https://github.com/systemd/systemd/issues/2388#issuecomment-173320246)

It affected Docker, and this was reported to RedHat. The Docker people then
noticed that all of the people who had CentOS 7, Debian 8, and the like were
seeing a message about an unknown systemd service unit directive. Others
(including Nathan Cutler, who reported the systemd problem to Ceph) had seen
this, and after discussion considered it cosmetic, as it did not in fact
prevent services from running on older versions of systemd. The Docker people,
however, promptly reverted their fix. The Docker position is now that "people
on newer kernels" have to maintain their own specially tweaked versions of
systemd service unit files.

* [https://github.com/docker/docker/issues/9868](https://github.com/docker/docker/issues/9868)

* [https://github.com/docker/docker/pull/19391](https://github.com/docker/docker/pull/19391)

* [https://bugzilla.redhat.com/show_bug.cgi?id=1311750](https://bugzilla.redhat.com/show_bug.cgi?id=1311750)

* [http://tracker.ceph.com/issues/15583](http://tracker.ceph.com/issues/15583)

* [https://github.com/docker/docker/issues/20036](https://github.com/docker/docker/issues/20036)

It hit Azk, too.

* [https://github.com/azukiapp/azk/issues/634](https://github.com/azukiapp/azk/issues/634)

Sebastian Schmidt hit this with Chrome, where the entire session spawned by
xdm was limited to 512 threads in total, and reported it to Debian.

* [https://bugs.chromium.org/p/chromium/issues/detail?id=602002](https://bugs.chromium.org/p/chromium/issues/detail?id=602002)

* [https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=823530](https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=823530)

Breno Leitão hit this on Ubuntu, with the SSH server under one set of control
groups on one system and under a different set on another, the result of a
dependency chain involving PAM and PolicyKit.

* [https://bugs.launchpad.net/ubuntu/+source/ubuntu-meta/+bug/1...](https://bugs.launchpad.net/ubuntu/+source/ubuntu-meta/+bug/1561658)

