
Another reason why Docker containers may be slow - pacavaca
https://hackernoon.com/another-reason-why-your-docker-containers-may-be-slow-d37207dec27f
======
mratzloff
One time we observed a dramatic drop-off in performance from one of our
services after a certain day that week. I looked at recent releases and saw it
perfectly coincided with one.

I asked the engineer in question to investigate, but after looking he said,
"It's nothing I could be doing."

So I sat with him and used git-bisect to prove to him it was his commit: he
had added trace logging within a couple of tight loops in the hottest parts of
the code base. I smiled.

"But it's trace. That's disabled in production. It can't be that," he said.
But we had already proven it was that commit, and the only thing that changed
was additional logging.

Long story short, the logging library was filtering calls by level just before
actually writing, rather than as close as possible to the call site—a design
bug, for sure.

I had him swap out the library everywhere it was being used.

Moral: logging is not free.

~~~
hvidgaard
Logging is never free. Either you add additional compile time cost to remove
it, or you add at the very least a single if and/or method call at runtime.

It's not easy problem to solve. The best I've come up with so far, is using a
decorator in conjunction with a DI framework that supports it. It's still only
a wrap around method calls, but it enables completely turning off logging in
production, and enable it when it is needed without making a special build, or
wasting resources during normal execution.

~~~
kbutler
Preprocessor-disabled logging (and generally compile-time disabled logging) is
free at run-time.

The compile-time cost of stripping out preprocessor macros disabled via #ifdef
is close enough to free you'd have a really hard time measuring it - O(n) on
number of lines of logging code, with n smaller than the number of lines of
actual code for any real scenario, and a very small multiplier.

The "cost" there is in the visual appearance of the source code.

------
dingo_bat
Nice debugging story but the conclusion was totally wrong! The author even
knows this. If they would be logging 3-4x the usual rate they would have seen
the same problem on bare metal too. Nothing to do with docker or competing
containers or whatever.

~~~
dbenhur
Not only that, but glog authors observed the overhead of frequent fadvise and
added rate limiting almost a year and half ago.
[https://github.com/google/glog/commit/dacd29679633c9b845708e...](https://github.com/google/glog/commit/dacd29679633c9b845708e7015bd2c79367a6ea2)

+1 perf. -1 stale library use. -1 misdirected learning

------
lurker9
I don't understand why more people don't use Solaris Zones, they seem to me to
be the superior solution by far, and with work done by Joyent you now have
modern LX-branded zones also. Is the lack of adoption mainly due to the fact
that it's Solaris, and not Linux?

(Solaris lives on in Illumos et al)

~~~
eikenberry
Solaris is not free software and the free software forks never gained much
traction. Plus it is now also associated with Oracle which brings along a lot
of extra distrust, particularly given how litigious they are.

~~~
travbrack
Illumos/SmartOS is FOSS

~~~
aaronchall
I remember being told Java is FOSS too... now tell that to Google...

~~~
asdbffg
Java _is_ FOSS.

Oracle used excuse, that it's current OpenJDK license (GPLv2) is incompatible
with license, used by Google's runtime (Apache 2). If Google re-licensed it's
Java implementation under GPL, some of arguments, used by Oracle lawyers (code
reuse and patent (?) violations), would have been void, and arguing about
reuse of APIs would have been a lot harder.

Of course, this does not really matter, because the whole lawsuit is just
excuse for power games between corporations. Oracle's goal wasn't about Java
licensing, it was about gaining some degree of control over emerging Android
ecosystem.

~~~
oneweekwonder
Can google pull a swift move?

~~~
monocasa
No, the copyright trial is about the "structure, sequence, and organization"
of the APIs, not any literal copying anymore. Switching languages, but keeping
the same class library still leaves them open.

------
dstroot
Better title: “nother reason why my code is slow and I’m logging too much”

~~~
pacavaca
The point was not to complain about how bad the Docker is but rather to
highlight that a lot of unexpected things may come from the fact that the
kernel is shared. "Logging too much" was just a reason for the posix_fadvise
being called too often but this becomes a problem ONLY when the kernel is
shared. In case of virtualization, everyone gets its own version of fadvise
(the kernel) and the conflict doesn't happen.

~~~
cjhanks
If your VM call to `fadvise` is not calling the underlying host kernel
operation, is it even working?

~~~
asdbffg
VM certainly does "call underlying host kernel operation", it just does so
indirectly — the guest userspace calls fadvise(), kernel implementation of
fadvise() asks the virtio disk driver to perform particular read/writes, the
virtio driver asks underlying kernel disk driver to read/write individual disk
sectors (without knowing, that they are related to specific file in guest
filesystem).

This specific bug was caused by putting high load on "kernel dentry cache",
e.g. a contention for memory structure, present in kernel memory. Guests
normally don't share memory, so contending for it was avoided.

Incidentally, there are situations, when different guests can compete for same
memory — when VM uses so-called "memory deduplication" techniques. Which is
why enabling that stuff on production systems may be a bad idea.

~~~
cjhanks
Thanks for explanation. I've seen the damage of KSMD as well.

------
rmrfrmrf
Isn't this kind of thing part of the reason why you log to stdout instead of
handling disk writes in your app?

~~~
crcl
A comment in the Medium article asked the same thing. The author's response:

> I agree that piping all logs to stdout would be the best solution in case of
> Dockerized microservices. It’s just that in our case we were porting an
> existing system, which a) already heavily relied on logging to files b)
> consisted of many microservices itself, which we couldn’t yet split into
> separate Docker containers but also couldn’t pipe all their logs to the same
> stdout.

~~~
rickycook
well that’s just an argument to use a pluggable logging system like in
python/log4j where you don’t configure logging in your service/library and
leave that up to the implementor. also let’s you log to a structured format
like JSON and pipe that to a logging aggregator

------
vivex
[https://sysdig.com/blog/container-isolation-gone-
wrong/](https://sysdig.com/blog/container-isolation-gone-wrong/)

------
jwildeboer
TL;DR logging using glog uncovered a bottleneck which was fixed upstream 9
months ago. Maybe keep your dependencies up2date? :)

~~~
voltagex_
Heh, was working with a router today that has a 6 (?) year old dependency -
not everyone is so lucky.

------
abpavel
"Due to a kernel bug..." made me question the whole article right there

------
jaequery
on osx, i know for a fact Docker for OSX is pretty darn slow due to its way
they handle filesystem.

but using Dinghy greatly helped sped everything up due to it using nfs. just
in case anyone wanted to know.

~~~
lilactown
It can't be worse than the way they handle the filesystem on Windows. I can't
even get events (e.g. watching for changes) to work on volumes.

~~~
NightlyDev
Reading a bunch of small files from a volume is insanely slow from me in
windows, so yeah...

------
jorisvh
This is not the original title. The original title is "Another reason why
_your_ Docker containers may be slow".

~~~
pacavaca
Hmm. I didn't notice that 'your' is not in the HN title. So, is it a filtering
or I just forgot to type it?

~~~
pacavaca
And yes, 'your' is important in this case. I definitely didn't want it to
sound like another Docker-blaming article.

