
From 30 to 230 Docker containers per host - UkiahSmith
http://sven.stormbind.net/blog/posts/docker_from_30_to_230/
======
myrandomcomment
So at one point we where doing scale testing for our product where we needed
to simulate systems running our software connected back to a central point.
The idea was to run as many docker containers as we could on a server with
2x24 core and 512GB of RAM. The RAM needed for each container was very small.
No matter what the system would start to break around ~1000 containers (this
was 4 years ago). After doing may hours the normal debugging we did not see
anything on the network stack or linux limits side that we had not already
tweaked (so we thought).So out comes strace! Bingo! We found out that the
system could not handle the ARP cache with so many end points. Playing with
net.ipv4.neigh.default.gc_interval and the stuff associated with it got us up
to 2500+ containers.

~~~
ktpsns
512GB RAM/2500 containers is still 500MB per container. In former days™ this
was enough for a computer to run a complete desktop environment with a web
browser and 20 tabs open (source: I had a PC with physically 500MB RAM). Is
this really the limit for such a decent equipped machine? (I guess a server
grade 48 cores, 512GB RAM should be less then 5kEUR nowadays)

~~~
jjeaff
>512GB / 2500 is still 500mb per container

Am I missing something? 512gb / 2500 is around 200mb per container.

~~~
Mirioron
I think he miscalculated. His point still stands, just not with 20 browser
tabs.

~~~
hobs
The important bit is "running our software" \- depending on the complexity of
said software, 200MB is pretty meh.

~~~
codegladiator
wait, i had windows xp with firefox and all at 256 mb ram ! which i got later
upgraded to 384 mbs because i reused another 128 mb ram chip :)

~~~
philsnow
In ~2003, I replaced my desktop that had been Intel-based with a Duron 800MHz
system, only I didn't have enough budget to get it the RAM it required
(new/different slot iirc), so I only had the 128MB it came with (whereas my
old machine had 768MB cobbled together from like six dimms).

I figured that one hop over 100Mbit Ethernet to remote memory was going to be
faster then swapping to spinning rust (remember this was before consumer SSDs,
and onions on our belts), so I made a ramdisk on the old machine and mounted
it over the network with the nbd (network block device) kernel driver, ran
swapon on the nbd and boom, extra "512MB" of RAM.

It worked amazingly well, and (knock on wood) none of my roommates ever
tripped over the Ethernet cables.

~~~
u6eexrtxjjxjexr
So you're the guy who actually managed to "download more RAM". Congrats!

On a more serious note, gonna add that trick to my book - still plenty of rust
spinnin' round

~~~
paulmd
Infiniband QDR adapters are amazingly cheap and RDMA-aware software can use
DMA to poke directly at memory or devices in the other system.

------
ec109685
Is there any talk of increasing these defaults in higher memory systems. The
low defaults feel like foot guns that people stumble into rather than
something needed for optimal performance.

~~~
heipei
I've given up on expecting sane defaults from every piece of software. Some
packages work perfectly fine out of the box, or simply work not at peak
performance if not tuned slightly. Other software has so many dangerous
default settings that it's hard to understand the rationale.

Case in point for me is Docker itself. By default it will write logs to json-
file and not truncate or rotate these logs. Packages distributed for Ubuntu et
al also don't set these limits, so unless you manually make it a system-wide
default or set it for each container you run individually it will eventually
eat your disk-space. This is a very dangerous setting since you're probably
using Docker in a lot of cases where all you do is run ephemeral, stateless
workloads. Having out-of-disk creep up on you on these kind of hosts is most
unexpected. The same could be said of having the -p port forwarding option
default to binding to 0.0.0.0 instead of 127.0.0.1. Also a footgun and an
exposed ElasticSearch with PII waiting to happen...

~~~
sandGorgon
Could you point to some of the sane configuration for docker? We are planning
to run some of these in production.

~~~
heipei
For the logging aspect it's funny because the Docker manual itself contains a
good snippet with reasonable settings:
[https://docs.docker.com/config/containers/logging/configure/](https://docs.docker.com/config/containers/logging/configure/)

For the port-binding thing, I'd just remember that it binds to 0.0.0.0 when
not explicitly specified otherwise, and then use docker network and not port-
forwards unless absolutely needed. For example, if you have an application and
a couple of backing services (database, redis, ElasticSearch), then only your
application needs a port forwarded from the host, the rest can live within the
docker network.

~~~
rmetzler
Setting up a firewall like ufw could prevent accidental port mapping to
0.0.0.0. I really don’t like that this is Docker‘s default.

~~~
kozziollek
Last time I checked (~year ago) Docker used different iptables chain(s) than
ufw or added itself before ufw rules, so ufw was useless in securing access to
ports exposed by containers.

~~~
rmetzler
Ok, thanks for the heads up, I'm going to test this.

------
falsedan
The big bottleneck we had with docker containers per host was not sustained
peak but simultaneous start. This was with 1.6-1.8 but we’d see containers
failing to start if more than 10 or so (sometimes as low as 2!) were started
at the same time.

Hopefully rootless docker completely eliminates the races by removing the
kernel resource contention.

~~~
justincormack
Rootless docker uses user namespaces it is all still happening in the kernel.

------
mmaunder
"Access was initially fronted by nginx with consul-template generating the
config. When it did not scale anymore nginx was replaced by Traefik."

Wonder why Nginx didn't scale.

~~~
GorsyGentle
If I were to guess, reloads triggered from config changes.

Consul-template writes a config and then does an action. In the case of nginx,
I would assume the action is to send a SIGHUP. I think haproxy would have also
been an option here, it has better support for srv record to do updates from
and the like.

~~~
hvindin
Where I am at the moment we're running clusters of 400-800 containers sitting
behind nginx instances and even thought we own nginx+ licenses, we've found
the consul-template + SIGHUP route to be totally fine, even at a churn of
maybe a dozen contained a minute everything still seems to be working fine. If
a particularly busy node dies then we occassionally see a few requests get
errors back, but Nginx's passive healthchecking (ie. checking response codes
and not sending traffic to an upstream with a ton of 500's being returned)
seems to handle all of that ok.

The only time our tried and tested consul-template + SIGHUP method is every
unsuccessful (and we've ended up jusy having to put processes in place to stop
this) is if we have the same nginx handling inbound connections to the cluster
under high load and we try and respawn _all_ the containers at once. Then
things start to go wrong for 5 minutes or so then back to normal.

While "the occasions error response" isn't perfect, I suspect that for most
use cases it's good enough, so I'd still be interested in knowing more
specifically what happened to that nginx...

------
_nickwhite
"With /proc/sys/kernel/pid_max defaulting to 32768 we actually ran out of
PIDs. We increased that limit vastly, probably way beyond what we currently
need, to 500000. Actuall limit on 64bit systems is 222"

Time to start thinking about 128bit systems!

~~~
mkl
Copy-paste error: it's 2^22, which is 4194304.

------
lewaldman
Well... I'm running 100~150 per EC2 with Kubernetes... ¯\\_(ツ)_/¯

~~~
petrikapu
what’s your instance type? t3.medium in EKS gives me 11 pods capacity by
default.

~~~
lewaldman
m5.4xlarge currently but we are migrating to r5.2xlarge. (it will fit better
our memory/cpu ratio).

------
m0zg
They could easily double that density with Go, or quadruple with C++ or Rust.
Why people still use JRE I fail to understand.

~~~
rubyn00bie
The density per server savings is probably a drop in the hat compared to the
cost of the engineers themselves... also by the sounds of it memory usage
isn't the issue here which is the only thing I think you'd get from C++. I've
seen well written java applications do amazing things performance wise even
expert C programmers couldn't match (without an obtuse amount of effort).

~~~
m0zg

      >> The density per server savings is probably a drop in 
      >> the hat compared to the cost of the engineers themselves
    

For C++ and Rust yes, unless scale is huge. For Go, emphatically no. Go is
simpler to work with, and on typical programs it uses half the RAM and fewer
threads. Although even for C++ and Rust at medium scale I'd rather do proper
engineering, and pay my (rather than somebody else's) people better. "Hard"
languages tend to select better engineers. Go is a bit of an odd one in this
regard because it selects better engineers by not allowing the kind of
mindless GoF masturbation one often sees in Java programs, at the language
feature level. Can't abuse OOP if there's no OOP.

    
    
      >> [memory] is the only thing I think you'd get from C++
    

This is far, far from the "only" thing you'd get out of the languages I
mentioned. Quarter of the memory, as few threads as you'd like, access to
vector instruction sets, easier performance optimization where the code calls
for it, not having to tune your GC for smaller heap, not having to new up a
class every time you do anything (I know you don't have to, but that wouldn't
be idiomatic Java), etc, etc.

