

Docker at Spotify - pini42
http://continuousdelivery.uglyduckling.nl/uncategorized/docker-at-spotify/

======
quaunaut
I've been watching Docker ever since Flynn was announced, and started using it
seriously this last December. Primarily, I'm trying to use it as a faster
alternative to Chef/Puppet, running it over a Vagrant precise64 box, to
potentially be something I can shove onto a foreign machine without much
worry.

Overall, this has actually worked fairly well- getting certain containers up
and running, and having them work together, hasn't been very difficult. But,
there's one place that's consistently been a thorn in my side: PostgreSQL.

Currently, there is this article in Docker.io's docs[1] that claims to help
you set up a Postgres container, but so far I've not seen it work for me or
anyone else I've spoken to about it. It's primarily come from how Postgres is
traditionally installed/ran:

1\. Service is started. This can either be an 'active' Postgres server, or in
the background. This creates a Postmaster.pid file, which ensures no other
Postgres servers are running on the machine(I'm not entirely sure why, beyond
ease-of-use scenarios).

2\. Service is stopped. This deletes the Postmaster.pid file.

In the case of Docker containers, this presents a bit of a problem. If you try
to start the Postgres container that way, it will run, but the Postmaster.pid
seems to sit outside of it. In the process, once you shut that container down,
you actually can't start another without error- the Postmaster.pid stops you.

The other alternative, is having a container running with Postgres as a
background process to something else. This can work, but you start having to
use services that can keep a process running while letting you access
/bin/bash to still get at Postgres.

Neither of these potential solutions allow you to retain data, either.

I've done a lot of poking around, but I haven't had a lot of success in
figuring out how to get past this. Those who are using Docker as a Puppet/Chef
replacement, especially with Vagrant- what are you usually doing for a
database solution?

1\.
[http://docs.docker.io/en/latest/examples/postgresql_service/](http://docs.docker.io/en/latest/examples/postgresql_service/)

~~~
boothead
Here's my docker file for running a postgres 9.3 with plv8

    
    
        FROM    boothead/saucy
    
        ENV     DEBIAN_FRONTEND noninteractive
    
        # Add repository and install PostgreSQL 9.3
        RUN     echo "deb http://apt.postgresql.org/pub/repos/apt/ precise-pgdg main 9.3" >> /etc/apt/sources.list
        RUN     apt-get update
        RUN     apt-get -y --force-yes -f install postgresql-9.3 postgresql-client-9.3 libicu48 wget
    
        RUN     wget http://www.mirrorservice.org/sites/archive.ubuntu.com/ubuntu//pool/universe/libv/libv8/libv8-3.7.12.22_3.7.12.22-3_amd64.deb
        RUN     dpkg -i libv8-3.7.12.22_3.7.12.22-3_amd64.deb
        RUN     apt-get -y --force-yes -f install postgresql-9.3-plv8
    
    
        # Postgres is started now, shut it down and replace config
        RUN     service postgresql stop
        ADD     pg_hba.conf     /etc/postgresql/9.3/main/
        ADD     pg_ident.conf   /etc/postgresql/9.3/main/
        ADD     postgresql.conf /etc/postgresql/9.3/main/
    
        # Create superuser, tempo db and plv8 language
        ADD     create_user.sh /
        RUN     /bin/bash /create_user.sh
        RUN     rm /create_user.sh
    
        # Configure for running in container
        EXPOSE  5432
        # VOLUME ["/var/lib/postgresql/9.3/main"]
        CMD     ["/bin/su", "postgres", "-c", "/usr/lib/postgresql/9.3/bin/postgres -D /var/lib/postgresql/9.3/main -c config_file=/etc/postgresql/9.3/main/postgresql.conf"]
    

create_user.sh is basically service postgresql start; su echo <SQL> | psql;
service postgresql stop;

This is fine for me to spin up a postgres instance at some known state for
development, but it's a bit dissatisfying for managing the data separately
from the container. Anyone got any tips for integrating volumes with this?

~~~
sehrope
I've been trying out Docker recently but haven't tried running persistent
services in it yet (only long running programs like apps). From what I've
gathered so far, Docker allows you to specify volumes (via _-v
/host/path:/container/path_) that are mounted directly and don't get included
in the container's copy-on-write filesystem (basically shared directories with
the host). You can use this to pass data into the container (ex: a SQL file to
init your schema) or pass data out of the container (ex: the DB files
themselves or the logs). I've been using the latter to centralize Dokku
logs[1].

To split out the Postgres setup and the initial schema setup, one idea might
be to have a the CMD for the container run a script to automatically check for
a bootstrap file on startup and run it. The bootstrap file could be specified
via a volume mount. To use the container with a different dev setup (i.e. a
different bootstrap SQL script) you'd simply start up the container with the
shared directory pointing somewhere else.

If you want to save the Postgres data files themselves outside of the
container you can again do it with volume mounts but you'll need some way to
keep track of which goes where. The volume mounts are specified each time you
startup the container so you need something to save those. For Dokku
specifically there's the PG plugin and it looks like it does exactly this[2].
I haven't use it but I guess it validates the idea.

This seems like a general trend with Docker; it's really cool tech but it's
pretty low level so you need something atop it to make usage smoother.

[1]: [https://github.com/sehrope/dokku-logging-
supervisord](https://github.com/sehrope/dokku-logging-supervisord)

[2]: [https://github.com/Kloadut/dokku-pg-
plugin/blob/master/comma...](https://github.com/Kloadut/dokku-pg-
plugin/blob/master/commands#L28)

------
tinco
I like how no one is paying heed to the "Don't use Docker in production"
warning on the Docker site. Every once in a while they release a version
that's buggy (like 0.7.3) and we fix to the previous version and wait for the
next version.

------
ermintrude
They let devs ssh onto production boxes and do what they want?!?! "Things
diverge pretty quickly" \- I'm sure they do. That sounds like a recipe for
disaster...

Docker looks promising but until there's a way to allocate maximum resources
to a container I wouldn't use it in production. VMs are much slower to start
but at least a runaway process on one won't affect other VMs on the same
hardware.

Docker only lets you hint that only a certain number of cores should be used,
so a bad process might monopolise the physical machine. Also until other tools
(ansible, salt, chef, etc) help with provisioning you need to edit the
dockerfile to change parameters (eg for settings that depend on your
environment like smtp end points test vs prod, etc).

If these points are addressed I think it'll be awesome though.

~~~
ldng
Aren't cgroups be more then a hitting ? Or do you mean that by default does
not setup strict enough cgroups ?

~~~
ermintrude
It seems cpu.shares allow a proportional amount of CPU to be allocated, but
LXC may use up to 100% of all idle CPUs. You can't specify a hard-limit of
"only use 1GHz when I have a 2GHz CPU" apparently (according to here:
[http://comments.gmane.org/gmane.linux.kernel.containers.lxc....](http://comments.gmane.org/gmane.linux.kernel.containers.lxc.general/2475)).
So I guess that could mean that if the containers are using 100% of all CPUs,
there wouldn't be resources left e.g. for monitoring on the host? Anyone got
any experience with this?

------
afandian
If there are any Spotify people reading, please please _please_ fix the
search/filter regression. You broke it a year-ish ago. It's nice to hear about
innovations but a bit galling to paying customers when you break an essential
feature and don't fix it.

To reproduce: be on an artist page. Now try and find a track with a given name
on that page. I dare you. Without reading every single track with your eyes.
You used to be able to filter that list.

[http://community.spotify.com/t5/Spotify-Ideas/0-8-8-Bring-
ba...](http://community.spotify.com/t5/Spotify-Ideas/0-8-8-Bring-back-the-
filter-option-on-album-artist-pages/idi-p/261856)

------
casca
"* Please note Docker is currently under heavy developement. It should not be
used in production (yet)." [1]

[1] [http://www.docker.io/learn_more/](http://www.docker.io/learn_more/)

~~~
allbutlost
This is addressed in the talk. It's a short talk, so may be worth listening to
it to see how it was used.

~~~
casca
Agreed, but I'd have thought that it was worth raising for those who are not
familiar with where Docker is in the development cycle.

------
ksec
Why Docker? Why not Packer.io? Why No Love for OSv?

