

Flat Docker Images - mpasternacki
http://3ofcoins.net/2013/09/22/flat-docker-images/

======
dkulchenko
Very cool. Here's the corresponding GH issue for docker itself to get this
sort of functionality:
[https://github.com/dotcloud/docker/issues/332](https://github.com/dotcloud/docker/issues/332)

~~~
mpasternacki
I have even linked to it in the article. I am going to comment and suggest
this kind of approach - I wanted first to flesh out the idea and validate the
proof of concept in this script.

------
ef4
I think the intermediate images are a deliberate feature, at least during
Dockerfile development. They serve as a cache of the results at each step, so
that expensive steps don't rerun between invocations of the Dockerfile, even
if you've been editing later steps.

I'm basing this off the documentation I've read, I haven't tried it myself.

~~~
mpasternacki
Yes, this is convenient and speeds up Dockerfile development. At the same
time, this is an issue when you consider using Docker as a part of your
production deployment toolchain. I think both ways should be supported:
incremental multi-layered images for development or exploration, and then
possibility of creating a single, compact image for lower overhead in
deployment. In a perfect world, it would be a switch for `docker build`, but
I'm not fluent enough in Go to propose a solution on that side.

Some people have considered another approach: flattening an existing stack of
images. Scripts for that are linked from the Docker issue on GitHub. I wasn't
able to get any of these working, and the logic behind these seemed quite
convoluted.

Still, my script is just a proof of concept - I tried whether it is possible
to take the approach I use internally for Docker build scripts, and use it to
build Dockerfiles. It seems it's possible, and it delivers good results. Time
and actual usage will show whether it's a good idea; if this approach makes
sense, it will hopefully make its way to the Docker core, and my hack won't
stay relevant for too long.

~~~
pault
Can you not just `docker export [imageid] > myimage.tar` and `docker import -
< myimage.tar`?

EDIT: from the [github
issue]([https://github.com/dotcloud/docker/issues/332#issuecomment-2...](https://github.com/dotcloud/docker/issues/332#issuecomment-23034961)):

"Currently the only way to "squash" the image is to create a container from
it, export that container into a raw tarball, and re-import that as an image.
Unfortunately that will cause all image metadata to be lost, including its
history but also ports, env, default command, maintainer info etc. So it's
really not great."

~~~
mpasternacki
I want to inherit from the base image, to keep the shared files actually
shared. This is a feature. It's just the dozen layers of that inheritance that
bothers me.

------
gmuslera
In Docker 0.6.2 was added -rm as builder parameter to delete intermediate
containers

~~~
mpasternacki
Doesn't it just remove the containers, but still keep the generated images
layered? I'll take a look (running 0.6.1 here), but it seems to solve a
different issue.

------
themodelplumber
As a graphic designer who clicked: Darn.

------
darklajid
Nice, thanks!

That allows me to work around a 'blocker' [1] for me right now: I can write a
normal Dockerfile, use this ready made tool to test/build the image until my
'docker build' issue is resolved one way or another. Cool!

1:
[https://github.com/dotcloud/docker/issues/1916](https://github.com/dotcloud/docker/issues/1916)

------
nickstinemates
That's a pretty perl script. Well done.

~~~
noonespecial
It comes as a bit of a surprise to some, but a huge amount of this kind of
"advanced sysadmin" is carried out in perl.

The kind of people who use perl like this are generally much more advanced
users of perl than the developers of the clumsy cgi's that might have formed
your first impressions of perl.

~~~
xb95
Yup. Matt's Script Archive was a long time ago. Modern Perl is about a million
times better and easier to use than it used to be.

Dancer, Moose, DBIx::Class (a bit more advanced), Plack/PSGI, etc etc. Not to
mention the language changes they've made with 5.10/5.12/5.14/etc (I'm in love
with the defined-or operator).

~~~
noonespecial
>Yup. Matt's Script Archive was a long time ago...

Argh. FormMail. _FormMail._ Its a near-PTSD-like flashback.

------
_lex
"Docker itself is intentionally limited: when you start a container, you’re
allowed to run only a single command, and that’s all. "

\- Not exactly true, you can launch a shell in that one command, in
interactive mode, so you're then able to run as many commands in that shell as
you'd like.

~~~
joshstrange
Did you read the rest of the post? He goes on to say you can do that. In fact
the entire message of the post is that entering into the container and running
commands and then committing at end is more efficient (In the terms of numbers
of layers and disk space, at the cost of what else I don't know yet) than
using a Dockerfile and 'docker build'.

~~~
mpasternacki
The main cost is less clarity: the build steps aren't isolated anymore, so
it's harder to pinpoint issues. There's also obvious risk of my script not
interpreting all options correctly.

Actually, I've just disabled VOLUME statement in the script, as it seems to be
a no-op in Docker. Only trace it leaves in the image is setting image's
_command_ to '/bin/sh -c "#(nop) VOLUME [\"/data\"]"'.

------
contingencies
_TLDR_ : Docker uses _aufs_ to provide copy-on-write snapshots, integral to
docker container-image builds. _aufs_ is not that widely used, and reportedly
has a depth limit of 42. This script flattens an entire build process to a
single snapshot to avoid said issue.

 _Context_ : docker people have already announced an intention to work to
unlink themselves from _aufs_ dependency.

 _Alternatives /Reality-check_: LVM2 can provide snapshots at the block layer:
either through the normal approach with a single depth limit (though you can
un-snapshot a snapshot through a process known as a _merge_ , and then
snapshot again as required), or through the new/experimental _thin
provisioning_ driver to get arbitrary depth (but 16GB max volume size). In
both cases it's filesystem neutral, and the first approach is very widely
deployed which means no roll-thy-own-kernel requirement. _zfs_ and _btrfs_
also provide snapshots, but are historically respectively poorly
supported/slow (userspace driver or build your own kernel for _zfs_ ) and
unfinished/in development ( _btrfs_ ). Linux also supports the snapshot-
capable filesystems _fossil_ (from plan9), _gpfs_ (from IBM), _nilfs_ (from
NTT). A related set of options are cluster filesystems with built-in
replication, see
[https://en.wikipedia.org/wiki/Clustered_file_system#Distribu...](https://en.wikipedia.org/wiki/Clustered_file_system#Distributed_file_systems)
Overall, the architectural perspective on various storage design options can
be hard to grasp without digging, and higher-layer solutions such as NoSQL
distributed datastore applications remain strong options in many cases.

 _Trend /future?_: Containers in general are moving towards formalizing the
"here's what I need: _x_ -depth snapshots with _y_ -availability and _z_
-redundancy" environment requirements specifications for software. In the
nearish future I predict that we'll see this in terms of all types of
resources (network access at layers 2 and 3, CPU, memory, disk IO, disk space,
etc.) for complex, multi-component software systems as CI/CD processes mature
and container-friendly software packaging becomes normalized (we're already
much of the way there for single hosts - eg. with Linux _cgroups_ ).
Infrastructure will become 'smarter', and the historical disconnect between
network gear and computing hosts will begin to break down. Systems and network
administration will tend to merge, and the skillsets will become rarer as a
result of automation.

~~~
ithkuil
LVM snapshots have some issues:

* you have to preallocate the size of the snapshot back storage. * you create N snapshots of the same base block device. For each block changed in the base, each of the snapshots will get a copy on write block added to the snapshot backing storage. * you cannot resize snapshot (I mean logical volume size, not the storage area for cow data) * you cannot shrink snapshot backing storage

snapshot aware filesystems solve these issues. The slowness of ZFS you mention
is only true for the fuse based toy driver. The license incompatibility
between ZFS and the linux kernel is source of much confusion. All it means is
that you cannot distribute linux kernel binaries linked with ZFS code (where a
kernel module can be seen as parts of the linux kernel API linked with ZFS
code). However nothing prevents you compiling the module on your machine, and
there is a nicely packaged solution for doing this for, with support for
distributions:

[http://zfsonlinux.org/](http://zfsonlinux.org/)

there is also a new place for promoting zfs: [http://open-
zfs.org](http://open-zfs.org)

AuFS seems to me a rather pragmatic approach for those who don't need the
advanced features and performance of an advanced filesystem, yet don't want to
waste IO bandwidth just to provision a lightweight container.

~~~
contingencies
All good points. I guess in response the only two things I would add are: (1)
If snapshots are for backup (most frequent use case? I guess so!) then LVM2
can do it for you without an exotic FS already. Sure, you may have to
preallocate. But it's generic (not filesystem-linked), so if you're an
infrastructure provider it future proofs your backup implementation. Sometimes
that's worth a lot more due to engineering and testing cycles. (2) You
probably _can_ shrink snapshot backing storage if you remove them, for example
after the snapshot is complete and the data has been subsequently copied
elsewhere to long term storage (cheaper/slower/remoter/more geographically
dispersed disks?). You can make a new one next time you need it. That said,
people who are that short on disk space are few and far between these days...
it's cheap.

------
consonants
Off-topic, but pertains to Docker images:

Do people usually roll out their own images from source/based on verified
binaries from the parent distribution's repositories or are base images
provided by the community?

~~~
mpasternacki
I've seen both; Docker's main registry provides some base images (the most
used is named `ubuntu` and has base system for Precise and Raring), and I've
seen many imaged descending from author's own base - it's quite easy to
prepare a base image using debuild or other distros' equivalents. Can't talk
about not debian-ish distributions, but debuild does verify its downloads.

The place of trust here is the registry - usually, for convenience, tags are
used rather than hashes (and I'm still quite not sure whether the long hex IDs
are hashes, or just unique random names). The registry returns hex id for a
given tag, and is trusted to deliver correct files for an ID.

I believe that the main index/registry runs over https and provides basic
security, but it would be a huge issue if it was compromised. It's quite easy
to run your own registry, too. What I'd love to see on top of that is some
kind of GPG-based verification of downloaded images (Debian got the problem
basically solved in Apt).

