
Simple Dockerfile examples are often broken by default - itamarst
https://pythonspeed.com/articles/dockerizing-python-is-hard/
======
btilly
I have a mixed opinion about his first point.

There are two basic approaches to take with dependency management.

The first version is to lock down every dependency as tightly as you can to
avoid accidentally breaking something. Which inevitably leads down the road to
everything being locked to something archaic that can't be upgraded easily,
and is incompatible with everything else. But with no idea what will break, or
how to upgrade. I currently work at a company that went down that path and is
now suffering for it.

The second version is upgrade early, upgrade often. This will occasionally
lead to problems, but they tend to be temporary and easily fixed. And in the
long run, your system will age better. Google is an excellent example of a
company that does this.

The post assumes that the first version should be your model. But having seen
both up close and personal, my sympathies actually lie with the second.

This is not to say that I'm against reproducible builds. I'm not. But if you
want to lock down version numbers for a specific release, have an automated
tool supply the right ones for you. And make it trivial to upgrade early, and
upgrade often.

~~~
oconnor663
> The first version is to lock down every dependency as tightly as you can to
> avoid accidentally breaking something...The second version is upgrade early,
> upgrade often...Google is an excellent example of a company that does this.

This is misleading. My understanding of Google's internal build systems is
that they _ruthlessly_ lock down the version of every single dependency, up to
and including the compiler binary itself. They then provide tooling on top of
that to make it easier to upgrade those locked down versions regularly.

The core problem is that when your codebase gets to the kind of scale that
Google's has, if you can't reproduce the entire universe of your dependencies,
there is no way any historical commit of anything will ever build. That makes
it difficult to do basic things like maintain release branches or bisect bugs.

> if you want to lock down version numbers for a specific release, have an
> automated tool supply the right ones for you. And make it trivial to upgrade
> early, and upgrade often.

This part sounds like a more accurate description of what Google and others
do, yes.

~~~
OJFord
For an easy open source example of such tooling, see Pyup.

We use it to do exactly that: pin down every dependency to an exact version,
but automatically build and test with newly released versions of each one.
(And then merge the upgrade, after fixing any issue.)

~~~
jrochkind1
Or the original ruby bundler, which locks down exact versions in a
`Gemfile.lock`, but lets you easily update to latest version(s) with `bundle
update`, which will update the `Gemfile.lock`.

Actually, it goes further, `bundle update` doesn't update just to "latest
version", but to latest version allowed by your direct or transitive version
restrictions.

I believe `yarn` ends up working similar in JS?

To me, this is definitely the best practice pattern for dependency management.
You definitely need to ruthlessly lock down the exact versions used, in a file
that's checked into the repo -- so all builds will use the exact same
versions, whether deployment builds or CI builds or whatever. But you also
need tooling that lets you easily update the versions, and change the file
recording the exact versions that's in the repo.

I'm not sure how/if you can do that reliably and easily with the sorts of
dependencies discussed in the OP or in Dockerfiles in general... but it seems
clear to me it's the goal.

------
Nullabillity
Good points, but it's amusing that his solution to #1 didn't lock down the
patch version, nor the distro around it. I think that also makes a decent
point for Nix[0], which solves #1-#3 by default (since choosing a particular
version of Nixpkgs locks down the whole environment, and considers the build
as a DAG of dependencies rather than a linear history). It also supports
exporting Docker images, while preserving Nix's richer build caching.[1]

[0]: [https://nixos.org/nix/](https://nixos.org/nix/)

[1]: [https://grahamc.com/blog/nix-and-layered-docker-
images](https://grahamc.com/blog/nix-and-layered-docker-images)

~~~
itamarst
Good point, will go fix that. My soon-to-be-ready attempt at a production-
ready template
([https://pythonspeed.com/products/pythoncontainer/](https://pythonspeed.com/products/pythoncontainer/))
covers the tradeoff between point releases vs. not-point-releases, and it does
pin the OS.

And yes, Nix fixes some of the problems of building a production-ready image,
but only a subset.

~~~
j88439h84
Could you elaborate on the remaining problems with Nix for building Python
images?

~~~
itamarst
Not an expert on Nix, but it's not so much that Nix has problems (though I'm
sure it does, my initial research suggested it's not quite there yet for
Python packages) but that there other things you need to get right.

For example:

1\. Signal handling (only one bit of [https://hynek.me/articles/docker-
signals/](https://hynek.me/articles/docker-signals/) is Dockerfile specific,
the rest still applies.)

2\. Configuring servers to run correctly in Docker environments (e.g. Gunicorn
is broken by default, and some of these issues go beyond Gunicorn:
[https://pythonspeed.com/articles/gunicorn-in-
docker/](https://pythonspeed.com/articles/gunicorn-in-docker/)).

3\. Not running as root, and dropping capabilities.

4\. Building pinned dependencies for Python that you can feed to Nix.

5\. Having processes (human and automated) in place to ensure security updates
happen.

6\. Knowing how to write shell scripts that aren't completely broken (either
by not writing them at all and using better language, or by using bash strict
mode: [http://redsymbol.net/articles/unofficial-bash-strict-
mode/](http://redsymbol.net/articles/unofficial-bash-strict-mode/))

etc.

~~~
tathougies
> though I'm sure it does, my initial research suggested it's not quite there
> yet for Python packages)

Can you expand on what's missing? I've successfully used nix to cross-compile
a pretty substantial python application (+ native extensions, hence the cross
compilation), for embedded purposes, and it pretty much worked out of the box.
Adding extra dependencies was straightforwards.

I think you can use pypi2nix for pinned dependencies, and you can run it
periodically for security updates.

~~~
itamarst
Like I said, it was very preliminary research... I reached the bit where
pypi2nix did "nix-env -if
[https://github.com/garbas/pypi2nix/tarball/master"](https://github.com/garbas/pypi2nix/tarball/master")
and wasn't super happy about the implications of "just use master" for
production readiness.

If it works, though, that's great!

The more general point though is that in my experience no tool is perfect, or
completely done, or without problems. E.g. the cited
[https://grahamc.com/blog/nix-and-layered-docker-
images](https://grahamc.com/blog/nix-and-layered-docker-images) suggests you
need to spend some time manually thinking about how to create layers for
caching? Again, very preliminary research—I know people are using it, I'm just
skeptical it's a magic bullet because nothing tends to be a magic bullet.

~~~
Nullabillity
Regarding layering, it used to be a completely manual process (just like with
Dockerfiles), but the point of the blog post was that you can now use
`buildLayeredImage` and correct layering will Just Happen.

~~~
itamarst
Ah, neat, hadn't realized that was an actual Nix feature now. The post made it
sound like this was just something they were writing for themselves.

------
adrianmonk
I feel like programmers often fail to grasp the distinction between example
code and production-ready code.

On one project, we made some example code available, and people would copy and
paste it into their project, change a few lines, and launch it into
production. Then they were surprised it didn't handle this or that situation
or deal with this or that detail.

Yeah, no shit it doesn't handle those things, _it 's example code!_ You're
supposed to read this code along with the documentation so you can get a gist
of what the API is like. It's a learning aid, not a software deliverable. Your
real code is going to be more complicated. This simplified code exists to get
you past the "how the hell does all this fit together at a high level?" hurdle
faster. Once you're over that hurdle, you can _start_ on the real
implementation.

~~~
dkarl
I wonder if people who complain about this have thought about what it would be
like for beginners to only get to see production code. Sometimes the cow needs
to be spherical.

~~~
ptyyy
This reminds me of that image of how to draw an owl. People forget that
everyone starts somewhere and there is no magical jump from beginner level
code to production code. Having good, well-explained examples of the beginner
and intermediate were incredibly beneficial to me.

------
tuco86
I have spent a ridiculous time building this so I'll take the opportunity and
share. It builds python wheel packages in a build container and installs them
in an app container. Works great for cpython and pypy. Also allows to build
for alpine and works for most other languages. We started to build basically
everything that way.

[https://gist.github.com/tuco86/67d84dfb27268b1faf05d2dbb1acb...](https://gist.github.com/tuco86/67d84dfb27268b1faf05d2dbb1acb667)

Ok, I kind of cheated and added the user just now. Sue me. Also posted this in
the other Docker related news. Sue me again.

~~~
Perceptes
Looks like the last line needs to be updated to have the server listen on 8080
instead of 80. (I'm guessing this is left over from before you added the non-
root user.)

------
linuxftw
The problems described here are called 'release engineering.' Dockerfiles
don't solve release engineering, they provide an abstraction for building a
release candidate, putting it through a pipeline, and then tagging a
successful build as your release. In other words, the end-container is the
immutable object that should be deployed, not the Dockerfile.

If you are building the container in each stage of your CI/CD pipeline, you
are doing it wrong.

------
jessemillar
> A broken Docker image can lead to production outages, and building best-
> practices images is a lot harder than it seems. So don’t just copy the first
> example you find on the web: do your research, and spend some time reading
> about best practices.

While I may not agree with absolutely everything in the article, this final
point is paramount. Please don't blindly use technology because you managed to
find a copypasta config that runs. Running != good.

~~~
zrobotics
Definitely very true. I write more C++ than anything else, and the sheer
number of online examples that start with

    
    
      using namespace std;
    

is just staggering. Sure, it works in a toy example posted to stackoverflow,
but it will cause problems in larger projects. I think globally there needs to
be better emphasis on using best-practices in tutorials and examples; I
remember this particular pet-peeve of mine also being present in college
textbooks. Especially for content aimed at newbies, it should be frowned upon
to show the wrong way to do things, since then it gets harder to show how to
do it the right way. I've had people who were surprised to find out that they
could type:

    
    
      using std::chrono::duration;
      using std::cout;
    

instead of pulling in the entire std namespace; simply because they'd only
ever seen examples that did it the lazy way.

edit: lack of semicolons strikes again!

~~~
nemetroid
While I agree with the general point of using best practices in code samples,
the Cpp Core Guidelines actually encourage[0] using

    
    
      using namespace std;
    

for std specifically, giving the reasoning that:

> sometimes a namespace is so fundamental and prevalent in a code base, that
> consistent qualification would be verbose and distracting.

I also work mainly in C++, and personally I prefer using it, together with
-Wshadow to catch possible issues.

0:
[https://github.com/isocpp/CppCoreGuidelines/blob/master/CppC...](https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#sf6-use-
using-namespace-directives-for-transition-for-foundation-libraries-such-as-
std-or-within-a-local-scope-only)

~~~
jhasse
Problems arise when you upgrade your compiler and a new symbol was added to
std::

------
j88439h84
For reproducible builds, `python:3.7` isn't specific enough.
python:3.7.3-alpine3.9 is more specific, for example. There aren't supposed to
be breaking changes in the bugfix releases, but they'll happen anyway.

~~~
kam
And
`python:3.7@sha256:35ff9f44818f8850f1d318aa69c2e7ba61d85e3b93283078c10e56e7d864c183`
is even better.

------
myroon5
This Dockerfile linter would warn you about multiple of these problems and
more:
[https://github.com/hadolint/hadolint](https://github.com/hadolint/hadolint)

------
kkapelon
I am actually preparing my own article titled "Docker antipatterns" that will
include many more points like this.

------
diminoten
"Broken" means "does not work". These examples _do_ work. I'm annoyed by this
incongruity. "Not sustainable"/"Not Forward Compatible", etc. would have been
preferable.

------
mychael
This is an advertisement disguised as a technical post on container security.

~~~
nirvdrum
Then I wish all advertising were like this. It's very informative and provides
a solution rather than just pointing out the problem. I hope this page ends up
ranking highly in search results because there are a lot of incomplete
Dockerfiles employing questionable practices that sit at the top of search
results and proliferate due to cargo culting.

------
zingmars
I feel like the better solution to #4 is setting up UID namespacing for docker
instead of (just) creating random users within the container. Even if you
create a user, it's still going to run as whatever UID it has within the
container (probably 1000 is most likely your UID if you're the only one using
said system)

------
quickthrower2
Nice eBomb. Describes the problem, offers value, does it politely so it is HN
(and other places) friendly, then "there are still problems..." and then the
paid solution.

I've seen docker images that do a git clone from the master head to get the
source, so basically if their Github account gets hacked. You're f'd.

------
geggam
It seems like the folks making docker files could stand to learn package
management with a mature package system before making docker files.

I wonder how many use docker after they learn ?

------
ru999gol
why is running as root in the docker a problem? Isn't the whole point of
containers to isolate the container? So what is the difference in a container
running root or a user? If there is, wouldn't that be more of a docker bug?

~~~
miduil
The page the article is linking to
[http://canihaznonprivilegedcontainers.info](http://canihaznonprivilegedcontainers.info)
is mixing up running as root with running docker with --privileged. Latter one
renders Docker security to zero, but is barely required.

I’m not saying “go run your all your Docker images as root”, but this is
clearly FUD.

Non-privileged containers are still having "root", just with way fewer
capabilities (See Docker [0] docs).

I’m not an expert, but I guess depending what you are doing the most
problematic capability might be AUDIT_WRITE, because it is not namespaced and
could be abused for DOSing syslog. But you might require it for things like
sshd, sudo, adduser, passwd, …

Depending on how you are holding it the NET_BIND_SERVICE and NET_RAW can be an
issue (depends on how your docker network looks like), but the others appear
not to be a security issue per-se.

This page [1] gives a good overview on default capabilities, though they are
also confusing to the reader with "better disable this".

I've created an issue, not sure if I have resources to fix their page though.
[2]

[0]: [https://docs.docker.com/engine/reference/run/#runtime-
privil...](https://docs.docker.com/engine/reference/run/#runtime-privilege-
and-linux-capabilities)

[1]: [https://www.redhat.com/en/blog/secure-your-containers-one-
we...](https://www.redhat.com/en/blog/secure-your-containers-one-weird-trick)

[2]:
[https://github.com/mhausenblas/canihaznonprivilegedcontainer...](https://github.com/mhausenblas/canihaznonprivilegedcontainers.info/issues/8)

~~~
itamarst
Non-privileged containers running as root are a definite security risk.

Real world example: CVE from February 2019 which allowed escalation to root on
host. It's preventable by (among other things) "a low privileged user inside
the container".

See [https://blog.dragonsector.pl/2019/02/cve-2019-5736-escape-
fr...](https://blog.dragonsector.pl/2019/02/cve-2019-5736-escape-from-docker-
and.html)

~~~
miduil
(Same discussion as on lobste.rs)

Thank you for this link, I've only seen the initial CVE announcement.

> This is not FUD [...]

The site is practicing FUD, it accomplishes communicating a message in an
untruthful fashion by mixing two different things into one. (Just check out
their stack overflow links, it is not clear if they are talking about root or
`--privileged`)

People are confused wether `docker root == host system root` and this site
doesn't help them to get a better understanding whether or not it is the case.
(It isn't) Plus it misses what its main goal should be, running a Secure
Docker environment.

You are talking about a previous exploit, not a permanent issue. Keeping your
host system up to date and additional hardenings is always going to be
necessary in exposed environments.

> Use Docker containers with SELinux enabled (--selinux-enabled). This
> prevents processes inside the container from overwriting the host docker-
> runc binary.

Authors recommendation is also using SELinux, this also helped with outer
Docker/Kernel related vulnerabilities in the past. Why isn't the page even
mentioning this?

\---

I think it is important to give a proper outlook on how problematic things are
and not to confuse people with super high expectations. You often end up
running containers that you have only little control about.

1\. Avoiding root in self-built containers is definitively the way to go,
since it reduces (unnecessary) attack surface, but

    
    
       1. It requires some glue code
    
       2. Might slow down your builds (`Dockerfile` multistage `cp --from=0 /app /app` loses permissions, requires chown afterwards)
    

2\. Avoiding root in CI/CD is nearly impossible 1\. many package managers
won't work

    
    
       2. some capabilities to test things (sshd for testing ansible scripts for example)
    
       3. can you use kaniko for building Docker images from within Docker without root?
    

3\. Harden your Docker host

    
    
       1. Use SELinux
    
       2. Use monitoring
    
       3. Drop capabilities that aren't necessary (NET_BIND_SERVICE, NET_RAW, ...)
    
       4. Use docker network separation
    
       5. Frequent system updates
    

4\. Keep yourself up-to-date, especially if you are running an exposed
environment

Just googled and this is rather more helpful:

\- [https://dev.to/petermbenjamin/docker-security-best-
practices...](https://dev.to/petermbenjamin/docker-security-best-
practices-45ih)

\- [https://blog.aquasec.com/docker-security-best-
practices](https://blog.aquasec.com/docker-security-best-practices)

\- [https://sysdig.com/blog/7-docker-security-
vulnerabilities/](https://sysdig.com/blog/7-docker-security-vulnerabilities/)

\- [https://github.com/docker/docker-bench-
security](https://github.com/docker/docker-bench-security)

------
jand
I do not intend to play down the importance of using docker carefully.

But the reproducible build aspect of the critic seems unnecessary to me: Isn't
that more a concern of the packaging system? (no python scripter)

If your packaging systems supports version selection/locking, then use your
packaging system right. If your packaging system cannot pin a version, how
should docker solve this?

~~~
derriz
Docker can't escape all the blame here - its layer caching mechanism is IMHO
flawed. It's fine to say that a packaging system should offer reproducibility
but Docker's layer caching design assumes that every RUN command produces
reproducible results.

You could of course blame users for not making sure that all the commands they
use in their Dockerfiles are actually reproducible but many/most examples even
in the official documention are clearly not reproducible.

Therefore you end up with what is in my opinion a semi-broken system -
building images seems to be reproducible (and fast) until you lose your layer
cache or you spin up a new CI build agent or a new dev joins the team and
tries to build the same image.

Not that I can think of an clean and performant solution to this problem.

~~~
tobbyb
We have been working on a simplified container build system which does away
with layers altogether. [1]

The use of layers at the build stage adds a lot of needless complexity with
very little benefits and users really need to step back and question the value
they are getting from the use of layers. [2]

Words like 'immutability', 'declarative' and 'reproduciblity' are often used
in ways that can lead to user misunderstanding and can be accomplished with
simpler workflows. For instance immutability, reuse, composition do not
require layers. There needs to be a lot more technical scrutiny to avoid
confusion.

[1]
[https://www.flockport.com/docs/containers#builds](https://www.flockport.com/docs/containers#builds)

[2] [https://www.flockport.com/guides/say-yes-to-
containers.html](https://www.flockport.com/guides/say-yes-to-containers.html)

------
vorticalbox
I would move the requirements.txt and pip install to after the user creation
seeing as you'll invalidate that cache if your requirements change.

Best part is that was brought up as an issue in the article only to do the
same thing in an example

------
hayd
Running pip with sudo doesn't seem a great idea either...

------
treis
(1) and (2) aren't really broken, IMHO. For most cases always using the most
up to date version is better than having 100% reproducible builds. After all,
you have the docker image that you can distribute if you really need to.
Better to pick up security and performance patches as they become available.
If those updates break something then you can make the decision to fix on a
known good version.

~~~
erik_seaberg
If you always pin, you have history to tell you which versions were good. If
you mostly don't, you have to start disassembling a bunch of old images just
to figure out what they were built from.

------
jo-wol
The final example in the article is broken. Python interpreter as PID 1 can't
handle linux signals.

~~~
itamarst
This is why I have a caveat at the top of the article as well as right after
the last example. This particular issue is fixable with `docker run --init`,
so not strictly necessary to fix in images.

------
mistrial9
not everyone is on a upgrade-daily churn, and should not have to be ! if you
are externally exposed, sure, because security .. but really, isn't there some
room here for different life cycles ?

------
batbomb
In general, in Go, Java, and Python I've resorted to copying in the Gopkg
files, pom.xml, and requirements.txt, and then running the requisite
dependency installer for the language (dep, pip, mvn, etc...) and then just
copying in the rest of the repo, relying on the .dockeringore with a default-
ignore for everything and specifying the individual files/directories you may
want to add, and in some cases a rootfs folder when necessary.

This seems to be the happy medium for me. I don't have very strong opinions on
requirements.txt always being the pinned output from a pip freeze, and it
seems like pipenv may actually die in a few years, and poetry will evolve to
take the mantle, but I do lots of things with conda anyway.

------
neves
Isn't it ironic that he isn't pining down the docker version?

------
DoctorPenguin
Isn't the example on how to make the referenced file better another
contribution to the pool of "broken by default" images? Either that or I don't
get the argument.

------
eyeareque
Not locking it to a specific version is better for security updates. Do you
want it to run stable with vulnerabilities or to run secure and broken?

~~~
bdcravens
> Not locking it to a specific version is better for security updates.

The idea is that you should take responsibility for your containers and verify
fixes and test your application.

> Do you want it to run stable with vulnerabilities or to run secure and
> broken?

If these are your two choices, you have a staffing or a workflow problem.

------
bytematic
Probably going to want to use tagged docker repos so that updating certain
packages, no matter the language, don't suddenly break your images

------
est
One more broken part is

    
    
        CMD [ "python", "./yourscript.py" ]
    
    

This breaks if you want to debug yourscript.py on startup. Better use a sh to
wrap it.

~~~
deathanatos
Why/what do you think this breaks / what does wrapping it in a shell do for
you?

E.g., for me, the following Dockerfile:

    
    
      FROM python:3
    
      RUN pip3 install ipdb
    
      COPY test.py /test.py
      CMD ["python", "test.py"]
    

where test.py is:

    
    
      import ipdb; ipdb.set_trace()
      print('Hello, World.')
    

run as `docker run -ti --rm $IMAGE_ID` works as expected:

    
    
      » docker run -ti --rm 52e98c118dc3
      > /test.py(2)<module>()
            1 import ipdb; ipdb.set_trace()
      ----> 2 print('Hello, World.')
    
      ipdb> p globals()
      {'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <_frozen_importlib_external.SourceFileLoader object at 0x7f809a8c8278>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, '__file__': 'test.py', '__cached__': None, 'ipdb': <module 'ipdb' from '/usr/local/lib/python3.7/site-packages/ipdb/__init__.py'>}
      ipdb> ^D
    
      Exiting Debugger.
    
      »

~~~
est
Can you add extra environ before python executes any code?

~~~
Faaak
Did you even try ?

`docker exec -e foo=bar -it ....`

------
42n4
Very good clues!

