
A deep dive into the official Docker image for Python - itamarst
https://pythonspeed.com/articles/official-python-docker-image/
======
bransonf
> Debian names all their releases after characters from Toy Story.

I can’t believe I’m just now learning this, but that’s good to know next time
someone asks why the names seem random.

~~~
dazzawazza
It's cute but really frustrating. I wish they'd just use the version number or
force everyone to use

    
    
        "<version #>-<stupidname>".
    

but I'm a FreeBSD user so what do I know?

~~~
kemotep
To be fair, you can just point to the `stable` repositories in the
`/etc/apt/sources.list` configuration file and simply just refer to your
install as "Debian Stable" ('s/stable/testing/g').

However every few years there might be a not so "stable" update around the new
release.

~~~
dazzawazza
It's not for me, it's for when I get bug reports and I need to decipher what
flavour of linux, what release of the kernel, what libc etc. "I'm running
decrepit duck" doesn't really narrow it down too much. Especially when
multiple distributions use these jovial monikers.

~~~
mikepurvis
I think Ubuntu strikes a good balance— goofy, lovable names, but a clear
alphabetical sequence. Robot operating system (ROS) inherited this approach
with its naming as well:
[https://wiki.ros.org/Distributions#List_of_Distributions](https://wiki.ros.org/Distributions#List_of_Distributions)

~~~
chrisseaton
Why do we need the names at all though?

Every now and again I have to translate between a code name and a version
string, or between a version string and a code name, and I think 'why on earth
am I having to do this lookup work as a human?'

~~~
mikepurvis
I feel like there are scenarios where it's convenient to have a plain text
string that is certain not to gum up parts of the system that would choke on
punctuation or a leading digit (see for example this package-mapping yaml:
[https://github.com/ros/rosdistro/blob/master/rosdep/base.yam...](https://github.com/ros/rosdistro/blob/master/rosdep/base.yaml)).
Of course, Fedora 19 is codenamed `schrödinger` including the unicode
character, so that's been an interesting experiment in ensuring that all the
systems which ingest and process the codenames can handle that.

Anyway, the obvious thing is that a codename is more fun and memorable than a
number; it's something to hang marketing and such off of. Presumably this is
why MacOS versions have always had public codenames since 2002. But I think
the practical reasons are valid as well.

------
claytonjy
> The packages—gcc and so on—needed to compile Python are removed once they
> are no longer needed.

Is there a reason to prefer this method, where installation, usage, and
removal all happen in one RUN, vs. using a multi-stage build? I tend to prefer
the latter but am not aware of tradeoffs beyond the readability of the
Dockerfile.

~~~
vbernat
The reason is to minimize the number of layers, mostly the layers that are not
used in the final image (in this example, a layer would include gcc while
another would remove it, but you would still need to download the layer with
gcc).

~~~
claytonjy
I thought non-final stages in a multi-stage build are left out of the final
artifact; is that incorrect?

I'm thinking the "build step" would be done in an earlier stage; it could be
the exact same RUN statement, or it could be split into multiple for
readability, and wouldn't bother removing any installed packages, since they
won't carry over to the final stage. Then the big RUN in the final stage would
be replaced with something like `COPY --from=builder ...`.

~~~
verst
EDIT: Provide example below

If you are doing multi stage builds it only matters to combined as many
statements as possible in the last layer.

I agree that for clarity it is nice not to optimize layers in the build stage
- those will be thrown away anyway.

I vastly prefer multi stage builds over having to chain install and cleanup
statements

Example: I usually want to use the python:3-slim image, but this doesn't have
the tools to compile certain python libraries with C extensions. Generally I
will use the python:3 image for my build stage to do my "pip install -r
requirements.txt" and then copy the libraries over to my final stage based on
the python:3-slim image

Of course I could install and uninstall GCC and other tools in a single
stage.. but that actually takes longer to do and is messier in my opinion.

~~~
jackthetab
> Example: I usually want to use the python:3-slim image, but this doesn't
> have the tools to compile certain python libraries with C extensions.
> Generally I will use the python:3 image for my build stage to do my "pip
> install -r requirements.txt" and then copy the libraries over to my final
> stage based on the python:3-slim image

Example on how to do that, please.

------
mikepurvis
An interesting aspect of this that isn't touched on in the article is about
the much-maligned [1] scheme where Debian's Python is patched to look for
packages in `dist-packages` instead of `site-packages`. The scenario in the
Docker image is I believe the exact one that this is intended to provide
sanity for, since you have two Pythons in play— the Debian-packaged python and
also the built-from-source Python. Things get hairy once you start using the
pip from either of these pythons to start installing pypi packages, especially
if those packages or their dependencies have compiled parts which may not work
for the other Python. Anyone who has used a Mac with brewed Python has also
likely experienced this pain— especially in the days before wheels, where you
basically had to use brewed Python for scientific work unless you wanted to
compile big packages like numpy from source.

So the deal is that the system-supplied Python can get deps from apt packages,
or from pip (into ~/.local, or as root into /usr/local), but either way will
install them into a dist-packages directory. This keeps them separate from the
packages which the built-from-source Python installs into site-packages.

Upstream Python has put a bunch of pieces in place to address this natively,
for example with the "magic" tags that keep compiled assets separate in a
directory of packages being potentially used by multiple different
interpreters (see
[https://www.python.org/dev/peps/pep-3147/#proposal](https://www.python.org/dev/peps/pep-3147/#proposal)).
And obviously the ideal solution where possible is to simply use a virtualenv
and be totally isolated from the system python. But there are situations where
that isn't possible or desirable, such as in this docker container, and so
here we are.

1: eg
[https://github.com/pypa/setuptools/issues/2232](https://github.com/pypa/setuptools/issues/2232)

------
chromedev
One thing I've learned is that rather than use a docker-entrypoint.sh, most
Linux software can be ran just using the `--user 1000:1000` or whatever
UID/GID you want to use, as long as you map a volume that can use those
permissions. It is a lot cleaner this way.

~~~
westurner
> _Why Tini?_

> _Using Tini has several benefits:_

> \- _It protects you from software that accidentally creates zombie
> processes, which can (over time!) starve your entire system for PIDs (and
> make it unusable)._

> \- _It ensures that the default signal handlers work for the software you
> run in your Docker image. For example, with Tini, SIGTERM properly
> terminates your process even if you didn 't explicitly install a signal
> handler for it._

> \- _It does so completely transparently! Docker images that work without
> Tini will work with Tini without any changes._

[...]

> _NOTE: If you are using Docker 1.13 or greater, Tini is included in Docker
> itself. This includes all versions of Docker CE. To enable Tini, just pass
> the `--init` flag to docker run._

[https://github.com/krallin/tini#why-
tini](https://github.com/krallin/tini#why-tini)

~~~
chromedev
I didn't know it was included by default. I'll check it out, thanks!

------
fizixer
After a few years' break from professional s/w dev, I'm really trying to get
into the docker-based dev mindset.

You can use docker (or, less efficiently, a full-blown VM) for any purpose,
but it looks like the killer app that has emerged appears to be devops. I
guess same goes with k8s, as well as Chef/Puppet/Salt/Ansible/whathaveyou.

However, I'm noticing there is a new way of doing s/w dev emerging, let's call
it modern software development, which utilizes some of these tools to maximize
s/w dev productivity. I'm just not clear what is the best way to approach
this.

The core issue is obviously dependency hell, and install-reinstall-reconfigure
hell.

I guess a useful way to think is, what if I want do to web-dev, and android-
dev, and, iOS dev, but I don't want these dev environments to interfere with
each other, and all these dev environments should be available accessible on a
single workstatation or powerful laptop.

I guess I could have docker for web-dev, docker for android-dev, and so on. I
came across docker compose, and then I heard it's known to be cumbersome for
dev-environments, and someone created binci to address those problems (though
it's not a well-known tool).

~~~
harpratap
The solution for this is to do your development in cloud directly. Projects
like Skaffold, tilt and VSCode in browser are one step in this direction.
Developer machines would become just thin clients and GUIs for these remote
machines.

~~~
fizixer
So you mean fire up an instance at AWS/Azure/GCP/etc that acts as an android-
dev vm?

So dev environment is on the cloud? and all the data (working dir, data files,
pdfs, whatever is involved during the dev phase) is also on the cloud storage?

That would be expensive, especially if you're a indie dev, and a tremendous
waste of local compute/storage capabilities.

Again, I'm really not sure what near-future looks like in this space.

~~~
harpratap
Indie devs are small minority, majority of the folks work in companies. You
are already paying 2000-4000USD for your laptops, plus you also have to cost
in lost productivity because of not being able to replicate your stack
locally. Eventbrite is already doing this for all their employees -
[https://kelda.io/blog/eventbrite-
interview/](https://kelda.io/blog/eventbrite-interview/) So the economics of
this does check out.

~~~
jdmichal
Larger companies can do this with virtual machines in their own datacenters,
which cuts down a lot on the cost. Especially for developer machines, which
have relatively high requirements on memory.

------
bluntfang
is it really a deep dive if it doesn't go into the base image at all?

edit: ok reading the article further, there are handwavy explanations. Don't
call it a deep dive if you're gonna say

"There’s a lot in there, but the basic outcome is:..."

and then not explain anything beyond that.

------
willemmerson
Why doesn't it just use the Debian python package?

~~~
itamarst
Debian Stable currently ships Python 3.7. If that's what you want, great.

But if you want Python 3.8, the soon-to-be-released Python 3.9, or other
versions, you don't have that.

~~~
Hamuko
> _Debian Stable currently ships Python 3.7._

That's surprisingly new even. I remember Debian being very behind when it came
to Python3.

~~~
kingosticks
But isn't that only because it was released just last year? Won't it stay
stuck on Python 3.7 until bullseye is released in approx 2 years time? It'll
likely be very dated by then.

~~~
sgerenser
Next Debian stable should be released in less than a year. It’s been more than
a year since Buster (current stable) was released. But yes the point stands
that 3.7 will be all you get built-in for the next several months regardless
of how many Python releases occur.

~~~
kingosticks
Thanks for correcting, you are totally right. No idea where I got the idea
their releases were roughly ever 3 years, it's clearly every 2 years.

------
dirtnugget
I would love to read this article on a phone but the top banner is not only
highly confusing but also blocking a lot of the content ... please fix this

------
monkpit
> A common mistake for people using this base image is to install Python
> again, by using Debian’s version of Python

Is that really a common mistake?

~~~
digitallogic
Whoa boy, is it ever, but maybe not for the reason you're thinking. ie it
isn't caused by people typing `apt-get install python`.

There are _many_ packages that have Python as a dependency these days. For
example, on my Ubuntu system:

> ~$ apt-cache rdepends python|wc -l

> 4649

I think the best illustration of how this can happen is installing postgres
libraries needed to build the psycopg2 PG client. If you know to install
`libpq-dev` then you're great. But if you do something that on the surface
feels totally reasonable, like installing the `postgresql-client` package...
guess what? You just installed another Python interpreter.

edit: formatting

------
westurner
There are Alpine [1] and Debian [2] miniconda images (within which you can
`conda install python==3.8` and 2.7 and 3.4 in different conda envs)

[1] [https://github.com/ContinuumIO/docker-
images/blob/master/min...](https://github.com/ContinuumIO/docker-
images/blob/master/miniconda3/alpine/Dockerfile)

[2] [https://github.com/ContinuumIO/docker-
images/blob/master/min...](https://github.com/ContinuumIO/docker-
images/blob/master/miniconda3/debian/Dockerfile)

If you build manylinux wheels with auditwheel [3], they should install without
needing compilation for {CentOS, Debian, Ubuntu, and Alpine}; though standard
Alpine images have MUSL instead of glibc by default, this [4] _may_ work:

    
    
      echo "manylinux1_compatible = True" > $PYTHON_PATH/_manylinux.py
    
    

[3] [https://github.com/pypa/auditwheel](https://github.com/pypa/auditwheel)

[4] [https://github.com/docker-
library/docs/issues/904#issuecomme...](https://github.com/docker-
library/docs/issues/904#issuecomment-544181321)

The miniforge docker images aren't yet [5][6] multi-arch, which means it's not
as easy to take advantage of all of the ARM64 / aarch64 packages that conda-
forge builds now.

[5] [https://github.com/conda-forge/docker-
images/issues/102#issu...](https://github.com/conda-forge/docker-
images/issues/102#issuecomment-619963279)

[6] [https://github.com/conda-
forge/miniforge/issues/20](https://github.com/conda-forge/miniforge/issues/20)

There are i686 and x86-64 docker containers for building manylinux wheels that
work with many distros:
[https://github.com/pypa/manylinux/tree/master/docker](https://github.com/pypa/manylinux/tree/master/docker)

A multi-stage Dockerfile build can produce a wheel in the first stage and
install that wheel (with `COPY --from=0`) in a later stage; leaving build
dependencies out of the production environment for security and performance:
[https://docs.docker.com/develop/develop-images/multistage-
bu...](https://docs.docker.com/develop/develop-images/multistage-build/)

~~~
athorax
Interesting! I use miniconda extensively for local development to manage
virtual environments for different python versions and love it. I hardly ever
actually use the conda packages though.

I assume the main benefit of using these images would be if you are installing
from conda repos instead of pip? Otherwise just using the official python
images would be as good if not better

Edit: I guess if you needed multiple python versions in a single container
this would be a good solution for that as well

~~~
Cacti
Quite a few conda packages have patches added by the conda team to help fix
problems in packages relying on native code or binaries. Particularly on
Windows. If something is available on the primary conda repos it will almost
assuredly work with few of any problems cross-platform, whereas pip is hit or
miss.

If you’re always on Linux you may never appreciate it but some pip packages
are a nightmare to get working properly on Windows.

If you look through the source of the conda repos, you’ll see all kinds of
small patches to fix weird and breaking edge cases, particularly in libs with
significant C back ends.

~~~
westurner
Here's the meta.yml for the conda-forge/python-feedstock:
[https://github.com/conda-forge/python-
feedstock/blob/master/...](https://github.com/conda-forge/python-
feedstock/blob/master/recipe/meta.yaml)

It includes patches just like distro packages often do.

------
Earmouse
Why would the image remove the *.pyc files?

~~~
acdha
It keeps the image size down and they’ll be recreated on first load for what
you actually use. If you use the compileall module you can generate them for
your app in the final layer.

