Hacker News new | past | comments | ask | show | jobs | submit login
Creating an up-to-date Distroless Python Image (2022) (alexos.dev)
45 points by tosh 8 days ago | hide | past | favorite | 28 comments





I love efforts like this. The current state of containers where the dominant mentality is “shove everything possibly needed into a single image and treat it like a black box” isn’t the best imo. Only put what’s necessary and nothing else, you don’t need a whole OS to run a single binary or script.

> The current state of containers where the dominant mentality is “shove everything possibly needed into a single image and treat it like a black box” isn’t the best imo.

I do something similar for my personal containers.

I have a common Ubuntu base image, that I add a bunch of tools to, which I might need for debugging later and install apt updates to as a part of the build process and remove the apt cache (so instead of changing the base image all the time to a more recent one, I still have an up to date container, at the expense of some additional layers).

Then, on top of that base image I have other specialized images, ones with Apache, some with JDK, some for .NET, Python, Node, Ruby etc., which I can then build my apps upon, while having a few useful tools in the container, as well as a bunch of shared layers that the nodes will be able to reuse.

The only exceptions are images like PostgreSQL, MariaDB, SeaweedFS etc., which I just take from Docker Hub, usually the Bitnami variety: https://bitnami.com/stacks/containers because I almost never need to connect to those or use them as base images, but rather just use them as connected services.

Should enterprises do that? Probably not. It is, however, an immensely pleasant approach to things, that lets me not stress about optimization and just use what works for me.


> Only put what’s necessary

If this were that easy, chroot with PID isolation alone would be quite enough.


It is that easy, at least on distros built for containers. Most stagex based containers are a single binary.

nix can generate images with exactly what's needed and nothing more.

Even for the programs that do something like dlopen("libcrypt.so.3")? Allow me to doubt that.

Well no, obviously. Nix has not solved the rice theorem. But you'll have this problem whether you're using BuildKit or Nix. Difference is with Nix the final image only contains your explicitly stated dependencies. I.e. no apt, no Python, no Perl, no cacerts unless you explicitly include them.

in nix, you have to be explicit about bringing in something, like libcrypt. otherwise it'll just never be found. so if you can get it to work outside of docker, it's pretty simple to put it in docker with all the things that are needed but nothing else: https://nix.dev/tutorials/nixos/building-and-running-docker-...

https://www.bnikolic.co.uk/blog/python/nix/2023/03/23/nixpyd...


Okay, but how do you find out that this program may dynamically load libcrypt which it doesn't mention in its PT_DYNAMIC for some silly reason or another? Because if you already know that, then you just put "RUN apt install -y openssl" in your Dockerfile which is easy. It's finding out that you need to do that that's hard.

does that mean apt lives in your image? or is it just on the Dev side? nix lives on the Dev side, and your image is not polluted.

the issue of people not documenting their dependencies is not addressed by nix, other than the fact that undocumented dependencies also won't "works on my machine", until it's resolved.


I am optimistic about Ubuntu Chilselled, which automates this process: https://canonical.com/blog/chiselled-ubuntu-ga

Here is a full example: https://github.com/cogini/phoenix_container_example/blob/mai...


It's amazing, the amount of effort and layers of abstraction people will wade through in order to get their glue language to run somewhere.

This is a Python problem more than anything else. Other language stacks make this far simpler - Go being a good example.

That's kinda what I'm getting at - I'm amazed that people go through all this friction just to run a particular interpreted language instead of using tooling and ecosystems which make this easy.

The number one factor in choosing a programming language for a project is “what do I already know”

Yeah, people who are willing achieve something are apparently ready to put quite some effort into achieving that something. Insane, isn't it?

Running something more self-contained, like a Rust or Go binary, is way easier. But even a C++ binary with a bunch of libraries is already trickier.

OP can get a small further reduction in image size; by copying /bin/echo and removing them in a later layer, they still affect the final image size. OP can build the /etc/passwd and other user files in a separate builder image and then COPY --from into the distroless image.

Stagex provides full-source-bootstrapped and reproducible Python packages as OCI images

https://hub.docker.com/r/stagex/python/tags

They are as bare as it gets: https://codeberg.org/stagex/stagex/src/branch/main/packages/...

Literally just Python. Import hash-locked musl, openssl, etc as needed (also provided by stagex), for a reproducible and tiny final Python project container image.


musl is not an option for a stable and compatible system. Also much slower with Python.

Alpine is the most widely deployed distro in containers which is musl based.

Never seen a stability issue. Speed can be a problem with the stock malloc but you are free to include scudo, jemalloc, or any malloc you want.

You also are free to include glibc at the last mile and link your final binary against that.


Distroless still uses Debian packages but strips and repackages them. This leads to CVEs being fixed _slower_ than with Debian images. You really have to do that work yourself so you can get a trusted and fixed image (like in the article).

The best approach in my view is Bazel with a combination of rules_py , rules_pycross and rules_oci

You can configure Bazel to fetch a standalone Python interpreter. Your app is bundled with the right interpreter for the target platform. rules_pycross can fetch the correct dependencies across platforms. Bazel can put this into a minimalist base image.

I haven’t come across anything else that fixes this for Python. The conventional wisdom is to give up and build in Docker.


so in the case of incidents, what do you do when no tool available in the container or debugging?

If you're debugging a running container and don't want to even restart it to instrument, you can enter its namespace from the outside and mount more directories with tools.

Details: https://jpetazzo.github.io/2015/01/13/docker-mount-dynamic-v...


just the file system is hardly enough for debugging

You can inject any binaries you want into a process namespace.

Shipping debug tools, a package manager, or even a shell in a production container is increasing attack surface and image size for no reason.

Do not ship dev tooling to prod. Pull and attach it on demand only. Servers and containers should be immutable appliances.


Processes in a container are just normal processes. But if your container lacks gdb or strace or whatever other tools, you can bring them in.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: