Hacker News new | past | comments | ask | show | jobs | submit login
Build your own Docker with Linux namespaces, cgroups, and chroot (akashrajpurohit.com)
297 points by ghostfoxgod on June 27, 2023 | hide | past | favorite | 130 comments



The fact that containers are such a simple technology always makes me think this:

Why is it still necessary to have whole, full-blown OS filesystems inside of our containers, if their purpose is running a single binary?

Dependencies/dynamic libraries are decent reason, sure. But wouldn't it make more sense to do things "bottom-up"? i.e. starting from an empty filesystem, and then progressively adding the files that are absolutely necessary for the binary to work, instead of the "top-down" approach, which starts from a complete OS filesystem and then starts removing the things that are not needed?


> Why is it still necessary to have whole, full-blown OS filesystems inside of our containers, if their purpose is running a single binary?

It's not. This is just the path of least resistance for most container builds since many applications require various supporting files to function properly. Most of my Go containers only contain 2 files: the go binary and ca certificates.

> Dependencies/dynamic libraries are decent reason, sure. But wouldn't it make more sense to do things "bottom-up"?

In theory, yes. With dynamic loading and support for loading plugins and runtime this is virtually impossible to do. That's why starting from a base image that already includes everything you need is generally the chosen method. BSD jails/chroot operate this way and it's an absolute pain to setup for complicated applications.


It’s not. You you can use systemd-nspawn to create a container that uses your own root filesystem by specifying --directory=/ --volatile=yes. This mounts a tmpfs into the container’s root, and then mounts your /usr into the container’s /usr in read–only mode. This allows the container to run all the software installed on your machine, while redirecting writes to the tmpfs.

Alternatively, instead of --directory=/ you could specify some other directory that contains an OS image (such as --directory=/var/lib/machines/debian-bookworm or --directory=/var/lib/machines/fedora-38). Multiple containers can transparently share the same image, since all the writes go to a per–container tmpfs.

https://0pointer.net/blog/running-an-container-off-the-host-...


That's an interesting way to use systemd-nspawn!

But, if it mounts everything, wouldn't that also make container escapes very easy?


When it is volatile, it really only mounts the /usr inside the specified directory, rather than the directory itself. In particular, /dev and /etc will be empty.


This isn’t too far from how most containers are built nowadays. Usually you just use Alpine or slim Debian images. The benefit of using those distro images is that they provide a lot of utilities you might want in a container, like a shell.

As an aside, the approach you described is how Nix works, and you can use Nix to produce OCI images.


> The benefit of using those distro images is that they provide a lot of utilities you might want in a container, like a shell.

Why would you want a shell in a container? Surely the idea of a container is to run a program, not act as interactive environment.


"docker exec -it" debugging/troubleshooting. It's generally harder to debug/troubleshoot random issues if there is no shell in the container, and you can get a working shell in a 10mb container image so not like it adds some dramatic inconvenience or size. Sure its not 100 percent necessary, but it's enormously convenient at times and I personally wouldn't build my own container images without a shell I can drop into usually, unless there are technical or security reasons to avoid doing it for the given application.

The shell can also be useful for certain scripting steps during image build, not all containers are just going to copy in a statically compiled binary etc. In fact, I'd say the latter is significantly rarer than the former, at least in my experience.

Being able to do commands like:

$ docker exec -it mycontainer sh

is generally very helpful during development.

> https://docs.docker.com/engine/reference/commandline/exec/


Troubleshooting inside the container or simple scripting.


Depends on the application. If it's a single, stand-alone application meant to run in a bare container, sure. But many applications are more complicated, or have been shimmed into a container, but generally run on a given Linux system. Diagnostics and debugging inside the running container kind of requires some level of shell and interactivity. Alpine and Debian-slim images usually have enough for this purpose. If you can navigate the file system and list files at the very least, you can exfiltrate data as needed relatively easily though.


Like others have said, for debugging, although the proper-but-rarely-done solution is to use PID namespaces to attach another container containing the shell, rather than including it in the raw Docker image.


It's not necessarily interactive, it's also pretty common to have a shell script for initialization/setup.


To debug stuff.


There's plenty of ways to build small container images that don't start with a "full" distribution, but whether they work for specific use cases will depend on what the applications need.

FROM Scratch - starts with a totally blank image, this is the smallest option but your application must work with no supporting files (e.g. statically compiled binaries).

Distroless - has a small number of standard OS support files but no package manager, so works where you don't need to install many OS packages.

Wolfi - Newer than the others, they're building an ecosystem of minimal images for specific purposes.


And let's not forget Alpine, which is a fairly common minimalist distro clocking in a ~5MB


Because the main point of Docker is to allow you to run programs in their own OS. It became popular as a software distribution method for typical Linux software that is not self-contained - requires a ton of system dependencies and spews itself all over the filesystem.

If your program doesn't do that - e.g. most Rust and Go programs which are a single statically linked binary - then there's very little reason to use Docker in the first place.

People will say "but you still want to containerise things!" or "what about orchestration?" and sure those are some incidental benefits now, but they only really happened because everyone was more or less forced to use Docker to actually run apps.

Docker is a workaround for shitty software.


"Docker is a workaround for shitty software"

This just needs to be repeated for the pythonistas sitting in the back.


Or, as another HN user put it a few years ago:

"Docker is static linking for millennials"

:D


Because building a sane filesystem with only the needed libraries needs knowledge and some effort, whereas containers exist to keep people from using their brains.

If you know what you are doing you have several alternatives:

1. Just link your binary statically.

2. Set the path to the picked out special libs as an environment variable only for the binary.

3. Re-think what's wrong with your system that leads to the pain of you wanting other lib versions for that binary.


1. Statically linking your application, then distributing that application, such as via Docker Hub to others will then require your application to be GPL licensed in many, many cases within the Linux ecosystem. This may not be desired.

2. You can absolutely limit the system libraries that are included to only those needed. Some platforms make this easier or harder than others.

3. That you have other software that you want to run in parallel and don't want to spin up full virtualization or manage various chroot structures to organize.

I'd also add that just because a tool makes it easy to do something, and means you can bloat your file system doesn't make it inherently bad. Not everyone is trying to run a large database on a potato.


> The fact that containers are such a simple technology

But they're really not. This is a common misconception because of the user friendliness of Docker, but underneath all that there are many moving parts that the Linux kernel exposes, which are tricky to manage manually, and additional features of Docker itself (Dockerfile, layered images, distribution, etc.). Sure, you can write a shell script that does a tiny fraction of what container tools do, but then you'd have a half-baked solution that reinvents the wheel because... it's lighterweight?

What TFA is doing is fine if you want to learn how containers work and impress your colleagues, but for real world usage, stick to the established tooling.

Like others mentioned, there are ways to do what you propose, and you can always create your own container tool based on that shell script that automates this :). Though you might be interested in unikernels, which is an extreme version of that approach using VMs.


If you use Nix that's pretty much how the container will be build. You specify that your app should be there and Nix, being aware of all dependencies will place inside everything needed.


Came to say this, in the case of sane things* Nix works perfectly

* pip is not guaranteed to get you to a working state even if you run the same commands, so don't even think about it


> will place inside everything needed

And more. In my experience, depending on the quality of the packages you happen to be consuming, you may end up with a container twice as large as a comparable (say, Alpine) container. For example, I once tried to bring git into my Nix container and was surprised to see over 300MB in increased size.

After having built Nix containers exclusively for a year now, I wouldn't really recommend it to others unless you're willing to invest a lot of additional time cleaning up community packages.


That's because by default Nix uses glibc (which is huge), while Alpine uses musl. You can create musl binaries, but you need to know Nix a little more and know how to cross compile.

Another great thing is that you can can easily modify existing package to remove some of their dependencies that you don't use.

This is shown here with packaging redis into a container here and making it 3x smaller than the alpine version: https://nixos.org/#asciinema-demo-cover

There's of course possibility that someone specified build dependency as a runtime dependency, but if that was done that's a bug.


Docker came about when lxc was the container interface on Linux. It's hardly user friendly. Doing everything yourself is possible, but again is really only user friendly for very simple operations. For instance, most users would probably find network name spacing or even uid/gid mapping pretty confusing. These days the images produced are quite different in that Docker now produces OCI compatible images, which makes them run anywhere. lxc on the other hand will produce images that run only on Linux. lxd eventually came around which is a lot more analogous to Docker as an application and outputs OCI compatible images.


I’m confused. If OCI images run anywhere, why does Docker Desktop for windows and mac run virtual machines with a Linux kernel behind the scenes?


Containers are a feature of the Linux Kernel as is namespacing, so non-linuxes will always need a VM to run Linux containers.

lxc really only covers starting a process as a container with some basic configuration. Later on Docker developed libcontainer which gave it interaction with other namespacing technologies like IPC, network, etc. The interfaces with these other namespacing technologies are not the same across linuxes which is what I mean by "run anywhere".


Most container images contain binaries that were made to use Linux syscalls for things like I/O and memory management.

Windows had an implementation of the Linux kernel interface for this exact purpose (WSL 1) but because of performance and compatibility challenges, they switched to plain virtual machines.

These days, virtual machine run with almost 0 overhead anyway, so if you want to run a Linux binary they're a much simpler option than implementing a whole operating system.


True enough... When I have been under Windows, a lot of things still run faster under WSL-Ubuntu than in Windows. Hyper-V is pretty nice, generally speaking, and the WSL tooling over it is pretty great. Makes having Windows as a host OS mostly tolerable. I used to joke that "WSL" was my favorite linux distro.

I switched at home FT when I saw ads in my start menu search results that first time. The Edge nags weren't enough, the pre-installed games, etc. But an ad before what I was looking for on my local system, that was too far. I realize it was only a "test" feature, but even that someone wanted to test such a thing sickens me.

I've still had to sometimes run under Windows at work. I prefer the M1/M2 macs now, just for silence and battery life, but they don't run Docker nearly as well, at least most of the x86_64 container issues are resolved, and most of what I touch has aarch64 targets.


OCI defines a spec for container images.

Both the Linux and Windows kernels support containers, but the way containers work is by running on the parent kernel (as opposed to traditional/micro VM's or unikernels).

So if you want to run a Linux container on windows you need a VM running the Linux kernel to provide the host kernel for the container.

The same would be true even on Linux if you needed to use a different kernel for the container than what the parent system is running


It's for portability.

Your alpine binary can run in an alpine container. But folks run that alpine container on ubuntu 18.04 or debian 11 or macos or wsl2.

also, docker brings a lot of usefulness to this mix - dockerfile "recipe" that builds on other recipes, layered filesystem sort of like version control, global namespace, etc


It's fun the play around with going that way. I did via chroot, and using ldd to see what a program like redis needs copied in[1]. But eventually you might want to get a shell into the little environment, and then page thru logs from inside it, and so eventually, unless you are going pure from-scratch, statically linked, you might want something like alpine in there. But its funny how as a solution to dynamic linking is shipping a whole OS.

[1] https://www.youtube.com/watch?v=JOsWB50LmwQ


The extent people will go to reinvent a worse, slower and more insecure ver of static bins + BSD jails is absurd...


> Why is it still necessary to have whole, full-blown OS filesystems inside of our containers, if their purpose is running a single binary?

It's not and Docker carries plenty of unnecessary complexity.


It's a challenge but seems to be becoming moire prevalent for folks focused on supply chain attacks. Chainguard in particular has a bunch of standard containers (e.g. nginx) that reduce the container footprint as much as possible.

https://www.chainguard.dev/chainguard-images


As a sibling noted, people do that. But disk space is cheap, and you aren't really losing much other than disk space for unused portions of the OS that are in a container, but what you gain is a normalized environment that's easy to develop for, test, and add to as needed.


I understand this is easy for a developer, but I think it is not responsible security-wise: that container is a whole toolbox at the disposal of whoever may breach in it.

On top of this, as some sibling commenters were saying, it is a waste of bandwidth, time and disk space.


You think manually compiling all your libraries into a singular executable, or custom making your own minimal distro, is safer than using a minimal distro image from an actual OS distribution with security reviews, notifications about package updates, and easy patching, and where you can literally start a shell instance in the container and just check for updates to see if you have security issues? Good luck manually tracking all the security announcements for all the libraries and modules you used. You are doing that, given this is a conversation about security in containers, right?

If you aren't shipping setuid binaries in your container, even if someone gets a full shell in the container they are locked down by the container permissions and cgroup limitations, which is the whole point. That some extra utility or library exists but is not running that has an exploit is really not providing much of an attack vector. If an attacker can arbitrarily run something the fact that some code is on the system is not likely to really change their abilities, unless it's setuid, so get rid of that stuff.


There are many facets to security. I would not like someone to have shell access + bash, curl, awk, jq in a container that has full network access to sensitive systems in my applications both upstream and downstream.

As per the dependencies of important projects: yep, I read their release notes. And since I do no like to do it, I try to keep them to a minimum, unless it is a prototype or a research project.

I get everyone has his sensibility, but please next time try to respect those who have a different attitude. Nobody attacked you or wants to.


> I would not like someone to have shell access + bash, curl, awk, jq

Having an OS as the container doesn't mean shipping with everything. bash and awk are commnon, and curl might be for some as well since it may be a dependency for something else, buy jq won't be, as well as many other things (and often you have to include the few things I noted anyway, as many applications will call external items and strict exec use isn't always the norm).

> I get everyone has his sensibility, but please next time try to respect those who have a different attitude. Nobody attacked you or wants to.

Please consider that perhaps you're being a bit too sensitive, given this is a text medium and you can't hear my inflection.

I understand I may have sounded a bit facetious, but I was being honest there. It's a lot of work keeping up with dependencies, so good luck with that if you, I wouldn't want to do it myself, and if you are doing that, I'm always interested in hearing about ways in which people manage it[1]. It's directly relevant to my job. If you weren't doing that, then it was meant as a gentle note that hey, you really should be if you're concerned about security, because anyone that's statically compiling external requirements into a single binary and isn't has got much more to worry about then whether there's other utilities shipped in their container.

1: For example, I wrote https://news.ycombinator.com/item?id=36450815 just the other day, which is directly relevant to that idea.


If your container is big it also takes long time to fetch and start it


Alpine docker container starts at 5mb.


OP was arguing that today the big size doesn't matter, because disk space is cheap. I was providing argument that it still does.


Base image layers are deduplicated though. Now if only we could pick a single base image...


Theoretically that would work, in practice it doesn't.

Making distroless containers is the way to go.


Alpine or Debian. Those are the only base images you ever need.


Specifically, Debian-slim. But even then, the versions are varied.


You might be unnecessarily exposing attack surface by including unused binaries in your container.


Laptop SSD space for 100s of 1 GB containers is not cheap.


1 GB seems like someone is putting way more than a base OS in a container, and many of the cheapest laptops come with at least a 256GB SSD now (Dell will sell you one for $330).

A RHEL 9 universal base image, which you can add just the packages you need to, is 217MB.

And if you have hundreds of images on your system, maybe remove some you aren't using. I doubt you're using them all, and if you are using them all, they likely aren't using all the space you think they are because they're probably sharing layers.


I would suggest Alpine (5mb) or Debian Slim (<80mb) for base images. There's no need to add the overhead of RHEL inside containers, it really doesn't give you anything.


It gives consistency for those that use RHEL and derivatives outside containers, allows easy use of enterprise software aimed at that distro, and it allows tying into registries such as quay.io that will give you notes about when and what aspects of it are out of date (package wise).

I'm not saying to necessarily use it, but I wanted to pick something that was on the heavier end of the spectrum to forestall complaints about how for some people's workloads it's not realistic to run a 5MB alpine image given their work environment and needs.


The ability to easily see and clear the amount of SSD space that images are wasting on my system is probably supposed to attract me to the proprietary Docker desktop application.


On a related note. I wonder whatever happened to unikernels and why they haven't become the norm.



unikernel is not the same as microkernel.

I've found these after some quick googling:

https://unikraft.org/ https://hermitcore.org/ https://nanos.org/

Seems to be a living concept still, just not in the mainstream.


There are mainstream use cases, particularly for cloud providers who want lightweight runtimes with hypervisor managed isolation.

The only "example" I can think of off hand is Firecracker, but I'm not 100% confident that Firecracker is technically a unikernels.

My guess is that there's a number of unikernel implementations behind closed doors that see heavy use


Lots of examples without the entire OS as other comments mention, an example would be Googles distroless[0]

[0]: https://github.com/GoogleContainerTools/distroless


It is a bit nice to have a "full" operating system in your container, especially for debugging.

There doesn't seem to be a _huge_ drawback to running an Ubuntu image vs Alpine, aside from the image size.


Ironically Ubuntu image is worse for debugging than Alpine by default even if it is substantially bigger. `vi` and `less` are missing from Ubuntu for example and are part of Alpine.


So? All you have to do is "docker exec -it container /bin/bash" and run "apt update" and "apt install vim", then remake the image with fixes or re-run the container once you've figured it out.


You can always smuggle the debugging tools in even `FROM scratch`, but my point was that "by default" these debugging tools were missing on Ubuntu.


The benefit of Ubuntu, at least for me, is that I know it pretty well. Alpine isn't bad, I'm just not familiar with it.


Try debian-slim, should mostly be the same, but save you 100mb+ off of your final size most of the time.


> There doesn't seem to be a _huge_ drawback to running an Ubuntu image vs Alpine, aside from the image size.

I switched from using Alpine for my own images to using Ubuntu instead a while ago: https://blog.kronis.dev/articles/using-ubuntu-as-the-base-fo...

So far, it's surprisingly nice. Ubuntu based images take up more space, but thanks to layer reuse and storage being affordable, this isn't a big issue in practice (at my scale).

There are no surprises in regards to performance or package management that I have to deal with. Actually, I can use Ubuntu/Linux Mint locally and have the same install instructions for tools/dependencies as well (if I want to test things outside of containers, running on the system directly).

It's also really nice to be able to build my own base image with whatever tools I want available (e.g. nano and some debugging stuff) and have them be available in all of the language/stack-specific images that I build later.

The EOL is also pretty long and I don't think that there's any shame in having a "good enough" and somewhat boring distro for most dev stuff, only occasionally looking at PPAs for newer stuff.

That said, Alpine and Debian are both fine as well! I still run Bitnami images for most complex software like databases, which I think used Debian as a base: https://bitnami.com/stacks/containers


Would suggest debian-slim as a base image over Ubuntu. Still deb, and most of the tooling is the same, but smaller base size to start. Also make sure to look into cache and man-file cleanup as part of any apt-get steps.


Even better

Get rid of those bloated OSes and run apps directly on HW/vHW


That's what I do.

I understand that there are some good use cases for containers, but I still haven't encountered a need for them.

I run everything from a /home/app folder, with the app user and restricted permissions.

I avoid projects that have container-only deployment (supabase, for example).

Whenever I peek behind the curtain, I end up horrified at the complexity.

Honestly, I just can't imagine why someone would want to run postgres, for example, in a container. Seems like a nightmare for maintenance and production support.


I think they meant "run apps directly on hardware / virtual hardware". As in building the app with a unikernel, bypassing the need for a (large) operating system.


it isnt and very few people actually think that. plenty of "from scratch" images. ergo any image that is a single go binary


Never made a ton of sense to me. Go crosscompiles easily, ship a binary.

Because the second you want to do anything involving https, you need certificates, and that's where having a minimal but existent base image starts shining, and mostly goes up from there...


Once you have a hammer… One advantage of software containers is to have this unique interface that is the same whatever is inside the container. Like shipping containers.

If it's one single golang binary, or a weird python container running only on specific version of Debian compiled during a blue moon, or some 8GB java enterprise bloatware, it's the same.


See, this is actually my problem with containers.

"or a weird python container running only on specific version of Debian compiled during a blue moon"

This.

I guess I've been fortunate that I'm able to reject software like this from my stack. I know that everyone isn't so lucky.

I mostly do nodejs, and have zero need for containers when a simple npm install gets all deps.

Or if I need performance, a single go binary.

I tried doing the container thing just to understand how it all works and what the hype is about.

It seemed needlessly complex and hard to develop/debug.

For more complex situations where you need a bunch of interacting programs and services, I prefer stuff like Ansible and VMs, or just manually setting up a base image.

I guess it's a "get off my lawn" kind of thing. It seems like containers are used a lot by folks who don't want to learn ops, like how to install and configure postgres, redis, etc...

I think that's a mistake, and just pushes the problem onto others who have to support the software in production.


Containers are more than "just" running a binary now though. If your "deploy" script is:

    docker build -t my-app . && docker push my-app
Then all of a sudden it's a reproducible, reusable deployment script that works for any language, any application, and that any other dev on your team can run (as opposed to playing "which flavor of coreutils did they use when they wrote this?"). It's provider agnostic - you can run it on DO, AWS, GCP, whatver. You get free rolling/blue green/canary/whatever you prefer deployments, a "basic" cross compilation out of the box. They're not magic, but they are an excellent abstraction, despite the warts.


sibling comments about Nix are not acknowledging that the base image size is around 600MB vs 5MB for alpine


This is only true if use a Docker based workflow using `FROM nixos/nix`. This image exists mainly as a way for people to try out Nix with, not to build production images on top of. We ship many things which bloat the image size but makes it nicer for interactive usage.

Using dockerTools from nixpkgs is much better and gives you much smaller images closer to Alpine size.


I might have confused download volume with image size but the tar.gz for dockerTools.buildLayeredImage with just node and mariadb in the contents is still 220MB (just checked)

Edit: with nothing in the contents it's 144M, which is getting reasonable but still nearly 30x alpine base


> But wouldn't it make more sense to do things "bottom-up"? i.e. starting from an empty filesystem, and then progressively adding the files that are absolutely necessary for the binary to work

Unless you're using Go or a C/C++ stack that does static linking and thus needs no dependencies including libc, that's yak shaving to an extreme degree.


nix does incremental file system. i use process-compose to build and manage process hierarchy.


this is what u are supposed to do, but people generally are lazy and for complex binaries or programs this can be a bit of a fiddley task honestly. you can write scripts which dump out all you need if u want. when people only had 'jails' still i think this was more common. rapid development kills all :D


The article is like Chat GPT generated


Yes, this seems like ChatGPT’s style of writing. Another post has a conclusion that matches ChatGPT’s conclusion style:

> Remember, namespaces are a powerful feature that requires careful configuration and management. With proper knowledge and implementation, you can harness the full potential of Linux namespaces to create robust and secure systems.

A clue from this post itself is that all the links were added to the intro because GPT won’t intersperse links throughout.

e:Softened my language since there’s no way to know, and w/e ChatGPT is smart anyway. Better to judge content on the merits anyway imo


chatGPT is amazing if you have not mastered the language you are writing in. its what it is for. give it some text, have it rewrite it. that its generated by it doesnt mean any content was produced by it. often ppl just use it to rephrase. imho thats what its for. (hard to tell tho which is which :D)


Yep, I do use GPT as one of the tools in my workflow. I write these blogs in markdown locally and have a helper script which takes the raw content and with a prompt it helps me generate a title, summary, Intro and conclusion (personal preference to keep these consistent on all blogs) and proofread the whole raw content for any mistakes (replaces grammarly completely now).

Quite happy with this workflow because it helps me publish articles more frequently where I don't have to worry about stuff other than just dumping my thoughts in raw format.

Its similar to how I use Astro as a tool to generate static pages from these markdown files to easily deploy on web or TailwindCSS etc etc you get the point.

https://media.tenor.com/1PMq-CFZno4AAAAC/avengers-endgame-hu...


You might want to specify that on the title or in intro.


Noted. Although, adding that on top of each blog is very repetitive info and not specifically related to the blog. I will find a better place to add this info but will definitely add this by today, most likely under the `/uses/` page.


I would still put it in some prominent place, similar to how newspapers put "sponsored". I.e I wouldn't want unknowingly to read a blog written (in large part) by chatgpt and would feel deceived when it wasn't clear from the start.


I got really frustrated reading the article because most of the text felt like padding without actually explaining why these specific commands were necessary. Definitely felt like LLM output.


Yah, the ChatGPT vibes are insane with not only this blog post, but some of his others.


Likely generated and then edited.


I don't think author even tried running the command himself.


Yes that's the problem, Chat GPT isn't always right


These articles where someone uses Linux kernel level features to replicate Docker isolation are as old as Docker itself, and in my experience they always miss one of the most critical parts of the Docker ecosystem - the easily hackable and extensible container image format.

It's the ease of extension of the container image format that is as much responsible for the popularity of container based architectures as it is clever use of namespaces, cgroups and chroot for "robust isolation, resource management, and security". Without the image format, Docker is way less interesting and you arguably haven't "built your own Docker".


> (...) they always miss one of the most critical parts of the Docker ecosystem - the easily hackable and extensible container image format.

Exactly this.

Docker might have a ton of nifty features, but it's killer feature is undoubtedly app packaging and deploying. Features like chroot are already as old as time, but they never became nearly as popular as Docker for a good reason: they don't solve the problem most people need to get out of the way before being able to containerize apps.


Replace every occurrence of 'Docker' in this article by 'containerd' and it's a match !


Reminds me of Bocker[0]

[0]: https://github.com/p8952/bocker


I don’t know how I feel about the author admitting to using ChatGPT to write the article.

The topic is right in my area of interest, but I don’t think want to be reading ChatGPT articles from here on out if I can avoid it.


I'll expand once again, I use GPT to help me write title, summary (for excerpt) Intro and conclusions. I mostly focus on dumping down my thoughts from the things I am learning into a markdown file and use a script to sprinkle GPT magic on it which makes it much better in terms of phrasing things into a short and crisp article format and proofread the content for any mistakes (syntactic as well as semantics).

I am exploring Linux myself for this year and hence more focused for content around that these days.

Given said that, completely understand your sentiment here, so feel free to skip it, no hard feelings but I'm gonna continue with this workflow till I find something better to improve upon this as well. :)


It may be something I'll eventually overcome. It may also be unavoidable or even undetectable very shortly.

But I find its wishy-washy tone tiring, lacking personality or spark. I'm not entirely sure it's the AI that's bothersome to me. Reading long Wikipedia articles is not much fun either. The committee process washes all the texture from the piece.

I think what we eventually seek in writing is character and that's missing.


At least they're up front about it. Personally I don't think I mind some ChatGPT magic if the creator doesn't think they can write better. As long as it was heavily curated and modified, and not just "write an article about manually using linux namespaces..." and copy-pasted into their blog.


I've seen lots of articles like these over the years. And yet, Docker persists. Perhaps working with the Linux internals is not the hard part of building a container ecosystem.


It's not the hard part in the simple case, certainly.

It's an interesting "hands on learning" style of exercise, although I'm not sure why we need a new article like this to make the rounds every so often


This topic is perfectly covered in a presentation by Jérôme Petazzoni from Dockercon 2015, a Docker developer, on how containerization works. After watching the video, I began to clearly understand that this is not virtualization at all, as many at first imagine.

https://www.youtube.com/watch?v=sK5i-N34im8


Isn't that quite the same as running

  debootstrap focal ./ubuntu-rootfs http://archive.ubuntu.com/ubuntu/
  systemd-nspawn -D ./ubuntu-rootfs

?


While I do like this concept, I’ve always done this simply by compiling all the dependencies into the same folder root as the application. It never needed to be any more complicated than that. Application developers made it complicated by not wanting to do anthything but use a package manager.

You can have nginx, Apache, php, Perl, python, etc… siloed into the same root as your application and have multiple instances for every application simply by compiling with the “path” parameter in the configuration.


What are the additional features that docker provides over the basic containers in the article?


Docker by default also applies a seccomp system call whitelist per [1] and restricts capabilities per [2], amongst numerous other default hardening practices that are applied. If a Docker container really had a need to call the "reboot" system call, this permission could be explicitly added.

More complex sandboxing techniques include opening handles for sockets, pipes, files, etc and then hardening seccomp filters on top to prevent any new handles being opened. In this way, some containers can read/write defined files on a volume without having any ability to otherwise interact with file systems such as opening new files (all file system related system calls could be disabled).

[1] https://github.com/moby/moby/blob/master/profiles/seccomp/de...

[2] https://docs.docker.com/engine/security/#linux-kernel-capabi...


The article says “Docker have features such as layered images, networking, container orchestration, and extensive tooling that make it a powerful and versatile “


Dockerfile and "market place" (hub) were the big ones in my opinion. Even though Dockerfile syntax was a mess in the beginning, being able to specify a base and a few commands was a huge improvement in usability. Then running build and push to make your image widely available. Collaboration was so easy compared to the alternatives, Linux jails, jailer, debootstrap, lxc and such.


In addition to the other features that siblings have mentioned, what Docker offers is simplicity, in that you don't need to understand the details of namespaces, capabilities, cgroups etc to get an application running in a container :).


It's hard to say as "Docker" as a term or product name has diluted pretty much as they variously use it for various products with various functionalitites, and they don't have anything just called "Docker" any more. There's a lot of virtualization stuff for various operating systems, orchestration stuff, etc.

(Note that the article also doesn't to link to any specific product, just to the Docker company front page. For me the first link the front page offers is "Docker Desktop for Linux" which is I guess a virtualization based system)


Documentation, third-party tooling, third-party training, large number of software engineers already experienced, extensive real-world usage and testing, plus the items others have mentioned.


This is mentioned near the top.

Docker have features such as layered images, networking, container orchestration, and extensive tooling that make it a powerful and versatile solution for deploying applications.


It already exists, though these guides are good to learn: LXC/LXD containers.


Brilliant! I know of pre-docker teams that still use this kind of implementation in their systems (with added functionalities). It’s great to see this guide, as some things are deemed more complex than they really are. I guess tooling is what brings an advantage around existing solutions, and made Docker super successful (minus license discussions).


I'm confused.

I tried `unshare --uts --pid --net --mount --ipc --fork` but it failed due to permissions. `sudo unshare --uts --pid --net --mount --ipc --fork` left me in an environment where I was still in my home directory, able to see all the files and create new files which would persist after exiting.

I guess there are many other tutorials which would explain this in depth, but this blog post did not really teach me anything useful about `unshare`.


Best place to learn in depth about any command ⇾ https://man7.org/linux/man-pages/man1/unshare.1.html

Of course, it's not feasible to add everything into a single blog, it's upon the reader's curiosity to explore more.

PS: I find man pages a lot helpful now, and would recommend the same for others as well.


Sometime back, I had created a github repo with all the steps including cgroups and namespaces. Had also setup up networking using veth and the existing docker bridge.

http://github.com/nascarsayan/diy-container


This is great, thanks for sharing!


Add `--mount-proc` to `unshare` to have separated `ps` output.


Yes, "isolated and efficient environments" are what I need. How can I do the port mapping, disk mounting with this approach? Could you talk more about it?


Neat, was studying just this yesterday for safely exposing some system resources to a WebAssembly module. Good read, but short.


Thanks and agreed, this one is a bit short because I skipped writing about the namespaces, cgroups and chroot in details and kept it outside into separate articles, Sadly I've seen in past that when I post relatively long articles then those don't do good because of short attention span (and I don't blame anyone for this, this is just the fact that I noticed). Hence, for such large topics I tend to break it down into multiple articles. Sharing the individual links if you want to go through the individual topics in bit more detail.

[1] Namespaces: https://akashrajpurohit.com/blog/linux-namespaces-isolating-...

[2] Cgroups: https://akashrajpurohit.com/blog/linux-control-groups-finetu...

[3] Chroot: https://akashrajpurohit.com/blog/how-to-create-a-restricted-...


There isn't any discussion of overlayfs here, which is a pretty important component of Docker.


Also no discussion of how networking is accomplished, which is a big gaping hole that ties people to ecosystems like Docker or Podman.


This looks a bit like Sandstorm, which is freaking awesome.


I read this as ‘Build your own Doctor’ and thought the next words were going to be Llama, medical journals and your health records.


Ironically you were being an LLM in that moment by predicting the next words.


Hah we are all LLMs, free will is an illusion. Ask BrainGPT a question and an answer prints itself out word by word in your head.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: