Hacker News new | past | comments | ask | show | jobs | submit login
VSCode, Dev Containers and Docker (feabhas.com)
178 points by ingve 36 days ago | hide | past | favorite | 228 comments

Does this deprecate my vim/gcc/gdb/make (in general "CLI driven workflow") which I have assigned so much time to, to have a nice vender/IDE-independent solution for development?

I'm curious to know the answer in an honest/practically-speaking sense not ideologically.

IMO, the downside of this container/web-app solution is the memory size that all the SDKs would need and this would add up eventually. But I'm not sure this fact alone could win any hearts over that sweet ease-of-entry-to-development.

People were nagging about latency and bloat in webapps and Electron GUIs...yet here we are...even SpaceX's console is a chromium instance.

I don't think so. If fact you could just package your "CLI driven workflow" up into a docker image and now you can instantly move it between machines, peers, etc...

Furthermore you could version and manage the evolution/drift of your workflow as underlying components change/get updated.

Exactly: I have CI produce a Docker image from my dotfiles repo https://bergie.iki.fi/blog/docker-developer-shell/

Hey that’s cool! I do the same thing! I use the Docker image as a way to use my development environment in a pinch and additionally as verification that my giant one-click setup script works on a fresh machine.


If needed, you could probably containerize that workflow as well. Conversely, if someone shared their containerized environment with you, you don't necessarily need to use VSCode. You can still just attach to the container directly using Docker.

It actually makes your workflow more powerful as you can package it in a container and other devs can start using it in no time.

I have never managed to get a step through debugger working with Pycharm. I know it's possible, but things like this are why devs won't be able to "start using it in no time". Docker solves one set of problems and introduces others.

Yeah, unfortunately the JetBrains IDEs don't really work with this workflow yet. I'm hoping one day they will ...

I saw online tutorials showing you how to set it up and wasted a few hours not managing to make it work.

Probably not, but your vim/gcc/gdb/make flow doesn't scale to other developers. If it works for you, by all means keep using it.

you can still have a shared base Image for your teammates, then use that base image to create a new image just for your .vimrc etc.

I meant more that by modern standards, vim/gcc/gdb/make is a very labor-intensive toolchain to write applications with.

"Doesn't scale" in the sense of "other developers are pretty unwilling to learn that chain of they didn't grow up with it."

> "other developers are pretty unwilling to learn that chain of they didn't grow up with it."

Funny, I've been running into this problem while trying to switch people to docker. Maybe we're all a bit guilty of this in our own way.

I've been doing some playing around with dev containers for a while now and am loving the experience so far. The one thing I am not super clear on however is doing end to end browser based testing in Web Apps. I didn't find a lot of good documentation on this currently. I can't tell if the "correct" solution is to also load something like headless chrome into the Dockerfile and install it manually or go down the path of trying to set things up in a different container with a docker-compose approach.

We found website testing inside a Docker container to be tough, so we still run Cypress on a native machine, but our web process is still ran within the Docker container. In CI, we use the default Cypress GitHub action which runs on the CI native system, and all of our servers are spun up with docker-compose. Source: https://github.com/NeonLaw/codebase/blob/development/.github...

A headless chrome works for testing. You can even open GUI apps (at least with X11 on Linux) from the container itself, and it will use the host's X session (chrome --no-sandbox).

I've been using a containerized Rails app with VSCode for a while now and absolutely love it.

You wouldn't happen to have an example Dockerfile you could share at all would you? The examples I have come across online so far did't look super straightforward.

Doing any sort of development with Docker on OSX can be painfully slow if you have a lot of files that change frequently. There are a few projects to improve it but they quite aren't there yet. It's my least favourite aspect of modern web work.

To make it workable I had to just give up on the local host file integration.

I basically used the container like a VM. Configured it with all the tools I normally use (e.g. OhMyZsh, etc) and had it constantly running in the background. I would use VS Code as a front end and work directly inside the container (cloning repos and pushing commits).

It had its quirks but the main benefit was that my local machine was no longer a snowflake. I could easily move to any machine, pull my "development" image, spin up the container and everything was exactly as I liked it.

What would be the advantage of this setup over a traditional VM where you presumably have more portability of the stateful image?

Since they're lighter weight it's easier to run more of them. Think of a place with 7-8 different applications, a few different DBs to support them, redis, elastic search, etc. You can spin up a mirror of your production environment with one command. Theoretically you can do the same with VMs but it will consume a lot more resources.

This is only true of Docker on Linux hosts.

Even on Windows/Mac you're probably better off running 1 VM and then docker inside of it.

Isn't that essentially what mac Docker Desktop does?


Speaking for myself, containers are started and destroyed faster. When using VMs, the tendency is to keep updating the software within the VM, making changes to their state, etc: this eventually leads to drifts. When using containers, if you need to make a change, you destroy the container and re-build it, and the state is always consistent with what you (and possibly your teammates) use

as an addition to do this, we've found it easier to spin up the same exact environment for CI/CD.

To be perfectly honest, in this scenario, there isn't much of an advantage.

The container approach is lighter weight, and I found it easier to manage the configuration via Dockerfiles. Managing a full VM with the OS install is a bit of a pain.

That being said, I worked at an organization that did the VM approach using Vagrant. It wasn't as nice as the VS Code/Docker approach, but the results were similar.

Have you tried docker on Windows? It's even worse.

On OSX I only had problems with GUI running in docker. I was used to sharing X between linux host and docker container also running linux.

For some projects, the only working solution I found was to run a VNC server in docker. Specific example: in a docker container, run a GUI built with Kivy and view the window on the OSX host. If anyone manages to do this without VNC, I'd like to know how!

> Have you tried docker on Windows? It's even worse.

I've been running Docker on Windows since Windows 10 17.09 or roughly the time WSL 1 came around. That's since October 2017.

It's been really fast and stable here and now with WSL 2 it's even better.

There hasn't been a single Flask, Rails, Phoenix or Webpack related project I've developed in the last 3+ years where I felt like Docker was slowing me down[0]. I'm using a desktop workstation with parts from 2014 too (i5 3.2ghz, 16gb of memory and a 1st gen SSD). About a month ago I made a video showing what the dev experience is like with this hardware while using Docker[1].

Code changes happen nearly instantly, live reloading works and even Webpack happily compiles down 100s of KBs of CSS and JS in ~200ms (which could further be improved by using Webpack cache).

[0]: The only exception to this is invalidating cached Docker layers when you install new dependencies. This experience kind of sucks, but fortunately this doesn't happen often since most changes are related to code changes not dependency changes.

[1]: https://nickjanetakis.com/blog/for-the-time-being-16gb-of-ra...

Also been using Docker on Windows 10 since around the 2016/2017, and also had no issues with speed - it works marvellously!

The problem is "technically" still there in Windows it's just that Microsoft decided to push things along by creating WSL which essentially puts Linux (with Linux Containers) on Windows.

The Windows solution is the equivalent of Smart Hulk figuring out time travel.

I'm not saying it's not good, because it is.

I have been using Docker powered desktop environment for development on WSL2. Didn't have any issue except that I couldn't access the containers via their internal IPs inside WSL2.

> Have you tried docker on Windows? It's even worse.

In my experience, Docker on Windows with WSL 2[1] has been pretty snappy.

[1] https://docs.docker.com/docker-for-windows/wsl/

I wonder what the stats are of the VSCode remote plugin users but I would wager a cup of water that the majority of users who go this route are dealing with Windows in some fashion. I’d never heard of such witchcraft until I moved to an organization that, for whatever reason, has deemed that Windows laptops are the mandatory choice for web development. WSL does have significant problems (and even when it works you deal with the idiosyncrasy of the magic it does to interact with the Windows environment : try installing docker solely in your wsl2 environment and you’ll find it still looks for an .exe executable or have fun figuring out why some so trivial doesn’t work like it used to because it depended on systemd which is purposely absent from operating systems you might install on WSL2). In truth using WSL2 is a trick, and can add headaches to anyone who doesn’t want to deal with the added abstraction of running another operating system that interacts with your native operating system in ways neither were originally built for. That’s not to say that WSL2 isn’t impressive. It has the ahah that parallels, codeweavers, or wine (to name a few) kind of all bring when they work. And when they don’t it’s also time to get lost in the weeds. But now, especially in its infancy, WSL2 has problems. Why wouldn’t it? It’s new to Windows offerings and relatively obscurely tested given the audience is a small fraction of what they’re up to.

This solution of VSCode + Docker containers seems to sidestep the whole WSL issue as WSL is no longer necessary for development if you’re containerizing everything anyway. While, I must admit, I like the idea of a project having the same steps for all users regardless of platform (to each his own), I don’t believe the majority of people would like to ditch their IDE of choice for the one tool that does this somewhat seamlessly. I’m probably not characterizing this well as I’m new to the workflow and I like to customize my own terminal workflow - which from what I’ve seen this doesn’t lend well to. Lemme just say it - I like the Linux cli workflow more than Windows and have since the beginning. But that’s just me. I’m sure their are plenty of peeps who feel the opposite. What I don’t like and I feel I share the same annoyance is now having to know both Windows and Linux command line interfaces.

Yes you’ve stepped into a rant, gotcha!

Here’s to hoping Microsoft goes full retard and strips out Windows and just maybe go the Edge route: shifting to putting a pretty face on Linux. That would be awesome, I’d buy Microsoft’s distro, frickin base it off Debian and let’s get this show going! Apple ain’t really doing anything special at this point - so your move Satya!

“I’d wager a cup of water”

I like this phrase. I would like to borrow it if you don’t mind :)

Me recently getting into Dune has altered my perception...

WLS2, docker desktop, running it for couple of years, so far so good

Agreed, try running webpack in debug mode with watch it's basically not-usable, as is git and install npm packages.

npm / webpack: That's not Docker's fault - that's node/JS's fault. Having to touch thousands and thousands of files is expensive. This is why I've come to loath web frontend development.

It's definitely Docker that's the problem. It's a well known problem, and people are working on solutions. It affects far more than NPM and Webpack.

> I've come to loath web frontend development.

Well, we don't have to do things this way just because everybody else does... developing "vanilla" Javascript is actually a pleasure especially with tools like Chrome's debugger. In my experience, front-end "frameworks" like Angular and React provide very little value and tons of unnecessary overhead.

Show me the team that's using vanilla JS, I'd like to sign up.

Same problem with PHP development, anyone trying to develop on windows for any real world framework will have epic IO lags, sometimes crashes with composer (equivalent to npm) and ALSO during execution time. Is this composer/PHP/Symfony-Laravel-whatever's fault ? Is it the caching lib fault ?

I work on linux, but try to onboard windows dev frequently ("windows dev", ikr ;) . The experience is always painful, and despite many efforts from the WSL team to move forward, docker for development on Windows is still barely usable right now ...

I use vscode to connect to a a cloud vm. VSCode is surprisingly good at this. Client is local and connects to an vscode server over ssh.

Granted I work on Azure and the cost of the vm is not something I have to worry about.

We also use remote dev boxes, but can VSCode connect to a remote server and connect to remote docker containers?

For example, I don't have ruby installed on the remote dev box, but it is installed inside docker on the remote host. I also don't have ruby or docker running locally.

I think all the linting plugins either expect ruby to be available on the remote host, or inside docker, but not this combo... Is there something I'm missing? (disclaimer: only played a bit with VScode, I use vim on the remote box usually).

> can VSCode connect to a remote server and connect to remote docker containers?

Yes, it can. https://code.visualstudio.com/docs/remote/containers-advance... talks about setting it up.

Thank you! looks cool. Hope I’ll figure it out, considering I’m pretty clueless about VSCode

Yes. Vscode (server) is running on the remote machine. Extensions run on the remote as well.

A new terminal in vscode is a terminal on the remote.

Only the client GUI is local.

Thank you. I managed to get it working... still I'm not sure about things like ruby-rubocop ... I can install it remotely, but not inside a remote docker container (where my ruby runs)... Any tips?

Yes, if you can connect the docker cli to the remote host, then vscode can connect too.

The Azure Codespaces VMs were awfully slow on IOs. It was not a great experience at all.

If you want decent IOs, better than an old notebook, Azure is extremely expensive.

I do this, but with a Linux VM running in Parallels on my Mac. Uses about 1/10 the CPU that Docker on Mac uses for the containers I run.

Same here, but with Hyper-V on Windows.

Originally I did this because of bad performance and bugs in WSL 1. I hear WSL 2 is better but I already have it all set up and it works great for me so I've just kept it.

There's a volume feature that sped up disk access to mounted volumes quite a bit. You add ":cached" to any volume. This changes some of the guarantees around consistency, but I haven't had any issues. See https://www.docker.com/blog/user-guided-caching-in-docker-fo...

Not sure about OSX, but in Windows 10 + WSL 2 Docker development can be made faster by cloning code directly to a volume.

Isn't Docker using virtual machines on macos?

the solution to this is to enable NFS for your volumes. The default volume configuration is a huge bottleneck for docker on macos

Although be aware, if you add shares to nfs.conf, your Mac will no longer sleep while connected to power, even if you close the lid.

Not sure if there's a way around it, but I was sitting down at my desk every morning to find a toasty laptop, lid closed and fans blaring.

A little bit off-topic, but I hope it's relevant enough: can someone who's well-versed in Docker give me some pointers as to how I can use it in a better way? Let me elaborate.

I'm a bit of an old fart when it comes to software development. I prefer stable, slowly evolving solutions. I am a fan of the role of classical distributions. I abhor bundling every piece of software with all its particular versioned dependencies until everything works. I'm not gonna change. And that means I'm probably not the type to use Docker to deploy anything. That being said, I do see great value in it as a way for sysadmins to let semi-trusted people run their own OS on shared hardware without stepping on each other's feet. I love that it lets all of us run each our OS of choice on our compute machines at work. But that's just it: when I use Docker, I pretend it's a VM. I really would like to learn to use it in a better and more appropriate way, but whenever I try to seek out information, quality search results are absolutely covered in garbage 10-second-attention span "just type this until it works" blogposts.

Any pointers?

(Alternative question: are there some cleaner solutions than Docker out there for the workflow I describe above?)

You are confusing two different things.

One is using Docker as a deployment packaging method. The other is using Docker only for development and still deploying traditionally. Sure you can do both, but it doesn't have to be this way.

>when I use Docker, I pretend it's a VM

Also check anti-patterns 1 and 4 here


This is an important distinction and one that sometimes gets lost in the back-and-forth over Docker.

I feel like a lot of the complaints levelled at Docker pertain to the packaging and deployment use case. Where Docker really shines - even for small teams or solo devs - is as a development tool.

Anti-pattern 1 perfectly describes how I am (hamfistedly) using Docker. Thanks! However, it doesn't really explain how to fix my mindset. It just says that I should ;-)

Could you give me some hints?

"There is no easy fix for this anti-pattern other than reading about the nature of containers, their building blocks, and their history (going all the way back to the venerable chroot)."

Basically learn the basics (cgroups, namespaces)

You should also study this https://github.com/p8952/bocker

> Basically learn the basics (cgroups, namespaces)

I think I have a basic grasp of those things, but still don't get how Docker uses them.

> You should also study this https://github.com/p8952/bocker

Cool! That's very usefl!

You have an increasingly rarefied perspective, and you should hang onto it.

Docker is a treadmill. Docker leads to Docker Compose leads to Kubernetes leads to whatever. It's a lot of noise and motion; you will increasingly encounter engineers who grew up on this stuff and assume it as a prerequisite, and are eager to climb the treadmill, thinking it's a ladder. You know about other options, and can decide when to stop.

Yeah I appreciate this. And since I'm a researcher rather than an engineer I'm also not forced to get on the threadmill. But like I said, I do see some value in using Docker as a tool to let a bunch of researchers run wild as semi-trusted superusers (by "semi-trusted" I mean we can be trusted not to purposefully do evil, but not trusted not to accidentlaly step on each others' fet). So I'd still like to learn a bit :-)

Try thinking of it as `git` but for an entire OS instead of merely a filesystem, and that you have multiple branches of your git tree opened simultaneously in different locations.

Or think of it as a set of databases that include the combination of the current state and the code ("migrations") to achieve that state, while allowing those databases to share the same history.

In other words, containers are a solution to state problems, not only "works on my machine" problems. The benefits are reproducibility and shared resources. It is "functional OS" as in "functional programming," a pipe of common operations to apply to inputs to generate a consistent output, and which can be forked anywhere in the history/pipeline long as the pipeline does not hide held state.

Coming back to the `git` analogy, a traditional VM with Snapshots is like saving ProjectV1, ProjectV2, ProjectV3, ProjectV2_fixed, ProjectV2_fixed_final, while a container solution is saving only the history and places where histories diverge.

To answer your alternative question, nix package manager (which can also be run as a standalone OS, NixOS) is an interesting alternative solution from Docker. Reading its documentation may also help in appreciating the alternative set of perspectives.

Yes, similar idea, except a change in a dependency in nix triggers a rebuild of all dependents, whereas it does not in ostree.

In Docker...it depends.

I'm not quite sure that I understand the advantages of doing development work inside of the container. What am I missing?

Working with a team on a project with a moderately complex build process (say python and node), it soon becomes a painful to onboard new team members and get their dev environments set up.

Even assuming they have the same version of macOS, homebrew constantly evolves so getting everyone developing with the same version of dependencies as you use in prod is super painful. If they’re on Linux they likely don’t run the exact same distribution as prod.

Even if all that is the same, maybe they have to work on multiple projects simultaneously with different dependencies.

Python and node are probably the easiest to get setup. With Python you have venvs which do a lot of the heavy lifting required.

The issue becomes when you are on a non Linux environment for development and deploy to Linux in containers. If you build your code locally then you are debugging $DESKTOP issues which might not be the same as Linux.

Also languange environments with poor dependency management (e.g. C, C++) benefit from having an installable system.

> Even assuming they have the same version of macOS, homebrew constantly evolves so getting everyone developing with the same version of dependencies as you use in prod is super painful.

If someone is using homebrew for their development dependencies, unless they are targeting a release to homebrew, kindly ask them to stop doing this.

for me the best advantage of working with containers is the initial setup for new colleagues, especially junior devs...,

setting up pyenv, correct version of python manually, nvm, postgres, etc.. takes time and our setup guides grown a lot in the past 5 years...,

with containers all those long setup guides can boil down to `docker-compose up -d`

On-boarding really does seem to be the major pull. I've been developing along side docker containers for years now and I just don't see that changing.

I do see value in it, though, after the comments and after doing some additional reading. I could see myself pushing this direction if the teams I worked on had higher churn rates or more frequent new hires. Also, I think that it lends itself well to certain tech stacks / languages / ecosystems than others.

There are many advantages of containerized development. One example would be the protection of root environment from version pollution. My team uses and supports 3 versions of a framework, how do I test and develop in all of them without one environment affecting the other?

Of course, you could use a version management tool like nvm/rvm/asdf etc, but I think containerisation is also a neat alternative, as you would be able to use multiple versions for languages/tools and libraries within those languages for which a version management tool doesn't exist.

Why would you be doing development as root?

Personally I would use virtualenvs with Python to solve that problem you described.

there's still system level packages that you might need, which may come into version conflicts among each other.

note: me too I default to virtualenv for local development, however there are usecases where this becomes insufficient.

I think root refers to the main system environment, not the privileged user.

"3 versions of a framework" is quite vague. Are those versions Linux/Windows/Mac? Because then Docker is not very useful.

Docker is not a general solution to this problem because it ties you intimately to Linux, whether directly, or through VMs, or compat layers.

Let's say you maintain or occasionally code several big webapps in Ruby. One uses Ruby 2.4 and Ruby-on-Rails 3.2, another uses Ruby 2.6 and Ruby-on-Rails 5.2, and the last one is bleeding edge using Ruby 3.0 and Rails 6.1

Having all versions of you language and framework installed top level is a huge pain in the ass, since they inevitably will interfere with each other. Having separate containers with all necessary dependencies in it for each app is a lot more manageable.

rbenv actually solves this quite well. You can set a project-level version in a .ruby-version file, and each lang version has its own gem cache

Yeah. I have over a dozen Ruby on Rails apps on my development machine, targeting a variety of Ruby and RoR (and other common gem) versions. I'm not aware of any problems anyone has experienced in recent years, since everyone has moved to rbenv. We used to see issues with rvm.

I don't use docker with any of these projects. They're mostly legacy for us at this point and are shipping to EC2 instances directly.

Our more current projects do use Docker, however, and we're doing development along side of Docker in those instances and that seems to be working fine for us.

I do appreciate that a dev container would / may be a better approach to this for other reasons and especially, potentially, other languages and ecosystems, though.

I used to work at a company where I'd write and maintain multiple microservices, so I was dealing with dozens of small and big ruby applications across different language versions and different frameworks and dependencies. It was two years ago, but I did experience a ton of issues between different versions of Bundler and rake and rubygems. I'd absolutely go full docker if I was still working there.

I work with several programming languages, its easier to make the problem language agnostic.

My team uses macOS, windows and Linux. Dev Containers not only allow us to have our dev envs and prod environment as close as possible, as well as making all our dev envs identical, it makes working cross platform really straightforward.

If you work in enterprise 80% of developers have to rock windows laptops for compliance reasons but develop applications that will run on redhat/centos in the end.

It's not so much that it's a container, but rather it's Linux. The host OS isn't Linux.

I understand this can be done. And this post explains how it's done. I still don't get why it should be done?

What's the advantage of running your dev environment in a container?

If your project is targeting an environment you don't control, and doesn't behave exactly like an environment you have exclusive access to, your code won't run the same way in the production environment as it does on your machine. There are different strategies for dealing with this, but Docker seems to be the sweet spot for making it easier. You use Docker to create a container that mimics the production target, and then use that container to develop in.

You could reconfigure your laptop to look exactly like the production target, but then you have to keep doing that every time you change projects.

Before Docker, I used VMs for this, but VMs have certain disadvantages that Docker addresses. Like size, and documentability. Every time someone wanted me to look at a project, we had to figure out a way to transfer and store a copy of a 20+ GB VM. And they couldn't tell me everything they did to create that VM from scratch, because VMware doesn't do that and neither does Hyper-V. With Docker it is just a small text file that describes everything it takes to create what was previously a massive, undocumented VM image. It forces you to document how to create the environment, and it saves on the space and time of transferring VMs around.

I use Windows 10 as my daily driver, and often work on projects using Postgres, RabbitMQ, Redis, nginx and others.

Until Docker came along, it was a royal PITA. I always dreaded getting a new laptop or something breaking, as it took forever to set everything back up again, and it was never quite the same.

Docker changed all that. It forces you to configure everything in a reproducible way in a Dockerfile - and it's much simpler than trying to come up with scripts to install and configure everything in Windows, and I'd say it's also quicker than trying to come up with scripts for a Linux VM, just because you can spin containers up and down so quickly.

Docker has been a game changer for dev/test.

What exactly is diffucult to understand about the value proposition? Seems pretty obvious to me...

If you mean why use dev environments in containers:

1) reproducibility,

2) re-use of container creation scripts for different environments,

3) isolation from your actual OS,

4) ability to run the same OS/libs/etc as the final deployment,

5) tons of base images with different environments already configured - from LAMP to data science,

6) easy sharing with others, team

7) ability to work with 1-2-5 or 100 different environments, with different OSes, versions, libs, python versions, whatever.

As for why have your IDE/editor work from inside a container (as described in the article).

Well, because you get all the benefits of containers (as above) PLUS get to use the editor as if you were programming directly on the target machine (including having visibility of installed libraries for autocomplete, running code directly there, and so on).

It's the same thing people have been doing with running Emacs in "server" mode inside another host, and programming with Emacs client on their machines as if they were locally at the machine.

> 3) isolation from your actual OS

For me, this is the major benefit. I don't have to worry about installing new tools or libraries and how they interact with my primary OS. While this isn't a huge issue for many, I don't want to have to worry about how Go, Python, Java, etc... are installed on my Mac. I like being able to pull in a Docker container with everything already setup (or a customized one). Then when I throw away a project, I don't have orphaned installations on my Mac.

All of those benefits were solved problems with VM based workflows using something like Vagrant long before Containers started gaining traction (IMO).

Don't get me wrong, I am a fan of containers (in particular LXC), but I wouldn't list those benefits as if they are unique or novel to container based workflows.

Edit: To be clear, there _are_ benefits to containers over VMs, just not the things you listed above from my perspective.

Overhead... it's all overhead. Starting and connecting to a VM isn't nearly as streamlined as connecting to a Docker container (even if it is also running in a VM).

Even the hurdle of SSH'ing to a VM is more cumbersome than `docker run`.

Certainly this could be automated and scripted, but the Docker solution is so... streamlined.

And I say this as someone that used to use Vagrant. With smaller installs available (such as Alpine), maybe more modern VMs would be just as easy as containers...

Also -- for me, VirtualBox was just "meh". It worked, but really wasn't that great. It always seemed like it took too many resources to run. That was another issue with Vagrant. (And yes, I did also use Vagrant with VMWare, but again -- that's a lot of overhead).

I can't name a more dysfunctional hypervisor than VBox. If my choices were a Virtual Box VM or superhacky Docker-as-VM, I'd pick Docker every day.

But I'd say VMs and containers solve different problems. In the case of dev environments, VMs are too "persistent" and accumulate personal cruft very quickly. Container tooling can be built to be noninteractive.

Yes, a lot of containers start in less than 1 second. With SSDs it's not as bad as it used to be for VMs, but it's still a lot longer than a container.

I also find the workflow of creating Dockerfiles to be much smoother than cobbling together scripts for a VM.

Plus Vagrant was a real PITA to get working on Windows (at least it used to be - I think I eventually gave up trying to get something running on Windows 7).

Schlepping around and running a whole VBox/Vagrant VM is heavy enough that I will be motivated to see if I can’t get it working under MacOS. Running a container is less offensive.

Is not docker server running inside a VM though ? You could also run Xserver (available on both Mac and Windows) and ssh -X into a VM to run GUI apps inside the VM.

>Is not docker server running inside a VM though

No, docker server runs directly on top of the OS as a native program.

As for Docker containers managed by the Docker server, they are runing on top of a supervisor - not in a full VM.

On Windows Docker runs in a VM.

That's wrong:

"With the latest version of Windows 10 (or 10 Server) and the beta of Docker for Windows, there's native Linux Container support on Windows. That means there's no Virtual Machine or Hyper-V involved (unless you want), so Linux Containers run on Windows itself using Windows 10's built in container support".

In any case, as Linux and macOS prove, there's no need for docker to have to run on a VM. And it seems there's no need on Windows either since 10.

You’re quoting a random blog post from Scott Hanselman and not Docker release notes.

I don’t know what happened with that but I was not wrong and Docker for Windows does still run in a VM: https://docs.docker.com/docker-for-windows/install/

I assume you don’t use Docker on Windows and just pasted the first google result?

Vagrant could have made it but it didn't. Seems like the cloud providers leaned into docker much more heavily.

>All of those benefits were solved problems with VM based workflows using something like Vagrant long before Containers started gaining traction (IMO).

Well, VMs are like overweight containers. Containers make "all of those benefits" easier, more performant, and more ligherweight.

Consistency between team members.

Super easy for new team members to get started on a project. No need to manually install dependencies.

Environment versioning in git and docker. Your local environment gets automatically updated with a git pull.

> Super easy for new team members to get started on a project. No need to manually install dependencies.

I see this brought up a lot as an argument. So why do we want this? How often do people switch companies? Once every 3 years on average or something? Getting your development env setup takes what, a few hours max on 3 years?

It depends on how large your organization is and what specific quirks need to be configured to do your work.

Its not unusual for large organizations to have internally hosted registries (Artifactory), source control and network proxies. This usually requires setting up different config files (.npmrc for Node.js/NPM), installation of custom root certificates, ssh keys, etc. None of that includes project/team specific configurations and workflows.

Take all that and multiply it by thousands of developers and you have a recipe for an endless stream of Slack chats, email chains, and Teams messages repeating the same config questions and answers.

If you can reduce all that down to a single docker pull, while making sure everyone's development environment is consistent, it can be a big win.

It can be much more complicated than that.

* You want means to keep all of the dev environments in sync so you don't get "works on my machine" problems.

* If you update something, then you need a way for everyone to have their environment reconfigured.

* As the number of projects/stacks/developers scales this becomes a bigger and bigger issue.

I've used some Anisble in combination with a shell script wrapper to handle some of this kind of stuff. Even still, it takes a lot of hands on support to make it all work. So, if you can get something like this to scale, it might be a big win ... if...

(edit formatting.)

Ok, now I'd like to update a dependency.

In docker world, I create an MR that updates the dev container and deploy container dockerfiles at the same time, check that it runs tests, and merge it in. I push a new version of the dev dockerfile, and have the .vscode/devcontainer.json reference that new tag. Next time all the devs open up this repo, they'll get notified they need an update. You just updated a dev dependency across the whole group in a source-controlled way.

What's your way to do it? Email everybody?

Fair point. It still comes up only a few times in a year. Most of the time I work with code bases that have most dependencies defined in some sort of package file (Gemfile, package.json, etc)

Updating a postgres version comes to mind as one of the possible differences and that usually only is an issue when working with pg_dump and pg_restore with different versions.

Good point nonetheless. I am not sure whether it is worth the work to maintain dev containers and the performance hit you get vs running a database directly for example.

> So why do we want this?

Just yesterday, I ran into an issue where a set of node unit tests were failing. My college and I were both getting failures, but different failures. The reason was: Different versions of Chrome, and thus different versions of the chrome integration plugin.

Given that we have effectively no control over Chrome's auto-updates, we'll never have truly identical development environments. A container with headless chrome would have resolved this for us.

But your customers still have widely varying versions of Chrome. So while your test might work, your code is still broken. Or did I misunderstand?

It's purely a test fixture setup that's failing because of a need for lockstep Chrome and the Chrome test fixture versions in Node. So, the code isn't broken for customers, just the local testing story.

It depends on your context. I've worked places that onboard new team members every few years, and I've worked places that onboard new team members every few weeks.

On larger teams, setting up a consistent environment like that also makes it easier for developers to collaborate. I've had experiences where attempts to pair program or share utility scripts generally stumbles and fails due to everyone's environment being a special snowflake.

But you can easily switch projects in-company, take a look at the one next cubicle etc. You switch machines. You decide work from home etc.

I like the fact that I can switch to/from different computers quickly. I have a desktop and a laptop that I routinely use. If I change something in a devcontainer[1], then I can move between the two easily. This actually happened to me earlier in the week.

[1] syncing the changes is left as an exercise to the reader, but I use a common git repository.

Not how often people switch companies but how often people work on different projects. Someone working on multiple projects can waste a lot of time keeping up with the environment of each project.

Sure, that sound like a good reason. Thanks.

> I see this brought up a lot as an argument. So why do we want this? How often do people switch companies?

It's not about you. The people coming in generally need confirmation and help when setting up their environments. Someone there would have to take time out of their day to help you. A few hours, a few days, a few weeks, a few deleted companies (https://news.ycombinator.com/item?id=11496947).

Here's an even more fun one: https://news.ycombinator.com/item?id=14476421

It also makes it easy to make changes and get it out to everyone on the team in an automated way.

Someone else raised the same point and I responded a bit differently, but I don't really have any experience with lots of these changes that really matter in many projects. Sure sometimes a version of imagemagick is fucky, but that happens so infrequently that I don't think it justifies running everything in docker.

I think you're really sleeping on the benefits of being able to spin up and change an environment easily.

Its easy to load a container with some stored test state. Its easy to load a completely fresh environment. Its easy to run multiple instances of things (with docker compose).

Its easy to totally shutdown the environment.

Its easy to work on different branches with different/conflicting dependencies and juggle containers.

That`s how we do it in my company:

- Clone project

- Build container

- Develop

If you're in a polyglot shop there are HUGE productivity gains in not needing to setup your environment manually, or worse, risk that vital information about it is distributed as tribal knowledge.

Plus, if your project has external dependencies like DBs, S3, etc... you can use docker-compose with VS Code as well.

Can only second this, we use a VSCode devContainer-based setup in all of our projects and even migrate our legacy projects to it (software agency).

Here's our current base go template, you only need Docker+VSCode on your system to get started: https://github.com/allaboutapps/go-starter

Bonus points:

* As all IDE operations solely run within the local Docker container, all developers can expect that their IDE will work the same without manual configuration steps.

* We can easily support local development in all three major OSes (MacOS, Windows, Linux) and even support developing directly in your Browser through GitHub Codespaces.

* Developing directly inside a Docker container guarantees that you use the very same toolset, which our CI will use to build these images. There are no more excuses why your code builds differently locally versus in our CI.

Edit: format/typos

Does that force everyone to use VS Code only? Can non VS Code developers also work efficiently?

You could always run something like Vim/Neovim directly in the container.

I, personally, hope that JetBrains comes up with something similar which will allow devs to use the same workflow with the JetBrains IDEs.

yep, currently it forces our engineers to use VS Code or a terminal-based setup (or GitHub Codespaces). Hopefully Goland / IntelliJ catches up in this year...

If you have projects with different versions (python2/3, Java5/8, node8/11 etc) you don't need to bother with version managers and package managers anymore.

Just launch the respective container and you are good to go.

for python, why cant you use use virtualenv with pyenv?

Because it is a Python specific thing. Docker works with all programming languages and that's it.

If you are full stack or work with multiple programming languages there is no need to learn the "equivalent" of virtualenv everywhere else.

Also with docker there is no setup/installation involved. You just pull the image and that is it.

Also virtualenv requires that you already have pip/python installed. Docker requires nothing (apart from itself). So you can instantly launch Java/Node/Erlang/Haskell whatever without any SDK/libs installed.

Second this. Developing in Docker completely liberated me from the caprice of Python packaging and Node dependency management and made doing development work on multiple platforms (MacOS, Ubuntu) frictionless.

There’s a “pets vs cattle” angle here, too. Something goes wrong, just pave over it and start again.

YMMV, of course, but I can’t imagine going back.

I think every programming language that I have used has had something equivalent.

Let's assume that this is true (I don't think that languages like C/C++ have this but I may be wrong).

If I use 4 programming languages why learn 4 tools instead of one (Docker)?

because setting and configuring virtualenv is also chore and loads of things that can go wrong. Its different in windows to mac to linux.

With docker its a single command run after installing docker.

So, speaking as a data scientist, the dependencies run deeper than Python kernels and libraries. For something like TensorFlow, you need to account for specific versions of Nvidia drivers and CUDA/CUDnn. Using something like the Nvidia container runtime for model development allows me to essentially version my entire development environment.

good point.

Because projects are seldom as simple as that, potentially having lots of other kinds of dependencies (a directory structure, system libraries/command, symlinks, other services running, etc).

And let's be honest, virtualenvs and all the various ways they're managed and updated and such aren't really bulletproof either.

It's really, really refreshing how much "yeah I managed to break my dev environment" or "I followed the wiki for how to start developing your project but it 'didn't work'" can be avoided if it's just "run docker(-compose)?".

And this is especially true for junior developers, who are probably fresh out of college and won't be familiar with lots of the tooling that exists in the world. Not that docker is a simple tool, but it can hide so much complexity that it is easier to just show people how to docker run and docker build and such.

You could, but then maybe you have some editor-specific settings you'd also want to tie to particular project.

And if you're working with multiple projects in multiple languages, why bother learning each language's equivalent of virtualenv (assuming it has any), when there's an universal method available?

I wrote this article a while back that compares setting up a Python development environment for web development with and without Docker: https://nickjanetakis.com/blog/setting-up-a-python-developme...

The TL;DR is there's a lot of things to set up yourself without Docker in order to run a typical web application and it's different depending on what OS / version you use. Some of these things are unrelated to Python too, such as running PostgreSQL, Redis, etc. but these are very important to your overall application.

Docker unifies that entire set up and the barrier of entry is installing Docker once and then learning a bit about it.

A couple of years ago I got your Docker course and I learned a lot from it, been doing Docker chores for my teams ever since. Anyway, good course!

Thanks a lot!

Since some of the proported features of containers is portability and consistentency/reproducibility between environments, the idea here is that you want to make sure your develop environment is as close to the deployment environment as possible. In addition, if you want to do prototype development, quickly try and test new ideas, and iterate on that, it's nice to go through that without rebuilding and deploying the container to a test/production environment. Do all of that right where you do your development.

At the agency where I work we have about 40-50 somewhat active projects at any time. Any given week I work with maybe 3 to 10 of them.

It saves so much time to just pull down the repo, run "docker-compose up" and have everything running, almost exactly the way it's running in production. With the right node or php version, databases, Elasticsearch, Redis etc.

The problem I usually run into with this sort of arrangement is the database. Every non-newbie developer is very aware of using source control with their code. A lot of developers are careful about managing the dependencies for that code as well. But for the database, you have a second asset that often needs to be synchronised with the code, and that means both schema and possibly records as well.

Just deploying changes affecting both assets carefully in production can be quite awkward relative to a code-only deployment. You might need to do this in several stages, updating your code to allow for but not require the DB changes, then updating the DB schema, then maybe updating existing DB records, then at least one more round of code updates so everything is running on the new version of the DB. And you might need to make sure no-one else on your team is doing anything conflicting in between.

Doing the same in a staging environment isn't so bad because you're running essentially the same process. However, for a development environment where you want the shortest possible feedback loops for efficiency, you need to be able to spin up a DB with the correct schema, and possibly also with some preloaded data that may be wholly, partially or not at all related to controlled data you use to initialise parts of your database in production or staging environments.

It is not always an easy task to keep the code you use to access the database, the current schema in the database, and any pre-configured data to be installed in your database all in sync, and to ensure that your production, staging/testing/CI facilities, and local developer test environment are also synchronised where they should be.

There's also mutual deployments which suck really bad i.e. deployment A needs to go out with deployment B; but if you haven't set up your CI with that facility (and, let's face it, the easiest way to set up a CI with multi-repos is on a per-repo basis), it can get really hairy.

I agree that the database (and things like uploaded images) can be a challenge. For WordPress we usually sync down from production, to staging, to testing, to local (while filtering out PII). For other projects is't usually easier using migrations and seeding.

The challenge is how you keep your migrations, source code and any seed data synchronised, but only where they should be. There is often a need for any separate migration process/scripts to be synchronised with corresponding changes to the DB access code in the main application, so the models in the application code always match the actual schema in the database. For seed data, there may be some "real" data that should always be present but also some "example" data you want to include (and reset to a known state) for each test in an integration test suite. Etc.

It kind of amazes me that there doesn't yet seem to be a way of handling this in the web development community that has become a de facto standard in the way most of us look at tools like Docker or Git these days.

It depends on what the environment is for. Your company work, or individual projects?

If you're an individual, then it would benefit someone who has multiple devices and/or multiple operating systems and doesn't want to manage their environment across all those devices and operating systems. For example, I personally have OSX, Windows, and several different Linux distributions on my laptop itself. My desktop also runs several operating systems.

Managing software across all of those is a pain. With Docker, I only have to manage the containers, and just have Docker installed on all the operating systems. Instead of managing like 50 different dependencies across 7 systems (think 7x50), I only have to manage Docker inside each system.

If you're working for a company, they will have their own dev environment. Instead of setting up and troubleshooting all their dependencies on your computer, you can just use their containers.

It lets you do dev work in an environment that matches prod (and your teammates' environments) as closely as possible.

It's frustrating at best and catatrophic at worst when you have code working on your machine, deploy to prod, and then discover incompatibility.

Reproducible build environment.

In big orgs it's a way to manage environment changes for big teams

So, you can extend your bio-break waiting for your build to finish as Docker furiously crunches those files, raises your laptop temperature and makes those fans spin mightily.

Remote development from different machines/thin client machine

You could do that already with VMs and SSH. Sure it is a bit easier with Docker, but that is not the big advantage.

While I love that VSCode can do this and I would like to appreciate a post about the topic, I wonder if I should upvote a blog post that adds almost no value over the existing VSCode documentation which is basically excellent already and covers every use case I ever stumbled upon.

To be truly portable it is worth considering whether your newly created Dev environment can work without the internet or in an air-gapped environment.

Vscode can be problematic in that respect. Typically with dependencies used by an extension often assuming they can reach out to other servers.

I look at initial internet facing container creation as a separately managed snapshot process to grab dependencies which then gets configured for particular Dev, build, test, runtime and release containers that are built by the dependency collecting original container.

Ie something like vscode isn't installed in the internet facing container, it is installed in the offline build of a Dev container. This is where the difficulties lay in my approach.

As amazing and convenient as Docker is in practice, containers hide the inherent mess that is modern computing, and the more they are used, the less chance is that this mess is getting cleaned up... ever. Ultimately this is another dependency, complexity hidden by another layer...

I feel like calling modern computing a mess is harsh. Modern computing encompasses a plethora of applications, to a point where we can model most if not all workflows. It's the product of millions of humans working together for half a century in a very large graph with little connection between each clusters.

Nothing at that scale is a "mess". It's simply what was created by our collective distributed system of humans and we should appreciate that it's not really a problem that can be solved instead of talking about it as if we could "fix" it.

I agree. While I understand the practicality of Docker and facilitating development using it, I wouldn't call it "forward" progress.

The fundamental problem, as you say, is that our dependency ecosystems don't meet our requirements. Docker is one way to avoid the problem without fixing it since it's easier. Forward progress would be to fix the problem.

On one hand, Docker removes some pressure to fix the problem and encourages perpetuating it. On the other hand, maybe it gets people to think about the problem more. I don't know which influence is stronger.

> containers hide the inherent mess that is modern computing, and the more they are used, the less chance is that this mess is getting cleaned up... ever

If you have a roadmap for how 150 or fewer engineers can "Clean up the inherent mess that is modern computing" in less than 5 years, then I'd be eager to read it. In the meantime, tools which enable people to manage the symptoms of that mess are good.

The Chunnel lets us work around the fact that the ocean has not yet been boiled away.

Understand that people quite reasonably feel this way but I personally don’t.

You’ve got to pick your battles. If you’re, for example, a front-end dev working right up at the top of the stack, then delivering value to your clients means getting them their marketing webpage, CRUD app, what-have-you. To do that you have to abstract away a vertiginous amount of stuff under you, all the way down the stack. We’re all standing on the shoulders of giants.

Docker is an amazing tool for just this sort of thing.

agreed. that docker is needed represents a failure of our industry

After learning Docker, I second this.

The great mistake happened way back in the 1980s (maybe earlier) when most OS developers didn't implement a proper permissions system for executables. Basically, the user should always be prompted to allow a program read/write access to the network, the filesystem and other external resources.

Had we had this, then executables could have been marked "pure" functional when they did't have dependencies and didn't require access to a config file. On top of that, we could have used the refcount technique from Apple's Time Machine or ZFS to have a single canonical copy of any file on the drive (based on the hash of its contents), so that each executable could see its own local copy of libraries rather than descending into dependency hell by having to manage multiple library versions sharing the same directories.

Then, a high-level access granting system should have been developed with blanket rules for executables that have been vetted by someone. Note that much of this has happened in recent years with MacOS (tragically tied to the App Store rather than an open system of trust).

There is nothing in any of this that seems particularly challenging. But it would have required the big OS developers to come on board at a time when they went out of their way to impose incompatibility by doing things like: using opposite slashes and backslashes, refusing to implement a built-in scripting language like Javascript, or even providing cross-platform socket libraries, etc.

The only parts I admire about Docker are that they kinda sorta got everything working on Mac, Windows and Linux, and had the insight that each line of a Dockerfile can be treated like layers in an installer. The actual implementation (not abstracting network and volume modes enough so there's only one performant one, having a lot of idiosyncrasies between docker and docker-compose, etc) still leave me often reaching for the documentation and coming up short.

That said, Docker is great and I think it was possibly the major breakthrough of the 2010s. And I do love how its way of opening ports makes a mockery of all other port mapping software.

I'm not sure I quite agree with that. Having a controlled environment in a sandbox of its own clearly has benefits, both for consistency of what you're running and for safety if it doesn't work as you expected. It doesn't need to be Docker specifically that we use to create such an environment, but if not Docker then we'd surely have looked for some other way to achieve the same result.

Docker is literally the specification of all the missing parts of the operating system. It's not a very good specification, but it is fairly comprehensive.

How so? Docker is fundamentally about OS Level namespaces and isolation, not about more abstraction.

Slightly off-topic, it really irks me when people take about reproducible builds/environments while using docker, while having something along the lines of apt-get update/pip install in their dockerfile.

Like that completely destroys the reproducibility of your container!

I can't speak for others, but, while we don't pin versions in the Dockerfiles for our build container images, we do version the images themselves.

Reproducibility is a continuum, not a binary. We've chosen a point on that continuum that we believe gives us the best trade-off between reliability and maintenance effort. 100% from-the-ground-up reproducibility would be ideal, of course, but there's a cost-benefit tradeoff, and we're not being paid to be perfectionists.

I mean in a sense, the fucked up behavior of system package managers relative to the actual concerns of building and distributing software to the masses is the reason we need Docker to sandbox environments in the first place. It's 2021, there is next to no reason why the default behavior is installation to /usr/lib with shared objects that are rarely shared, with global access when installed for one application used by one user, and linked dynamically when the object never changes over the lifetime of the running application.

For all its warts this is something that npm gets right. Package management is a tool for software development, not software distribution to end users.

You seem to be missing the point of what we call a "Linux distribution". Shared libraries are fantastic and everything in /usr/lib should be shared

I don't appreciate the condescension, is it not fairly well known that shared libraries aren't typically shared at all? You can distribute software that is reusable and packaged for many machines (the ultimate sense of "shared" in my opinion) but that's difficult when it relies on distribution-specific nuances and packages.

Do you really enjoy having separate packages for the same software for each flavor of package manager that does the same thing? Wouldn't it be nice if we didn't have to use containerization to distribute the same software to machines running a kernel with a stable ABI?

The majority of "shared" libraries are used by exactly one application: https://drewdevault.com/dynlib

It makes absolutely no sense to pollute a global namespace with these.

Out of 958 libs, 230 of mine on my desktop are only used by one application, according to the results after running the stuff from the link. I'm not disagreeing with you about polluting the global namespace, but at least in my case, it's certainly not a majority.

Except that - Linux distributions aren't solely tools for developing software, and their package managers are intended for all their end users.

That's not my point - dependency resolution is a build time requirement, not an install-time one. Requiring the user's machine to resolve dependencies to install your software is fragile, bug prone, and harms the user experience.

Like it or not, users don't care if your software has interchangeable parts. They care if it runs on their system. The only sane way to guarantee a piece of software runs outside your developer machine is to include its dependencies during distribution and packaging (not refer to them - which is what package managers require). The less sane way is to use containers, but those are required when developers don't package their software sanely.

This doesn't preclude users from installing software or replacing interchangeable components should developers support it. What it prevents is disgusting bugs and workarounds because dev A built on distro B while user C wants to use it on distro D but the packages have to be separately for everyone because the distro package managers don't agree with each other on what things are named or how they should be built.

No thanks. Apps like Firefox should use the libraries - windows, menus, etc. - installed on my system. Please don't ship your own libraries for this stuff.

Containers are an even more insane approach, so maybe we are in violent agreement.

Both apt and yum support installing to whatever directory you like as long as you have write permissions there.

Not really. They support dumping the package contents to an arbitrary directory, but you can't actually run the software from there without either the software having been written to support arbitrary paths (almost none is) or using namespacing and chroot to build it a sandbox wherein its baked-in paths actually work.

Contrast that with something like RiscOS AppDirs, classic Mac applications, or Next/Mac Application Bundles.

You very much can run the software from those directories, what are you talking about? Those paths are handled by the OS itself as well through LD_PRELOAD and PATH.

If what you're saying were true, unmodified software wouldn't work in a Docker container, either.

EDIT: Here's the exact command to do what you're saying isn't possible: apt-get download package; dpkg -i --force-not-root --root=$HOME package.deb

I was going to give you the benefit of the doubt and actually try this so I could show you that you are wrong, but I couldn't even get that far because dpkg always complains it is unable to access the dpkg status area. So clearly this is not as trivial as you make it out to be. I suspect because it expects a full filesystem in $HOME with its status file in the appropriate place. In other words, it is expecting a whole separate installation to be under $HOME.

Regardless, lets assume it did work. Here's what it would do: unpack the package replacing '/' with '$HOME' in the destination paths. That's it. That software will not magically be able to find its associated libraries and configurations without the user mucking with environment variables at best, or chrooting or sandboxing such that $HOME appears to it to be a wholly separate installation.

That's not how sane systems do this sort of thing. I have been trying to do this sort of thing in Linux for pretty much as long as I have been using Linux because I loathe the way Linux installs software, and in 20 years it has never been straight forward. AppImage is a close as we get and software needs to be carefully built and packaged for that.

> If what you're saying were true, unmodified software wouldn't work in a Docker container, either.

> [...] or using namespacing and chroot to build it a sandbox wherein its baked-in paths actually work.

Point taken, though there's no reason the package manager couldn't change the environment variables based on the directory it's told to use for the install. The pieces are there, but nobody's really taken the time to put them together. For the filesystem, some simple bind mounts would probably suffice, though you'd still run into some permissions issues I imagine.

EDIT: It seems to be significantly easier on dnf, to the point that it could be trivial to add full support for home-directory installs.

"support" and "default" are very different things. Especially with package managers like homebrew, which comes with a loud warning that non-default (root) paths may not be supported by dependencies.

It also doesn't really, since this is a system level issue. Applications need to package their dependencies, not the other way around. Dependencies form graphs, not flat lists. The existence of a global cache of libraries shared by all programs is a total inversion of requirements.

Depends on how you do it, right?

For instance, we create base images configured with SDKs, libraries, frameworks, configurations, binaries, etc...

Those base images are then built, versioned, tagged and then pushed to our container repos ready to be used by developers, CI/CD, etc...

Images based on these base images never need an apt-get, pip install, etc... If there is a dependency missing, updated needed, etc... we'll create a new base image with it, following the steps above.

I would love some constructive feedback.

This is what I think is the most practical approach. Pinning down ALL your dependencies to the exact version is much harder than it sounds. What you’re describing sound like the way Jib [1] does it. The pictures in this [2] blog post help visualize it.

The reason I like the approach you describe is because it keeps things simpler at the start of a project and consistent across most projects.

I also think it makes sense to have those support containers build on a schedule. For example, you build your build/CI container weekly and that’s the CI container for the week. On demand project builds use that CI container which has all dependencies, etc. baked in.

It would be nice if CI systems would let me explicitly tag builds as (non)reproducible.

1. https://github.com/GoogleContainerTools/jib

2. https://phauer.com/2019/no-fat-jar-in-docker-image/

That solves the problem one way, but I don't know if I would call it reproducible. You're working around the fact that it's not reproducible by only doing it once. You don't get the benefits, e.g. everyone who uses your images has to trust that you built them the right way.

I guess what I mean is that everything built using this base image should be reproducible. There is no reason (hopefully) to reproduce the base image. Any changes to the base requirements (apt-get, pip, etc) requires a whole new build and results in an entirely different artifact.

And just to be clear, I'm not building (no human) the base image. The base image is also created within it's own build pipeline that has all of the necessary things to track its materialization and lineage. Logs, manifests, etc...

Once the image has been thoroughly tested and verified (both by humans and verification scripts) each time a change is merged, the git repo is tagged, docker image is built and tagged and then pushed to the container repo.

Perhaps you could explain what you mean by the other way? Why would you ever need to recreate the base image? Perhaps if the container repo dropped off the face of the earth and had to be created from scratch?

The industry seems to have adopted “hermetic” as a word that describes truly reproducible builds, while “reproducible” has a lower standard. In many cases, it seems to be used to mean “not dependent on local build environment”.

Yea - I get where you're coming from - you can't rebuild the container, but each team member can reuse the same container - which is at least a step towards being fully reproducible.

We don't rebuild image every time. We store images in GitLab registry. Also, just because we use apt-get to install python3 doesn't mean build isn't reproducible. The actual toolchain and sysroot are highly version controlled and python is just used to kick off the build script so it almost doesn't really matter which version python is as long as it's backwards compatible with the build script.

We tried this in our research group, and found that issues come in when it gets to things like...

Oh no! CMake is too old a version to support a dependency we have to build in the image construction. So we better pull in a version of CMake from a PPA which is community maintained, and build it from source/etc.

If I use tools that can give me reproducible builds, like Bazel or Nix, I won't need or want to use containers for development.

I'm interested to know your alternate solution. How would you recommend getting the required libraries into your image?

I guess the OP meant such that apt/yum/dnf/whatever runs every time the image runs, rather than just once when it's built. Not that that's something I see very often, mind.

Isn’t the standard approach here then to derive from base images, which have exact versions? Being honest, I’m not a Docker/VM legend but I’ve seen a few attempts at managing this, and base images was one of them.

Personally, I don’t see the issue with it if you’re at least being a little careful— don’t make obvious mistakes like installing latest/nightly packages automatically, etc.

Yes, that's the standard approach, but the base images are frequently updated. If you really want to pin at a specific image, you need to specify the image hash, rather than using the "latest", or even a version tag (e.g. "2.1").

In your image that extends from the base image, you'll typically update the package repo cache (it is typically cleared after building the base image, to reduce the size), then install whatever packages you want.

Like you, I don't see a particular issue with updating system-level packages - especially from a security standpoint.

I never understood why commands like apt-get don't take a version with the dependency name.

They do. You can do `apt-get install package=version`

As far as I'm aware it's supported (something like `apt-get install virtualbox=5.0.18-dfsg-2build1 -V`). It's just not commonly used because you usually just choose a distro with the desired update granularity (whether you want to the newest version out there or a consistent version with backported security fixes, or something in between).

Keeping apt-get in our container builds is vital, since it helps keep CVEs out of our containers. We can do automated re-builds of all of our containers weekly and (typically) it lets us keep on top of the CVE game.

And by that you mean that the version of those deps aren't pinned?

You depend on what the repository serves for "apt-get". You may pin it, but it still doesn't guarantee you're going to get the same version if it was replaced on the repository without changing the version.

At least for python, just pinning the top level dependencies are not enough. if you pin tensorflow==2.4.0, it doesn’t pin its required packages, rather it just defines a range. An example would be tf will try to get wheel>=0.26 (which was released in 2015 and is currently on release 0.36.2)

Whats the correct way to get libraries into your docker container? We have plenty of apt-get / pip calls in the dockerfiles at my work (set up by someone else).

Not if you specify the versions to install, at least

I don't know what could be more streamlined than clicking "Code -> Open in Codespace" on GitHub, and Visual Studio Code renders in your browser, with all of your dotfiles set and dependencies loaded to mess around with something.

(GitPod does a similar thing where you can append the URL to gitpod.io, but they can't use the VSCode extension marketplace).

Codespaces is in private beta. I have requested access for some time now and got no response.

All the services I mentioned are fully open for everybody today.

Unless, unimaginably, you need to keep your code on-prem...

I know, I know. Unthinkable.

I haven't tested, but I think gitpod works with on-prem installations.

Yes, Gitpod is open-source and can be self-hosted: https://github.com/gitpod-io/gitpod/

Site seems down, here is the archived version http://web.archive.org/web/20210121151535/https://blog.feabh...

One slight problem I had with .devcontainer in VS Code was running the devcontainer on a remote ssh server.

Remote SSH works. Local devcontainer works. But mixing the two requires configuring the docker engine settings to point to the remote. This forces other projects to also run on the remote machine.

This was a problem as of 2 months ago.

This is what I used (combined with stackoverflow posts) to try it.

Is it not possible to set the Docker host to the remote machine under your project workspace settings only, instead of in global settings?

Looks like gitpod to me: https://github.com/gitpod-io/gitpod

Throw in Azure with k8s and MS sales department will organise a "training" for the managers on Caribbean or Ibiza.

This is the killer feature that keeps me on the proprietary version of VSCode. Makes my life so much easier!

Do you know the acronym ABM? If you're not a native German speaker, probably not. Arbeitsbeschäftigungsmaßnahme. Docker, k8s ... it's all there to employ the unemployed. Also to make something simple complicated so Cloud providers can monetize even this part and lock you in and bill you 10fold for something that really isn't that expensive

Why dont we develop on production server ???

Containers promote bad software development.

For example?

> Docker + "Moving Software Development Forward"

Oh! My sides!

This feels like a case study in how to take an anti fragile system, and just make it fragile.

As much as it is a bunch of small paper cuts to support varied developers, you are more resilient to a big shift.

Assuming wordpress/php/mysql/shitball happened

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact