Does this deprecate my vim/gcc/gdb/make (in general "CLI driven workflow") which I have assigned so much time to, to have a nice vender/IDE-independent solution for development?
I'm curious to know the answer in an honest/practically-speaking sense not ideologically.
IMO, the downside of this container/web-app solution is the memory size that all the SDKs would need and this would add up eventually. But I'm not sure this fact alone could win any hearts over that sweet ease-of-entry-to-development.
People were nagging about latency and bloat in webapps and Electron GUIs...yet here we are...even SpaceX's console is a chromium instance.
I don't think so. If fact you could just package your "CLI driven workflow" up into a docker image and now you can instantly move it between machines, peers, etc...
Furthermore you could version and manage the evolution/drift of your workflow as underlying components change/get updated.
Hey that’s cool! I do the same thing! I use the Docker image as a way to use my development environment in a pinch and additionally as verification that my giant one-click setup script works on a fresh machine.
If needed, you could probably containerize that workflow as well. Conversely, if someone shared their containerized environment with you, you don't necessarily need to use VSCode. You can still just attach to the container directly using Docker.
I have never managed to get a step through debugger working with Pycharm. I know it's possible, but things like this are why devs won't be able to "start using it in no time". Docker solves one set of problems and introduces others.
I've been doing some playing around with dev containers for a while now and am loving the experience so far. The one thing I am not super clear on however is doing end to end browser based testing in Web Apps. I didn't find a lot of good documentation on this currently. I can't tell if the "correct" solution is to also load something like headless chrome into the Dockerfile and install it manually or go down the path of trying to set things up in a different container with a docker-compose approach.
We found website testing inside a Docker container to be tough, so we still run Cypress on a native machine, but our web process is still ran within the Docker container. In CI, we use the default Cypress GitHub action which runs on the CI native system, and all of our servers are spun up with docker-compose. Source: https://github.com/NeonLaw/codebase/blob/development/.github...
A headless chrome works for testing. You can even open GUI apps (at least with X11 on Linux) from the container itself, and it will use the host's X session (chrome --no-sandbox).
I've been using a containerized Rails app with VSCode for a while now and absolutely love it.
You wouldn't happen to have an example Dockerfile you could share at all would you? The examples I have come across online so far did't look super straightforward.
Doing any sort of development with Docker on OSX can be painfully slow if you have a lot of files that change frequently. There are a few projects to improve it but they quite aren't there yet. It's my least favourite aspect of modern web work.
To make it workable I had to just give up on the local host file integration.
I basically used the container like a VM. Configured it with all the tools I normally use (e.g. OhMyZsh, etc) and had it constantly running in the background. I would use VS Code as a front end and work directly inside the container (cloning repos and pushing commits).
It had its quirks but the main benefit was that my local machine was no longer a snowflake. I could easily move to any machine, pull my "development" image, spin up the container and everything was exactly as I liked it.
Since they're lighter weight it's easier to run more of them. Think of a place with 7-8 different applications, a few different DBs to support them, redis, elastic search, etc. You can spin up a mirror of your production environment with one command. Theoretically you can do the same with VMs but it will consume a lot more resources.
Speaking for myself, containers are started and destroyed faster. When using VMs, the tendency is to keep updating the software within the VM, making changes to their state, etc: this eventually leads to drifts. When using containers, if you need to make a change, you destroy the container and re-build it, and the state is always consistent with what you (and possibly your teammates) use
To be perfectly honest, in this scenario, there isn't much of an advantage.
The container approach is lighter weight, and I found it easier to manage the configuration via Dockerfiles. Managing a full VM with the OS install is a bit of a pain.
That being said, I worked at an organization that did the VM approach using Vagrant. It wasn't as nice as the VS Code/Docker approach, but the results were similar.
Have you tried docker on Windows? It's even worse.
On OSX I only had problems with GUI running in docker. I was used to sharing X between linux host and docker container also running linux.
For some projects, the only working solution I found was to run a VNC server in docker.
Specific example: in a docker container, run a GUI built with Kivy and view the window on the OSX host. If anyone manages to do this without VNC, I'd like to know how!
> Have you tried docker on Windows? It's even worse.
I've been running Docker on Windows since Windows 10 17.09 or roughly the time WSL 1 came around. That's since October 2017.
It's been really fast and stable here and now with WSL 2 it's even better.
There hasn't been a single Flask, Rails, Phoenix or Webpack related project I've developed in the last 3+ years where I felt like Docker was slowing me down[0]. I'm using a desktop workstation with parts from 2014 too (i5 3.2ghz, 16gb of memory and a 1st gen SSD). About a month ago I made a video showing what the dev experience is like with this hardware while using Docker[1].
Code changes happen nearly instantly, live reloading works and even Webpack happily compiles down 100s of KBs of CSS and JS in ~200ms (which could further be improved by using Webpack cache).
[0]: The only exception to this is invalidating cached Docker layers when you install new dependencies. This experience kind of sucks, but fortunately this doesn't happen often since most changes are related to code changes not dependency changes.
The problem is "technically" still there in Windows it's just that Microsoft decided to push things along by creating WSL which essentially puts Linux (with Linux Containers) on Windows.
The Windows solution is the equivalent of Smart Hulk figuring out time travel.
I have been using Docker powered desktop environment for development on WSL2. Didn't have any issue except that I couldn't access the containers via their internal IPs inside WSL2.
I wonder what the stats are of the VSCode remote plugin users but I would wager a cup of water that the majority of users who go this route are dealing with Windows in some fashion. I’d never heard of such witchcraft until I moved to an organization that, for whatever reason, has deemed that Windows laptops are the mandatory choice for web development. WSL does have significant problems (and even when it works you deal with the idiosyncrasy of the magic it does to interact with the Windows environment : try installing docker solely in your wsl2 environment and you’ll find it still looks for an .exe executable or have fun figuring out why some so trivial doesn’t work like it used to because it depended on systemd which is purposely absent from operating systems you might install on WSL2). In truth using WSL2 is a trick, and can add headaches to anyone who doesn’t want to deal with the added abstraction of running another operating system that interacts with your native operating system in ways neither were originally built for. That’s not to say that WSL2 isn’t impressive. It has the ahah that parallels, codeweavers, or wine (to name a few) kind of all bring when they work. And when they don’t it’s also time to get lost in the weeds. But now, especially in its infancy, WSL2 has problems. Why wouldn’t it? It’s new to Windows offerings and relatively obscurely tested given the audience is a small fraction of what they’re up to.
This solution of VSCode + Docker containers seems to sidestep the whole WSL issue as WSL is no longer necessary for development if you’re containerizing everything anyway. While, I must admit, I like the idea of a project having the same steps for all users regardless of platform (to each his own), I don’t believe the majority of people would like to ditch their IDE of choice for the one tool that does this somewhat seamlessly. I’m probably not characterizing this well as I’m new to the workflow and I like to customize my own terminal workflow - which from what I’ve seen this doesn’t lend well to. Lemme just say it - I like the Linux cli workflow more than Windows and have since the beginning. But that’s just me. I’m sure their are plenty of peeps who feel the opposite. What I don’t like and I feel I share the same annoyance is now having to know both Windows and Linux command line interfaces.
Yes you’ve stepped into a rant, gotcha!
Here’s to hoping Microsoft goes full retard and strips out Windows and just maybe go the Edge route: shifting to putting a pretty face on Linux. That would be awesome, I’d buy Microsoft’s distro, frickin base it off Debian and let’s get this show going! Apple ain’t really doing anything special at this point - so your move Satya!
npm / webpack: That's not Docker's fault - that's node/JS's fault. Having to touch thousands and thousands of files is expensive. This is why I've come to loath web frontend development.
Well, we don't have to do things this way just because everybody else does... developing "vanilla" Javascript is actually a pleasure especially with tools like Chrome's debugger. In my experience, front-end "frameworks" like Angular and React provide very little value and tons of unnecessary overhead.
Same problem with PHP development, anyone trying to develop on windows for any real world framework will have epic IO lags, sometimes crashes with composer (equivalent to npm) and ALSO during execution time.
Is this composer/PHP/Symfony-Laravel-whatever's fault ? Is it the caching lib fault ?
I work on linux, but try to onboard windows dev frequently ("windows dev", ikr ;) . The experience is always painful, and despite many efforts from the WSL team to move forward, docker for development on Windows is still barely usable right now ...
We also use remote dev boxes, but can VSCode connect to a remote server and connect to remote docker containers?
For example, I don't have ruby installed on the remote dev box, but it is installed inside docker on the remote host. I also don't have ruby or docker running locally.
I think all the linting plugins either expect ruby to be available on the remote host, or inside docker, but not this combo... Is there something I'm missing? (disclaimer: only played a bit with VScode, I use vim on the remote box usually).
Thank you. I managed to get it working... still I'm not sure about things like ruby-rubocop ... I can install it remotely, but not inside a remote docker container (where my ruby runs)... Any tips?
Originally I did this because of bad performance and bugs in WSL 1. I hear WSL 2 is better but I already have it all set up and it works great for me so I've just kept it.
There's a volume feature that sped up disk access to mounted volumes quite a bit. You add ":cached" to any volume. This changes some of the guarantees around consistency, but I haven't had any issues. See https://www.docker.com/blog/user-guided-caching-in-docker-fo...
A little bit off-topic, but I hope it's relevant enough: can someone who's well-versed in Docker give me some pointers as to how I can use it in a better way? Let me elaborate.
I'm a bit of an old fart when it comes to software development. I prefer stable, slowly evolving solutions. I am a fan of the role of classical distributions. I abhor bundling every piece of software with all its particular versioned dependencies until everything works. I'm not gonna change. And that means I'm probably not the type to use Docker to deploy anything. That being said, I do see great value in it as a way for sysadmins to let semi-trusted people run their own OS on shared hardware without stepping on each other's feet. I love that it lets all of us run each our OS of choice on our compute machines at work. But that's just it: when I use Docker, I pretend it's a VM. I really would like to learn to use it in a better and more appropriate way, but whenever I try to seek out information, quality search results are absolutely covered in garbage 10-second-attention span "just type this until it works" blogposts.
Any pointers?
(Alternative question: are there some cleaner solutions than Docker out there for the workflow I describe above?)
One is using Docker as a deployment packaging method.
The other is using Docker only for development and still deploying traditionally. Sure you can do both, but it doesn't have to be this way.
This is an important distinction and one that sometimes gets lost in the back-and-forth over Docker.
I feel like a lot of the complaints levelled at Docker pertain to the packaging and deployment use case. Where Docker really shines - even for small teams or solo devs - is as a development tool.
Anti-pattern 1 perfectly describes how I am (hamfistedly) using Docker. Thanks! However, it doesn't really explain how to fix my mindset. It just says that I should ;-)
"There is no easy fix for this anti-pattern other than reading about the nature of containers, their building blocks, and their history (going all the way back to the venerable chroot)."
You have an increasingly rarefied perspective, and you should hang onto it.
Docker is a treadmill. Docker leads to Docker Compose leads to Kubernetes leads to whatever. It's a lot of noise and motion; you will increasingly encounter engineers who grew up on this stuff and assume it as a prerequisite, and are eager to climb the treadmill, thinking it's a ladder. You know about other options, and can decide when to stop.
Yeah I appreciate this. And since I'm a researcher rather than an engineer I'm also not forced to get on the threadmill. But like I said, I do see some value in using Docker as a tool to let a bunch of researchers run wild as semi-trusted superusers (by "semi-trusted" I mean we can be trusted not to purposefully do evil, but not trusted not to accidentlaly step on each others' fet). So I'd still like to learn a bit :-)
Try thinking of it as `git` but for an entire OS instead of merely a filesystem, and that you have multiple branches of your git tree opened simultaneously in different locations.
Or think of it as a set of databases that include the combination of the current state and the code ("migrations") to achieve that state, while allowing those databases to share the same history.
In other words, containers are a solution to state problems, not only "works on my machine" problems. The benefits are reproducibility and shared resources. It is "functional OS" as in "functional programming," a pipe of common operations to apply to inputs to generate a consistent output, and which can be forked anywhere in the history/pipeline long as the pipeline does not hide held state.
Coming back to the `git` analogy, a traditional VM with Snapshots is like saving ProjectV1, ProjectV2, ProjectV3, ProjectV2_fixed, ProjectV2_fixed_final, while a container solution is saving only the history and places where histories diverge.
To answer your alternative question, nix package manager (which can also be run as a standalone OS, NixOS) is an interesting alternative solution from Docker. Reading its documentation may also help in appreciating the alternative set of perspectives.
Working with a team on a project with a moderately complex build process (say python and node), it soon becomes a painful to onboard new team members and get their dev environments set up.
Even assuming they have the same version of macOS, homebrew constantly evolves so getting everyone developing with the same version of dependencies as you use in prod is super painful. If they’re on Linux they likely don’t run the exact same distribution as prod.
Even if all that is the same, maybe they have to work on multiple projects simultaneously with different dependencies.
Python and node are probably the easiest to get setup. With Python you have venvs which do a lot of the heavy lifting required.
The issue becomes when you are on a non Linux environment for development and deploy to Linux in containers. If you build your code locally then you are debugging $DESKTOP issues which might not be the same as Linux.
Also languange environments with poor dependency management (e.g. C, C++) benefit from having an installable system.
> Even assuming they have the same version of macOS, homebrew constantly evolves so getting everyone developing with the same version of dependencies as you use in prod is super painful.
If someone is using homebrew for their development dependencies, unless they are targeting a release to homebrew, kindly ask them to stop doing this.
On-boarding really does seem to be the major pull. I've been developing along side docker containers for years now and I just don't see that changing.
I do see value in it, though, after the comments and after doing some additional reading. I could see myself pushing this direction if the teams I worked on had higher churn rates or more frequent new hires. Also, I think that it lends itself well to certain tech stacks / languages / ecosystems than others.
There are many advantages of containerized development. One example would be the protection of root environment from version pollution. My team uses and supports 3 versions of a framework, how do I test and develop in all of them without one environment affecting the other?
Of course, you could use a version management tool like nvm/rvm/asdf etc, but I think containerisation is also a neat alternative, as you would be able to use multiple versions for languages/tools and libraries within those languages for which a version management tool doesn't exist.
Let's say you maintain or occasionally code several big webapps in Ruby. One uses Ruby 2.4 and Ruby-on-Rails 3.2, another uses Ruby 2.6 and Ruby-on-Rails 5.2, and the last one is bleeding edge using Ruby 3.0 and Rails 6.1
Having all versions of you language and framework installed top level is a huge pain in the ass, since they inevitably will interfere with each other. Having separate containers with all necessary dependencies in it for each app is a lot more manageable.
Yeah. I have over a dozen Ruby on Rails apps on my development machine, targeting a variety of Ruby and RoR (and other common gem) versions. I'm not aware of any problems anyone has experienced in recent years, since everyone has moved to rbenv. We used to see issues with rvm.
I don't use docker with any of these projects. They're mostly legacy for us at this point and are shipping to EC2 instances directly.
Our more current projects do use Docker, however, and we're doing development along side of Docker in those instances and that seems to be working fine for us.
I do appreciate that a dev container would / may be a better approach to this for other reasons and especially, potentially, other languages and ecosystems, though.
I used to work at a company where I'd write and maintain multiple microservices, so I was dealing with dozens of small and big ruby applications across different language versions and different frameworks and dependencies. It was two years ago, but I did experience a ton of issues between different versions of Bundler and rake and rubygems. I'd absolutely go full docker if I was still working there.
My team uses macOS, windows and Linux. Dev Containers not only allow us to have our dev envs and prod environment as close as possible, as well as making all our dev envs identical, it makes working cross platform really straightforward.
If you work in enterprise 80% of developers have to rock windows laptops for compliance reasons but develop applications that will run on redhat/centos in the end.
If your project is targeting an environment you don't control, and doesn't behave exactly like an environment you have exclusive access to, your code won't run the same way in the production environment as it does on your machine. There are different strategies for dealing with this, but Docker seems to be the sweet spot for making it easier. You use Docker to create a container that mimics the production target, and then use that container to develop in.
You could reconfigure your laptop to look exactly like the production target, but then you have to keep doing that every time you change projects.
Before Docker, I used VMs for this, but VMs have certain disadvantages that Docker addresses. Like size, and documentability. Every time someone wanted me to look at a project, we had to figure out a way to transfer and store a copy of a 20+ GB VM. And they couldn't tell me everything they did to create that VM from scratch, because VMware doesn't do that and neither does Hyper-V. With Docker it is just a small text file that describes everything it takes to create what was previously a massive, undocumented VM image. It forces you to document how to create the environment, and it saves on the space and time of transferring VMs around.
I use Windows 10 as my daily driver, and often work on projects using Postgres, RabbitMQ, Redis, nginx and others.
Until Docker came along, it was a royal PITA. I always dreaded getting a new laptop or something breaking, as it took forever to set everything back up again, and it was never quite the same.
Docker changed all that. It forces you to configure everything in a reproducible way in a Dockerfile - and it's much simpler than trying to come up with scripts to install and configure everything in Windows, and I'd say it's also quicker than trying to come up with scripts for a Linux VM, just because you can spin containers up and down so quickly.
What exactly is diffucult to understand about the value proposition? Seems pretty obvious to me...
If you mean why use dev environments in containers:
1) reproducibility,
2) re-use of container creation scripts for different environments,
3) isolation from your actual OS,
4) ability to run the same OS/libs/etc as the final deployment,
5) tons of base images with different environments already configured - from LAMP to data science,
6) easy sharing with others, team
7) ability to work with 1-2-5 or 100 different environments, with different OSes, versions, libs, python versions, whatever.
As for why have your IDE/editor work from inside a container (as described in the article).
Well, because you get all the benefits of containers (as above) PLUS get to use the editor as if you were programming directly on the target machine (including having visibility of installed libraries for autocomplete, running code directly there, and so on).
It's the same thing people have been doing with running Emacs in "server" mode inside another host, and programming with Emacs client on their machines as if they were locally at the machine.
For me, this is the major benefit. I don't have to worry about installing new tools or libraries and how they interact with my primary OS. While this isn't a huge issue for many, I don't want to have to worry about how Go, Python, Java, etc... are installed on my Mac. I like being able to pull in a Docker container with everything already setup (or a customized one). Then when I throw away a project, I don't have orphaned installations on my Mac.
All of those benefits were solved problems with VM based workflows using something like Vagrant long before Containers started gaining traction (IMO).
Don't get me wrong, I am a fan of containers (in particular LXC), but I wouldn't list those benefits as if they are unique or novel to container based workflows.
Edit: To be clear, there _are_ benefits to containers over VMs, just not the things you listed above from my perspective.
Overhead... it's all overhead. Starting and connecting to a VM isn't nearly as streamlined as connecting to a Docker container (even if it is also running in a VM).
Even the hurdle of SSH'ing to a VM is more cumbersome than `docker run`.
Certainly this could be automated and scripted, but the Docker solution is so... streamlined.
And I say this as someone that used to use Vagrant. With smaller installs available (such as Alpine), maybe more modern VMs would be just as easy as containers...
Also -- for me, VirtualBox was just "meh". It worked, but really wasn't that great. It always seemed like it took too many resources to run. That was another issue with Vagrant. (And yes, I did also use Vagrant with VMWare, but again -- that's a lot of overhead).
I can't name a more dysfunctional hypervisor than VBox. If my choices were a Virtual Box VM or superhacky Docker-as-VM, I'd pick Docker every day.
But I'd say VMs and containers solve different problems. In the case of dev environments, VMs are too "persistent" and accumulate personal cruft very quickly. Container tooling can be built to be noninteractive.
Yes, a lot of containers start in less than 1 second. With SSDs it's not as bad as it used to be for VMs, but it's still a lot longer than a container.
I also find the workflow of creating Dockerfiles to be much smoother than cobbling together scripts for a VM.
Plus Vagrant was a real PITA to get working on Windows (at least it used to be - I think I eventually gave up trying to get something running on Windows 7).
Schlepping around and running a whole VBox/Vagrant VM is heavy enough that I will be motivated to see if I can’t get it working under MacOS. Running a container is less offensive.
Is not docker server running inside a VM though ?
You could also run Xserver (available on both Mac and Windows) and ssh -X into a VM to run GUI apps inside the VM.
"With the latest version of Windows 10 (or 10 Server) and the beta of Docker for Windows, there's native Linux Container support on Windows. That means there's no Virtual Machine or Hyper-V involved (unless you want), so Linux Containers run on Windows itself using Windows 10's built in container support".
In any case, as Linux and macOS prove, there's no need for docker to have to run on a VM. And it seems there's no need on Windows either since 10.
>All of those benefits were solved problems with VM based workflows using something like Vagrant long before Containers started gaining traction (IMO).
Well, VMs are like overweight containers. Containers make "all of those benefits" easier, more performant, and more ligherweight.
> Super easy for new team members to get started on a project. No need to manually install dependencies.
I see this brought up a lot as an argument. So why do we want this? How often do people switch companies? Once every 3 years on average or something? Getting your development env setup takes what, a few hours max on 3 years?
It depends on how large your organization is and what specific quirks need to be configured to do your work.
Its not unusual for large organizations to have internally hosted registries (Artifactory), source control and network proxies. This usually requires setting up different config files (.npmrc for Node.js/NPM), installation of custom root certificates, ssh keys, etc. None of that includes project/team specific configurations and workflows.
Take all that and multiply it by thousands of developers and you have a recipe for an endless stream of Slack chats, email chains, and Teams messages repeating the same config questions and answers.
If you can reduce all that down to a single docker pull, while making sure everyone's development environment is consistent, it can be a big win.
* You want means to keep all of the dev environments in sync so you don't get "works on my machine" problems.
* If you update something, then you need a way for everyone to have their environment reconfigured.
* As the number of projects/stacks/developers scales this becomes a bigger and bigger issue.
I've used some Anisble in combination with a shell script wrapper to handle some of this kind of stuff. Even still, it takes a lot of hands on support to make it all work. So, if you can get something like this to scale, it might be a big win ... if...
In docker world, I create an MR that updates the dev container and deploy container dockerfiles at the same time, check that it runs tests, and merge it in. I push a new version of the dev dockerfile, and have the .vscode/devcontainer.json reference that new tag. Next time all the devs open up this repo, they'll get notified they need an update. You just updated a dev dependency across the whole group in a source-controlled way.
Fair point. It still comes up only a few times in a year. Most of the time I work with code bases that have most dependencies defined in some sort of package file (Gemfile, package.json, etc)
Updating a postgres version comes to mind as one of the possible differences and that usually only is an issue when working with pg_dump and pg_restore with different versions.
Good point nonetheless. I am not sure whether it is worth the work to maintain dev containers and the performance hit you get vs running a database directly for example.
Just yesterday, I ran into an issue where a set of node unit tests were failing. My college and I were both getting failures, but different failures. The reason was: Different versions of Chrome, and thus different versions of the chrome integration plugin.
Given that we have effectively no control over Chrome's auto-updates, we'll never have truly identical development environments. A container with headless chrome would have resolved this for us.
It's purely a test fixture setup that's failing because of a need for lockstep Chrome and the Chrome test fixture versions in Node. So, the code isn't broken for customers, just the local testing story.
It depends on your context. I've worked places that onboard new team members every few years, and I've worked places that onboard new team members every few weeks.
On larger teams, setting up a consistent environment like that also makes it easier for developers to collaborate. I've had experiences where attempts to pair program or share utility scripts generally stumbles and fails due to everyone's environment being a special snowflake.
I like the fact that I can switch to/from different computers quickly. I have a desktop and a laptop that I routinely use. If I change something in a devcontainer[1], then I can move between the two easily. This actually happened to me earlier in the week.
[1] syncing the changes is left as an exercise to the reader, but I use a common git repository.
Not how often people switch companies but how often people work on different projects. Someone working on multiple projects can waste a lot of time keeping up with the environment of each project.
> I see this brought up a lot as an argument. So why do we want this? How often do people switch companies?
It's not about you. The people coming in generally need confirmation and help when setting up their environments. Someone there would have to take time out of their day to help you. A few hours, a few days, a few weeks, a few deleted companies (https://news.ycombinator.com/item?id=11496947).
Someone else raised the same point and I responded a bit differently, but I don't really have any experience with lots of these changes that really matter in many projects. Sure sometimes a version of imagemagick is fucky, but that happens so infrequently that I don't think it justifies running everything in docker.
I think you're really sleeping on the benefits of being able to spin up and change an environment easily.
Its easy to load a container with some stored test state. Its easy to load a completely fresh environment. Its easy to run multiple instances of things (with docker compose).
Its easy to totally shutdown the environment.
Its easy to work on different branches with different/conflicting dependencies and juggle containers.
If you're in a polyglot shop there are HUGE productivity gains in not needing to setup your environment manually, or worse, risk that vital information about it is distributed as tribal knowledge.
Plus, if your project has external dependencies like DBs, S3, etc... you can use docker-compose with VS Code as well.
* As all IDE operations solely run within the local Docker container, all developers can expect that their IDE will work the same without manual configuration steps.
* We can easily support local development in all three major OSes (MacOS, Windows, Linux) and even support developing directly in your Browser through GitHub Codespaces.
* Developing directly inside a Docker container guarantees that you use the very same toolset, which our CI will use to build these images. There are no more excuses why your code builds differently locally versus in our CI.
yep, currently it forces our engineers to use VS Code or a terminal-based setup (or GitHub Codespaces). Hopefully Goland / IntelliJ catches up in this year...
If you have projects with different versions (python2/3, Java5/8, node8/11 etc) you don't need to bother with version managers and package managers anymore.
Just launch the respective container and you are good to go.
Because it is a Python specific thing. Docker works with all programming languages and that's it.
If you are full stack or work with multiple programming languages there is no need to learn the "equivalent" of virtualenv everywhere else.
Also with docker there is no setup/installation involved. You just pull the image and that is it.
Also virtualenv requires that you already have pip/python installed. Docker requires nothing (apart from itself). So you can instantly launch Java/Node/Erlang/Haskell whatever without any SDK/libs installed.
Second this. Developing in Docker completely liberated me from the caprice of Python packaging and Node dependency management and made doing development work on multiple platforms (MacOS, Ubuntu) frictionless.
There’s a “pets vs cattle” angle here, too. Something goes wrong, just pave over it and start again.
So, speaking as a data scientist, the dependencies run deeper than Python kernels and libraries. For something like TensorFlow, you need to account for specific versions of Nvidia drivers and CUDA/CUDnn. Using something like the Nvidia container runtime for model development allows me to essentially version my entire development environment.
Because projects are seldom as simple as that, potentially having lots of other kinds of dependencies (a directory structure, system libraries/command, symlinks, other services running, etc).
And let's be honest, virtualenvs and all the various ways they're managed and updated and such aren't really bulletproof either.
It's really, really refreshing how much "yeah I managed to break my dev environment" or "I followed the wiki for how to start developing your project but it 'didn't work'" can be avoided if it's just "run docker(-compose)?".
And this is especially true for junior developers, who are probably fresh out of college and won't be familiar with lots of the tooling that exists in the world. Not that docker is a simple tool, but it can hide so much complexity that it is easier to just show people how to docker run and docker build and such.
You could, but then maybe you have some editor-specific settings you'd also want to tie to particular project.
And if you're working with multiple projects in multiple languages, why bother learning each language's equivalent of virtualenv (assuming it has any), when there's an universal method available?
The TL;DR is there's a lot of things to set up yourself without Docker in order to run a typical web application and it's different depending on what OS / version you use. Some of these things are unrelated to Python too, such as running PostgreSQL, Redis, etc. but these are very important to your overall application.
Docker unifies that entire set up and the barrier of entry is installing Docker once and then learning a bit about it.
Since some of the proported features of containers is portability and consistentency/reproducibility between environments, the idea here is that you want to make sure your develop environment is as close to the deployment environment as possible. In addition, if you want to do prototype development, quickly try and test new ideas, and iterate on that, it's nice to go through that without rebuilding and deploying the container to a test/production environment. Do all of that right where you do your development.
At the agency where I work we have about 40-50 somewhat active projects at any time. Any given week I work with maybe 3 to 10 of them.
It saves so much time to just pull down the repo, run "docker-compose up" and have everything running, almost exactly the way it's running in production. With the right node or php version, databases, Elasticsearch, Redis etc.
The problem I usually run into with this sort of arrangement is the database. Every non-newbie developer is very aware of using source control with their code. A lot of developers are careful about managing the dependencies for that code as well. But for the database, you have a second asset that often needs to be synchronised with the code, and that means both schema and possibly records as well.
Just deploying changes affecting both assets carefully in production can be quite awkward relative to a code-only deployment. You might need to do this in several stages, updating your code to allow for but not require the DB changes, then updating the DB schema, then maybe updating existing DB records, then at least one more round of code updates so everything is running on the new version of the DB. And you might need to make sure no-one else on your team is doing anything conflicting in between.
Doing the same in a staging environment isn't so bad because you're running essentially the same process. However, for a development environment where you want the shortest possible feedback loops for efficiency, you need to be able to spin up a DB with the correct schema, and possibly also with some preloaded data that may be wholly, partially or not at all related to controlled data you use to initialise parts of your database in production or staging environments.
It is not always an easy task to keep the code you use to access the database, the current schema in the database, and any pre-configured data to be installed in your database all in sync, and to ensure that your production, staging/testing/CI facilities, and local developer test environment are also synchronised where they should be.
There's also mutual deployments which suck really bad i.e. deployment A needs to go out with deployment B; but if you haven't set up your CI with that facility (and, let's face it, the easiest way to set up a CI with multi-repos is on a per-repo basis), it can get really hairy.
I agree that the database (and things like uploaded images) can be a challenge. For WordPress we usually sync down from production, to staging, to testing, to local (while filtering out PII). For other projects is't usually easier using migrations and seeding.
The challenge is how you keep your migrations, source code and any seed data synchronised, but only where they should be. There is often a need for any separate migration process/scripts to be synchronised with corresponding changes to the DB access code in the main application, so the models in the application code always match the actual schema in the database. For seed data, there may be some "real" data that should always be present but also some "example" data you want to include (and reset to a known state) for each test in an integration test suite. Etc.
It kind of amazes me that there doesn't yet seem to be a way of handling this in the web development community that has become a de facto standard in the way most of us look at tools like Docker or Git these days.
It depends on what the environment is for. Your company work, or individual projects?
If you're an individual, then it would benefit someone who has multiple devices and/or multiple operating systems and doesn't want to manage their environment across all those devices and operating systems. For example, I personally have OSX, Windows, and several different Linux distributions on my laptop itself. My desktop also runs several operating systems.
Managing software across all of those is a pain. With Docker, I only have to manage the containers, and just have Docker installed on all the operating systems. Instead of managing like 50 different dependencies across 7 systems (think 7x50), I only have to manage Docker inside each system.
If you're working for a company, they will have their own dev environment. Instead of setting up and troubleshooting all their dependencies on your computer, you can just use their containers.
So, you can extend your bio-break waiting for your build to finish as Docker furiously crunches those files, raises your laptop temperature and makes those fans spin mightily.
While I love that VSCode can do this and I would like to appreciate a post about the topic, I wonder if I should upvote a blog post that adds almost no value over the existing VSCode documentation which is basically excellent already and covers every use case I ever stumbled upon.
To be truly portable it is worth considering whether your newly created Dev environment can work without the internet or in an air-gapped environment.
Vscode can be problematic in that respect. Typically with dependencies used by an extension often assuming they can reach out to other servers.
I look at initial internet facing container creation as a separately managed snapshot process to grab dependencies which then gets configured for particular Dev, build, test, runtime and release containers that are built by the dependency collecting original container.
Ie something like vscode isn't installed in the internet facing container, it is installed in the offline build of a Dev container. This is where the difficulties lay in my approach.
As amazing and convenient as Docker is in practice, containers hide the inherent mess that is modern computing, and the more they are used, the less chance is that this mess is getting cleaned up... ever. Ultimately this is another dependency, complexity hidden by another layer...
I feel like calling modern computing a mess is harsh. Modern computing encompasses a plethora of applications, to a point where we can model most if not all workflows. It's the product of millions of humans working together for half a century in a very large graph with little connection between each clusters.
Nothing at that scale is a "mess". It's simply what was created by our collective distributed system of humans and we should appreciate that it's not really a problem that can be solved instead of talking about it as if we could "fix" it.
I agree. While I understand the practicality of Docker and facilitating development using it, I wouldn't call it "forward" progress.
The fundamental problem, as you say, is that our dependency ecosystems don't meet our requirements. Docker is one way to avoid the problem without fixing it since it's easier. Forward progress would be to fix the problem.
On one hand, Docker removes some pressure to fix the problem and encourages perpetuating it. On the other hand, maybe it gets people to think about the problem more. I don't know which influence is stronger.
> containers hide the inherent mess that is modern computing, and the more they are used, the less chance is that this mess is getting cleaned up... ever
If you have a roadmap for how 150 or fewer engineers can "Clean up the inherent mess that is modern computing" in less than 5 years, then I'd be eager to read it. In the meantime, tools which enable people to manage the symptoms of that mess are good.
The Chunnel lets us work around the fact that the ocean has not yet been boiled away.
Understand that people quite reasonably feel this way but I personally don’t.
You’ve got to pick your battles. If you’re, for example, a front-end dev working right up at the top of the stack, then delivering value to your clients means getting them their marketing webpage, CRUD app, what-have-you. To do that you have to abstract away a vertiginous amount of stuff under you, all the way down the stack. We’re all standing on the shoulders of giants.
Docker is an amazing tool for just this sort of thing.
The great mistake happened way back in the 1980s (maybe earlier) when most OS developers didn't implement a proper permissions system for executables. Basically, the user should always be prompted to allow a program read/write access to the network, the filesystem and other external resources.
Had we had this, then executables could have been marked "pure" functional when they did't have dependencies and didn't require access to a config file. On top of that, we could have used the refcount technique from Apple's Time Machine or ZFS to have a single canonical copy of any file on the drive (based on the hash of its contents), so that each executable could see its own local copy of libraries rather than descending into dependency hell by having to manage multiple library versions sharing the same directories.
Then, a high-level access granting system should have been developed with blanket rules for executables that have been vetted by someone. Note that much of this has happened in recent years with MacOS (tragically tied to the App Store rather than an open system of trust).
There is nothing in any of this that seems particularly challenging. But it would have required the big OS developers to come on board at a time when they went out of their way to impose incompatibility by doing things like: using opposite slashes and backslashes, refusing to implement a built-in scripting language like Javascript, or even providing cross-platform socket libraries, etc.
The only parts I admire about Docker are that they kinda sorta got everything working on Mac, Windows and Linux, and had the insight that each line of a Dockerfile can be treated like layers in an installer. The actual implementation (not abstracting network and volume modes enough so there's only one performant one, having a lot of idiosyncrasies between docker and docker-compose, etc) still leave me often reaching for the documentation and coming up short.
That said, Docker is great and I think it was possibly the major breakthrough of the 2010s. And I do love how its way of opening ports makes a mockery of all other port mapping software.
I'm not sure I quite agree with that. Having a controlled environment in a sandbox of its own clearly has benefits, both for consistency of what you're running and for safety if it doesn't work as you expected. It doesn't need to be Docker specifically that we use to create such an environment, but if not Docker then we'd surely have looked for some other way to achieve the same result.
Docker is literally the specification of all the missing parts of the operating system. It's not a very good specification, but it is fairly comprehensive.
Slightly off-topic, it really irks me when people take about reproducible builds/environments while using docker, while having something along the lines of apt-get update/pip install in their dockerfile.
Like that completely destroys the reproducibility of your container!
I can't speak for others, but, while we don't pin versions in the Dockerfiles for our build container images, we do version the images themselves.
Reproducibility is a continuum, not a binary. We've chosen a point on that continuum that we believe gives us the best trade-off between reliability and maintenance effort. 100% from-the-ground-up reproducibility would be ideal, of course, but there's a cost-benefit tradeoff, and we're not being paid to be perfectionists.
I mean in a sense, the fucked up behavior of system package managers relative to the actual concerns of building and distributing software to the masses is the reason we need Docker to sandbox environments in the first place. It's 2021, there is next to no reason why the default behavior is installation to /usr/lib with shared objects that are rarely shared, with global access when installed for one application used by one user, and linked dynamically when the object never changes over the lifetime of the running application.
For all its warts this is something that npm gets right. Package management is a tool for software development, not software distribution to end users.
I don't appreciate the condescension, is it not fairly well known that shared libraries aren't typically shared at all? You can distribute software that is reusable and packaged for many machines (the ultimate sense of "shared" in my opinion) but that's difficult when it relies on distribution-specific nuances and packages.
Do you really enjoy having separate packages for the same software for each flavor of package manager that does the same thing? Wouldn't it be nice if we didn't have to use containerization to distribute the same software to machines running a kernel with a stable ABI?
Out of 958 libs, 230 of mine on my desktop are only used by one application, according to the results after running the stuff from the link. I'm not disagreeing with you about polluting the global namespace, but at least in my case, it's certainly not a majority.
That's not my point - dependency resolution is a build time requirement, not an install-time one. Requiring the user's machine to resolve dependencies to install your software is fragile, bug prone, and harms the user experience.
Like it or not, users don't care if your software has interchangeable parts. They care if it runs on their system. The only sane way to guarantee a piece of software runs outside your developer machine is to include its dependencies during distribution and packaging (not refer to them - which is what package managers require). The less sane way is to use containers, but those are required when developers don't package their software sanely.
This doesn't preclude users from installing software or replacing interchangeable components should developers support it. What it prevents is disgusting bugs and workarounds because dev A built on distro B while user C wants to use it on distro D but the packages have to be separately for everyone because the distro package managers don't agree with each other on what things are named or how they should be built.
No thanks. Apps like Firefox should use the libraries - windows, menus, etc. - installed on my system. Please don't ship your own libraries for this stuff.
Containers are an even more insane approach, so maybe we are in violent agreement.
Not really. They support dumping the package contents to an arbitrary directory, but you can't actually run the software from there without either the software having been written to support arbitrary paths (almost none is) or using namespacing and chroot to build it a sandbox wherein its baked-in paths actually work.
Contrast that with something like RiscOS AppDirs, classic Mac applications, or Next/Mac Application Bundles.
You very much can run the software from those directories, what are you talking about? Those paths are handled by the OS itself as well through LD_PRELOAD and PATH.
If what you're saying were true, unmodified software wouldn't work in a Docker container, either.
EDIT: Here's the exact command to do what you're saying isn't possible: apt-get download package; dpkg -i --force-not-root --root=$HOME package.deb
I was going to give you the benefit of the doubt and actually try this so I could show you that you are wrong, but I couldn't even get that far because dpkg always complains it is unable to access the dpkg status area. So clearly this is not as trivial as you make it out to be. I suspect because it expects a full filesystem in $HOME with its status file in the appropriate place. In other words, it is expecting a whole separate installation to be under $HOME.
Regardless, lets assume it did work. Here's what it would do: unpack the package replacing '/' with '$HOME' in the destination paths. That's it. That software will not magically be able to find its associated libraries and configurations without the user mucking with environment variables at best, or chrooting or sandboxing such that $HOME appears to it to be a wholly separate installation.
That's not how sane systems do this sort of thing. I have been trying to do this sort of thing in Linux for pretty much as long as I have been using Linux because I loathe the way Linux installs software, and in 20 years it has never been straight forward. AppImage is a close as we get and software needs to be carefully built and packaged for that.
> If what you're saying were true, unmodified software wouldn't work in a Docker container, either.
> [...] or using namespacing and chroot to build it a sandbox wherein its baked-in paths actually work.
Point taken, though there's no reason the package manager couldn't change the environment variables based on the directory it's told to use for the install. The pieces are there, but nobody's really taken the time to put them together. For the filesystem, some simple bind mounts would probably suffice, though you'd still run into some permissions issues I imagine.
EDIT: It seems to be significantly easier on dnf, to the point that it could be trivial to add full support for home-directory installs.
"support" and "default" are very different things. Especially with package managers like homebrew, which comes with a loud warning that non-default (root) paths may not be supported by dependencies.
It also doesn't really, since this is a system level issue. Applications need to package their dependencies, not the other way around. Dependencies form graphs, not flat lists. The existence of a global cache of libraries shared by all programs is a total inversion of requirements.
For instance, we create base images configured with SDKs, libraries, frameworks, configurations, binaries, etc...
Those base images are then built, versioned, tagged and then pushed to our container repos ready to be used by developers, CI/CD, etc...
Images based on these base images never need an apt-get, pip install, etc... If there is a dependency missing, updated needed, etc... we'll create a new base image with it, following the steps above.
This is what I think is the most practical approach. Pinning down ALL your dependencies to the exact version is much harder than it sounds. What you’re describing sound like the way Jib [1] does it. The pictures in this [2] blog post help visualize it.
The reason I like the approach you describe is because it keeps things simpler at the start of a project and consistent across most projects.
I also think it makes sense to have those support containers build on a schedule. For example, you build your build/CI container weekly and that’s the CI container for the week. On demand project builds use that CI container which has all dependencies, etc. baked in.
It would be nice if CI systems would let me explicitly tag builds as (non)reproducible.
That solves the problem one way, but I don't know if I would call it reproducible. You're working around the fact that it's not reproducible by only doing it once. You don't get the benefits, e.g. everyone who uses your images has to trust that you built them the right way.
I guess what I mean is that everything built using this base image should be reproducible. There is no reason (hopefully) to reproduce the base image. Any changes to the base requirements (apt-get, pip, etc) requires a whole new build and results in an entirely different artifact.
And just to be clear, I'm not building (no human) the base image. The base image is also created within it's own build pipeline that has all of the necessary things to track its materialization and lineage. Logs, manifests, etc...
Once the image has been thoroughly tested and verified (both by humans and verification scripts) each time a change is merged, the git repo is tagged, docker image is built and tagged and then pushed to the container repo.
Perhaps you could explain what you mean by the other way? Why would you ever need to recreate the base image? Perhaps if the container repo dropped off the face of the earth and had to be created from scratch?
The industry seems to have adopted “hermetic” as a word that describes truly reproducible builds, while “reproducible” has a lower standard. In many cases, it seems to be used to mean “not dependent on local build environment”.
Yea - I get where you're coming from - you can't rebuild the container, but each team member can reuse the same container - which is at least a step towards being fully reproducible.
We don't rebuild image every time. We store images in GitLab registry. Also, just because we use apt-get to install python3 doesn't mean build isn't reproducible. The actual toolchain and sysroot are highly version controlled and python is just used to kick off the build script so it almost doesn't really matter which version python is as long as it's backwards compatible with the build script.
We tried this in our research group, and found that issues come in when it gets to things like...
Oh no! CMake is too old a version to support a dependency we have to build in the image construction. So we better pull in a version of CMake from a PPA which is community maintained, and build it from source/etc.
I guess the OP meant such that apt/yum/dnf/whatever runs every time the image runs, rather than just once when it's built. Not that that's something I see very often, mind.
Isn’t the standard approach here then to derive from base images, which have exact versions? Being honest, I’m not a Docker/VM legend but I’ve seen a few attempts at managing this, and base images was one of them.
Personally, I don’t see the issue with it if you’re at least being a little careful— don’t make obvious mistakes like installing latest/nightly packages automatically, etc.
Yes, that's the standard approach, but the base images are frequently updated. If you really want to pin at a specific image, you need to specify the image hash, rather than using the "latest", or even a version tag (e.g. "2.1").
In your image that extends from the base image, you'll typically update the package repo cache (it is typically cleared after building the base image, to reduce the size), then install whatever packages you want.
Like you, I don't see a particular issue with updating system-level packages - especially from a security standpoint.
As far as I'm aware it's supported (something like `apt-get install virtualbox=5.0.18-dfsg-2build1 -V`). It's just not commonly used because you usually just choose a distro with the desired update granularity (whether you want to the newest version out there or a consistent version with backported security fixes, or something in between).
Keeping apt-get in our container builds is vital, since it helps keep CVEs out of our containers. We can do automated re-builds of all of our containers weekly and (typically) it lets us keep on top of the CVE game.
You depend on what the repository serves for "apt-get". You may pin it, but it still doesn't guarantee you're going to get the same version if it was replaced on the repository without changing the version.
At least for python, just pinning the top level dependencies are not enough. if you pin tensorflow==2.4.0, it doesn’t pin its required packages, rather it just defines a range. An example would be tf will try to get wheel>=0.26 (which was released in 2015 and is currently on release 0.36.2)
Whats the correct way to get libraries into your docker container? We have plenty of apt-get / pip calls in the dockerfiles at my work (set up by someone else).
I don't know what could be more streamlined than clicking "Code -> Open in Codespace" on GitHub, and Visual Studio Code renders in your browser, with all of your dotfiles set and dependencies loaded to mess around with something.
(GitPod does a similar thing where you can append the URL to gitpod.io, but they can't use the VSCode extension marketplace).
One slight problem I had with .devcontainer in VS Code was running the devcontainer on a remote ssh server.
Remote SSH works. Local devcontainer works. But mixing the two requires configuring the docker engine settings to point to the remote. This forces other projects to also run on the remote machine.
Do you know the acronym ABM?
If you're not a native German speaker, probably not.
Arbeitsbeschäftigungsmaßnahme.
Docker, k8s ... it's all there to employ the unemployed. Also to make something simple complicated so Cloud providers can monetize even this part and lock you in and bill you 10fold for something that really isn't that expensive
I'm curious to know the answer in an honest/practically-speaking sense not ideologically.
IMO, the downside of this container/web-app solution is the memory size that all the SDKs would need and this would add up eventually. But I'm not sure this fact alone could win any hearts over that sweet ease-of-entry-to-development.
People were nagging about latency and bloat in webapps and Electron GUIs...yet here we are...even SpaceX's console is a chromium instance.