Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Has anyone migrated off containerized infrastructure?
345 points by murkt on Aug 14, 2020 | hide | past | favorite | 358 comments
I'm constantly aggravated by various quirks of containers, and don't really remember any big problems with non-containerized infra.

A random and non-exhausting list of things that bother me from time to time:

— Must build an image before deploying and it takes time, so deploys are slow (of course we use CI to do it, it's not manual).

— If I need quick hot fix RIGHT NOW, I can't just log in, change couple of lines and restart, must go through full deploy cycle.

— Must remember that launched containers do not close when ssh breaks connection and they can easily linger for a couple of weeks.

I generally find it harder to change how things work together. It's probably possible to spend lots of effort to fix these things, but I don't remember having to do all this cruft with old school infrastructure.

Hence the question - has anyone migrated off containerized infrastructure? Are you satisfied? Or I'm misremembering things and horrible things wait for me in the old-school ways?




Disclaimer: I'm a container infrastructure consultant at Red Hat, so take all of this with a big grain of salt :-)

What you are complaining about isn't really containers (you could still pretty easily run stuff in a container and set it up/treat it like a "pet" rather than "cattle"), it's the CI/CD and immutable infrastructure best practices you are really sad about. Your complaints are totally valid: but there is another side to it.

Before adopting containers it wasn't unusual to SSH in and change a line of code on a broken server and restart. In fact that works fine while the company/team is really small. Unfortunately it becomes a disaster and huge liability when the team grows.

Additionally in regulated environments (think a bank or healthcare) one person with the ability to do that would be a huge threat. Protecting production data is paramount and if you can modify the code processing that data without a person/thing in your way then you are a massive threat to the data. I know you would never do something nefarious - neither would I. We just want to build things. But I promise you it's a matter of time until you hire somebody that does. And as a customer, I'd rather not trust my identity to be protected because "we trust Billy. He would never do that."

I pine for the old days - I really do. Things are insanely complex now and I don't like it. Unfortunately there are good reasons for the complexity.


Another way of putting this, which largely amounts to the same thing, is that containerization was developed by and for very large organizations. I have seen it used at much smaller companies, most of whom had zero need for it, and in fact it put them into a situation where they were unable to control their own infrastructure, because they had increased the complexity past the point where they could maintain it. Containerization makes deploying your first server harder, but your nth server becomes easier, for values of large n, and this totally makes sense when your organization is large enough to have a large number of servers.


I think containers are great even for really small companies. You boiled it down to `n' servers but its `n' servers times `m' services times `k` software updates. That's easier as soon as n * m * k > 2!

First of all, containers can be used with Cloud Run and many other ways to run containers without managing servers at all! (tho if you can use services like Netlify and Heroku to handle all your needs cost-effectively, you probably should).

Setting up a server with docker swarm is pretty easy, because there's basically one piece of software to install. From there on all the software to update and install is in containers.

If your software gets more complex, your server setup stays simple. Even if it doesn't get complex, being able to install software updates for the app independently of the host is great. Ie, I can go from Python 3.7 to Python 3.8 with absolutely zero fuss.

Deploying servers doesn't get more complicated with a few more containers. At some point that's not true but if you want to run, say, grafana as well, the install/maintenance of the server stays constant.

Imagine what you would do without containers... editing an ansible script and having to set up a new server just to test the setup, or horribly likely ssh'ing in and running commands one-off and having no testing, staging or reproducibility.

I vastly prefer Dockerfiles and docker-compose.yml and swarm to ansible and vagrant. There are more pre-built containers than there are ansible recipes as well. So your install/configure time for any off-the-shelf stuff can go down too.

Setting up developer laptops is also improved with Docker, though experiences vary... Run your ruby or python or node service locally if you prefer, set up a testing DB and cache in docker, and run any extra stuff in containers.

Lastly, I think CI is also incredibly worthwhile even for the smallest of companies and containers help keep the effort constant here too. The recipe is always the same.


Having used Docker and Kubernetes, and also spun up new VM's, I can say that Docker and Kubernetes are _not_ easier, if you're new at it. Spinning up a new VM on Linode or the like is easier, by far.

Now, this may sound incredible to you, because if you're accustomed to it, Docker and Kubernetes can be way easier. But, and here's the main point, there are tons of organizations for whom spinning up a new server is a once every year or two activity. That is not often enough to ever become adept at any of these tools. Plus, you probably don't even want to reproduce what you did last time, because you're replacing a server that was spun up several years ago, and things have changed.

For a typical devops, this state of affairs is hard to imagine, but it is what most of the internet is like. This isn't to say, by any means, that FAANG and anybody else who spins up servers on a regular basis shouldn't be doing this with the best tools for your needs. I'm just saying, how you experience the relative difficulty of these tasks, is not at all representative of what it's like for most organizations.

But, since these organizations are unlikely to ever hire a full-time sysadmin, you may not ever see them.


Some of us have notes, that we can mostly copy-paste to setup a server and it works well without magic and n·m·k images.

Last time I checked, docker swarm was accepting connections from anywhere (publish really publishes) and messing with the firewall making a least-privilege setup a PITA; docker was building, possibly even running containers as root; and most importantly - the developers thought docker was magically secure, nothing to handle.

How do you handle your security?


An nginx container handles redirects to HTTPS and SSL termination and talks to the other services using unpublished ports. Only 22 (sshd running on server) and 80 and 443 (published ports) are open to the world. Swarm ports open between the swarm servers. That's between AWS security groups.

I don't build on my servers. A service (in a container) makes an outgoing connection to listen to an event bus (Google PubSub) to deploy new containers from CI (Google Cloud Builder).

Config changes (ie, adding a service) are committed then I SSH in, git pull and run a script that does necessary docker stack stuff. I don't mount anything writable to the containers.


I cannot agree that "Containerization universally makes first server deployment harder". Even at single person scale, tools like Docker-Compose etc make my dev life massively simpler.

In 2020, I'd argue the opposite for most people, most of the time!

Also, if your container runtime is preinstalled in your OS as is often the case, the first run experience can be as little as a single command.


One of my favorite things is how it forces config and artifact locations to be explicit and consistent. No more "where the hell does this distro's package for this daemon store its data?" Don't care, it puts it wherever I mapped the output dir in the container, which is prominently documented because it pretty much has to be for the container to be usable.

Hell it makes managing my lame, personal basement-housed server easier, let alone anything serious. What would have been a bunch of config files plus several shell scripts and/or ansible is instead just a one-command shell script per service I run, plus the presence of any directories mapped therein (didn't bother to script creation of those, I only have like 4 or 5 services running, though some include other services as deps that I'd have had to manage manually without docker).

Example: Dockerized Samba is the only way I will configure a Samba server now, period. Dozens of lines of voodoo magic horsecrap replaced with a single arcane-but-compact-and-highly-copy-pastable argument per share. And it actually works when you try it, the first time. It's so much better.


> you could still pretty easily run stuff in a container and set it up/treat it like a "pet" rather than "cattle"

Keep in mind, though, if you've got a pet stateful "container" you can SSH into, it's not really a container any more; it's a VPS.

(Well, yes, it is technically a container. But it's not what people mean to talk about when they talk about containers.)

When $1/mo fly-by-night compute-hosting providers are selling you a "VPS", that's precisely what they're selling you: a pet stateful container you can SSH into.

And it's important to make this distinction, I think, because "a VPS" is a lot more like a virtual machine than it is like a container, in how it needs to be managed by ops. If you're doing "pet stateful containers you can SSH into", you aren't really 'doing containers' any more; the guides for managing containerized deployments and so forth won't help you any more. You're doing VMs—just VMs that happen to be optimized for CPU and memory sharing. (And if that optimization isn't something you need, you may as well throw it away and just do VMs, because then you'll gain access to all the tooling and guidance targeted explicitly at people with exactly your use-case.)


A VPS (which is usually a virtual machine) will be running on top of a hypervisor, and each VM on the host will have their own kernel. Containers on the other hand are different because the kernel is shared among every container running on the host. The separation/isolation of resources is done via kernel features rather than by a hypervisor like a VM. Adding SSH and a stateful filesystem to your container to make it long lived doesn't make it any less of a container. To me that seems like saying "my car is no longer a car because I live in it. Now it's a house (that happens to have all the same features as a car, but I don't use it that way so it's no longer a car)"

If you're defining "container" not by the technology but rather by "how it needs to be managed by ops" then we're working with completely different definitions from the start. We would first need to agree on how we define "container" before we can discuss whether you can treat one like a pet rather than cattle.


Where does an RV fit into your taxonomy?

If you have stateful containers where changes persist across restarts of the container, then I think you can't really call them containers anymore. Just like if you have VMs with read-only filesystem images generated by the CI/CD pipeline, it's not unreasonable to describe them as container-like. Once you throw in containers with a stateful filesystem or a VM with a read-only filesystem into the mix, then 'container' is no longer a good description of what's going on, and more precise terms need to be used, especially as you get into more esoteric technologies like OpenVZ/Virtuozzo, which uses kernel features, and not virtualization, to provide isolation, but it's containers are not the same as Docker's.

We could come to an agreement of the definition of container, but that wouldn't even use useful outside this thead, so maybe it's more useful to enumerate where the specific technology is and isn't important. The ops team cares about how the thing needs to be managed, and less so how it goes about achieving isolation. However, the exactly technology in use is of critical importance to the security team. (Those may be the same people.) Developers, on the third hand, ideally don't even know that containers are in use, the system is abstracted away from them so they can worry about business logic and UX matters, and not need to worry about how to upgrade the fleet to have the latest version of the openssl libraries/whatever.


Container is a thing before Docker invented. LXC/OpenVZ/Solaris Zones should be a container. We need a different term about immutable container style like Docker.


OpenVZ "VPS" offerings are, in fact, just containers with a shared kernel.


> A VPS (which is usually a virtual machine)

This is where I disagree. Like I said in my sibling post, the term "VPS" was invented to obscure the difference between VM-backed and container-backed workload virtualization, so that a provider could sell the same "thing" at different price-points, where actually the "thing" they're selling is a VM at higher price-points and a container at lower price-points. "VPS" is like "spam" (the food): it's a way to avoid telling you that you're getting a mixture of whatever stuff is cheapest.

Sure, there's probably some high-end providers who use "VPS" to refer solely to VMs, because they're trying to capture market segments who were previously using down-market providers and are now moving up-market, and so are used to the term "VPS."

But basing your understanding of the term "VPS" on those up-market providers, is like basing your understanding of the safety of tap water on only first-world tap water, and then being confused why people in many places in the world would choose to boil it.

(And note that I referred specifically to down-market VPS providers in my GP post, not VPS providers generally. The ones who sell you $1/mo VPS instances are not selling you VMs.)

> If you're defining "container" not by the technology but rather by "how it needs to be managed by ops" then we're working with completely different definitions from the start.

It seems that you're arguing from some sort of top-down prescriptivist definition of what the word "container" should mean. I was arguing about how it is used: what people call containers, vs. what they don't. (Or rather, what people will first reach for the word "container" to describe; vs. what they'll first reach for some other word to describe.)

Think about this:

• Linux containers running on Windows are still "containers", despite each running isolated in their own VM.

• Amazon Elastic Beanstalk is a "container hosting solution", despite running each container on its own VM.

• Software running under Google's gVisor is said to be running "in a container", despite the containerization happening entirely in userland.

• CloudFlare markets its Edge Workers as running in separate "containers" — these are Node.js execution-context sandboxes. But, insofar as Node.js is an abstract machine with a kernel (native, un-sandboxed code) and system-call ops to interface with that kernel, then those sandboxes are the same thing to Node that containers are to the Linux kernel.

• Are unikernels (e.g. MirageOS) not running as VMs when you build them to their userland-process debugging target, rather than deploying them to a hypervisor?

> To me that seems like saying "my car is no longer a car because I live in it. Now it's a house (that happens to have all the same features as a car, but I don't use it that way so it's no longer a car)"

A closer analogy: I put wheels on my boat, and rigged the motor to it. I'm driving my boat down the highway. My boat now needs to be maintained the way a car does; and the debris from the road is blowing holes in the bottom that mean my boat is no longer sea-worthy. My boat is now effectively a car. It may be built on the infrastructure of a boat—but I'm utilizing it as a car, and I'd be far better served with an actual car than a boat.


> CloudFlare markets its Edge Workers as running in separate "containers" — these are Node.js execution-context sandboxes.

This is inaccurate:

- Cloudflare Workers does not use Node.js at all. It is a new custom runtime build on V8.

- Cloudflare absolutely does not market Workers as using "containers", in fact we market them explicitly as not "containers": https://blog.cloudflare.com/cloud-computing-without-containe...

(Disclosure: I am the lead engineer for Workers.)

----

In the industry today, the term "container" refers to a hosting environment where:

- The guest is intended to be a single application, not a full operating system.

- The guest can run arbitrary native-code (usually, Linux) binaries, using the OS's standard ABI. That is, existing, off-the-shelf programs are likely to be able to run.

- The guest runs in a private "namespace" where it cannot see anything belonging to other containers. It gets its own private filesystem, private PID numbers, private network interfaces, etc.

The first point distinguishes containers from classic VMs. The latter two points distinguish them from things like isolates as used by Cloudflare Workers.

Usually, containers are implemented using Linux namespaces+cgroups+seccomp. Yes, sometimes, a lightweight virtual machine layer is wrapped around the container as well, for added security. However, these lightweight VMs are specialized for running a single Linux application (not an arbitrary OS), and generally virtualize at a higher level than a classic hardware VM.


Hmm, is this really true? Typically people mean lxd or docker when they say containers, but VPSes run on KVM or OpenVZ and are a different level of abstraction than a container. I could be misunderstanding VPSes but I believe they are true VMs?


OpenVZ is fundamentally a container system, almost exactly equivalent to LXC. (In fact, Linux namespaces and cgroups were effectively created through a refactoring and gradual absorption of OpenVZ-descended code.)

"Virtual Private Server" (VPS) is a generic marketing term used by compute providers to allow them to obscure whether they're backing your node with a true VM or with a container. Down-market providers of the kind I referred to always use it to mean containers.

Yes, these VPS provider containers are wrapped in a VM-like encapsulating abstraction by the compute engine (usually libvirt), but this is a management-layer abstraction, not a fundamental difference in the isolation level. VMs that use OpenVZ or Linux containers as their "hypervisor backend" leave the workloads they run just as vulnerable to cross-tenant security vulnerabilities and resource hogging as they would if said workloads were run on Docker.

-----

But all that's beside my original point. My point was that, when you run a "pet stateful container that you can SSH into", you're Greenspunning a VPS node, without getting any of the benefits of doing so, using tooling (Docker) that only makes your use-case harder.

If you acknowledge what you're really trying to do—to run your workload under VPS-like operational semantics; or maybe even under VM-like operational semantics specifically—then you can switch to using the tooling meant for that, and your life becomes a lot easier. (Also, you'll make the right hires. Don't hire "people who know Docker" to maintain your pseudo-VPS; they'll just fight you about it. Hire VPS/VM people!)


Just to be clear, I don't think anybody is arguing that you should use containers like you would a VPS, merely that you can. I would bet everyone here would agree that just because you can doesn't mean you should :-D


Yeah, I see what you mean (when taking the word 'container' in its technical meaning.) I'm not arguing with that; in fact, that was the same point I was making!

But I think that people don't tend to use the word "container" to describe "a container used as a VPS."

Which points at a deeper issue: we really don't have a term for "the software-artifact product of the Twelve-Factor App methdology." We refer to these things as containers, but they're really an overlapping idea. They're signed/content-hashed tarballs of immutable-infrastructure software that can be automatically deployed, shot in the head, horizontally shared-nothing scaled, etc. These properties all make something very amenable to container-based virtualization; but they aren't the same thing as container-based virtualization. But in practice, people conflate the two, such that we don't even have a word for the type of software itself other than "Docker/OCI image." (But a Google App Engine workload is such a thing too, despite not being an OCI image! Heck, Heroku popularized many of the factors of the Twelve-Factor methodology [and named the thing, too], but their deploy-slugs aren't OCI images either.)

My claim was intended to mean that, if your software meets none of the properties of a [twelve-factor app OCI image workload thing], then you're not "doing [twelve-factor app OCI image workload thing]", and so you shouldn't rely on the basically-synonymous infrastructure that supports [twelve-factor app OCI image workload thing], i.e. containers. :)


Ah ok, cool yeah I think we're in total agreement then. No doubt you are absolutely right, the word container is used commonly to mean all sorts of things that aren't technically related to the technology we call containers :-)

I do think a lot of enterprise marketing and startup product pitching has made this problem so much worse. I see this a lot with Red Hat customers (and Red Hat employees too for that matter). "Containers" are painted as this great solution and the new correct way of doing things, even though much of what is being sold isn't tied to the technical implementation of containers. There indeed isn't a good marketing-worthy buzzword to describe immutable infrastructure/12-factor app and all that at a high level.


No, it isn't true. The OP basically says:

If you use containers like VPSes then you have basically have a VPS but in a container.


No, this is dogma ;)

Everything is a host and can be used for anything.


Before adopting containers it wasn't unusual to SSH in and change a line of code on a broken server and restart. In fact that works fine while the company/team is really small. Unfortunately it becomes a disaster and huge liability when the team grows.

Writing a script to ssh into a bunch of machines and run a common command is the next step. That works far longer than most people acknowledge.

I pine for the old days - I really do. Things are insanely complex now and I don't like it. Unfortunately there are good reasons for the complexity.

Meh.

Containers provide solutions to the problems that someone else had. If you don't have those problems, then containers just create complexity and problems for you.

What problems do they solve? They solve, "My codebase is too big to be loaded on one machine." They solve, "I need my code to run in parallel across lots of machines." They solve, "I need to satisfy some set of regulations."

If you do not have any of those kinds of problems, DON'T USE CONTAINERS. They will complicate your life, and bring no benefit that you care about.


Counterpoint, in many ways its much simpler than 20 years ago: Docker, k8s, etc is miles beyond the type of automation I used to have to deal with from the operations type people.


We have used chroots + a bunch of perl scripts for 20 years. Besides APIs for adding/deleting nodes or autoscaling nodes, nothing much changed for us. And, as I have remarked here before (as it is one of my businesses); that extra freedom, esp autoscaling, is almost never needed and, for most companies, far more expensive than just properly setting up a few baremetal machines. Most people here probably vastly underestimate how much transactions a modern server can handle and how cheap this is at a non-cloud provider. Ofcourse, badly written software will wreck your perf and with that nothing can save you.


You’re comparing cowboy sysadmin (mutable servers, ssh in and live-edit stuff) to process-heavy devops with CI. These are orthogonal to containers/not-containers.

If you don’t use CI, it’s easy to get fast deploys with containers. Just build the images on your dev box, tag with the branch and commit hash, and push directly to a docker registry (docker push is smart enough to only push the layers the registry doesn’t already have). Code is running 30 seconds after compiling finishes.

(Don’t want to pay for a registry? It’s trivial to run one yourself)

These aren’t foolproof, fully reproducible builds, but practically they’re pretty close if your tools require your repo to be clean and pushed before building the image, and if your build system is sane. Besides, if you’re used to editing code as it’s running on servers, you don’t care about reproducible builds.

Also, if you’re starting containers manually on the command line, you’re doing it wrong. At least use compose so your setup is declarative and lifetime-managed.

(Edit: s/swarm/compose/)


> Also, if you’re starting containers manually on the command line, you’re doing it wrong. At least use compose so your setup is declarative and lifetime-managed.

As I wrote in a nearby comment, I'm not starting containers manually - we have compose, swarm, it's declarative and lifetime-managed.

However, we often need to do some bespoke data analysis, so we often ssh into a server, type `make shell` to launch a REPL and type/paste some stuff into it.


You can do all that with containers quickly.

I develop my stuff with k8s with all it's lifetime management and have 10 second deploys from my dev box.


Still 10 seconds. I have 0 seconds deploy for some ‘cowboy development’ I do; it is now a competitive advantage :) 10 sec (and for almost all setups I have seen it is vastly more that than) is a lot while devving and deploying for test. Each their own, but just fixing bugs live with my client (Zoom + me live on the test server fixing many issues in an afternoon) is vastly more efficient for me. Obviously committing from the test(now dev) server to github will result in ci/cd to the staging server, but workflows where I work on my local, commit, ci/cd to test and then the client tests is vastly slower and I do not like it; it feels like a waste of my time.

In my main business I am forced (regulatory) to do it all by the book; vastly less enjoyable for me.


10 seconds deploys? How is this possible :) Do you have any links that explain your workflow?


It's nothing fancy - I just use a typical k8s Service/Deployment object on GKE. A deploy is:

  1. docker build (most layers cached) - 2s
  2. docker push - 2s
  3. update deploy.yaml
  4. kubectl apply -f deploy.yaml
  5. kubectl rollout status deployments {name} - 6s


Lucky!! With our infrastructure, deploys can take an hour or so. 10 minutes for the build, 10 minutes for the image to get built, plus the rest of the time for terraform to apply infrastructure changes across dev, staging, and prod. Only thing we do have is automated testing after every deploy so issues tend to get caught. But that's still so long for a deploy! I don't know of a good way to get it down faster.


Why is terraform deploying infrastructure for every container deployment? Can't you just rollout onto the existing infrastructure?

Also sounds like there is some lay hanging fruit available by adding some caching/layering in the build process


Ah, I should have explained more. That's an hour for our ASG + EC2 deployments. The only benefit we get is easy roll backs because it always deploys a new ASG. We're switching over to Spinnaker and started with our EC2 infrastructure. I think container deployments will be faster but still, an hour for EC2 deployments!


Yeah this is exactly what I do too, works just fine. You probably already have something like this, but I hacked a bash+yq script that automatically updates all relevant yaml files with the latest image tag. So getting new code running is two lines:

make image push deploy

kubectl -f somewhere/deploy.yaml


Take a look at skaffold. Ours is under 10s


I think people kind of boiled themselves alive with Docker and don't step back to think if they're where they want to be often enough.

Docker first started getting traction when people were building their software with make. Make never quite got caching right (not really its fault), so nobody was really sure that their changes were going to show up in their release unless they ran "make clean" first. And, you had to have 700 helper utilities installed on your workstation for a build to work after "make clean". Docker to the rescue! It gives you a fresh Linux distribution, apt-get installs all the utilities needed for your build, and then builds your thing. It works 100% of the time! Celebrate!

At the same time, programming languages started moving towards being build systems. They want "mylang run foo.ml" to be fast, so they implemented their own caching. But this time they did it right; the "mylang" compiler knows about the effect every input file has on every output file, so it's guaranteed to give you the right answer with or without the cache. Some languages are so confident these days that you can't even disable the cache! They know it works perfectly every time. The result is extremely fast incremental builds that are just as reliable as clean builds, if you have access to that cache.

This, unfortunately, is not something that Docker supports -- layers have one input filesystem and one output filesystem, but now languages are producing two outputs -- one binary for the next layer, one cache directory that should be used as input (opportunistically) to the next build. The result is, to work around people writing makefiles like "foo.o: foo.c" when they actually meant "foo.o: foo.c foo.h helper.h /usr/include/third-party-library.h /usr/lib/libthird-party.so.42", EVERYONE suffers. "mylang run foo.mt" takes a few milliseconds on your workstation, but 5 minutes in Docker. Your critical fix is hamstrung by your CI infrastructure.

There are a number of solutions to this problem. You could have language-specific builders that integrate with your CI system, take the source code and a recent cache as inputs, and produce a Docker container as output. (Systems like Bazel even have a protocol for sharing cache between machines, so you don't have to copy gigabytes of data around that you probably don't need.)

But instead of doing that, people are writing articles about how their CI takes 3 hours when a build on their workstation takes 3 seconds, and it's because containers suck! But they don't actually suck -- the underlying problem is not saying "in production, we will only run .tar.gz files that contain everything the application needs". That is actually wonderful. The underlying problem is "the first build step is 'rm -rf /' and the second is 'make world'".


> Docker first started getting traction when people were building their software with make.

A result of this is that containerisation took off hardest in ecosystems without good build and deploy tools. Getting exactly the right version of the application, every library, and the runtime has traditionally been a struggle in Ruby, a nightmare in Python, torture in C, etc, but pretty easy in Java. As a result, most of the Java shops i know are still deploying by copying a zip file or some such onto a server.


I'm astounded that docker still doesn't have the concept of 'build volumes' that can be used for streaming artifacts and caches into build steps.

That said, 'docker build' is not the only game in town. For a long time, Red Hat's source-to-image tool has been able to do incremental builds:

    s2i build --incremental /path/to/mycode mybuilderimage myapp
This creates a new container from the mybuilderimage image, copies the source code from /path/to/mycode into the container at /tmp/src, and runs mybuilderimage's 'assemble' script (which knows how to invoke "mylang run foo.ml" and install the build result into the right place so that the 'run' script will later find it). The result is committed as the myapp container image, which typically uses mybuilderimage's 'run' script as its entry point.

The next time the command is run, the first thing s2i will do is create a new container based on myapp, and invoke its 'save-artifacts' script, which has the job of copying out any build caches and other artifacts to be used in an incremental build. The container is then discarded.

Now, the build runs as before, but with the addition of s2i copying the saved build artifacts into /tmp/artifacts, so that the 'assemble' script can use them to make the build faster.

This isn't perfect: you pay for speedier builds with larger container images, since you can't delete build caches, etc. like you'd normally do at the end of a Dockerfile. But it's a good first step, and you can always have another step in your pipeline that starts with the myapp container, deletes unwanted files and then squashes the result into a single layer above the original base image that mybuilderimage was itself built from.


It does, now. The whole build system was uprooted with the change from legacy build to buildkit. One of the current "experimental" features is build volumes.


Do you have a link with a description? I can only find years old open docker github issues, 3rd party software or some very hacky solutions


This is a great comment - especially the the last 2 paragraphs.

Docker is not the same as containers - docker is just one way to build containers. I've not seen one yet that would be easier to use (and as as result faster than docker) but it does not mean we cannot create one!


I personally use Nix to build Docker-compatible containers in production.


If you need a hot fix "RIGHT NOW" you might be doing something wrong in the first place.

Being able to just ssh into a machine is one of the problems that we did solve with containers. We didn't want to allow anyone to SSH into a machine and change state. Everything must come from a well define state checked into version control. That's where containers did help us a lot.

Not sure what you mean with lingering containers. Do you launch yours through SSH manually? That's terrible. We had automation that would launch containers for us. Also we had monitoring that did notify us of any discrepancies between intended and actuated state.

Maybe containers aren't the right tool for your usecase - but I wouldn't want to work with your setup.

Btw. most of this is possible with VMs, too. So if you prefer normal GCE / EC2 VMs over containers that's fine, too. But then please build the images from checked in configuration using e.g. Packer and don't SSH into them at all.


> If you need a hot fix "RIGHT NOW" you might be doing something wrong in the first place.

Can we please grow up beyond this kind of comment?

I suspect everybody knows that hotfixing production isn't an ideal thing to do, and the many reasons why that's the case, but lots of us nevertheless do it from time to time. Where I work now we've probably hotfixed production a handful of times over the past couple of years, amongst thousands of "proper" deployments. It's really not a big deal.

We know so little of the context behind OP's issues that picking holes isn't helpful or informative. The specific question was about migrating from containerized infrastructure back to a more traditional hosting model. I for one would be interested in reading about any experiences people have in this area so it's quite frustrating to find this as the top comment when it's substantially off-topic.


>Can we please grow up beyond this kind of comment?

No platform that assigns quantitative virtue points to comments based on how many people agree with that comment will ever not have an abundance of people lobbing low-effort quips and dog whistles designed to appeal to the majority.


This should probably be the banner message when you arrive on this site. Well said.


this happens in the workplace as well though


> this happens in the workplace as well though

Exactly this. I'm fortunate in that where I work now it's pretty rare: the culture is fairly collegiate and friendly, and there's a strong sense of "we're all on the same side". Other places I've worked every meeting has been an exercise in point scoring, which is incredibly wearing and - try as one might - the culture does end up influencing one's own behaviour.

You can obviously get preachy about how people should be stronger characters and not so easily influenced but the phrase "Bad company corrupts good character" exists for a reason. Unless you're Gandhi it's incredibly difficult for many people to resist the culture around them day in and day out without some positive reinforcement from the behaviour of others, especially when you're incentivised to do otherwise, and may be penalised for not doing so[0].

[0] The answer is, if you can, find another job. Not always an option, but a good idea if it is.


Same experience here, for the last 4400 deployments over the last 4 years at my current job we had to do 3 live patch on production. This just happens in the real world. Last occurrence was 2 months ago when a deployment step that worked well several thousands of times failed in an unrecoverable state with our deployment pipeline.


Strong disagree, I've worked at/seen so many places that do deployments in reckless and awful ways that I don't think it's obvious to most people.

Even in this thread, the arguments "against" containers, a lot of these things have alternatives that are still better. For instance, ok maybe you need to make a hotfix, you're still better off just compiling a one-off image on a dev machine and deploying it (I used to be able to do this in like, 30 seconds), as compared to trying to edit files directly on the server. Especially if you have multiple servers.


Thank you!

You're absolutely right. There is no context outside of the few lines OP asked and the quote you highlighted shows a lack of real world experience.

Yes it's true the "right now" fixes indicate there is a problem, but in small shops it's generally the most reasonable approach. Now if you're on a team of 50 other people and you need to make "right now" fixes then there is certainly a problem. Neither of which any of us can know from the context.... but besides the point, that it's not even on topic.


I don't even think "right now" fixes necessarily indicate a problem in development approach. Sometimes unexpected things happen and you need a way to fix them quickly, without being bogged down by infrastructure.

A great example of this is using a production repl. It even served Nasa well (https://stackoverflow.com/questions/17253459/what-exactly-ha...). Having to change your system on-the-fly is not always an indication of a poorly developed or managed system.


If the rate of unexpected things is "sometimes" and not "very rarely" it probably does indicate that there are problems with development, build process or something else in the infrastructure.

Most of us are building CRUD apps not sending people to the moon.


This is very common and just because it doesn't match your use case doesn't mean some businesses don't need a hot fix "right now." If you work in 24/7 ecommerce and your site is producing $60k per hour and there is a network failure that breaks something you need a hot fix right now, otherwise your 3 hour code review, build, q/a, deploy pipeline will cost the company $180,000.


> otherwise your 3 hour code review, build, q/a, deploy pipeline will cost the company $180,000

Taking code review away, because we all know we're not going to sit around waiting for a review while a patch needs to go out urgently, if your build, test, and deploy pipelines take three hours then you have some serious problems that you need to address, and containers aren't it.

There are methods for handling hotfixes/patches to production quickly that work well in high volume sales website setups.


> Taking code review away, because we all know we're not going to sit around waiting for a review while a patch needs to go out urgently

I worked at a trading firm where you could add a zero to that cost for a 3-hour outage, and I can tell you that the one thing we would absolutely never skimp on was code review. Because the cost of a bad "fix" that actually makes things worse has the potential to be greater still, and because humans are most likely to make silly mistakes when they're working under intense pressure.

What we would do instead is slightly intensify the code review process by pair programming the hotfix, and ensuring that a third developer who was familiar with the system in question was standing by to follow up with an immediate review.


Pair programming anything as urgent as a hotfix works really well. It takes some pressure off of the developer working on it and turns it into a team event.

We will even sometimes keep the video call open until deployment is done and production validation is complete - the devops guys get the information they need, someone else is keeping an eye on the checklist and calling out items if necessary, etc.


Oh yeah, absolutely. I was happiest when I also had the attending ops person looking over my shoulder while I worked on the fix.


I really like that approach. It also takes pressure off the tech lead or whoever is implementing the fix, and transforms the patch into a full-team responsibility. I bet this sort of behavior makes for strong and effective teams.


Please do tell. Our containers never take less than 10 minutes to deploy (a single one).


I have app stacks that deploy a dozen containers in seconds because they are stateless and close to "functional" – just transformations over inputs.

I have app stacks that deploy a dozen containers over an hour because the orchestration takes time: signal the old containers to drain, pause for an app with a very long initialization time to settle, gradually roll traffic to the new one to let caches warm, and then repeat.

In both of these cases, deployment is a function of the application. There's nothing infrastructural that puts a time floor on things.


Sure, not contending that. It's just that in my memories blue/green deployments still took less time, although I can't say how much.


Not a serious reply, but it may interesting: One million containers over 3500 nodes in 2 minutes: https://channel9.msdn.com/Events/Ignite/Microsoft-Ignite-Orl...


Well 10 minutes is way less than 3 hours.


Certainly. But it does add up and it does kill motivation for rapid iteration. :(


10 minutes to deploy to production should be fine, but rapid iteration shouldn't be happening in production. It sounds more like the complaint is related to running a similar setup in dev, and taking 10 minutes to see changes during development (which is understandably too long).


Yep, that's what I meant. I have no choice but to run a local k8s cluster or else I can't test.


and then we're back to the original statement, if you're rapidly iterating hot fixes you have serious problems and likely doing it wrong.


A lot of businesses that operate 24/7 run on containers quite well. When I had to so these sort of quick hot fix things for a startup, almost all of the issues were caused by lack of testing. Testing lacked because there wasn't an easy way to constantly make sure dev staging and prod are absolutely the same. Same infrastructure, same code, same packages, etc.

That's an easier solve with docker containers. And testing, including UI testing, can be integrated much more easily with ci/CD tools and docker containers that have code which goes by commit hashes and which ensure every package down to the version is controlled across environments.


Someone fucking directly with prod might also cost $180000.


One will cost and another might, see the difference


Time to deploy is a known value. The impact potential of having a workflow where ssh'ing into production is even possible can cost you buckets as: 1. You're messing with production, obviously. 2. Your infrastructure isn't stateless. 3. Your infrastructure is likely not HA. 4. You likely don't have canaries in place to mitigate the impact of bad production deployments.

And the impact of all of these on direct revenue/productivity can be immense. SSH into prod is a crutch.


They are both a might. Both are a probability distribution.


In the long run "fucking with production" as well <i>will</i> cost a fortune.


Citation needed. I’ve “fucked” with many prod systems over the last 15 years and not caused outages, or extended them.


Those costs need to be amortised annually.

There are other costs from outages which occur infrequently but are highly costly because developers/support are manually touching the servers.

The cost really should be analysed as what is the median cost on revenue (or profit) as a percentage point. One off pricing is pretty meaningless.


you rollback and then make a proper fix.


> If you need a hot fix "RIGHT NOW" you might be doing something wrong in the first place.

Maybe I phrased this point badly. I haven't found myself to need a hot fix RIGHT NOW for a long time. What I had in mind was this: sometimes there is an issue that is very hard to reproduce locally. If a developer doesn't understand it from the get go and have to experiment a bit, it can be frustrating to commit, wait for deploy (even to stage environment), test what happens and repeat.

> Not sure what you mean with lingering containers. Do you launch yours through SSH manually?

Of course the application itself is run automatically, web server, workers, etc.

However, we often need to do some bespoke data analysis, so we often ssh into a server, type `make shell` to launch a REPL and type/paste some stuff into it.


I think you might be falling into a cycle that I often find myself falling in to. It goes like this: Oh, there's a bug. But it's obviously caused by _this line here_, so I won't even bother testing it offline, I'll just fix it and run CI and then... oh no, wait, it was more complex than I thought, but I'm in the change-CI-test cycle now so I'll keep doing that. And then, all of a sudden it's taken me three hours or more to fix what was in reality quite a small bug.

The solution is to force yourself not to use the CI/container system at all during debugging, and instead to build the binary (or whatever) standalone. That's hard because you invariably aren't tooled up to run the component outside its deployment system so you have to do some extra tricks, but in the long tun it seems to be the way to go.


Over the years I've learned to do this very rarely. However, I'm not the lone wolf in the woods, I also have other developers working for me. I can talk to them however long I want and motivate them to debug locally to understand the problem, but in the end they have to gain some experience with their own pain. I would prefer that their experience gaining would be faster, so it would cost me less money :)


> I would prefer that their experience gaining would be faster, so it would cost me less money :)

Maybe your current strategy isn't working?


docker exec -it <container id> kubectl exec -it pod -c container


Not sure what do you want to say with this command.


you want to be able to "ssh in and make changes to figure out how to fix things". why doesn't that do that for you?


Sorry, but did you not forget the --rm option to remove the container once it dies ?


For both docker and k8s, that command is 'exec into already running container', not 'spin up a new container'.


This is all getting quite heated! Please take it outside.


This was a joke. Humour.


Please contain yourself.


I interpret it as a suggestion that you allow developers access to the running container in production to debug the issue

There are more sane ways to do this than through raw kubectl access, of course: see e.g. telepresence (https://www.telepresence.io/)


By sane, you mean yet another piece of infrastructure that has to be installed, documented, packaged into containers, deployed and updated for a problem that did not exist without containers. For a very large organization (e.g. hundreds of devs) this make sense, but for a medium sized company (> billion EUR turnover per year outside of software business) this soon becomes just another piece of overhead.


This problem certainly exists without containers.

- How do you provide easy visibility into running applications?

- How do you prevent inspecting an application from affecting the behavior of an application?


You put this fairly succinctly. And I agree wholeheartedly.

I think of the phrase “adding epicycles” when this kind of stuff starts happening.


I’ve worked in a number of areas with basically no fail/delay SLAs. I think it’s naive to think “if you need a hot fix right now, you’re doing it wrong”...the number of times we needed to hot fix because of ourselves was very low. But when you’re in an integration heavy environment and one of the many moving parts (outside of your control) breaks, well thought out “put the fire out” stopgaps on the server consistently save the day (and the company money by not breaching the SLA)


That makes perfect sense and it's definitely true that sometimes the hotfix is not a bug in your code (which can be solved by a rollback) but instead having to patch a problem in a dependent system. But that seems orthogonal to the container issue. Shelling into a live server and changing something only works if you have the entire build toolchain on the production server which hasn't generally been the case in my experience. Even if you aren't using containers you still need to build artifacts and deploy them. It's just that you are deploying binary artifacts instead of containers. It doesn't seem like the container builds are the real long pole in that process.


Redeploy the older working version?


"outside your control" is key here. you're assuming a rollback would work. in many cases, some external system changes without your knowledge, and you're only seeing those changes on production.

I've got a client that has data feeds from multiple vendors. some are pulls, some are... "hey, we'll FTP this file to you". the file format has changed - unannounced - at least 3 times in the past... 15 months. Then something breaks on production, but you don't know what. You need to get on that machine and take a look.

"Redeploy the older working version" doesn't do anything except re-introduce more problems in these instances.


This is a good point. There are probably lots of people on HN working in cloud environments where your dependencies are actually organizationally within your control. If one of your dependencies makes a change that breaks you, you can escalate the problem and compel them to roll back the change. This is the luxury of building the entire world. My service depends on nothing that can't be escalated to my own VP, so "roll back to the old version [of whatever changed]" is a very satisfying answer, but it's not an option when your dependencies aren't obligated to keep you running.


I pity anyone whose system needs less than X downtime per month, but who depends the constant availability of an external system that is down for more than X per month :)


> Then something breaks on production, but you don't know what. You need to get on that machine and take a look.

If that's a problem you find yourself having at all, much less regularly. You have a serious observability problem.


Isn’t the larger issue that your production environment can be brought down by bad user input?


There's breakage outside "brought down". A system that's running but doesn't produce outputs because the input data changed can be "broken" and violating its SLA too. And not really something you can design around outside "we won't promise anything", but then you loose to competitors that do take it on themselves to react quickly enough with hotfixes.


How fast can you grow if you’re constantly putting out fires? It sounds like you are in a B2B world. Businesses always want more. Where are the sales people/customer service managers that can set realistic expectations on what the client requires vs. what they are expected to do? “logging into production and manually correcting stuff” can only go so far and doesn’t scale.


> If you need a hot fix "RIGHT NOW" you might be doing something wrong in the first place.

Without ant context that seems like an outrageously ignorant comment. (In saying that most companies probably are doing many things wrong).


> In saying that most companies probably are doing many things wrong

I think that sounds actually very plausible :)


Agreed. This confused the heck out of me.

Almost sounds like script kiddies when they start learning programming for the first time in their lives they assume that the test of successfully learning how-to-program is that your code should compile on first try without errors.


Give me a break. Everyone needs a hotfix occasionally.

If your users are asking for something and your response is “you’re just doing it wrong!”, you’re probably the one who is wrong.


But if your developers are asking for something and your response is "you're doing it wrong" (hopefully not verbatim), you've got a good mentorship opportunity on your hands


I don't know what I'd do without being able to ssh into VM instances. Whether it's for looking at various logs, the occasional core dump, or uploading a custom binary to test something, it's incredibly time saving.


But... you can ssh into a container and change state (depending on config options), can't you? I'm not sure I follow this response or the OP's complaint.

I mostly write python code and one very nice pattern I've found is to run a container somewhere (either locally or on a server somewhere) with an open port. You can SSH into it and use the remote interpreter as your main project interpreter. That way your dev environment is 100% reproducible. VS Code an Pycharm Pro both support doing that.


I think OP is working with a farm of containers that are spawned and destroyed dynamically. Which means SSHing on one and fixing a problem would not really help.


For small deployments, running a Common Lisp web app with remote repl access is great for finding problems. I wouldn't recommend this for high traffic apps but there are many use cases in this world for small user base, focused web apps where maximizing developer productivity is required for profitability.


Depends on how you setup the container.


> Being able to just ssh into a machine is one of the problems that we did solve with containers.

you should be able to ssh into a machine and strace a process to see why something is going wrong. If your only solution is always "restart the container" or "revert to an old container" or "only deploy containers that are known to work" you're not actually debugging extant problems.


We did at feeder.co. We ran into so many quirks with Docker and Kubernetes that we decided to go back ”bare VM”. It was great! We’re a small team, so any resources save are a great thing. It always felt like we containerized to be able to ”scale up quickly” or ”iterate faster”. That of course was never needed, what was needed however was to scale up infrastructure to keep up with the resources required by Docker Swarm, k8s or just Docker in general.

Also, running services is hard already. Adding another layer that caused existing assumptions to break (networking, storage, etc) made it even harder.

Bare VMs are crazy fast and running apt install is crazy easy. Not looking back at the moment!


Yep, storage and networking are two things that I haven't mentioned in the post but they definitely annoy. Sometimes (rarely but it happens) network breaks and Docker Swarm falls apart, we then have to restart it.

Storage is ughh.


It's also hard to understand how the network works in K8s and Docker Swarm. Sometimes we'd hit random slowdowns that were impossible to understand (I'm definitely no networking expert) Just restarting the server or moving to another node would fix it. I really want to use K8s, because it's a cool promise, but for us at least, it was too complicated in reality.


If you only have a few VMs, consider ditching swarm and just using docker (with compose) with host mode networking. I’m not sure swarm ever got stable enough to use in prod; I migrated our stuff from swarm to k8s a couple years ago due to similar issues. K8s has been solid but it’s a beast.

For storage, why not just mount host dirs into the container for stuff you want to persist? Then you’re no worse off than you were before.


It depends how you designed your app. My app uses a RDS instance and a S3 bucket for data and file storage. I fel it is a best practice that your containers should be stateless (except perhaps in development). Docker is not very good at storage and I wouldn't recommend using it in that way.


maybe scale is not needed ,but how do you achieve resiliency with bare metal VMs without adding LB and watchdog layers (which is what k8s is anyway)?


We use DigitalOcean's loadbalancer product, we've also tried Cloudflare's for pure HTTP loadbalancing. We also use a single Redis instance for job queues. We use Graphite and Grafana for monitoring system metrics (running a Bitnami Graphite/Grafana instance on AWS because we had credits) And for the rest I guess just keeping the services simple? When we do need to scale up and add a new web server or task runner, it takes about an hour of my day.

One thing I realised with going bare-VM is that most services today are insanely stable. MySQL almost never crashes, Redis definitely almost never crashes, Rails/Passenger/Nginx never have any issues. The things that do happen is disks filling up, application bugs causing issues, or actual VM downtime, which is rare but happens when you have 30 VMs. With Docker or K8s it added a super complex layer that is in development and has issues.

The 4 months we ran our web servers on K8s, I spent at least 1 month debugging issues that ended at an existing open ticket on Github.


A lot of it been like this for a long time. Postgre, MySQL, Redis, Nginx are bulletproof solid. Sqlite might as well be a hammer and so many businesses could easily run on it, if only there was a way to drop a column :o

Amount of $$ that was spent on Docker infra to run couple dozen servers at the last place I used to contract for.. oh and more fun when those machines have GPUs on them(a lot more fun) then they decided to support Singularity as well because.. I don’t know.


> When we do need to scale up and add a new web server or task runner, it takes about an hour of my day.

So you manually create and configure your VMs?

Do you have some kind of HA for your database? If a VM goes offline, how quickly can you replace it?

Maybe you don't need to go back to Docker or Kubernetes, but at least consider using one of the hyperscale cloud providers, with its auto-scaling groups and multiple data centers per region, so you can have a system thatheals itself even while you're asleep or on a plane.


Yes, I manually create and configure in the sense that I tweak a number in Terraform templates and manually run the ansible playbook for each new server. It's taken a lot of time to get to that level (I think keeping a setup of bash scripts would suffice in our case...)

We run a read-replica on every database, so in case a hardware error occurs on the main database we can manually switch it over. It might mean up to an hour of downtime if the worst happens. Some data loss is OK and can be solved with manual customer support most of the time. It's also a lot cost effective than working towards a 100% SLA.

Keeping the read-replicas alive is plenty pain enough! I can't imagine automating everything to auto-heal itself. (Sounds super fun though)

Codifying the setup for auto-scaling would be a massive undertaking. Each new change then requires destroying VMs and bringing up new ones. That would then require a k8s-like layer of infrastructure for secrets, DNS, service discovery, not relying on ephemeral storage (which is a lot faster than volumes/block storage).

I really love doing ops/devops, and would love to have the perfect setup which is 100% automatic and scaleable. Even now I have to stop myself from spending too much time scriptifying things that can just be run manually.


and hour of downtime and some data loss are not metrics acceptable to most businesses i know of

and what if you have 10 customers joining every day? still gonna be running that ansible manually?


I guess you assumed 1 customer = 1 new server? For enterprise purposes where data siloes are important a different approach definitely makes sense. We have 300+ new users per day, so our manual system scales well.


well if you rely on SaaS solutions for LB and HA, thats fine

less so if you're limited to airgapped/onprem or there are some other security or regulatory considerations


All cloud providers offer LBs with backend instance health checks. Custom scaling rules too.


You should add a load balancer, depending on what you do. Most load balancers will check in on the backend node, and disable them if they fail, rerouting traffic to other nodes.

The load balancers we utilize will do failover between them selfs, and do it really well, as in "you don't notice".

Many seem to underestimate the stability of modern virtualization, the build in redundancies, fail over feature and the capabilities of load balancers.

I would guess that most Kubernetes clusters are built on virtual machines, not physical hardware. Meaning that you just now have layers of redundancy.


Instead of fixing something RIGHT NOW, meaning adding another commit to your build, why aren't you instead rolling back to a known good commit?

Image is already built.

C/I already certified it.

The RIGHT NOW fix is just a rollback and deploy. Which takes less time than verifying new code in any situation. I know you don't want to hear it but really, if you need a RIGHT NOW fix that isn't a rollback you need to look at how you got there in the first place. These systems are literally designed around never needing a RIGHT NOW fix again. Blue/Green, canary, auto deploys, rollbacks. Properly designed container infrastructure takes the guesswork and stress out of deploying. Period. Fact. If yours doesn't, it's not set up correctly.


I very much disagree. If your bad deploy included migrations that can't be reversed (drop an old table for example) then rolling back just gave you two problems.

As long as the dev wrote (and tested!) two-way migrations and they are possible, then yes you are correct.


The long-term viable approach is to not make forward-backward incompatible migrations in short period of time.

If your system is truly distributed (multiple machines hosting it), then if one server performs migration that deletes the table, then the other server stops working.

You must have 2 checkins and 2 rollouts: Create a new table while maintaingin the old one -- and let it bake in; and then delete the old table a week later (or whatever is your cadence/release cycle).


IMHO the answer here is to just never do irreversible migrations. (In other words: just leave old tables/columns around--if not indefinitely, then for a few weeks after they stop being used.)


I have done that in the past and I agree, it's a decent solution. The risk is that the old stuff never gets cleaned up and before long your DB is full of all kinds of cruft and nobody knows which part is needed and which isn't.

I saw a heinous bug once because a newer person was using a column that had stopped getting updated years earlier (because it was replaced). This person saw the column and it was exactly what they needed. Then customers started getting billed on old accounts that they had either closed or changed with us. They were really mad.


My solution to this is to rename that table/column to “table_deprecated” or “column_deprecated”. This has the nice property of being reversible, causes nice visible errors if something unanticipated is actually using the column/table, makes it obvious that it shouldn’t be used (well, one would hope), and makes it easy to find for permanent deletion later.


That's a great idea! Although I did work with a person once who would use anything that looked helpful regardless of "deprecated" being all over the place (Eclipse would even yell at the method calls but he didn't care). Granted that's Java and at that point in time APIs were being deprecated without workable replacements, so it's hard to fault him.


At that point you might as well just delete them.


I think you're missing the point. If your migration renames a table or column, then the migration can be rolled back, by simply reversing the rename. If you delete the thing, then there is no way of rolling back other than restoring from backup.

I can see it being useful to rename these things, as long as there is some process in place to delete the renamed items after a period of time has passed.

Would be good if you could schedule a future migration that will happen only after a certain date has passed, which deletes the renamed tables/columns.


This sounds so much like something that I'd do, that I had to stop and think about my personal hall-of-fame screwups to make sure that wasn't one of them.


In terms of things that scare me as a developer, stale data is WAAAAYYY less concerning then having people try to shotgun fix an issue on a production server after an irreversible update took down the entire system... Especially now that the developer and the entire organization is likely now in a panic state and not thinking clearly.

That bug you describe, maybe stale data is part of the problem, but ... why are you having a new person working on the billing system presumably without any code review? It sounds like you have a lot of process issues.


Just create an issue in your issue tracker with the highest priority to delete a column (and why) with a deadline in 4 weeks. Stupid simple solution but works perfectly.


"Clearly the answer for any software problem is to just not have any bugs."

You see how that isn't an actual solution?


An irreversible migration isn't a bug, nor is anyone saying the answer for any software problem is to not have bugs.

They're saying the solution to the specific problem of being unable to rollback due to irreversible migrations is to not write irreversible migrations, which is a completely valid solution and indeed the correct one. The whole point of migrations is to track db changes so that undoing them is easy.


That misses the point.

Code and the DB always need to be compatible one version forward and back. That's required engineering discipline.


creating a table is an irreversible migration. Once you have the data, you can't just ... reverse the migration by deleting the whole table.


I would never ever EVER want to do a deploy with migrations that can't be rolled back. That sounds like professional malpractice TBH.

Two way migrations are both possible and SHOULD be done for any real data.


> Properly designed container infrastructure takes the guesswork and stress out of deploying. Period. Fact. If yours doesn't, it's not set up correctly.

Hmmm, I don't have the experience to know if it's setup correctly or not. All I can do is watch it fail and then learn from my mistakes.

Is there a container "framework" that out of the box gives me all of " Blue/Green, canary, auto deploys, rollbacks..." so I don't have to guess if I'm doing it right?


I hate to say it, but yeah that's kind of the point of kubernetes deployments (https://kubernetes.io/docs/concepts/workloads/controllers/de...). Or openshift for more UI and "out of the box" experience.

You deal with all the headache of making your app stateless with a predictable API so that you can reap the benefits of a system like k8s, which can automatically manage all of it for you.

Similarly i'm a bit confused by your comment about SSH dying... in k8s you configure a readiness/liveness probe and behavior when the probe starts to fail. If SSH is an important thing for a given container, maybe the "liveness" probe is the command "ps aux |grep sshd". Then if it dies, the container can be pruned automatically.


Nomad [1] does it as well, also visualized nicely in their awesome UI.

[1] https://learn.hashicorp.com/tutorials/nomad/job-blue-green-a...


We’ve been using Convox[0] for the last 2 years. I’ve been pretty happy with how simple it is to work with. We’re still on version 2 which uses AWS ECS or Fargate. Version 3 has migrated to k8s and is provider agnostic. We just haven’t had the bandwidth to upgrade yet.

[0] https://convox.com/


We are using Convox v2 too and are happy with it, but I'm hesitant to do the upgrade to introduce the complexity of kubernetes to our devs and if convox the right abstraction on top of kubernetes when there which is already a pile of abstractions in k8s itself (and so many other tools to choose from in the k8s universe).

https://github.com/aws/copilot-cli isn't ready for our use cases, but is more or less convox v2 built by AWS.


I remember that... But it was too aws specific. I'll be giving it another look, thank you for pointing that out!

Edit: sadly it's still another layer onto aws/gcp/other... I'll pass again.


This seems overly absolute to me. What about all of the cases where the bug wasn't caused by a recent commit? Some cases of this I've seen are:

* Time bomb bugs. Code gets committed that works fine until some future condition happens, such as a particular subset of dates that aren't handled properly.

* Efficiency issues. Some code could might function properly and work fine with low amounts of data, but hit a wall when it has to handle loads beyond a particular size.

* Bugs in code that just hadn't received much traffic yet. A feature having a bug that only affects 0.1% of people using it might not be discovered until the feature gains traction down the line.


Having dealt with all of these in production, I can tell you the strategies I've used to combat these things:

1. Solid code reviews. Anyone of our developers can halt a code review for any reason. We require 3 approvers on each review. Sensitive areas require reviews from people familiar in that area. We also have tooling that allows us to generate amounts of test data in dev that is similar to prod loads. This helps us catch a lot of time bombs.

2. Feature toggles to decouple deploy of code from release of code. This allows us to test our code in production before turning it on for customers. It also allows us to slowly rollout a feature and watch how the code behaves. This also gives us a kill switch to turn off the code if it is bad.

3. An incredibly robust testing pipeline. It takes about 50 minutes from commit to production deployment. We can also deploy previous containers very quickly for situations that require it.

This doesn't solve all of our problems. Some changes cannot go behind feature toggles (DB migrations, dependency upgrades, etc). But we do pay a lot of attention to design and rollout plans for database migration changes and such.

All of these things come at an extra cost to us, but it allows us to move quickly when we need to. But we're in a lot better place than we were when we were trying to do weekly releases. We have a good mix of team experience (sr vs jr) - and have a lot of discipline in our software engineering practices. We still have problems like I said, but these strategies have greatly improved our ability to deliver software.


Out of curiosity, how many devs does your org have? I think a lot of the disagreements here come from people at orgs with 3 developers talking to people at orgs with 30,000.


about 40 and growing


Sometimes you have to quickly roll forward.


Those sometimes should be super rare, and you should build testing infrastructure to prevent that from needing to happen. When you release something, you move traffic from the alb over to the new instance, if you have an issue, just move it back. If you are deploying breaking changes and don't provide yourself a stable upgrade and downgrade path, yea, you're gonna have trouble.


I’ve learned to take the time and go through the normal deploy steps for any hot fix. More often then not, rushing the steps leads to longer outages, missing the actual bug, creating a new bug, etc.

Don’t cowboy it, deploy properly and you’ll be more relaxed in the long term.


Yeah, I should have been more clear. I'm 100% for using normal deploy steps and I'm not recommending cowboy-ing updates in a container.

He was asking about using non-containerized infr though. If you can commit a code hotfix and quickly deploy the code package, you can roll forward without the slow container build/deploy.


Incidents _requiring_ rolling forward are extremely rare. In the cases you have to, just build the image and deploy to your cluster with a high max-surge configured.

If you image has correct caching, rebuilding it shouldn't take much time. Most of your time is likely spent in CI and rolling deployments, both of which you can manually skip.


> If you image has correct caching

This is the hangup for most CI/CD systems with containers. Typical configurations (e.g. Gitlab basic setup) don't leverage any caching, so every container is built 100% from scratch, every time.

Adjusting the system to properly utilize caching and ordering your container builds in a way that the most volatile steps are as late as possible in the build will massively speed up container builds.


Please do not.


My guess is that they are writing code (migrations...) Without thought given to rollback.


I'm not a huge fan of containerized infrastructure for the purpose of containerized infrastructure. Typically teams I've seen moving to k8s or a containerized solution don't have strong reasons to do so, and aren't using the pieces of those technologies that provide value.

I have worked with a few companies moving from containers to serverless and a few moving from containers to VMs.

I think that serverless often gives people what they were looking out of containers in terms of reliable deployments, less infra management, and cloud providers worrying about the hairy pieces of managing distributed compute infrastructure.

Moving to more traditional infrastructure has also often simplified workflows for my customers. Sometimes the containerization layer is just bloat. And an image that can scale up or down is all they really need.

In any of these cases, devops is non-negotiable and ssh to prod should be avoided. Canary deployments should be utilized to minimize the impact of bad production builds and to automate rollback. If prod is truly down, pushing an old image version to production or reverting to an old serverless function version should be baked into process.

The real crux of your issue seems to be bad devops more than bad containers and that's where I'd try to guide your focus.


Serverless (Lambdas, functions) might be ok for some backend trigger type processes but it’s absolute shit for end user facing apis. Also managing deployment of that crap is worse than dealing with K8s.


Asking about non-container systems on HN is mostly going to get you "you're doing it wrong" responses -- HN people are very driven by blog-sexy tooling and architectures.

If you want to deploy fast you need to skip steps and reduce I/O - stateful things are _good_ -- deploying with rsync/ssh and HUPing services is very fast, but people seem to have lost this as an option in a world with Docker.

I consult in this space - first 30m on the phone is free - hmu if interested.


> If you want to deploy fast you need to skip steps and reduce I/O

It's perfectly okay to build your artifacts on a stateful host and then put them into a container as a quick to add layer. A whole lot of work has gone into making applications quicker to build via incremental builds that depend on state, and it's worth taking advantage of.

People want to make containers the solution to every problem, and I think that's where the philosophy is hurting. It's okay to not have perfectly hermetic build / developer environments if your tooling is better for it.


rsync needs constant attention not to send unwanted artifacts. Git based deployments aren't as fast, but much more robust and controllable with pull requests.

Edit: of course this is for source code deployments, not binary output.


All the binary deploys I created at work run over git.

Honestly both of those are non-issues. Rsync is as much set once and forget as git, you'll need to script both and configure it there. And git slowness is a one time per deployment server issue.

The one advantage of git is that it will keep an history of your deployments. Of course, CD tools do that too, but git is way more transparent and reusable. The one disadvantage is that it will make your deploy scripts more complex and stateful.


Yeah, with rsync you get both -- like I can deploy source code (python, et al) and stuff with outputs I don't want to track in git (webpack et al).

I can essentially invoke whatever build steps I need to once on a build host - then let rsync handle moving all the changed things to my deployment environments.


asking the other way round: did you ever have a problem that you don't quite know how to replicate a given setup? for many tasks it saves you time - instead of having to mess with the host environment one can instead create an isolated environment - and give the docker access to the host file system or network.

It takes a time to get used to that mode of working (without automated testing it is very hard); but it does have a lot of advantages.


I use other tools - config management and whatnot I do in Saltstack or Ansible, so the VMs are throw-away really, nothing is configured by hand, it's just that they live longer than the lifetime of a single deploy.


I sure have, but not for any of those reasons, and especially not that cowboy "just log in, change couple of lines and restart". The reason is always when the time, cost and complexity of managing the control plane outweighs the benefits.

You can still have a perfectly good, quickly executing, end-to-end CI/CD pipeline for which the deployment step is (gasp) put some files on a server and start a process.

The inflexion point for this varies by organisation, but I've never seen an environment with less than three services where managing container scaffolding is a net positive.


We have a series of customers who I want to move away from Kubernetes, simply because the management of the Kubernetes cluster out weights the cost of managing the application, compared to running it on a few virtual machines.

I wouldn't even set the limit a three service, but perhaps closer to 10, depending on the type of services and the development flow.

Getting a late night alarm on a virtual machine is still much easier to deal with, that an error on a container platform.

One solution that seems to be somewhat simpler, but still managing to retain many of the advantages of having a containerized infrastructure is Nomad, but I still haven't tested it on anything large scale.


This may sound like heresy, but Docker Swarm is perfectly viable for this kind of use case.


Viable yes, annoying, also yes.

Have you tried fixing Docker Swarm when it randomly decides that one worker is missing and it spins up the "missing" containers on the remaining worker while reporting that your missing a worker, but at the same time your containers are somehow also over-replicated?


Yes, and I've also run into the networking issues more than once.

However, both fail states are fairly rare, and Docker Swarm is far simpler to manage than K8s.


I use K8s at work, but Docker Swarm at home because it is simple to set up and works well.


But that's the operational headache of using Kubernetes specially when you are dealing with a small number of services. There are other simple and/or managed platforms that will give the same usage experience hiding the complexity.


Absolutely, just plain Docker or Docker Compose are both wonderful and easy to use tools.


I'm not sure if you have seen a proper build and maintained automated e2e lifecycle.

You write code, you push it, 5 Minutes later it is rolling out, tested, with health checks and health metrics.

Your infrastructure itself is keeping itself up to date (nightly image builds, e2e tests etc.)

It just works and runs. It doesn't make the same mistake twice, it doesn't need an expert to be used.

I'm not saying its for everyone! Put three/four VMs on AWS, add a managed database and you are good to go with your ansible. Use a Jira plugin to create reaccuring Tickets for doing your maintenance and that should work fine.

Nonethless, based on your 'random list of things' it does sound like you are not doing it right.

There is something very wrong if you really think its critical for you to be able to 'hot fix' aka playing hero by jumping on your vms and hacking around. IF you only one single server for your production env. there is no risk of forgetting a server to patch but there is still the issue of forgetting to backport it (which is probably the wrong term if you don't hotfix your release branch)

Most mistakes i do, are mistakes i do because i was able to do them.

And that might sound unfortunate but there is a feeling of trust for your setup. At least i get that feeling and i get that through automatisation. Knowing and seeing the deployprocess just working day in day out. Knowing that my monitoring and alerting is setup properly, knowing that the systems keep themselfs up to date, knowing there are proper tests in place.


> You write code, you push it, 5 Minutes later it is rolling out, tested, with health checks and health metrics.

Yep, I have that. It's more like 15 minutes than 5 for me, but this process has nothing to do with containers - it can be done in the same way without containers.


> It just works and runs. It doesn't make the same mistake twice, it doesn't need an expert to be used.

Except when you misconfigure something on friday night and it does the same mistake 100 times per hour until someone notices it.


This will happen once and then there will be a test for it.

My automated system will only get more resiliant over time. This is a benefit for the system itself.

Of course when you do it manually, you will learn and gain experience but thats only for YOU. It does not just get transfered to your colleagues and when you are on holiday and shit hits the fan, it will not help.

My biggest reason why i like automatisation so much is: the company becomes less reliant on me.

It is the same mechanism why the industry is getting more computer logic: Machines are complicated and you need to train people. Make it easier for people to 'just use' and you have more people available which you need to train less.


That's why there is a don't deploy on Friday rule.


Yeah, I don't think that's a new problem with containers.


0. "Builds are slow" -> use multi-stage builds to ensure layer caching works great.

1. "Hot fix now" -> I do just log in and enter the container, change a couple of lines and restart, not sure what's your problem here

2. "containers do not close when ssh breaks" -> I guess that's also going to save you if you run <whatever mysql management command> without screen/tmux !

3. "Harder to change how things work" -> actually, it makes it much easier to add services to a stack: just add them to the compose file and use container names for hostnames in configurations !

4. "Must remember launched containers" -> why not use NetData for monitoring ? it really does all the monitoring/alerting you need out of the box ! And will show containers, tell you before /var/lib/docker runs out of space (use BtrFS to save a lot of time and space)

I'll add that containers make it easy to go from DevOps to eXtreme DevOps which will let you maintain a clean master branch, and that is priceless ! Details -> https://blog.yourlabs.org/posts/2020-02-08-bigsudo-extreme-d...

Where I'm migrating to : replacing Ansible/Docker/Compose with a new micro-framework that lets you do all of these with a single script per repo, instead of a bunch of files, but that's just because I can :)


I've heard a lot of the same complaints from people who almost universally have /not bought into the idea/ if containerisation. If you're using containers but you yearn for the days of being a web master, treating servers as pets rather than cattle, and wanting to edit code on the fly, then you're never going to get along with containers.

It's the same jump, from non-containerisation to containerisation, as it is from non-SCM to SCM. People who upload their files via FTP have a hard time picking up Git (or well, they did, ten years ago or so.) You'd have people complaining that they have to run a whole bunch of commands: git add, git commit, git push, then on the other side git pull, when they used to just drag the files into FileZilla and be done with it.

The thing is though, if you change the way you work, if you change the process and the mindset, you can be in a much better position by utilising the technology. And that requires that you buy in.

But, as for your questions: no, I haven't. I have always taken legacy or new projects and gone containerisation with continuous integration, delivery, and deployment.


Isn’t it quite a jump from treating servers as pets to containerization? There is a middle ground - autoscaling, health checked VMs behind a load balancer where the autoscaling group is using an image.


I tried the middle ground. I used Packer with Ansible to build the VM images, on the theory that the auto-scaling group should use final, ready-to-run images. My image builds took 15-20 minutes. Also, for the services that had only one instance, it was way too tempting to just SSH into the one VM and update things manually rather than suffering a full build and deploy. Do you have any suggestions for a better way to do this middle-ground approach?


yes, Chef Habitat


Stop being mediocre


That advice is impossible to act on. No one can be perfect at everything. So it's necessary to be mediocre at some things in order to accomplish the things one really cares about. So yes, I'm mediocre at operations. I want to get better, but if I tried to be perfect at operations, I wouldn't get other things done.

So, do you have any suggestions that can actually be put into practice?


If you get better at operations it makes other things easier. Being better at operations is the ultimate sharpening the saw move.


I run everything directly on VPSs and deploy via rsync.

Every now and then I have long discussions with friends who swear on containerizations.

All the benfits they mention are theoretical. I have never run into one of the problems that containerization would solve.


That sounds like a very small setup you run with very limited requirements if you run this successful.

The benefits they are mentioning are theoretical for you and i personally have not worked in a professional env where VPSs and rsync would be enough at all.


You sound exactly like my friends. Except that they know that my systems are several orders of magnitude bigger then theirs.

So they don't argue with the present "This cannot work" but with the future "This will lead to catastrophic failure at some point!".

This has been going on for years now.


What is your rough setup then?


> That sounds like a very small setup you run with very limited requirements if you run this successful.

No, not necessarily. Computers are fast, and if you don't add complexity until you need it, you can do a hell of a lot with a half-decent VPS and some rsyncing.

For context: a couple of years ago I ran a website that was in the Alexa top 1K for a while (back when that was still relevant), and that was heavily visited and used for the time during which it was relevant. If you worked at any news organization anywhere, it was probably on your daily list to check.

Yet it was relatively crappy PHP, not even very optimized aside from some very naive memcache caching, and ran off a random VPS with 2GB of RAM - and that included the database. The biggest challenge wasn't scaling or deployment processes, but fighting off constant DDoS attacks.

Of course, the key difference between that deployment and a typical startup deployment is that it wasn't built like a startup. It wasn't "measuring engagement", it wasn't doing "big data", it wasn't collecting data for targeted advertising - it just did one thing and it did it well, with the only complexity involved being that which was actually necessary for that purpose.

Over the years I've looked at a lot of complex "devops" setups for other people, and almost without exception the vast majority of the resource requirements and complexities originate from data collection that approaches kleptomania, and their choice of tooling - which ostensibly was chosen to better handle complexity. It's just a self-fulfilling prophecy that way. Most people don't actually have this degree of complexity to manage.

That's not to say that there's no organizations or projects at all that would benefit from automated cluster orchestration (with or without containers). But it's very much a "prove that you need it" kind of thing, not a "you need it unless..." kind of thing.

(I do think that there's inherent value in deterministic deployments. But that's separate from whether you need multi-system orchestration tooling, it can be achieved without containers, and even then the deployment process should be trivial enough to make it worth your while.)

Edit: To be clear, this is not an argument to prioritize performance over everything else or avoid dependencies/tools, at all. Just an argument to not add moving parts that you don't actually need. For anything you add, you should be able to answer "what concrete problem does this solve for me, and why is it worth it?".


These things can all be managed an automated using ansible/salt/etc -- containers just add another layer of abstraction to manage/maintain/understand/etc.


> I have never run into one of the problems that containerization would solve.

You've never had to migrate your app to another host, or manage dependencies for an app?


you have never made an overwriting change that broke your system in a way that made rollback difficult?


> At Viaweb, as at many software companies, most code had one definite owner. But when you owned something you really owned it: no one except the owner of a piece of software had to approve (or even know about) a release. There was no protection against breakage except the fear of looking like an idiot to one's peers, and that was more than enough. I may have given the impression that we just blithely plowed forward writing code. We did go fast, but we thought very carefully before we released software onto those servers. And paying attention is more important to reliability than moving slowly. Because he pays close attention, a Navy pilot can land a 40,000 lb. aircraft at 140 miles per hour on a pitching carrier deck, at night, more safely than the average teenager can cut a bagel.

> This way of writing software is a double-edged sword of course. It works a lot better for a small team of good, trusted programmers than it would for a big company of mediocre ones, where bad ideas are caught by committees instead of the people that had them.

http://www.paulgraham.com/road.html


The idea that Navy pilots don't crash because they have minds like steel traps is absurd. They have a ton of process and redundancy to reduce human error to a minimum, and they follow tedious checklists religiously. Even private pilots have this rigor.

If you applied the amount of process pilots use to software deployment, you'd improve ops by orders of magnitude.


And they sometimes can't land those planes on the pitching decks at night, and have to redirect or ditch. Source: old man was in Carrier Ops, talked about it constantly.


Not that I remember, no.


I can opine based on my current position, where I interact with both containerized and non-containerized infra, specifically a docker-compose-like system versus direct installs on AWS EC2 instances. In my opinion, a well made containerized system is far superior an experience:

- Deploy times are certainly slower, up to 50x slower than non-containerized. However, we're talking 30s deploys versus 20 minute deploy times, all-inclusive. The sidenote here is that you can drastically reduce containerized deploy by putting in some effort: make sure the (docker) containers inherit from other containers (preferably self-built) with executable version that you need. For instance, you might inherit version X of program A and version Y of program B before building only a container with version Z of program C, as A and B barely change (and if they do, it's just a version bump in the final container). Even better, just build a code container during deploy (so a container with essentially only code and dependencies), and keep all the executable as separate images/containers that are built during development time;

- Containers do allow high-speed fixes, in the form of extremely simplified rollbacks. It is built into the entire fabric of containers to allow this, as you just change a version number in a config (usually) and can then rollback to a non-broken situation. Worst case, the deploy of fixed code in my case does take only 20 minutes (after the time it takes to fix/mitigate the issue, which is usually much longer);

- Local environment is _so much easier_ with containers. It takes 10 minutes to setup a new machine with a working local environment with containers, versus the literal hours it can take on bare metal, disregarding even supporting multiple OS'es. On top of that, any time production wants a version bump, you can start that game all over again without containers. Most of my devs don't ever worry about the versions of PHP or Node they are running in their containerized envs, whereas the non-container system takes a day to install for a new dev.

Containers can be heavy and cumbersome, but in many cases, a good responsibility split can make them fast and easily usable. In the specific case of docker, I find treating the containers like just the executable they are (which is a fairly default way of dealing with it) works wonders for the easy-and-quick approach.


Local environment specifically isn't fully containerized in my project. DB and similar things (elasticsearch, message queue) locally are inside containers, but the code itself is not. I worked on a project before where I had to have it containerized and it was a slow mess. I'd rather spend a couple more hours on setting up local dev environment for every new hire than deal with code in Docker locally.

In production we have it done the other way - PostgreSQL and Elasticsearch are run directly, but code is in containers.


To be honest, I have a fairly similar situation, I just use a different code-container for local than for production. In production we run some things directly and package the code, whereas in local we package the services, and have the code semi-separate.

In the production environment, I want the code image to be set in stone, that way a deploy or rollback will go to the exact git commit that I expect. So the CI-script for deployment is just a `docker build` command, the dockerfile of which clones a specific commit hash and runs dependency installation (yarn install, etc.), then sets the image version in the production env variables. The code is then encapsulated in 1 image, which is used as a volume for other containers, and the runtimes are each in their own container, connected by docker-compose.

For local, it's a much heavier code image that I've prebuilt that contains our current version of every tool we use, so that the host machine needs nothing but docker installed to be able to do anything. The services that actually display stuff on the screen (Node.js) run as their own container with their own processes, but you can hop into your code container (used as a volume for the services) and try out command line Node stuff there, without fear of killing the procs that show your local environment.

It took a long time to reach this point, lots of experimentation, but it's now pretty lightweight and pretty useful too.


Hmm, not sure I understand why you put code in one image and then use it as a volume for other containers. Why not run directly from the volume with the code?


Composability whilst maintaining a monolith codebase. The code in its single imageis used in 3 different environments, without including any runtimes that those environments might not need, keeping the image small. At the same time all code can be kept in a single git repository.


It is slow in your local because you are probably using OSX or Windows. On Linux the speed is near native for me for local development.


Yes, locally we use OSX apart from one developer who uses Linux.


I guess thousands of developers would pay for a fast docker implementation on mac. File access is so slow if you want to mount your sourcecode into the container.

There are solutions like docker-sync but they have a kind of random delay, sometimes the sync happened fast sometimes it takes a few seconds.


I also do this. We have DB and another service running via docker-compose. But our actual Webpack typescript app is done locally. We run on OSX, and due to the Docker file system slowness, the dev-test-run deploy cycle is far too slow. Much better running it outside of Docker.


> a well made containerized system is far superior an experience

Like everything, it depends on context. Is it good to have separate dev, staging, and production environments? For many companies, the answer is "D'oh!". But if you're one guy trying to get your first p.o.c. out, by all means, deploy directly to production.

Containers is sortof the same thing: if you're a small team, and don't have many customers, the disadvantages of containers (however small) may outweigh the advantages.


I largely agree with you, best tool for the job and whatnot. At the same time, I feel that due to the time I've spent investigating and experimenting with containers, I'm almost better at getting containers to work the way I want than to run it plain.

So if I was now starting a new project solo, I would probably go straight for containers and never deal with conflicting versions or difficult, hacked-together rollbacks, and I don't think it would take me, personally, more time to set up.

In other words, I think it's a good investment of time to spend, say, a full work-week on understanding containers and experimenting on how to make it work for your use-case.


regarding local environments,

have you faced performance issues? running 5-6 docker images (app,redis,db,mq etc) at the same time has been causing the machine to lag for us. most of my team is on the 16 inch macbook.


I won't be the first to tell you that docker + macos is a constant struggle. It can be done, and I support about 10 developers using about 10 simultaneous containers on macos. My troubleshooting workflow right now is the following:

- Is the container throwing errors? A running container somehow repeatedly throwing errors on its non-main process restarting will eat up all resources of any machine, any OS;

- Is the container trying to sync a folder or datasource between the container and the host? Especially on macos, using docker for mac, this will hurt your performance. Solutions are in the form of specialized syncing systems (docker-sync)[http://docker-sync.io/] or manual syncing using rsync or unison;

- If you have many running containers, it can be useful to spin up a linux VM (ubuntu, debian) in virtualbox, then run the containers in there, finally using a tool like unison to sync dynamic content (the changing code) to the vm;

- Is 1 container using far more resources than the others? Is this strictly necessary? It is possible, in docker-compose at least, to limit certain resources for containers (it's probably also possible without docker-compose, just using docker run) https://docs.docker.com/config/containers/resource_constrain....

Probably the dynamic content synchronisation is the biggest resource hog, and docker-sync has really helped with that in the past, plus it doesn't require different setups between linux and mac hosts, i.e. you could use the same docker-compose.yml file.

It doesn't hurt to inspect whether the host machines are potentially running other heavy services, such as heavy electron apps or a million open tabs in browsers. I've had to tell some devs to perhaps not use their work machines for personal chat applications (but just having their phone open then) because those applications were using >3GBs of RAM each on a 16GB ram machine, leaving very little for any work-related processes.


I was happy with docker-sync only in the first hour or so. I was trying it for a php+js setup (so many small files) and was frustrated after some time. Sometimes the sync happened instantly, sometimes it needed multiple seconds. Its reslly frustrating refreshing a webpage to see a bug still happening again and dont understand whats happening. Changing again and not fixing it again, adding some test output and not seeing it finally understanding that the sync is slow again or you need to restart it again because it silently broke.


File access in Docker for Mac is a known issue. Best solution is to mitigate using docker-sync or similar.


Run it native - binaries plus supervisord - do the same thing on prod. It's fast, observable, easy to debug, etc.


What do you think about systemd and monit?


Maybe monit works - i like something that’s not the system init so you can use the same config on macos that you use on linux...


ah, that makes sense. I don't deploy to macos.


Neither do I, but it's convenient to use the same stuff in dev you use in prod.


I have long felt like containers, and VMs before then, have been abused to the point of absurdity. Most of the primary reasons for people jumping to them are already sufficiently solved problems.

* Resource and process isolation -> capability-based security

* Dependency management conflict resolution -> nix / guix style package management

* Cluster orchestration and monitoring -> ansible, salt, chef, puppet, etc.

If you need all of those things at the same time, maybe containers are the right choice. But I hate the fact that the first thing we do when we run into a pip package conflict is to jump to the overhead of containerization.


Others are talking about process reasons but I have a technical one:

We have an internal tool that listens to a message queue, dumps a database (from a list of permitted databases), encrypts it and sends it to S3 to be investigated by developers.

When running on a container, the process takes 2-3 minutes with small databases, about an hour or more with larger ones. When running on a regular EC2 image, the process takes about 5 minutes in the worst case scenario and is borderline instant with smaller databases.

Mapping internal volumes, external volumes, editing LLVM settings, contacting AWS support etc yielded nothing. Only migrating it to a regular EC2 instance had any results and they were dramatic.

We run Docker containers for local development too but when restoring very large databases we need to use MySQL in a real VM instead of in our Docker setup because Docker crashes when faced with a disk-heavy workload.

So to conclude, the only reason I wouldn't want to containerise a workload is when the workload is very disk IO heavy, whereas most of our apps are very network IO heavy instead.


If your hosts run linux, maybe take a look at disabling the userland-proxy in docker for your development environement, and see if it helps. userland-proxy _really_ slows down certain applications in my experience. Setting userland-proxy=false in daemon.json and restarting docker converts from using the userland-proxy to using iptables. FWIW still considered "beta" and may result in bugs of its own, but has really helped with a few of our ($dayjob) more pokey apps a few of our environments.


Isn't that the same as just running docker --net=host?


No, it still forwards to a docker internal/bridge ip, just using iptables forwarding instead of a tcp-proxy. net=host just uses the host context.


Yep, I moved our knowledge management platform[1] from Docker + Docker Swarm to just deploying with Ansible.

I think containerization is another one of those things that you're told is great for everyone, but really you need to have many teams with many services that all need to act in concert in order for containerization to be worth the effort / overhead.

That being said, I conceptually prefer how with tools like K8s you can have fully declarative infra as code, rather than the hybrid imperative/declarative mix of a tool like Ansible.

[1] https://supernotes.app


thank you for giving visitors of your website the choice to completely avoid cookies - without any dark ux patterns involved!


Why didn't anyone bother asking what's this containerized infrastructure for? The size of it? Purpose and redundancy options?

Everyone in HN starts criticizing vague container statements. This really turned into Apple vs PC debate.


The desire to make production changes outside of version control and change management flows is not a complaint about containers. It isn't an Apple vs. PC debate. It is literally reading a question from that person on your team who makes your job extremely difficult and error prone.


isolation, ease of deployment, management of artifacts and network resources

if you're running a single app a lot of this may not apply


Still vague. In my opinion containers are not for everyone, it has to be a very specific scenario. Most internal things can be achieved with VMs easily.


Docker ≠ containers

You can run lxc/nspawn containers as lightweight VMs and save a lot of (runtime, management) overhead without having to worry about any of Docker's or k8s's quirks.

We're quite happy with that approach, Docker isn't production grade IMO and k8s doesn't make sense at our scale.


I felt bamboozled when I first heard about nspawn. It's like docker but well integrated into systemd. How come there is so little discussion about it? Is the tooling lacking?


I love containers! I just hate that so many people assume that means docker and ignore the things you refer to. lxc is so nice and fast... I haven't taken the time to test out nspawn yet though.


We never bothered to migrate our small setup (circa 20 instances) to containers, we just use VMs.

We use Go binaries, so dependencies are compiled in, hosted on VMs on a cloud provider, setup with cloud config, and using systemd to manage services sitting behind a load balancer, one vm per service. Automated test and deploy so it's simple to make updates.

Never really felt the need for containers or the extra layer of abstraction and pain it brings.

Re hot fixes, automate your deploy process so that it's fast, and you won't mind checking in a fix and then deploying because it should take seconds to do this. You don't need containers to do this.

If you can't deploy changes quickly and easily in a reproducible way something is wrong. Having a record of every change and reproducible infrastructure is incredibly important, so we check everything into git then deploy (service config, code changes, data migrations). You don't need containers to get there though, and I'm not really sure they make it easier - perhaps in very big shops they help to standardise setup and infrastructure.


I have migrated to containerized infrastructure recently and I can tell that it has its benefits, quite a lot actually. (Where before I worked only with VPSs.)

But after working with it, it's pretty visible that the abstraction layer is really huge and you need to learn the tools well. When you deploy to linux VPS, you probably have already worked on unix system and know plenty of the commands.

Another thing, I think having a designated person to the infrastructure makes it much less trying for a team. On the other hand, you have 'code' sitting in the repos and everyone feels like they can into devops. I don't think it's exactly true, because e.g. k8s is a pretty complex solution.


> If I need quick hot fix RIGHT NOW, I can't just log in, change couple of lines and restart, must go through full deploy cycle.

If you have the need for that kind of thing, I don't know why you would use containers.

Containers is for organizations who have processes.

Unfortunately nowadays we teach every developer to have containers, ci/cd, terraform, test coverage, ... as a requirement


Logging in to fix something RIGHT NOW also really falls off as soon as you hit moderate scale and have to do the editing on 20-30 boxes.


Who really did that though?

I’m sure a few. But mostly back in the days before we (almost) all had containers we automated that stuff with puppet etc...

It worked ok but it had its own problems, we iterated and move on to disposable workloads and infrastructure - which has a completely different set of problems. Amongst other things it makes scaling even easier - if you need it and if you do it right.

Before automation frameworks and services we had scripts, which were either nice and simple but limited, or a huge mess and highly complex (and often fragile).

Before that we either had mainframes / centralised computing or we didn’t have scale.

Maybe it was different in the windows world, but that’s pretty much what my experience has been across many clients over the past 16~ years with Linux/BSD etc...


You can both have processes, and rare exceptions to processes. Abd even processes for exceptions (eg checklist that includes documenting what you did and reimplementing the change eg in vc'd ansible)


You are asking right, but you also listing points(opinionated) that are intended to protect you against human error.

Pick the tools that suits your flow.

Nothing wrong with bare metal or virtual servers.

EDIT: to add some good years ago was managing PHP shop where all production was baremetal and development/staging was replicated in containers, everybody happy, hope it helps


We use Nix and NixOS with no containers. You get some of the benefits of containers (eg different binaries can declare dependencies on different versions of X), without some of the wonkiness of containers

It has trade offs (eg worse docs), but you might like them better than eg Docker’s


Heroku works fine for me. No k8s or docker container nonsense just push the code, deploy and done.


heroku is a PaaS, it could be running containers internally, you wouldn't know. docker, k8s and the other tools are just building blocks of a PaaS, you are comparing oranges to a full course meal.


Sure, but the point is that by running your own K8S you're running a cloud provider. Which is fine, if your business is being a cloud provider. If not, let someone else do it, they'll be better at it.


My startup would have gone bankrupt a couple of years ago if we would use something as expensive as Heroku. Or AWS.


Genuinely curious what you were doing. Renting your own racks in a DC, providing reliable networking, paying upfront for all the servers and then hiring staff for just managing all this bare metal is insanely expensive for a startup. How is not using cloud a cheaper option?


We're renting bare metal servers from Hetzner. It's multiple times cheaper than AWS. We have lots of data, and to handle it with no problems our DB server has 512 GB of RAM and 24 cores (AMD EPYC). How much does it cost to have a comparable server on AWS? More than $2k per month, last time I checked.

We bootstrapped the startup and I'm based in a much cheaper area than SF or NY (Eastern Europe, actually). I can hire a decent developer here for $2k per month.


Can you share your company ?


I've sent you an email to the contacts in your profile.


Okay so you're still using Cloud service but with a different layer of abstraction. It's not the same as on-prem Bare metal which usually people refer to as "not using cloud"


There are options in the middle of "shared hosting" & "build your own cpu".


Yep, I haven't said I'm "not using cloud", I said I don't use Heroku or AWS because they're expensive.


This is to me a very scary comment. It says "we have no idea how computers work". Source: run a service on bare metal in a rack in a DC that I haven't entered in a year or more.


What is your scale? How much does it cost? What happens when your rack or DC goes down? All of this is not worth it for small-mid size companies. This is why the cloud took off in the first place. The revenue of AWS, Azure, GCP speaks for itself.


So much IT is relatively easy in the singular.

It's the plural that teases out the differences between the pets and the cattle.

And maybe that's the point of this thread: scalability.


Scaling what though?


Any system.


Yes, dynos are containers; their use of containers predates Docker.


Yes, for my home network. The entire setup is just for me and my hobbies, no ci-cd. Managing containers was much more work, even with docker-compose. After two years, I switched back. Maybe the tooling is better, or maybe I am better with the tooling, but it seems much easier this time around.

One big change in the last 2 years is documentation on "how to use this image" has become more common. Figuring out how to use an image used to take hours - inspecting it's internals, learning the config files for that specific tool, modifying just the lines you needed, or write/mounting a custom file/folder, etc. Now, many images have docker-compose examples, and many images have loader scripts that can read env variables and configure themselves properly. Having a good entrypoint.sh is a huge benefit to a docker image's usability, and having a docker-compose example is good documentation

Why did I switch back? The isolation finally became significantly more useful for me. Perhaps the range of my 'hobbies' increased - I started running many more technologies. Multiple tools had conflicting dependencies, or only supported super-old options (looking at you, Unifi Controller still depending on the EOL mongo3.4)


I work with relatively immature team and a monolithic app that does not deal well with errors. The static infrastructure works flawlessly which caused a huge debt in the form of lack of error handling. The application while processing huge transaction load also has strict real-time requirements and no scalability requirements (load is well controlled from within the org).

When I joined, the team was in the process of migrating to cloud, yet with no understanding of what that means. The basic plan was to split the app into smaller services and get them running in containers, with no provision to get the team to learn to support the app, debug problems, deal with spurious errors, etc.

We are currently migrating off to get be able to focus on improving the app instead of spending entire learning budget (I mean developers' limited focus ability, not money) on learning cloud tech. Improving here means refactoring, properly separating modules, building robust error handling, etc. There might be time when when we decide to get back on cloud but currently I see this is only distracting from really important issues.


I agree with you that Dockerizing everything is a fad. My team migrated off of Docker 4 years ago and couldn't be happier. But if you're keeping long lived SSH sessions alive for weeks on production servers, and expecting to fix things RIGHT NOW by "loggin in, change a couple of lines and restarting," then you have process/methodology issues my friend. Handle that!


Never adopted it, too much resource consumption (ram/cpu) quickly adds to the cost for these systems in prod esp if you are trying to scale on a low budget.

System now is usually deploying to a workbench, automated installing the services I need there, automate making a disk image I can use to provision n-number of machines on the fly through a loadbalancer daemon (that monitors cpu load, network in, and other custom tasks on the slaves to determine whether to scale up or down), while still having the flexibility to automating scp'ing (fabric) code to the slaves as things update (also through the daemon) without re-provisioning everything on the slaves via boot from an image.

An aws consultant tried to move our monolith to a full on aws monstrosity with docker + elb + cloudfront + bunch of other stuff, went from about ~$15/day to ~$180/day in infrastructure costs, and a bunch of stuff was (more) broken. Decided to roll our own, and were around ~$20/day now, and can still bring it down below what we were paying before.


> Must build an image before deploying and it takes time

not strictly true.

this is how we do with kubernetes and docker:

for testing: have test images that have the basic supporting systems built in the image, but the application gets built from the entry point before starting with a configuration point provide the git credentials and branch to build.

startup is 5 minutes instead of instant, but that's handled by minReadySeconds or initialDelaySeconds, and there's no image build involved, just a change into the deployment metadata to trigger the pod cycling.

for production: you split the basic supporting image and add your built application as a layer on top, depending form it, so instead of building everything from a single docker file you only push the binaries and the docker push is basically instant.

if performance of that step concerns you that much because the binaries come with lots of assets, you can run CI directly on the docker registry host, so push bandwidth becomes a non issue, or you can bundle assets in an intermediate image with it's own lifecycle.


Personally I prefer repeatable deploys and knowing that the right dependencies are installed, however we've all found ourselves in OP's situation from time to time. We still have a few ancient pets around that may get in situ hotfixing when circumstances warrant, but most of our workloads are containerized and deployed to kubernetes via gitlab pipes. There are a couple of things you can do to speed docker builds up, such as using build targets and structuring targets so that the layers of code that might need to change are added to the image last, allowing the earlier layers to be served from cache unless base dependencies change. For most of our services a commit-build-push-run cycle is probably on the order of 1-3 minutes. That might be slower than a nimble-fingered guru logging in to edit a line and restart a service, but its close enough that the advantages of the system far outweigh its costs, for us anyway.


While we're on the subject: can anyone recommend a good place to start with containerized infrastructure "the right way" to someone who has only ever administered actual hardware the old-school way? It's not my day job, so it's not something I've been keeping too close an eye on.

Maybe some books? Much appreciated!


For Docker, I've found Docker Deep Dive to be a good book - https://www.amazon.com/Docker-Deep-Dive-Nigel-Poulton-ebook/...

For K8s, this book is from some of the k8s authors themselves - https://www.amazon.com/Kubernetes-Up-Running-Kelsey-Hightowe...


If you ask this question in HN, you will start another debate. Clearly everyone has a very unique way of managing and deploying containers. Even some consider Docker not a containerization. (you will see in the comments of this thread.)


OP was asking for a starting point to learn, not for best practices of managing and deploying containers.


OP sounded like had a very specific setup with some issues going on.


I moved back from AWS (and AWS with docker for a while) to plain linux debian with fixed number of instances, and I am quite happy with the move.

But the app/services are not installed manually, instead I use debian packages. Every package will copy itself into a 'backup' directory during installation, so in case of of rollback I reinstall a copy from there. I have it working for 2 years this way without issues. Configuration is preloaded inside the packages.

-Must build an image before deploying and it takes time ... still the same ... -If I need quick hot fix RIGHT NOW, I can't just log in ... perhaps this is more related to having fixed instances vs auto scaling. -Must remember that launched containers do not close when ssh breaks ... + memory issues , disk issues... this is fixed and it is the biggest benefit for me.


Why are you rebuilding your image everytime? You can cut a significant chunk of time by just reusing the last image if you only have code updates. Instead of FROM ubuntu:latest use your image name:lastversion then do thing like apt-get update instead of doing all the aptget installs.

You actually can ssh into a container if you’re debugging a deployment problem with something like: sleep 9999d your container, then do kubectl exec -it podname — bash

You can debug what the hell went wrong like was it a config? ENV? Fix all that and see if your deployment is working then fix it all at one with your changes in code then deploy. But agreed with the sentiment that ssh sucks. It sounds like your iteration cycle for checking deployments are long because of either tooling or knowing these small tricks that isn’t written in manuals.


Overengineering at it's worst. The vast majority of all apps do not need containerization. They just need Apache and a competent person at the wheel. I've switched back from AWS for many of my projects and I love how simple it used to be.


At H5mag we use bare VMs and deploy using Debian packages. Build them with a Gitlab runner, push to a package server. Let the ‘nodes’ update themselves. It’s really nice since the only dependencies you have for deploying are rock solid.


We're quite similar, except our packages are first installed onto GCP images which are then deployed gradually as new nodes start. We had a similar solution when on AWS.


If your container builds are slow you are doing something wrong. You need to restructure your builds so that you get better layer caching. One of the primary reasons we switched to containers in the first place was because it allowed us to quickly and repeatably build complicated infrastructure and share the prebuilt artifacts while also speeding up rebuilds.

As for dealing with runtime issues/hot fixes, put something like supervisor in your containers that might need temporary interventions in production and use that to restart the service without having to restart the container.


Maybe we are doing something wrong in our build and deploy process, we’ll going to re-examine all those things tomorrow.

I would think it’s pretty reasonable right now, but someone here said their deploy cycle is 10 seconds long and there are probably things we can improve without rebuilding whole infrastructure around different concepts.


Do you allow your CI workers to cache layers or do they build from scratch each time?


I moved to nixos and to be honest I don't miss much using docker. You can restrict your app pretty easily on linux without using a container.

The reason I moved away from containers was because of a linux kernel bug which slowed down network requests through docker. I was working on a latency sensitive application at the time, so I just moved nginx and my containers to real machines.

Setting up things manually wasn't great, especially when deploying to multiple machines, so I just wrote a few different nix configurations and created some Digital Ocean machines with nixos-infect as a init script. There was definitely a learning curve as the language is peculiar (2-3 days to get nginx + postgres + redis + python app), but after doing it once I can pretty much deploy anything I want in a fast and immutable way. Replicating a similar system with a node.js app took less than a hour.

Changing something on the fly is still possible and you have access to everything. I run everything as a systemd services that I can limit with cgroups.

You may run into problems if you're relying on an old package that it's not on nixpkgs, but adding local derivations is quite straightforward (and you can always take inspiration from nixpkgs).


In my previous job we used Azure and moved from Service Fabric using Docker to App Service without containers. I should say containers developed by us. as App Service uses containers.

Overall, everything was easier though that might be because of not using SF rather than moving away from containers. An end to end deployment (Iac, App and integration tests) of ALL microservices would take between 18 - 45 minutes.

The lower figure was if it skipped IaC (limitations on ARM templates and Vnets forced serial deployment) and the upper figure was slowness on our agents. You'd have to add anywhere between 2 and 30 minutes for the build itself (30 minutes was the remnants of the monolith that was being slowly and methodically dismantled)

Could save another 8-16 minutes by not running any tests and work as ongoing to allow a more pick and choose approach as the full deployment method was starting to become too slow as the number of microservices increased (I think we had 12 when I left)

There was nothing that, say k8s, offered and we needed that wasn't available on our setup, plus it was miles cheaper than SF or AKS.

If it works for you then fine, if it doesn't the try something else


We currently use habitat in production on VMs and have been happy with it.

It helped us solve a few pain points with deploying to VMs:

- dependencies are packaged and shipped with you app (no apt get install) so dev and prod environments are the same - local development happens against a docker container that contain all our prod services running inside of it (redis, postgres, Kafka, etc.) - built in CD and rollbacks via deployment channels


Sounds like your not using Kubernetes? Might be worth exploring a managed platform such as GKE or k3s on your own infra.

Tooling such as skaffold.dev would alleviate some of your complaints around deploy lifecycle. It will watch for code changes and automatically build, tag and deploy the image.

Paketo Buildbacks is great as well, no more Dockerfiles and a consistent build process across different languages and frameworks


I'm not using Kubernetes and I don't feel like using it at all. I want less abstraction layers, not more of them.


I would argue that by not using a container orchestrator, you are destined to write your own bad orchestrator


Kubernetes directly addresses some of your original complaints with those abstractions, they exist for good reasons


— If I need quick hot fix RIGHT NOW, I can't just log in, change couple of lines and restart, must go through full deploy cycle.

As somebody working on the ops side, this saved our asses so many times from making the already bad situation worse, for example: when the CI system caught a typo in the db migration script meant to fix the bug which only showed up on production, would brought down the whole site and we must have done a full DB restore.

Most of the senior engineers I worked with always knew, that there is a high chance that they can tell what their change would effect, but if they forget something it could cause massive problems and understood well, that our manager/team lead/etc would have been asked the really uncomfortable question from the business side, why did we went down and lost money. And I don't want to put them to a position, where the answer is, we made a change on production without testing, which brought us down, because we were in a hurry and as a result the 10 minutes outage became a 12 hours long outage.


I feel your pain. Believe me. After years of building platforms I've just gotten to the point where I think, why can't I just take my source code and run it. Why does no one really offer me this and of all the things that do how many different UIs do I need to go through, how much setup do I need to do. I quite frankly never want to touch AWS ever again.

I built https://m3o.com with this primarily in mind. I spent 5 years building an open source framework so that M3O would not be a silo and multi-environment is a thing. We made it so simple that `micro run [git url]` just works. And what's even better. We just run your code, we don't build a container, there's no CI/CD cruft, we just literally pull your code and run it. That simple.

Containers are a great way to package an app and its dependencies but we can prebaked then and inject the source code. What's more devs don't need to even think about it.


Being able to move fast (ahem, move quickly) is important. I don’t know your exact details but trust that you need this change.

Reforming your architecture so that it is reproducible without the overhead of containers, orchestration, etc is definitely possible.

What containers enforce upon you is the rigor of reproducibility. If you are disciplined enough to wipe your infrastructure and redeploy using shell scripts instead, you will get the same benefits without the infra overhead (training, bulkiness, etc. — looking at you, Ansible.)

Be prepared to implement a fair few things you used to get “for free” though. Also, you must strongly resist the urge to “just quickly implement [feature we used to have with Docker]”.

It’s quicker to think your way out of needing the feature than it is to realize how hard it was for Docker to reliably implement it.

I recommend at least basing yourself on LXC or LXD, and starting from there. It’s much easier to scriptably reinstall an LXC host than it is to reimage bare metal.


Containers are indeed cumbersome but what's even more cumbersome is software that work on your laptop, pass the tests, and fail in production for whatever reason such as a different software version of a dependency.

So I rather deal with containers. However never do kubernetes, it has nice features but it's not worth it for small and medium companies.


I disagree, Kubernetes is becoming a default choice. Small companies still want declarative config, load balancing and zero downtime deployments. It is so easy now to spin up a managed cluster and deploy an app, the complexity argument has been well addressed by the community and vendors


When I stopped working a few months ago, major cloud providers still had outstanding issues on their managed kubernetes.

The complexity of the configuration is huge. Zero downtime deployments and load balancing can be achieved without the complexity of kubernetes.

For example, drag and drop of .php files on a FTP server did that 15 years ago.


What happens if you need to drag and drop 2 php files and someone makes an HTTP request that gets processed right in-between the update of each file ?

They will surely get something weird, an error at best. Containers allow transactional deployments and FTP does not.


True. Perhaps you can upload to a temporary folder first, rename the old production folder to something else, and rename the temporary folder to the normal production folder. You may serve a 404 error if someone manages to do a request between the two renaming operation but I'm sure it's fine.


It's definitely better, but then it's not ZDD is it :)


Or just change a symlink to the new folder


> zero downtime deployments

> drag and drop a PHP file

These two things run in conflict. It’s not about not dropping requests - It’s about a sane state and error reproducibility. If you drag the wrong file, there will be plenty of downtime...


True. But if you apply the wrong YAML file in kubernetes too. If your container has the wrong version too.

Anyway, I'm not advising for uploading PHP files through FTP on live servers in 2020, I took it as an example of something we used to do without kubernetes.


This is an oversimplification of the problem. You need to drain the requests to the old version then wait until the new one passes a health check before rerouting the traffic


"It works on my machine" is not the problem which is always solved by containers. I had several cases where it worked on my docker on my machine.


We're considering it for workloads that need unique hardware setup (SSD's + formatting for ~5TB embedded dbs, CPU-pinning, GPU even)...

The overhead in some of the csp kubernetes platforms is quite annoying for managing the infrastructure & container runtime ops. We've "hacked" many setup approaches to get what we need.

Other than that, no way.


I hear all of your complaints and empathize with them. Containerized workflows are in fact more complex, and it does hurt a little.

But consider this. You can mount the entire host fs, say under `/var/host` or the likes, and you're tied back to Code on the machine. You can use the host network stack with `--net=host`. And you can even skip user and process space segmentation. And so what would that get you?

Containers are just threads with good default isolation. By default, the system capabilities are greatly reduced in a container (PTRACE for example, though sometimes that one hurts a little too). Systemd does the exact same things with it's units, careful segmenting out it's units into process groups with constrained capabilities.

The point being that containers are just processes with good default isolation. That's a win for versioning and security, at the cost of complexity.


In my experience so far, containers are only really a bother if you're already doing things wrong and it's exposing practices that may be convenient in the short term but will bite you in the long term. I know in previous jobs the things that were obstacles to containerization were essentially anything that made a single server "special", but making "special" servers is a bad idea in the first place. Pets vs. cattle, etc. etc.

Specific points..

> Must build an image before deploying and it takes time, so deploys are slow (of course we use CI to do it, it's not manual).

How large are your images? Build times can be very variable, but if you have a good base image it can be very fast. I think on average my build times have been about 5-20 seconds. There's definitely an art to it, but dockerfiles are not in any way complicated and setting up a multi-stage build to cache the expensive parts (compiling C libs, etc.) is fairly straightforward.

> If I need quick hot fix RIGHT NOW, I can't just log in, change couple of lines and restart, must go through full deploy cycle.

Oh my god, just don't do this. This is such an antipattern. If you think you need this you're really doing things wrong. If you need a hotfix RIGHT NOW, you should hit that fancy roll-back button (which containers and their orchestrators make easy..) and then figure out what went wrong, instead of trying to use vim to fix some line of code in production.

> Must remember that launched containers do not close when ssh breaks connection and they can easily linger for a couple of weeks.

Huh? TBH I don't understand why you would be expecting a closed SSH connection to shut it down -- these things are almost always meant for running a service -- but this is a really minor thing.

It sounds like you just don't want to change any of your current habits, not that the habits that containers encourage are somehow worse.


I haven't migrated off it, but I've stopped forcing it everywhere because I came to the realization that there will probably be only one copy running in production anyways.

These days I just keep an <project>/systemd/<bunch of service files> and <target os>_provision.sh

Provision file has all the commands the target OS needs to run the program. Service files are copied to their respective places under $HOME/.config/systemd and are ready to use after systemctl --user daemon-reload and systemctl --user enable <all services>. You can also use loginctl enable-linger <username> in order to start user services on startup.

I install programs as services using systemd and monitor a bunch of them with monit. I also can't help but notice that they do work better on bare metal.


I agree with you. The problem is that individuals and small teams adopts enterprise best practices not realizing that smaller scale solutions might work better.

On my own projects I even deploy with cca 200 lines of Bash. It's super super fast, I understand every bit of it and can fix things quickly.

You should move to containers/CD/immutable infra when it makes sense to you and your project. But as someone already mentioned you can make containers fast as well.

Anyway if anybody is interested in actually understanding the basics of deployment I am writing a book on this topic called Deployment from Scratch[0]. I am also looking for beta readers.

[0] https://deploymentfromscratch.com/


Before all of my work time transitioned to deep learning projects about 7 years ago, I spent about 15 years (mostly as a remote consultant) creating and deploying systems for customers. A few random thoughts, admitting that my opinions may be out of date:

Question number one: what is the business cost for a system being offline? If the cost is moderate, I find it difficult to argue against running two copies on a single server or VPN in different availability zones behind a robust (managed as a service) load balancer. Scale the servers or VPNs for anticipated work load.

Managed container services or platforms like AppEngine or Heroku have always made sense to me to to reduce labor costs.

Containerized infrastructure makes sense when the benefits out weigh the labor costs.


I've work in all kind of places as sysadmin (or now devops) since the beginning of the century.

Have seen what you call old-school maybe, the VMware era, Xen, colocated infra ,owned infra, ... The Cloud(s)... and finally docker (and later kubernetes), in those years.

Now I can say I happily switched job 3 years ago, to a place that never entered the virtualization, neither the containers vogue.

On a team of 3 and half persons (enough to cover one week of on-call each month), we manage our near 500 physical servers (all of them with 2xXeon, between 32 and 256G of ram, 10G networks (in 40G switches/uplinks) and all kind of storage layouts with SATA, SSD and NVME ) in different colocations. Sometimes, with the help of remote hands (mainly for hard disk replacements).

During those 3 years (and before) I have seen lot of drama between my friends, by all kind of issues related to container technologies.

Complexity issues, high availability issues, bugs, maintenance issues, team issues, cross-team issues, management issues, security issues, networking issues, chaos issues, nobody-knows issues, cost issues, operational issues, etc

Yes, you can have all of them with bare metal servers too, indeed, but I look my daily work, and then I talk with friends in pro-container companies, and I feel good.

Nothing stops you to use ansible in bare metal servers, indeed you need to automate everything if you want to be happy: IPMI setup, firmware updates, OS install, service management, operations, monitoring... the most of that you fully automate, the better will be your daily routine.

Also, really important, going "old-school" doesn't free you to have a full stage environment, and a well thought and fault-resistant architecture and good backups.

Regarding your random list:

> Must build an image before deploying and it takes time, so deploys are slow (of course we use CI to do it, it's not manual).

Maybe going "quick and dirty" will release you of this, but doing things well, no mater if it's container or bare metal, won't.

I need to build packages (or ansible roles), test the change effects, pass tests and CI, the same (it's not mandatory, but it's convenient)

> If I need quick hot fix RIGHT NOW, I can't just log in, change couple of lines and restart, must go through full deploy cycle.

True, but we're in the same point... "old-school" is not the same than "quick and dirty". You can change containers in a quick and dirty way too if you want (and your company allows that).

> Must remember that launched containers do not close when ssh breaks connection and they can easily linger for a couple of weeks

Well each platform has their own issues to deal with. In bare metal you could have to deal with compatibility and degradation (or even breakage) issues for example.

I think, a good balance between all this could be: develop "quick and dirty" and freely, release via CI, deploy properly. No mater the platform.

If developers don't have agile environments, and they need to commit, wait for build, wait for CI, review the results etc for every and each line of code they want to try... I get what you mean, it's a pain.


> Also, really important, going "old-school" doesn't free you to have a full stage environment, and a well thought and fault-resistant architecture and good backups.

Of course! We have a full stage environment, good backups, etc. And I'm not willing to lose them.

I've worked once in a company where a senior developer dropped a couple of tables on production, back in 2008. They made backups but never tested them. Turns out that for some reason they were zero-sized. I'm testing all my backups since, so they're good :)


Just so you know, containers are not about virtualization, clustering, orchestration etc.

It is about application packaging.

For example, in some projects, I run containers via ansible.


On the one hand, something like Docker/Kubernetes offers fine-grained resource control that nothing else can match; on the other hand, this is an advanced optimization that very few companies actually need. Last year I had several clients who nearly destroyed themselves by over-investing in infrastructure setups that were excessive to their needs, so out of frustration I wrote this essay that listed all of the ways Docker/Kubernetes was excessive for nearly all of my clients:

http://www.smashcompany.com/technology/my-final-post-regardi...


Actually, containerization helped us a lot. Our automated CI/CD does linting, testing (we only test critical stuff atm), compilation and deployment in less than 5 minutes. We also double-check changes by requiring approval to merge into develop and a second approval if develop is merged into the main branch.

If we need to deploy a hot fix, it is pushed to master directly with admin access or if there is something truly critically (which should never happen), we just block connections on our load balancers and/or clusters.

Containers are imho one of the best things out there for deployment, as you can deploy changes to a cluster of servers without needing to reconfigure them and without the codes/containers interfering each other.


I really enjoy the responses here because I think this is the first time I got what all the fuss about containerized systems.

Stateless is a pain unless it forces you to decouple from the state (in the database, presumably?) so you can roll back (or forth) easily.

I know everyone in the industry probably thinks: haven't you read all these manuals and guides? Even as a tourist, that is what we have been telling you.

As a tourist: no, I didn't see what the guides thought they were showing me. I've seen this a lot where I read through an intro just to learn (I don't program sites professionally), and then at the end I have an "aha" moment that was never (in my opinion) explicitly stated.


>— If I need quick hot fix RIGHT NOW, I can't just log in, change couple of lines and restart, must go through full deploy cycle.

yes you can. you need to have your containers running opensshd. Despite what people tell you, even facebook does this on Tupperware ( Slide 18 of https://www.slideshare.net/Docker/aravindnarayanan-facebook1... ).

Our containers are built using supervisord running the app itself and sshd.

Containers allow you to do anything. Its people that are religious.


What you mention doesn't sound like container problems. To be frank, dropping hotfixes and ssh-ing into production machines sounds like you have a rather un-systematic approach to ops and containers now show it to you.


I've heard of "NanoVMs" trying to be a thing. Does anyone know much about them or have experience with them? I've heard them as a replacement/improvement over containerizing everything.


We had an haproxy running in a container. The reason it was in a container is that we didn't trust the infra team to trust it. Once we were shot of the infra team it went onto its bare metal.


This sounds a lot more like office politics and IT drinking the kool-aid than anything technical.


Don't ssh into containers. Ideally, don't make edits to code(?) on live environments..your other complaints, I dunno? Omit the container builds, push around your jars/bins/wheels/static assets by hand. Don't think containersation is the root of your frustratration, and I definitely don't think they're exclusively for large orgs, but do your own thing and enjoy wrangling infra as though it were the early 2000s..


How long do your deploys take? Our containerized Rails app, running on ECS, with deployment via Codeship, takes about 10 minutes from a push to master to production.


Took us 10 minutes or so as well. The testing + container build + ECS deploy.

We shaved about 3-6 minutes off deploy by going to Lambda... Could shave another 1-2 minutes if we drop docker entirely, but we still build docker containers to run tests, and for local development work. The main time savings was in task definition rollover and health checks by the load balancer.. but will lambda there's less concern over unhealthy-ness since it's only detectable by the health check route in limited circumstances


Takes about 15-16 minutes for a full cycle from a push: – run tests (8 mins) — build images (3 mins) — deploy to stage (~2+ mins) — manual approve takes however long it takes to click a button, from 2 seconds to 2 weeks :) — deploy to production (~2+ mins)


Are the dependencies being installed every time? I got most of our builds down from 2-3m to 10s by just adding a couple of lines to the Dockerfile, improving caching of dependencies. And the deploy also went down, since there are fewer layers to push.


We have a custom in-house Concourse CI setup that most of our developers hate, built by a consulting firm that no longer has much interaction with the company. One of things it does it run an entire new container to run `shfmt`, a linting tool.

Our new CI happens to use containers, but:

- We aren't managing the containers ourselves, because we're an arts platform not a cloud infrastructure provider.

- We don't spawn new instances to run a (expletive) linting tool.


Concourse is a great CI tool. Sounds like the real issue was a lack of understanding how it works. Yes, each step in a pipeline is a container but you can put all logic in one container if you wanted, not just a lint command.


Besides the argument of whether or not you should move off containers, that is your problem ;) I do have something to say about:

> If I need quick hot fix RIGHT NOW, I can't just log in, change couple of lines and restart, must go through full deploy cycle.

That is actually a feature, not a bug. You bypass QA at your peril, especially when you need a fix RIGHT NOW because you are likely going to make a bad situation much worse.


The non-exhaustive list of reasons to dislike containers is also a pretty light list of reasons we’re adopting them. We’re doing so thoughtfully, and moving at a deliberate pace–with this migration, we’re necessarily changing attitudes and culture. Of course, we’re always thinking about break-glass access for something that really needs to be done right now, but... slow is smooth, smooth is fast.


Building a container shouldn't be any longer than it takes you to ssh into a production box and convince yourself that the code you just changed isn't broken.

Keep image size as small as possible (reasonably) and optimize your images to that the things that change the most (code and config usually) as the last layer to take advantage of caching things that don't change often.


I've hot patched prod - placed classes into the system class loader because we couldn't unpack and repack the war. I've edited PHP or Python or Ruby files directly. I've stuck a dll from a dev machine onto a prod IIS instance and restarted it for changes to take effect.

I have felt it worth my time to:

- separate the installation from the configuration

- have uniform environments (where a test environment is the same as prod in terms of tech stack and configuration, not necessarily volume)

- have consistent automated deployment approaches

- have consistent automated configuration approaches

With such an investment of time and effort, it has helped me:

- construct a prod-like environment (tech stack + configuration) to test in with the assurance that the app will experience the same constraints in prod.

- provide a tested deployment and configuration approach that environment owners can use to change production.

- push/place my hot fix in the test environment, validate it via automated tests, and then push the same hot fix to prod.

This has helped me ensure that the time between code-commit to go-live on prod is under an hour (including gatekeeping approvals from the environment owners) in even regulated environments. (I'm working on a book on this and will share more in a few weeks)

Depending upon the organisation and the specific project's automated test suite maturity, sometimes testing in prod may be the only option. If you must use containers but wish to retain the flexibility to change the binaries, then consider mounting the binaries into the container from a file system and use the containers for containment.

However, you should strongly consider moving to a package-once push-anywhere process.

If you face a situation where an intermediate environment is broken or blocked and you must push the artefact to prod, then by all means do so manually - after all, you ought to be certifying artefacts and not environments. An automated process that doesn't permit any authorised manual over-ride is only getting in the way. Such a manual override should provide for the right audibility and traceability, though.

Having said all this, the ideal would be uniform environments, consistent deployments and configuration, package-once deploy anywhere, audibility and traceability.


Some devs would get lost trying to SSH in, grep the logs, fire up vi, make some changes and restart the server the way it was done before containers. Some devs are inefficient with containers in the same way.

It's important to go with the choice that works best for you and the devs you have.

A lot of great stuff runs with and without containers.


"If I need quick hot fix RIGHT NOW, I can't just log in, change couple of lines and restart, must go through full deploy cycle."

If you have CI/CD, and you must have CI/CD, this should never be done. As soon as it is allowed, you will eventually have changes applied to production that are not in your VCS.


> If I need quick hot fix RIGHT NOW, I can't just log in, change couple of lines and restart, must go through full deploy cycle.

This is a feature, not a bug. Really this speaks to the entire post; as these aren't "quirks". It has a side effect of blocking the "Hero Programmer".


We migrated from containers to serverless (Azure Functions in our case). This removed a large amount of complexity from our architecture and we are pleased with the result. It is always a question of tradeoffs, serverless has its own issues but overall it has worked out well.


Was there a big cost difference?


It cost a bit more but not, for us, excessively more. A big upside was that we no longer had to tinker with docker and especially kubernetes. This allowed us to focus almost entirely on delivering functionality (and reliability and security) rather than infrastructure. It's always a matter of tradeoffs.


> If I need quick hot fix RIGHT NOW

Enable rollbacks?


I wrote a Heroku-like mini-PaaS (https://github.com/piku) to escape containers for a bit, but it now acts as a stepping-stone: I iterate quickly on it, then add a Dockerfile :)


> Must build an image before deploying and it takes time, so deploys are slow (of course we use CI to do it, it's not manual

Docker images are extremely fast to build if you use a dockerignore or staging directory properly. Our multi-GB image builds in seconds.


> If I need quick hot fix RIGHT NOW, I can't just log in, change couple of lines and restart, must go through full deploy cycle.

The ability to do this is usually abused by developers, it is best not to have it. Cattle, not pets.


Yep. We've gone from Containerized to managed Serverless (ie. AWS Lambdas and the like).

Not for any of the reasons that you pointed out, but primarily because running container orchestration platforms is a pain in the rear end.


These are all reasons why I've never containerized anything. Lambda it's the closest I've gone and found use for. I guess I don't have work loads that gain great advantage from containers.


Your best bet is writing local-first software, then run "containerized" or on a basic Linux n+1 hardware. Site autonomy and portability is what POSIX is all about.


I did, somewhat.

Running between 8 and 14 Linode and Digital ocean VMS, I wanted to research if K8s or any docker setup would be good for me.

I have been doing linux server maintanance for over 16 years now, set-up and grew three webhosting companies. My current VMs are for my own tooling (selfhosted mail, nextcloud, matomo) my startup (mostly servers crunching OSM databases) and some leftover client hosting (mostly Rails and Sinatra).

The issues I ran into with Docker and k8s were all small and could probably be overcome given extra time and effort. But combined, I decided that containerization does not solve enough problems for me, that it warrants all the new problems it introduces.

In no particular order:

* Firewalling: On a normal linux machine or in a cluster a long-solved problem (resp: iptables/ufw or dedicated firewall servers/hardware). With containerized: no idea. Seems a largely unsolved issue, probably because "containerization should not do any firewalling". And partly because of how network mapping works, the problem is mitigated (but not solved!) there.

* Monitoring: at least three popular docker images that I used forgot to add logrotation, crashing after long term running. That in itself is bad (and shows lack of detailed finish in many images) but it shows you need to monitor. I used munin and am now migrating towards prometheus/grafana, but I really have no idea how to properly monitor a flock of containers. Another problem that has been solved for ages, but requires time&effort in a containerized env.

* Timed jobs (cronjobs). There is tooling to spin-up-and-run containers on timed schedules, but no-one as easy and stable as having Ansible write a file to /etc/cron.d/osm_poi_extract (or use systemd, fine with me).

* fail2ban, tripwire, etc: small tooling that is hard to do or requires extra attention in a containerized setup.

* unity: if you rely on 3rd party containers you'll quickly have a rainbow-spectrum of machines: Ubuntu LTS, Ubuntu edge, Ubuntu ancient, Alpine, CentOS, a rare freeBSD. The interface is consistent, the underlying tech is not: troubleshooting is a terror if you first have to spend 20 minutes of googling "how do I know what version Alpine I have and how do I get curl on this damned thing to see if elasticsearch is maybe giving results on localhost".

I realize that my 16+ years of prior linux-sysadmin experience hinder me here: I'm probably trying to make a containerized setup do the same that I'm used to, but which is not really needed (tripwire, fail2ban, firewalls?).

But for me, containers -in production- solve no problem. Yet they introduce a whole range of problems that in a more classic setup have been solved for, sometimes literally, decades. "Infra and state in a documented revision control" is solved mostly with ansible, saltstack, chef or puppet; this can and should certainly evolve and improve. Networking is solved with, well, networking. Hell, one of my previous hosting setups had a cron job that would get "the latest /etc/hosts" hourly: that was our entire DNS setup. it worked. reliably", /etc/hosts and docker, don't get me started (the reply would probably be: but you don't need /etc/hosts in k8s).


Containers aren't really designed to run without some sort of management infrastructure to handle the problems you mention -- network policy, monitoring, log rotation, scheduling, etc.

There are many container runtimes. Kubernetes is popular and solves some of these problems.

> Firewalling

The base unit in Kubernetes is the Pod, which kind of acts as a mini-VM (without virtualization) that runs 1 or more containers. It's a mini-VM, so has its own network; 127.0.0.1 is the Pod, not the Node that it happens to run on. You can set many networking-related policies at the Pod level; PodSecurityPolicy for very coarse-grained security policies (access to the host network, access to become root, etc.), or NetworkPolicies for what people would traditionally consider firewall rules (only allow access to the MySQL pod from a pod in the MyWebApp service, etc).

> Monitoring

Kubernetes rotates your logs for you. Retaining logs and making them useful is still an unsolved problem; containerized or not. (I wrote a very complicated log analysis system when I was at Google. I miss it.) Your problems with monitoring (how to get Prometheus to discover containers to scrape) boil down to not having service discovery. Kubernetes provides service discovery, and Prometheus knows how to ask it for a set of endpoints to collect metrics from. (Service discovery is nice in general, and is often something sets of hand-rolled VPCs are missing.)

> Timed Jobs

Kubernetes has the concept of CronJobs. Overall, I don't like the approach of cron jobs over having a program that is always running and wakes up when it's time to do work. But both options are available to you.

> fail2ban, tripwire, etc.

Typically people have a broader-scoped access control and rate limiting policy. All external traffic hits a proxy between the outside network and the network that Pods are on. The proxy sends information about the request to an authorization service and receives an access decision, and accepts or rejects the request accordingly. If you had ssh sitting behind such a proxy, you would instruct it to send success/failure information to the authorization service so that it can make an accurate decision for the next connection attempt.

There is some desire/work to make ACLs easier and do it at the NetworkPolicy/CNI layer. What you can do largely depends on what CNI plugin your cluster uses, and if you're using a managed k8s provider, you probably don't have much choice in the matter. Hence people replicating the "old world" with authenticating proxies.

In general, people are not running SSH servers, especially ones that can accept passwords from the outside world, on Kubernetes. As a result, you are unlikely to find a prebuilt piece of software that does that -- nobody needs it. Management is done out-of-band through the API server.

> unity

You probably shouldn't rely on production containers to have debugging tools. The ideal container is "scratch" with your application binary in it, or distroless if that is not possible. It's not the container's job to have debugging tools -- it's the management layer's job to attach your favorite debugging tools to the pod while you're debugging. Support for this is pretty limited, though; k8s 1.16 got ephemeral containers, but they're still "alpha".

> containers -in production- solve no problem

They solve a variety of problems. You are guaranteed to get the code that you want into production, without any other things that you don't want. You get service discovery. You get declarative management. If you just have one thing you want to run, it's a lot of overhead.

Traditionally management has been the job of the application itself. If it wants rate limiting, it adds rate limiting. If it wants log rotation, it adds log rotation. If it wants monitoring, it adds monitoring. If it wants authentication, it adds authentication. This is a waste of time for every application to have to implement these core features -- if the application knows that it's going to run inside infrastructure that already has that, it only has to focus on its core functionality. That means smaller, more reliable apps with less surface area for bugs or security problems. But it also means that to run it, you have to build the infrastructure. Right now, things are transitional -- apps exist on a spectrum of what services they expect to get from the operator and which services they provide themselves. It's a mess. But it will get better.


Thanks for the eloquent reply!

During my PoC I did see that a lot of things that I traditionally do on the machine are now done external. But for me that makes it harder.

A practical example was: How do I get all my "fail2ban"-rules ported to "whatever this cluster-management uses"? Simple things like "if someone requests admin.php on this Rust-only API: ban it for 2 hours". I did see this is possible, but the amount of tools and their complexity was off-putting to me.

I really appreciate how you suggest good alternatives and solutions to problems I stated: I hope they'll help me or someone else moving to k8s in future.

And about:

> Traditionally management has been the job of the application itself.

Yes. Been there; hell, I even wrote rate-limiters for Drupal, and WordPress for our large customers on the Drupal, and later WordPress webhoster that I ran.

But I also understand how microservices is the solution to this.

However: Microservices does not require containerization, IMO. I'm running between 8 and 14 VMs on Linode and DigitalOcean exactly because of this: because stuff is "microserviced". I have a flock of Elasticsearch servers. A server that ingest, processes and then stores the OSM database weekly into that elasticsearch. Several postgres database servers. A statistics-server. Had an "avatar server", there's a dedicated url-shortener somewhere, A mailserver. Moving an authentication server onto my flock next week. I don't need k8s for that. Microservices are a good solution to specific problems but the underlying tech can easily be $6.99 VPSs managed with ansible.


Maybe if you need right now hot fixes then you also need to start asking "Five whys" style questions.


> I'm constantly aggravated by various quirks of containers, and don't really remember any big problems with non-containerized infra.

Beautiful summary of why trends are cyclical. Problems with current trend lead to invention of new trend. Everyone jumps to new trend. Eventually people find problems with new trend but they've now forgotten the problems with old trend so begin to move back.


> has anyone migrated off containerized infrastructure?

No, nor would I ever.

> Are you satisfied?

Yes, I am. Not claiming that it's perfect - everything has it's benefits and downsides, and it's always a question of using the right tool for the job. But in my case, the way I work, the benefits of containers very clearly outweigh the downsides by a large margin. So much so, that I'd still be using containers for local development, even if the production environment was non-containerized. Switching between projects with different setups has never been so easy.

> Must build an image before deploying and it takes time, so deploys are slow

Deployments sure can be slow. A full deployment to AWS can take 10 to 15 minutes for me. But it comes with zero downtime - which is top priority for my systems. The load balancer only spins down the old instances, after the new ones are up and running (yes, I have multiple containers, and multiple instances of each container running at the same time). I can build the new containers, fully test them locally, and only deploy them AFTER I know everything is fine. And I can 100% rely on the fact that the deployed containers are exactly identical to what I tested locally.

> If I need quick hot fix RIGHT NOW, I can't just log in, change couple of lines and restart, must go through full deploy cycle.

I completely stopped doing that over 15 years ago. Back then I started modifying files locally, pushing them to versioning and then pulling the new version on production. That helped to avoid so much pain, that I'm never going back ever. No way.

While it would be possible to setup a container to pull the newest source on every startup (thus allowing hot fixes through versioning as described above) - I actually prefer to build containers, test them locally, and only deploy them, once I know everything works. This way I rarely ever need fast hot fixes in the first place.

It's just my way of doing things - and for me containers are the right tool for the job. That does of course not mean, that they are the right choice for everyone. But I'm currently doing a lot of things that I think are pretty great, which would be outright impossible without containers.

> Must remember that launched containers do not close when ssh breaks connection and they can easily linger for a couple of weeks.

What? How? I don't even...


Containers are the leaky abstractions of the devops world. Same pros and cons apply


You just presented a list of the benefits of containerized infrastructure.


I'm running a couple of small servers with containers and I was quite early on the Docker train, and for me, it definitely solves some real problems even at small scale. Of course, as anything, it also brings new problems. Here's why I went for it:

- Before, when I just SSH:ed to servers and rscynced files, I had many situations where I forgot what change I did on a server, how it was set up, and so on. I found Docker when I was looking for tools to put those various commands and bash scripts into one place for central storage and easy re-creation of the environment. Dockerfiles and containers makes everything reproducible and reduced to the smallest amount of steps needed to get something correctly setup.

- I would find that something worked locally but not on remote due to different versions of some dependency. Docker images ensured I could test in almost identical environments. It's also easy to try new apps without worrying about polluting the current environment, so I'm not faster in trying out solutions and rolling back/forward dependencies.

- I would test things on the server, because I was not able to run the exact setup on my local computer. This takes time and risk breaking real stuff. Docker images fixes this.

- I would struggle with knowing what services ran or not. Part of this came from me not knowing all the ins and outs of Linux, so I felt it was hard to get an overview of what's running. docker ps makes it easy to see what's running.

- Updating a traditional app often required me to change more things than just update a source tree. It could be starting/stopping services, adding files in other places. So updates tended to become manual and error-prone (I didn't do them often enough to remember by heart what's needed). Docker and primarily docker-compose encapsulates all the stuff into simple commands.

- Before, my apps would use mixed sources of configuration - environment, config files in different places, command line arguments. More importantly, they would often be stateful, saving things to files that needed to be managed. With Docker, i was somewhat forced to align all config in one place and make everything else stateless and that makes things much cleaner.

- As a hobbyist, I rarely had the time before to go over the security of my servers. I find that Docker provides a better secure default in terms of making it clear what services are open to the world and by reducing the attack surface.

Of course, containers have brought some problems too:

- Lots of apps were not ready to be containerised, or required a lot of hacks to do so. So I've done a lot of debugging and sleuthing to find the right way to run and configure various apps in Docker. These days, the official Docker images are much better and exist for almost all apps, but there is still a lot of "messy magic" built into their Dockerfiles.

- More often than not you want to get into the container to run something, debug, ping, check a file, etc. This gets more convoluted than before, and you need to learn some new tricks to e.g. pipe in and out data. It's made harder by the fact that many images are so minimalistic you don't have access to the full range of tools inside the containers.

- Logging in Linux is IMO a mess since before, but still with Docker it's not great, just mashing up stdout for all containers and unclear rotation procedures. There are many ways to deal with it, but they often require lots more tooling, and it still gives me some headache.

- Yes, waiting for build and transfer of images adds a bit of time to deploy. And it's somewhat annoying to deal with two version control systems, e.g. git and Docker Hub. I haven't gone all in on CI yet but that would automate more and just let me use git.


Containers are pretty good. My current company uses C++ and CMake and they eschewed containers for nix.

I can assure you with nix and cmake it's 1000x more complicated than it needs to be.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: