Over the years, I kept tweaking my setup and now settled with running everything...

elitan · on July 16, 2023

I do the same as you using Caddy.

To avoid downtime try using:

    health_uri /health
    lb_try_duration 30s

Full example:

    api.xxx.se {
      encode gzip
      reverse_proxy api:8089 {
        health_uri /health
        lb_try_duration 30s
      }
    }

This way, Caddy will buffer the request and give 30 seconds for your new service to get online when you're deploying a new version.

Ideally, during deployment of a new version the new version should go live and healthy before caddy starts using it (and kills the old container). I've looked at https://github.com/Wowu/docker-rollout and https://github.com/lucaslorentz/caddy-docker-proxy but haven't had time to prioritize it yet.

remram · on July 16, 2023

That's neat, I wonder if there's a way to do that with nginx?

edit: closest I found is this manual way, using Lua: https://serverfault.com/questions/259665/nginx-proxy-retry-w...

coopsmoss · on July 21, 2023

If I understand you correctly, you do a sort of blue green deploy? Load balancing between two versions while deploying but only one most of the time?

How do you orchestrate the spinning up and down? Just a script to start service B, wait until service B is healthy, wait 10 seconds, stop service A, and caddy just smooths out the deployment?

michaelsalim · on July 16, 2023

Thanks for that. Didn't know this is a thing in Caddy. Seems low effort so I'll probably do that for now. I omitted it but I'm actually using caddy-docker-proxy. It's awesome, makes the config section be part of each project nicely. Haven't seen docker-rollout though. Seems like it could be promising.

bradleyjkemp · on July 16, 2023

If you've got a load balancer (like Caddy) in front of your pods you can configure it to hold requests while the new pod comes up: https://twitter.com/bradleyjkemp/status/1486756361845329927

It's not perfect but it means rather than getting connection errors, browsers will just spin for a couple seconds.

The same technique is used by https://mrsk.dev/

collaborative · on July 16, 2023

If you have more than one backend you can also reconfigure caddy on the fly to only serve from active ones while each one is being updated

9dev · on July 16, 2023

I‘ve built up the software stack of the startup I work for from the beginning, and directly went for Docker to package our application. We started with compose in production, and improved by using a CD pipeline that would upgrade the stack automatically. Over time, the company and userbase grew, and we started running into the problems you mention: Restarting or deploying would cause downtime. Additionally, a desire to run additional apps came up; every time, this would necessitate me preparing a new deployment environment. I dreaded the day we’d need to start using Kubernetes, as I’ve seen the complexity this causes first-hand before, and was really weary of having to spend most of the day caressing the cluster.

So instead, we went for Swarm mode. Oh, what a journey that is. Sometimes Jekyll, sometimes Hide. There are some bugs that simply nobody cares fixing, some parts of the Docker spec that simply don’t get implemented (but nobody tells you), implementation choices so dumb you’ll rip your hair out in anger over, and the nagging feeling that Docker Inc employees seem incapable to talk to each other, think things through, or stay focused on a single bloody task for once.

But! There is also much beauty to it. Your compose stacks simply work, while giving you opportunities to grow in the right places. Zero-downtime deployments, upgrades, load balancing, and rollbacks work really well if you care to configure them properly. Raft is as reliable in keeping the cluster working as everywhere else. And if you put in some work, you’ll get a flexible, secure, and automatically distributed, self-service platform for every workload you want to run - for a fraction of the maintenance budget of K8s.

Prepare, however, for getting your deployment scripts right. I’ve spent quite a while to build something in Python to convert valid Docker-spec compose files to valid Swarm specs, update and clean up secrets, and expand environment variables.

Also, depending on your VPS provider, make sure you configure network MTU correctly (this has shortened my life considerably, I’m sure of it).

michaelsalim · on July 16, 2023

That's encouraging, thanks. Are you able to share your python convertor script by any chance?

9dev · on July 17, 2023

I've extracted a gist: https://gist.github.com/Radiergummi/fe14c4ed93c68f2928a6a275...

Let me know if that helps, or if you need more guidance. Maybe I could open source the whole thing properly, if that is useful to someone :)

elitan · on July 16, 2023

What's the correct configuration of MTU?

dijit · on July 16, 2023

there is no one size fits all answer to that. The standard is 1500; but MTU lowers with levels of encapsulation. (since you need those bytes for the encapsulation overhead).

There's also "Jumbo Frames" though you're not likely to encounter that day to day in a VPS.

efrecon · on July 16, 2023

I do the same. Swarm is the way to go since you already have compose files, but I have made the choice that it is not worth it. Until you hit scaling issues (as in many customers/users).

dirkhe · on July 16, 2023

I built a similar setup but I don't like to push the images with docker save and docker import over ssh. Do you run your own registry?

michaelsalim · on July 16, 2023

Nowadays I use github's packages registry. I used to run my own registry in the past along with the docker save method. But both of them are annoying to deal with. I have Github Pro so it's pretty much free. However even if I need to pay for it in the future, I'll probably do so. It's just not worth the headache.

BossingAround · on July 16, 2023

How often do you rebuild your containers?

michaelsalim · on July 16, 2023

Whenever I have anything to deploy, so depends on the project. On actively developing ones, could be once or twice a day. On slower days maybe once every 2/3 days.