Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Docker, Kubernetes, Openshift, etc – how do you deploy your products?
135 points by BloodKnight9923 on Feb 1, 2017 | hide | past | favorite | 114 comments
I use docker extensively with python backed ansible scripts to manage my product deployments (with a jenkins CI/CD pipeline). That has been a lot of fun, but I have also played with both Kubernetes and Openshift.

I love what Openshift Origin can do, but the learning curve is like a brick wall (See Dwarf Fortress Fun for an example) and the costs are far from minimal.

Kubernetes is easier to learn, but comes with its own gotchas.

What do you do to maintain stable deployments that allow for easy CI/CD? How do you minimize costs with your solution?

I recently(past 6 months) joined a new startup as the operations person, and we standardized on kubernetes for deployment. In the past I've worked with puppet/chef/ansible/heroku/aws/appengine/vmware you name it, and Kubernetes is the nicest and most flexible platform to build on top.

There's a learning curve, and new features are being added, but at this point I would not hesitate to recommend Kubernetes to just about anyone.

CI: We standardize on CircleCI and it gets the job done, but has some serious shortcomings. I've also come close to building my own on top of the k8s cluster and it's not the correct time investment for me right now, but I'd consider building my own in the future. I've yet to find a CI framework I really like.

Have you looked at buildkite (http://buildkite.com)? It's a very different approach to CI - you use your own machines. That means you have access to your own s3 buckets, you can talk to your other services and get to use all the CPU and ram your machine has.

We switched over from Circle and it's been awesome to scale our build process without fear of resource constraints. It's pretty great :). They even have a CF setup that will auto scale boxes based on CI load.

We also use Buildkite. Absolutely love it. Support and troubleshooting is very easy to come by as well :)

Well that's exciting. I've been pushing towards Kubernetes with the company I'm at right now, got half the battle done by dockerizing our services and using Harbor to manage the containers.

Do you have any advice, or gotchas on getting into kubernetes development?

Have you had a proper look at Gitlab CI? It know, it's hard to make that sound credible with all that's going on with Gitlab right now but I found their CI to be the best around and you can even use your own machines to run stuff on.

To be fair, GitLab's current issues are purely operational on the GitLab.com side: their backup strategy and testing needs some work, but the product is really solid and most of the features are already there in GitLab CE.

For me, getting GitLab up and running on kubernetes was a breeze (using the popular docker image [1]) and my `pg_dump; duplicity;` backups are chugging right along. I haven't played with their CI yet, but I'm pretty excited to see how much it can do for me automatically managing my cluster.

[1] https://github.com/sameersbn/docker-gitlab

Yes, out of all solutions I'm most excited about GitlabCI and Drone.io

I've worked with a self hosted Gitlab instance before and that was nice, but I'm not quite sure how well the Gitlab CI would integrate when the code is on github.

If you are doing a lot with docker images, it might be worth considering Concourse CI (https://concourse.ci). I haven't played with it in any depth yet so I'm not sure how it compares with other solutions, but I like the idea of your entire build pipeline running through docker images. Theoretically you should even be able to run it in kubernetes.

There's a github repo where someone mounted Concourse on kubernetes[0], but it looks a bit stale. Given the evolutionary pace of k8s, YMMV.

[0] https://github.com/vyshane/concourse-kubernetes

unfortunately concourse when not deployed with bosh does not have any automation (spawning workers, auto-discovery).

Any thoughts on Concourse?

I like Concourse from a manageability perspective, as compared to Jenkins, in that it is more of a "bring your own plugin" in a "just in time" fashion via resource-types-as-docker-images as opposed to requiring a priori provisioning/support of an installation-wide plugin/suite.

It is also very nice in that you can execute pipelines, and even just jobs, on the Concourse installation from your local CLI. This makes it much simpler to test pipeline/job tweaks without a single commit being necessary until it is verified to work. Yes, you can do something similar with the Jenkins Pipeline plugin suite using the Replay functionality but it is significantly more tedious (cut and paste from editor into a textbox in the Jenkins UI, bleh).

I like it- strong, simple building blocks. It reminds me of Ansible, in that the design seems well factored.

Take a look at http://runnable.com

What shortcomings did you find in Circle?

Openshift is essentially Kubernetes + Redhat Extensions + Redhat Support

I use Gitlab CI and helm[1] for deploying. The last step of the ci process checks out the helm chart which is just another git repo and executes a helm install/upgrade CHART-NAME. Making things accessible is done through kubernetes ingress with nginx[2](which includes getting let's encrypt automatically for all external endpoints) so when I want to deploy a new staging version of the app I can do helm install app --set host=my-stage.domain.com .

There still a few gotchas like the pods won't update when a configmap was changed which is important because I keep the container configuration maps as configmaps. A crude workarround for this is [3] which triggers a configuration reload of the application running inside the container.

This solution has no licensing cost unlike Openshift(Tectonic[4] is another enterprise Kubernetes distribution which is free for 10 nodes) and the cost are based on the amount of time to set this up. But after you got into helm and more complex kubernetes deployments it should be easy.

[1] https://github.com/kubernetes/helm

[2] https://github.com/jetstack/kube-lego

[3] https://github.com/jimmidyson/configmap-reload

[4] https://coreos.com/tectonic/

OpenShift Origin is the open source project. There's no licensing costs unless you want Red Hat support.

Hi, nice, this is a lot similar to our initial approach. Glad to see that! To make this flow a bit more easier, we've created a tool [1] (apologies for the plug) that is on top of Helm. What it does, is taking a diff between a bunch of Chart references with values (the desired state) and what's currently in Kubernetes (the actual state), and perform a few create/updates/deletes. So for instance you don't have to add an explicit helm delete if you remove a component, instead just remove it from the desired state. In our day to day work we add and update tons of components through this system, only updating .yaml files, checking them into Git and have the CI/CD do all the work. Also our tool has a dry-run mode, which acts like a test stage for pull requests.

[1]: https://github.com/Eneco/landscaper

Hi rollulus,

thanks for the info. As far as I understand it you configure the state of the helm release + configured values in a file and apply it to a kube cluster. Depending on your cluster setup this is really helpful. Do you have a solution for triggering a rolling update of pods if a configmap has changed? In your examples I didn't see any configmaps.

Good point, eicnix. No, we don't have a solution for that, but hadn't the need so far either. Sorry.

I used to use Marathon on Mesos for deploying Docker containers, and orchestrated it via a hacked together Jenkins cluster, which worked well but took a lot of configuration and was somewhat brittle.

I moved to Kubernetes about 6 months ago and have been really enjoying it. My first production cluster was hand rolled on AWS, where I found the cloud-provider load balancer integrations extremely helpful (https://kubernetes.io/docs/user-guide/load-balancer/).

I'm now using Google Container Engine which is effectively just a hosted Kubernetes cluster on GCP, which has really been 0 effort setup, and have been deploying to it with Wercker (http://www.wercker.com) [Disclaimer: I currently work at Wercker as of the last few months, but was a fan/user for many years before joining]

One thing I noticed across Openshift, Mesos, and Kubernetes: none of them handle the Docker daemon on a node hanging particularly well, which in my experience happens fairly often.

I use Convox (http://www.convox.com). It is backed by ECS which gets me out of the infrastructure game for the most part and the CLI interactions in Convox are similar to heroku style commands so the learning curve is much simpler than deploying and learning my own Kubernetes or OpenStack or ECS configurations. They've also thought of the other things you need like environment based secrets(uses DynamoDB and KMS behind the scenes), as well as external load balancing, TLS, RDS integrations and more with single simple commands.

They also have CI/CD out of the box and builds can be triggered in your existing cluster with a 'convox build' or triggered on pushes to your private github repos.

Overall, unless you have a team that actually sees benefit in managing your own container and cluster manager(you better be big), id recommend embracing Convox, or something like it. The complexity still exposed by Kubernetes, OpenStack or ECS is still significant.

Seconding Convox. It's also worth noting that the tooling is open-source, and they profit off an optional-to-use closed-source web interface which has free and paid tiers.

I'm one of two developers at a very young startup, and the one responsible for backend + devops stuff. I simply don't have the time to learn a more complex tool like Kubernetes (not that I didn't consider it) while also working on the actual product. Its simplicity has been a bit limiting on occasion, but they're happy to accept PRs for well thought out changes. I recently had a PR merged regarding UDP ports and ELBs that should make microservice architectures much easier and cheaper to implement.

If you don't want to manage Kubernetes, you can use Google Container Engine (GKE), which is Kubernetes-as-a-service.

A nice bonus is that they only charge you for the minion nodes. The Kube master is free.

The master is free for small clusters (0-5 nodes). After that, you pay for it.


(I work for Google Cloud)

As I understand it, Convox is closely adapted to AWS. Which, like any closely adapted software, can be great or terrible, according to circumstances.

Disclosure: I work for Pivotal on Cloud Foundry, a competing system.

We use Convox at Balsamiq as well.

I deploy on bare metal. Docker, Kubernetes, et. al add layers of complexity that I don't need. I'm not saying that they don't have benefits at a certain scale, but for the types of single-server deployments I do, I have not been convinced.

please do #sig-onprem on kubernetes.slack.com . We are discussing a lot of stuff to make this easier.

The biggest challenge right now is the ingress/loadbalancer abstraction. Hopefully, that should get resolved over the next few months.

Been scanning the thread to find this stance, and there I go! Good to know I'm not entirely alone there =)

Yep, a few shell scripts are enough. Ansible is nice to parallelize multiple boxes, but that is about it.

this. just a simple rsync and docker-compose up can be all you need in most cases.

Our team is <15 engineers. The set up is roughly as below. We have around 40 services. Ping me if you wanna talk more.


* Development: docker + docker-compose. Ideally, we would want to get rid of docker-compose for development.

* CI: Travis (planning on switching to something that is more on the CD side)

* Infrastructure management: terraform

* Prod: AWS, CoreOs, Kubernetes 1 master node and 5-6 worker nodes (m4.large) in an autoscaling group.

Infrastructure deployments and updates are done by Terraform. Blue/Green deployments thanks to the autoscaling group.

Kubernetes deployments and updates are done by kubectl.

There's still problems with each piece, but for the most part they work great without much trouble.

If you want to get rid of docker-compose, you could try using minikube (https://github.com/kubernetes/minikube), which is a local Kubernetes deployment, similar to the desktop version of Docker. It works well and supports all major platforms.

docker 1.13 swarm has full compatiblity to use compose file format to launch a cluster. you should try that. https://www.infoq.com/news/2017/01/docker-1.13

In my experience, docker itself is not very stable, and swarm is nowhere close to Kubernetes offers.

We mainly use drone and have built a templating tool that wraps around kubernetes deployments to give us feedback on whether they were successful or not.

Example kube-deploy files: https://github.com/UKHomeOffice/kube-piwik

Example app / drone files: https://github.com/UKHomeOffice/docker-piwik

Platform Documentation: https://github.com/UKHomeOffice/hosting-platform

KD - our deployment tool https://github.com/UKHomeOffice/kd

I can't really comment on whether or not this specific pipeline actually works as I've just picked a random open source example but the workflow is there.

We also have a legacy tool and use jenkins sometimes, but mostly that won't be open sourced.

Legacy deployment tool - don't use this. https://github.com/UKHomeOffice/kb8or

At Schezzle (https://schezzle.com) we use docker swarm on AWS.

The build jobs creates images that are published to ECS repositories, and there are auto scaling groups that add and remove engine hosts to and from ALB target groups for each deployed service. It makes service discovery, scaling, etc. really easy.

Definitely try swarm out if you haven't already. 1.12 was good, 1.13 is amazing (secrets, health-based VIP membership, etc).

could you talk about docker swarm ? any pitfalls that you are seeing, etc?

we have been considering playing with swarm. How many images and instances do you have, etc

And especially, how have you leveraged 1.13

I'm currently on an all-docker pipeline but I resent it. It's slow, tedious and everybody's trying to use docker against its design (everybody tries to make images with as few layers as possible, I think docker should just do away with the layers altogether). It also makes it harder than it should be to make an image that works both for local development and deployment at the same time. Also, docker-compose is riddled with fairly old but important bugs (for instance, Dockerignore files are ignored by docker-compose's build).

I'd much prefer doing simple bare-metal deployments again.

> I think docker should just do away with the layers altogether

I'd take the opposite stance, really. As far as the image format, it's the major differentiation Docker has, and IMO a really clean way of keeping image pulls DRY. Once your hosts have pulled a single image, given that you don't actively undermine it, subsequent pulls, even for different images, only need to retrieve the absolute minimum since they already have hopefully pulled the majority of the file system.

If you run on AWS, we at Boxfuse (https://boxfuse.com) offer a solution that could work for you: tiny machine images (starting at 5 MB for Golang apps) generated in seconds, no layers, immutable infrastructure, blue/green deployments and auto-scaling directly at the VM level both on EC2 as well as on VirtualBox (for rapid local testing).

Disclaimer: I am the founder and CEO of Boxfuse


Surprised it wasn't already in the long list of suggestions. Have been using the tool when it was called Tutum. The guys behind Docker bought Tutum and renamed it to Docker Cloud. It's currently set up to redeploy services when I push an image to my repositories. Really loving the simplicity, even tough it's got some quirks.

You can now link your bitbucket or github repositories. Let it build your containers and deploy it to production. This way you can build an easy CI/CD pipeline.

+1 on everything.

With some personal flavor:

- I use autoredepleoy only in a test environment - We have a locally running old server with drone.io that is web-hooked to GitHub / BitBucket

http://rancher.com/ works with minimal fuss. The tools in this space are so much in flux I just care about something working easily and reliably in the short/medium term.

I use dokku for deployment and on-premise gitlab for CI.

Dokku's main advantage is that it's a no-brainer : if you're used to deploying heroku apps, it's very similar. It also automates the creation of data containers for database services, for example. On top of that, while I can use heroku's buildpacks for small sideprojects, I can also take full control of the build using a Dockerfile (which is what I do for bigger projects). The main inconvenient is that it can't manage multi host container deployments, like docker-swarm or kubernetes (I don't need it, so no need to compromise on simplicity).

Gitlab's pipeline both offer CI and CD, with a lot of cool features around it, like being able to tell on a commit page when it has been deployed on production, for no configuration cost.

Regarding costs : well, it's the cost of a dedicated server.

hear hear. I've taken the last couple of months to refactor all of my side projects to 12-factor apps for deployment on a nice Dokku instance. Absolutely effortless.

Next steps now is getting these projects deployed through a CI/CD. I evaluated a few, and it looks like I'm down to drone.io or jenkins.

This will bring some much needed sanity I need to keeping all these side / personal projects in order. I can go weeks or months without touching them, but then know EXACTLY how they will get tested and deployed.

we use Nomad[0], we pretty much use Hashicorp's entire stack (consul, vault and nomad). Vault has been fabulous for secret(s), authentication, etc. Consul for service discovery and Nomad for job running/deployment. We have a mix of static binaries that we run and docker containers. Most of our new stuff is all docker containers. We use Jenkins as our CI/CD, that just run nomad jobs and confirm their successful deployment.

Cost management is easy, all the projects are open-source and since we can spin Nomad up against any cloud provider or internal machine hosts, depending on what's the cheapest at the time. It's pretty easy to wrap your head around Nomad and make it do what you need.

0: https://www.nomadproject.io/

We are evaluating Openshift Origin on an existing OpenStack on-premise cloud. So far I have been playing around with the oc cluster up deployment on a local workstation and it works fine but I haven't played around with the CI/CD option (they support jenkins deployments, etc). From the docs I see that there is a bit of complexity regarding the security constraints and integration of volumes that I need to wrap my head around.

I also attended the DevConf.cz and saw a lot of presentations regarding Openshift. They have most of the talks on youtube (https://www.youtube.com/channel/UCmYAQDZIQGm_kPvemBc_qwg) in case somebody is interested

My big issue was getting multi-node deployments working well in AWS. I hit walls of configuration issues, DNS issues, poor documentation on fields, and generally could not make much forward progress. Running locally or on a single node OpenShift was fantastic, the haproxies for ingress were easy to configure and launching new services was impressively easy.

I was leveraging EFS as NFS mounts for my persistent volumes and had good results.

You might check out fabric8 if only for their visualizations of what is going on in your openshift / kubernetes environment.

Thanks for the youtube link! I'll be sure to check it out

Yeah I can imagine that setting up a multi-node deployment on AWS might be an issue. Fortunately for openstack there is a redhat maintained heat template that should hopefully make the installation quite straightforward (but haven't tried it yet).


Yep, have used these several times. They work very well.

Hi Timeu! I am from Red Hat in Nordics and would be happy to learn more about your case and what kind of questions you are dealing with. Any chance you could send me an email with your contacts? you can try our corporate email address norge at redhat dot com :)

I can recommend Rancher. I’ve used Openshift, Kubernetes and Rancher - so far, Rancher has been the best experience.


You can also test Rancher now easily 0 setup: https://try.rancher.com

We deploy all our containerized applications to Rancher (using Cattle for orchestration) via Jenkins jobs with a standardized Makefile for build, test, and deploy, making things consistent.

We look at running straight k8s, but it was like using a chainsaw to sharpen a pencil for our use case.

In addition their devs are extremely helpful and also have a hobby of getting things to run on ARM.

We're migrating to Rancher from a mix of Jenkins tasks and manual deploys on AWS machines (ie: a mess) and the product is great. I've evaluated both DCOS and k8s about 7-8 months ago and found a k8s a bit complex and moving really fast (we're a small ops team so we can't spend too much time browsing documentation and keeping up with the latest way of doing things). I didn't like DCOS for various reasons (it seemed less mature and the community was too small.)

Rancher is also a breeze to deploy, I could manually deploy it with one hand tied behind my back in 10mn on AWS.

I read something about using Rancher to deploy an Elixir cluster with Docker the other day. That jumped out at me because prior to reading that, the general feeling was that Docker + Erlang/Elixir cluster was a no go.

Rancher was the first thing I'd seen that claimed to be able to pull it off. I'll definitely give this more of a look.

I'll take a look at it, thanks for the suggestion!

At the company I work at we use Docker with Kubernetes. The deployment process involves Ansible and Jenkins CI.

I, personally, prefer the bare-metal deploys of automated scripts. I usually just spin up a VM and write a bash script to "prep" it the way I want. After that, I just run "./deploy" and it pushes where I want. I like this because I feel like I have more control and it actually feels easier. Plus, I've run into weird issues with Docker that take so long to debug that it completely cancels out the benefit of using it for me.

The bash script I have works for every side project I create, and is simply copied from project to project. :)

> bare-metal deploys of automated scripts. I usually just spin up a VM

You're mixing terms here. A VM is not bare-metal.

Yes I'm aware. Using the term in a different sense, but I can see how that's confusing under this context.

I meant just an old-school deploy without containers and the sort.

We use AWS elastic beanstalk. It's simple to setup a high availability environment. And if needed, you can always access the underlying ec2 instances or elastic load balancer.

Jenkins has a plugin that integrates with elastic beanstalk. This makes ci/cd straightforward.

There's no extra cost for elastic beanstalk, other than what you'd pay for ec2, s3 and elastic load balancer.

We've a starter template with a bunch of .ebextensions scripts that simplify common installation tasks.

If your application is a run-of-the-mill web app speaking to a database - elastic beanstalk is pretty much all you need.

Nobody mentioned it, but I'm using Vagrant and the digital_ocean plug-in to manage local VMs and droplets for my small projects, it's a simple, quick and convenient way to bring up fully replicable apps/services. I'm using small scripts to provision my machines with Caddy, PHP7, MySQL, and a few other goodies. Given available droplet sizes, I'm not hard pressed to scale beyond a single machine per app/service, and this keeps everything simple; otherwise I'd probably go with Kubernetes.

We use Rancher with Cattle and do CI/CD via our self-hosted GitLab CI. Pretty easy to setup & maintain. Would definitely recommend taking a look at Rancher if you haven't yet.

I haven't played much with Rancher, but I'm really curious about it. I'd like to hear more about your setup. Perhaps you could even contribute a post on setting it up with GitLab CI?

At my last job, we started off using Mesos and Marathon, but eventually ended up dropping that in favor of a homemade solution using SaltStack (the manager demanded we drop Mesos/Marathon and use Salt - it was pretty shitty).

At my current place, we are using Teamcity to run tests and build images, and Rancher for the orchestration part. I built a simple tool to handle auto-deployments to our different environments.

I cannot recommend Rancher enough. Especially for small teams, it's just a breeze to set up and use.

I've been working on creating a platform for a non profit to get veterans coding (http://operationcode.org/). We're a slack based community and have been rolling out some home grown slack bots and we currently have a rails app hosted on heroku. Managing and keeping track of the different apps was getting unwieldy so in an effort to consolidate our apps and reduce costs I evaluated a few different options. I ended up going with rancher and after working with it a bit I'm pretty happy.

I have github hooked up to travis. When a new PR (or commit) is pushed travis shoves the app into its container, and runs the test suite inside the container.

If that passes AND the branch is master we push the image to docker hub. As of now we manually update the app inside of rancher but I think automating that will be a simple API call. Once we get more stable I'll be investigating that.

I still haven't quite figured out secret management but outside of that and a tiny learning curve it's been pretty smooth sailing.

An example travis config: https://github.com/OperationCode/operationcode_bot/blob/mast...

I do something very similar but with GitHub and Teamcity.

Automating the upgrades (i.e. redeploys) in Rancher is pretty straight forward - their API is super easy to use. I ended up writing a simple tool in mostly Bash to handle it, and threw it in a Docker container to run on Teamcity.

Offtopic, but that's a really awesome idea for our veterans. Thanks for working on it!

Since most of the solutions mentioned here are container based, I will provide something different.

Started using Juju[1]. Basically, Juju handles bootstrapping/creating instances you need in the public clouds and you can use Juju Charms to specify how to deploy your services. So our deployment looks like this:

  - juju bootstrap google # Get instance in GCE
  - juju deploy my-app    # My app is deployed to GCE
You can actually try this with already publicly available apps. Example, You can deploy Wikimedia[2] by just doing:

  juju deploy wiki-simple
This will install Wikimedia, MySQL and creates the relationship needed between the Wikimedia and the database.

In our case, we have a production and development environments. Both are actually running in clouds in different regions

  - juju bootstrap google/us-east1-a production
  - juju deploy my-app

  - juju bootstrap google/europe-west1-c development
  - juju deploy my-app
In addition to running in different regions, development looks at any changes to development branch in our GitHub repo.

We don't use any containers. Juju allows us to deploy our services in any clouds (aws, gce, azure, maas ...) including local using lxd.

[1] https://www.ubuntu.com/cloud/juju

[2] https://jujucharms.com/wiki-simple

We use Dokku (https://github.com/dokku/dokku) in production using it's tag:deploy feature to manage all of our containers (apps in Dokku). We've fully automated it that we no longer interact directly with the individual instances. Pushes to master kick of builds that create docker hub images, then a deployment is triggered on the production machines.

We use Kontena at the moment.


I'll check it out! Thanks for the link

How is it?

We've had quite many issues due to using RHEL with the devicemapper storage driver. But overall I like the concept behind it.

At Weaveworks, we have a built a tool called Flux [1]. It is able to relate manifests in a git repo to images in container registry. It has a CLI client (for use in CI scripts or from developer's workstation), it also has an API server and an in-cluster component, as well as GUI (part of Weave Cloud [2]).

Flux is OSS [3], and we use it to deploy our commercial product, Weave Cloud, itself which runs on Kubernetes.

1: https://www.weave.works/continuous-delivery-weave-flux

2: https://cloud.weave.works

3: https://github.com/weaveworks/flux

We have a relatively simple cloud app: a couple (micro)services, but we also use Postgres and ElasticSearch. We started using Docker + Spinnaker + k8s, but then we ran into the problem of setting up the app for local dev (where we wanted to use local PG) and prod (where we wanted to use RDS).

<plug>we've been working a bit on an open source tool, pib, that supports setting up multiple environments because we ran into this problem (behind the scenes it uses terraform, k8s, and minikube). would love to hear if anyone here has seen anything similar or has thoughts! https://github.com/datawire/pib</plug>

I've spent a fair amount of time evaluating different solutions through my startup[0] and have found Kubernetes, by far, to come with the least pain. It's not hard to get started with, but also works well as you grow and mature. It makes most of the decisions right from the start and kubectl gives you most of the functionality you need to manage deployments easily.

Also, while I have a vested interest in saying this, you don't always want to solve this yourself. Look at hosted solutions like GCP and CircleCI to make things even more painless.

[0] http://getgandalf.com/

Offtopic: How do you implement the product of your startup precisely? It seems way too good of a promise to be true and it seems that people might become very disappointed.

At its core, Gandalf is just a large collection of scripts and playbooks which are written to work very generically. They're slotted into an overall framework which I wrote that can "learn" the architecture of a particular company/app and customize things intelligently. NLP works on top of this to map user input to playbooks.

There's still a decent amount of human intelligence involved though, since obviously we want to give customers a good experience. This mainly comes in upfront (where we tune the implementation for each customer) and for any tasks which Gandalf hasn't learned to do yet. I've also invested in making it easier to train Gandalf to do things—for example, I can say "watch me" and then do a bunch of things with the AWS console/API and they get turned into a parametrizable playbook.

Any good DevOps engineer invests heavily in automation. Gandalf is just one level up of automating the process of automation.

If you have any other questions, feel free to email morgante@getgandalf.com.

We selected Kubernetes on AWS but there are a lot of details to go from source code all the way through to automated k8s deployments. We are currently using our own framework (https://github.com/closeio/devops/tree/master/scripts/k8s-ci...) but I’m keeping an eye on helm/chart to see if it makes sense to incorporate that at some point. Pykube (https://github.com/kelproject/pykube) has made it easy to automate the k8s depoyment details. We needed a process that would take Python code from our GitHub repos, build and test on CircleCI and then deploy to our k8s clusters.

A single commit to our master branch on GitHub can result in multiple service accounts, config maps, services, deployments, etc. to be created/updated. Making all of that work is complicated enough but then we also need to deal with things like canary deployments and letting us build and deploy to k8s from our local workstations. And then there are details like automatically deleting old images from ECR so your CICD process doesn’t fill that up without you knowing. Incorporating CICD processes with Kubernetes is kind of new so there is a lot of different projects and services starting to address this area.

I've worked with a lot of tools. I've decided that I like things that are simple and don't cost much money to get started. For new projects I always start with Heroku, or Parse (on a free back4app plan now).

I love Ansible. Chef is alright. I've been using AWS OpsWorks recently, and it's not bad. Elastic beanstalk is ok, too.

I've spun up some Kubernetes clusters, and it's nice, although I have no need for it yet. I remember the database situation was difficult when I was trying it last year. Something about persistent storage being difficult, so you had to run Postgres on a separate server.

I still like Capistrano. You can automate it with any CI pipeline. For one client, I used the "elbas" [1] gem for autoscaling on AWS. It automatically created new AMIs after deployment. Not super elegant, but it worked fine.

I don't see much of a middle ground between Heroku and Kubernetes. Just start with the one free dyno. Maybe ramp it up to 3 or 4 with hirefire.io. Once you're spending a few hundred per month on Heroku, that's probably the time to spin up a small kubernetes cluster and deploy stuff in containers.

[1] https://github.com/lserman/capistrano-elbas

This is a great question and something we've been trying to figure out ourselves. Historically, we were using Ansible to deploy Docker containers to EC2 instances, but have moved some services over to Kubernetes, Swarm and Lambda/Serverless. All of these are create the same deployment challenges -- the current products out there don't fit perfectly. The more we want to deploy to a higher level than "just Docker", the less Ansible provides today. But we wanted to stick to the core concepts of automation, continuous delivery (at least to staging), and chatops style management of production.

Our current approach is using an Operable (https://operable.io) Cog we wrote which takes the kubernetes yaml and applies it to a running cluster. It's not perfect, but I'm pretty happy with the direction it's going. We built this cog in a public repo (https://github.com/retracedhq/k8s-cog) so you are welcome to use any of it, if it's useful. Then we have our CI service send a message (using SQS) after a build is done to deploy to staging.

I've been dancing over the line between devops and backend development. It sounds like I am where you were historically, I'm using ansible to deploy docker images to EC2 instances, and a few monitoring scripts in python to do a little more finely tuned orchestration.

Lately I've been using home-brew chatops to manage products so it's nice to hear what other people are using. Operable looks really interesting, I'm going to give it a shot. Thanks for the example cog!

Would you mind expanding on why you're using both K8s and Swarm?

We don't use both in the same product. Currently we have a product deployed on k8s and a different one on swarm (well, 1.12 swarm mode, not the original swarm). We won't keep it this way forever, but we've definitely learned a lot about managing each in a production environment while running this way.

For Kubernetes, checkout https://www.distelli.com

Its a SaaS (and enterprise) platform for automated pipelines and deployments to Kubernetes clusters anywhere.

Previous discussion: https://news.ycombinator.com/item?id=13160218

disclaimer: I'm the founder at distelli.

I work at an established company and most of our apps are still deployed with RPM and puppet.

For our dockerized services we use Nomad internally and for a different product we've built in AWS we're using Elastic Beanstalk with all of the resources defined in terraform.

We use jenkins to manage the CI/CD for each method.

We currently use docker for all our services in AWS and we deploy them with ansible scripts. Services with a single container are fairly straightforward, but for services with multiple containers running, we use the DR CoN patter which works fairly well. Our ansible scripts handle everything from deploying the container, to deploying registrator, to updating the nginx templates, so it's fairly automated.

For CI, we use our own product (Runnable [0]), which allows us to test our branches with their own full-stack environments, which is great for solid integration tests. We often use it for e2e too. We're planning on adding more CD features in the near future though.

[0] http://runnable.com

We use Deis (https://deis.com/workflow/), which is a sort of Heroku on top of Kubernetes. For CI, we use CircleCI and automatically deploy when tests pass on the master branch.

Have been working with the kubernetes teams on slack. Kubernetes is definitely building a lot of the right things ground up, but its like Hbase vs Cassandra - the former needs a full time dedicated team to get stuff working.

Docker Swarm (especially 1.13 https://www.infoq.com/news/2017/01/docker-1.13) is like Cassandra for me. Yes it has a few shortcomings, but it allows you to have a fairly reasonable cluster using a stupid compose.yml file and very quickly.

I use cloud66 for my sideproject https://backmail.io . All the components are dockerized, and deployed/managed through cloud66 stack. For a smaller projects/teams, cloud66 provide an easier way to get everything working with single click ssl , easy scaling , and provide both vertical and horizontal scaling either using cloud vm's or your custom dedi machines. It also supports CI pipeline, to build docker images, though i use my own jenkins setup to build docker images.

We work with folks (very large banks, automotives, governments, manufacturers, retailers) who use Cloud Foundry, often combined with Concourse to deploy both apps and the platform itself.

It's surprising the number of people who want to build a homebrew Kubes PaaS. When I first started working in development, every company was building its own CMS, until it invariably realised that it was hard and that they were better off using a commercial or open source solution. Seems that container-based platforms are history repeating itself.

I've been using Docker, I love it. Hope to weigh the pros and cons of Swarm and Kubernetes and try those out too, but for most of my applications networked Docker containers are sufficient.

We're heavily invested in Azure and their ARM system (Azure Resource Manager). Our entire infrastructure is code as ARM Templates which we deploy to dev / test / production. There's no discrepancies between environments. Our entire application is then deployed on top. Everything is done through VSTS (Visual Studio Team Services). We're very happy with it, very flexible and we have a very stable platform because of it.

We do this for a living: http://gravitational.com/managed-kubernetes/

This is Kubernetes, plus monitoring of your choice, running on your infrastructure, remotely managed by our team. The side benefit is that the same setup works on different infrastructure options, so you deploy and run the same stack on AWS and also on-premise/bare metal.

Has anyone tried docker swarm or Docker datacenter, we've been looking at it but are on the fence vs kubernetes...

I have been using Convox to deploy our Docker containers. It has been great for the past year and is improving daily.

I really liked fleetd, so it's sad that it's wrapped up. It felt unixy and was small enough to understand. Now I'm looking toward serverless and total abstraction of the infrastructure. I kind of see the space in between filled by Mesos, Kube and others as a bit ephemeral.

Custom wrapper around Amazon ECS. We need more fine grained control over the instances to support encryption, secret injection, log aggregation, and so forth than other frameworks provide.

> How do you minimize costs with your solution?

Autoscaling groups triggered off of "cluster capacity".

Docker and nvidia-docker, since it allows pcie passthrough for novideo GPUs.

I'm working for a startup right now. We're using Kubernetes via GKE on Google Cloud.

Back in 2015, I implemented a Kubernetes by hand in AWS. I'm not going to do something like that again. GKE is fairly painless and it has most of the sensible defaults that I want. Networking just works -- pods can talk to each other as well as to any VM instances from any availability zone and region. Integrating with GCP service accounts just works. Spinning up experimental clusters is easy, as is horizontally scaling the clusters. One gotcha is that Google has not made K8S 1.5 generally available in all regions or availability zones. Otherwise, upgrades are pretty easy.

I have deployed with Docker Compose (not doing that again -- it is easier to use shell scripts). I have deployed with AWS ECS service (not doing that again; it does not have the concept of pods which severely constrains how you deploy). I used to deploy with Chef. I've heard of Chef's Habitat, but have not played with it.

Back for the 2015 project, I wrote Matsuri as a framework to manage the different Kubernetes templates. It's useful if you know Ruby. It uses idiomatic Ruby to generate and manage K8S specifications, and run kubectl commands. I wanted a single tool that could work with all the different environments (production, staging, etc.) as well as manage the dev environment. For example, if I want to diff my version-controlled spec on dev with what Kubernetes master currently has, I would use `bin/dev diff pod myapp`. If I want to diff the deployment resource by the same name, I would use `bin/production diff deployment myapp`. I can write hooks specific to the app. For example, `bin/production console mongodb` uses hooks to query Kubernetes to find a pod to attach to, determine the current Mongodb master, and invoke the command to go directly into the Mongodb shell. But I could have invoked `bin/staging console mongodb` or `bin/dev console mongodb`. I could do this because I have been developing software for a long time and I have enough ops experience to be able to put it all together. YMMV.

We're using Go.cd for the CD. I could have used Jenkins, but decided to give Go.cd a try. Go.cd has some advantages (such as much better topologies and tracking value streams) though there are also things it does not do as well as Jenkins (Go.cd auth mechanisms blow, and I had to write my own custom proxy to get Github hooks working more securely and reliably). Setting up GCP service accounts so that go.cd agents can deploy was a lot easier than I thought, once I read through the GCP docs. (Much easier than AWS).

Docker containers are still difficult to make. You want to vet things before using them. Handling this stuff is still going to be a full-time job for someone, both in terms of designing the infrastructure as well as the development tools. There are a lot of issues that come up because dev might throw things over the wall that might impact the overall reliability and performance of the system.

> . I have deployed with AWS ECS service (not doing that again; it does not have the concept of pods which severely constrains how you deploy)

What have you found are the biggest advantages of pods over containers? How does ECS constrain how you deploy? Are you simply referring to rollout/rollback, scale up/down?

The last time I used ECS for a production deploy, you could group containers together (just as you can on compose). However, there were no easy way to do service discovery. This made wiring containers together difficult. If I wanted one container to talk to another, I had to group them and deploy them as one unit.

That meant I could not horizontally scale one container more than the other. I can scale the whole group, but there is a lot of wasted resources at that point.

Kubernetes pods group containers together under a single IP address. Containers from one pod (one IP address) can talk to any other pod. Docker did not even have this functionality until 1.12, and that is too little, too late. (And I am not sure this is something ECS supports right now). Combined with label selectors, long-running services (which binds a DNS name to the set selected by the label selectors), I can horizontally-scale pods and still maintain service discoverability. Using DNS makes service discovery stupid-easy. This means I can scale Kubernetes pods independently from each other.

Another consequence of using Service objects to select a set based on label selectors is routing can now be dynamic. Pods that need to talk to another pod goes through the service. I can then scale the dependency up and down, and it doesn't really affect the pod that requires that service. I can do rolling upgrades to the dependency, and it works because Service abstracts that through label selectors.

There are still some warts related to this setup. Stateful sets still needs a lot of work. I've also found that many applications caches IP addresses (redis sentinel being a notorious example). To work well with Kubernetes, it's better to always query DNS when making a connection. Ruby drivers for Mongodb and Redis, for example, will cache DNS lookup, making failover fragile (if you are running Mongodb and Redis inside Kubernetes; if you're not, you won't have this problem).

I was choosing between Kubernetes and Mesos after ECS, but had not looked into either deeply. It was random chance that took me to Kubernetes instead of Mesos. Kubernetes solved many of the pain points of Docker Compose and ECS.

Forgot: Matsuri is here: https://github.com/matsuri-rb/matsuri

It's largely undocumented. I've heard some interest in places to use it. Time constraints is such that creating examples for it is low on my priority list. Frankly, if you don't know Ruby, you're probably better off looking at Helm.

A related question: how often are the people here scaling their applications up and down?

Do you have large workload spikes, or traffic spikes?

At Pivotal we use BOSH[0] almost exclusively for deploying distributed systems. The motivating usecase was Cloud Foundry[1], but it can be used for pretty much anything. Our founding role in both of these is why BOSH is our first choice for such occasions.

It has a plugin model (CPIs) for hosting substrates, so right now it can deploy and upgrade systems on AWS, GCP, Azure, vSphere, OpenStack and there are others I forget right now.

It's proved itself in large production systems for years. Every week or two we entirely upgrade our public Cloud Foundry, PWS, and nobody ever notices.

OK, that's a lie. You get an email from CloudOps: "We're going to deploy v251". Then a few hours later: "v251 is deployed". Or occasionally: "Canaries failed, v251 was rolled back".

There's nice integration with Concourse[2,3]. You simply "put" your deployment and it just gets deployed for you. Our CloudOps team do this now, which makes their lives that much easier.

Versioning is trivial, especially if you're working in a commit-deploy model via Concourse.

The downside is that BOSH is BOSH.

We're doing lots of work to make it friendlier and more approachable, but right now it's powerful and very opinionated. It does not have a smooth onramp, because the basis of its power and reliability is that it insists on certain minimum conditions first.

It's really meant for operators, not developers, but at Pivotal the main consumers by volume are developers. Usually to deploy Cloud Foundry and Concourse; though my current assignment is actually going to be shipped purely as a BOSH release.

Disclosure: I work for Pivotal on Cloud Foundry.

[0] http://bosh.io/

[1] https://docs.cloudfoundry.org/deploying/common/deploy.html

[2] http://concourse.ci/

[3] https://github.com/concourse/bosh-deployment-resource

I just do a "cap production deploy", and it does everything for me (I use bluepill + god for running background processes too)

I don't need Docker, and think it's too complex. I deploy to over 50+ servers, so don't tell me it's because I run a simple setup :P

If you're on AWS, then you should be using ECS first of all.

Openshift is a wrapper on top of k8s.

You should just use helm.

In a team where Node and Golang were the language of choice, we used GitHub private repos for code, TeamCity as the driver for CI/CD and Salt to deploy the Docker images to our different environments running on AWS EC2 instances. I must say I really liked TeamCity and its different integrations with GitHub, build processes (Node/NPM, frontend tooling, ..) and how variables could be shared down with project and releases.

To deploy code with Salt, we had an SSH account on the Salt server configured with a bunch of deploy keys. Each of those had a forced command that would read $SSH_ORIGINAL_COMMAND and forward this information to an agent (running as root) that would execute Salt with the correct arguments, based on information in $SSH_ORIGINAL_COMMAND. This let us use a build step in TeamCity that basically did ssh deploy@mgmt-gateway [env] [project] [version]. Deployments were logged to New Relic and Slack.

In a different team that are fond of PHP we use a private GitLab CE for code management, GitLab CI Multi-runner as the build agent for CI/CD, Ansible for configuration management and code deploys to different environments running on AWS EC2. Like in the previous team, we have configured our .gitlab-ci.yml to pass some arguments in $SSH_ORIGINAL_COMMAND over SSH to a management node that in turns talks to Ansible.

Something I like with having a private GitLab CE instance is that development doesn't stop because your public Git host is DDoSed or have other problems (like the recently discussed one here on HN).

Test and staging servers are shutdown/destroyed off-hours and restarted/recreated by cron jobs that execute Ansible plays which identify eligible EC2 instances via EC2 tags. Production environments with multiple servers are similarly scaled down during off-hours. By simply modifying/removing the "shutdown" tag from the AWS resources, teams are able to exclude their test/staging environments from the scheduled shutdowns, something which is useful for upcoming releases. ;)

In the Node/Golang shop I loved how simple Docker images were and how good it felt to deploy it to isolated containers. Unfortunately, I don't see how that's possible (in a clean way, preferrably without using two images) when both an Nginx process (static file serving, e.g. frontend resources) and a PHP-FPM process needs access to the same code release.

(If you have experience with Nginx/PHP-FPM apps and Docker, feel free to enlighten me!)

Things I'm not entirely fond of about GitLab CI is that:

- each branch in each repo must have a .gitlab-ci.yml that is up-to-date (administrative challenge!)

- it's entirely driven from a git push (though the web gui provides buttons for existing builds to retry/manually execute steps to e.g. deploy code)

GitLab has no support for a centrally managed .gitlab-ci.yml file on a project group and/or project level. There's no way to define variables on a project group and/or project level. There's no way to schedule jobs so that you can execute daily/weekly tests, or to manage jobs (in a user-friendly way via the web gui) that perform cron-like tasks, so you can avoid putting these tasks on the server themselves in /etc/cron.d (which becomes a problem when you restore backups / bake AMIs / do auto-scaling).

I'd love to look more into K8 and Google's cloud offerings, especially since I believe this might be the future and because I believe Google are lightyears ahead of the competition when it comes to security and protecting the privacy of its customers. Unfortunately I'm afraid it's not viable given my team's current investment in Nginx/PHP-FPM apps and various AWS services.

As long as your apps run well in containers (Docker or Rkt or others even), they can run and work well on K8s, which also runs well on AWS. You should consider K8s to replace imperative CM-based (Salt/Ansible) deployment mechanisms. The native pod abstraction in K8s can also nicely address multi-container composition issue you mentioned.

Matador Cloud (https://matador.cloud/) uses nixops to manage NixOS machines.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact