Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Migrating from Heroku to AWS using Docker (doordash.com)
149 points by stanleytang on April 4, 2015 | hide | past | favorite | 54 comments


If you're looking at doing this, especially if you're moving from Heroku and/or using Docker, seriously consider AWS ElasticBeanstalk. http://aws.amazon.com/elasticbeanstalk/

In case you don't know, ElasticBeanstalk is the AWS Heroku but _a lot_ more flexible. It's basically a layer on top of EC2 and ELB. It takes care of creating auto-scaled EC2 instances to your specification and deploying your application to it (with rolling updates!)

Out of the box, it supports PHP, Java, Python, Ruby, Node.js, .NET, Go, and Docker (so, everything else), and it's fairly trivial to set up a heroku-like git-based deploy with something like circle ci.


It's worth noting that Elastic Beanstalk just released support for multiple docker containers per instance. This takes EBS from being a novelty to being production-grade Docker orchestration. Previously, it ran a single docker container per instance which - while good for stage/prod parity and easy rollbacks - doesn't really take advantage of the core Docker advantages over a VM. Now, you can orchestrate multiple containers per host, including nice-to-haves like zero-downtime deploys, automatic load balancing, monitoring, and log aggregation.

We're using EBS at StaffJoy and in the process of switching to multiple containers per EBS host. This means that a staging environment can have cheap redundancy (e.g. for zero-downtime deploys) by utilizing the same EC2 hosts, but also that queue workers (sporadic, processor-intensive scheduling operations) can split a processor with the more memory-intensive web app.

Getting EBS to work with the private docker registry was a bit of a pain because documentation was lacking - if you're having trouble with this, feel free to ping me. I'm considering a blog post on our deployment system, including easy rollbacks.

The problems we haven't figured out yet on EBS are streaming logs from hosts to another service (e.g. loggly/logentries/papertrail for realtime production error monitoring) and monitoring deploys (specifically to send a Slack notification that changes are fully deployed and to flush the Cloudflare cache).

(Note - here EBS = elastic beanstalk, not elastic block storage)

Edit: A final note - we ended up doing a Jenkins install instead of using CircleCI for build promotion purposes. We build a container, push to the docker registry, then deploy to a stage. We have a separate job that lets us deploy to prod. However, there's no rebuilding - it's just a button that takes a build number and tells EBS to deploy that existing build to the master cluster. For rollbacks - you just enter a prior build number and a prior build immediately starts rolling back We couldn't replicate this stage->master promotion/rollback in CircleCI.

Edit2: We chose to build the image ourself and register with the private registry instead of having AWS build it because we run our unit tests within the built container, so we can guarantee that the image that passes tests is the exact same as the one deployed to production. There was some concern that the image hash could differ - e.g. due to a pip download failure - if AWS built the image separately.


> (Note - here EBS = elastic beanstalk, not elastic block storage)

I, really, really don't recommend using EBS as an abbreviation for Elastic Beanstalk, since Amazon uses the actual abbreviation canonically to refer to Elastic Block Storage. At worst, just call it Beanstalk, when the AWS context is clear.


The AWS abbreviation for Elastic Beanstalk is EB.


We use beanstalkd queue too (SQS gets expensive at scale!), so "beanstalk" is unfortunately also equivocal in our infrastructure. I think it's just bad naming for AWS Elastic Beanstalk overall.


You should do a blog post. I had no idea Elastic Beanstalk allows multiple docker containers per instance. This changes EVERYTHING! I would love to hear how you got this all tied together.


Here's the announcement - it's a little over a week old:

https://aws.amazon.com/about-aws/whats-new/2015/03/aws-elast...


I think he meant a blog post about your entire architecture and build/deployment would be great. And I agree :)


I am wondering how difficult it would be to "push to elastic beanstalk". It would be kind of cool to have some sort of git daemon running in the background, so you wouldn't necessarily need to use a third-party CI service. You could just push to a specific git remote, just like with Heroku.


you don't need a third-party CI service, you can just run "eb deploy" from your git repo

http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/create...


That's definitely something we considered doing at DoorDash. I've heard bad things about it as you want more control over deployment/package upgrades, etc. how has that played out for you?


I'm also curious if any large projects use it. We tried it out for a couple weeks - it worked fine but we felt overly constrained by the configuration rules and ended up rolling our own solution.


We use it at Mi9 to run some of Australia's highest trafficked websites, like ninemsn.com.au (and family of new sites) and jumpin.com.au. Also Stan.com.au (Australian Netflix competitor) runs their micro service architecture on Elasticbeanstalk


Suggesting that moving to Docker obviates the need for configuration management is frankly naive.

The whole point of CM tools is to make the layout/configuration of systems predictable. That doesn't make Docker or CM redundant. It means that when you build Docker containers, it makes good sense to install a CM tool and use it to do the systems configuration.

Saying that the Dockerfile works like BASH and thus makes life easier is a huge step backwards. Ultimately, administrators have to enter, troubleshoot and debug containers. Moving back to shell script-style configuration inside of containers just kicks the problem down the road.

Docker, and containers in general, are great. And you should treat their contents with the same respect that you do any system.


> It means that when you build Docker containers, it makes good sense to install a CM tool and use it to do the systems configuration.

If you belong to the "one container, one process" camp, I'd say CM has little utility.

Let's say we have a redis image. Using a CM I'd have to:

  * install ruby and various ruby lib pkgs
  * install chef or puppet, and a multitude of rubygems 
  * add recipes/manifests
My image is now several times bigger than it need be, and the configuration is now more opaque, in several files. It's a pain to debug and troubleshoot when something deep inside the recesses of chef-solo fails. And why would I?

I'd rather have a dockerfile with `apt-get update && apt-get install -y redis-server` and perhaps a line adding a custom config file. Very readable.


Appreciate the comment. The point I was hoping to get across wasn't that Docker would completely replace CM (or that CM is a bad thing), but that it could help reduce the amount of work in the CM world. As mentioned, we still needed to use Chef anyway (and using Opsworks to get a head start), so at least in this kind of environment CM is still necessary. That said I can see how the article could be slightly misleading =)


I appreciate your response, but -

When you discuss CM being necessary, you are talking about using it on the host and not within the container.

Ultimately, operability and proper configuration inside the container is critical. Using a Dockerfile with no CM inside it is not much of an improvement on not using CM anywhere.

You don't need stateful CM inside the container. It's fine to fire and forget - use Puppet in apply mode or Chef solo. But there's a reason these tools are used in building AMIs and containers over scripting languages - we've come a long way over the past 10 years, and I still feel that switching to the Dockerfile as a configuration mechanism is like moving back to configure/make/make install.


What else do you think is happening under the hood when you use a CM tool like, say, Ansible? Ansible translates your configuration to small Python scripts, uploads them to the remote host and runs them.

What I'd like to see is a "script dump" output that still lets you create your yml configuration but converts it into shell scripts that you can call from your Dockerfile without any dependencies.

Of course, now you have an additional build step... and what could you use to tie everything together? make/make build :)


What happens under the hood is almost irrelevant, because it's predictable and repeatable.


Would you mind sharing more details about the Docker setup on AWS?

"Our code deployment flow was straightforward, and mostly consisted of building a Docker image off our latest codebase and distributing it to our server instances, with the help of a Docker image server. We only needed to write a few short Chef scripts to download the latest Docker image build and start up a new Docker container."

I was wondering what did you mean downloading the latest Dockerfile and build the image, or what sort of building are you performing with an image if it was not an accident? The diagram suggest a slightly different setup.

I my mind, you build your docker images on the CI/CB system and push them to a repository, than you can deploy them with docker pull. I was wondering if you used the docker-registry service or went for Docker Hub? I finished migrating some of our stuff to AWS/Docker last week but we skipped Opsworks part. I am familiar with Chef though, I just think it is overly complicated and it is hard to debug if something goes sideways.

Anyways, thanks for sharing.


Store your Dockerfile in version control, either in its own repo with other Docker files or in the same repo as the app.

Use Jenkins (or preferred build system) to fetch the Dockerfile, your app, build the Docker container, and then docker push to your own private docker registry.

Bake your AMIs with your Docker image (tags are your friend). You don't want to have to scale and find out you're having problems fetching Docker images from your private registry. You'll still be using config management for the underlying Docker host.

Hope this helps!


Yup, pretty much.

We actually were using Docker Hub but they went down a few times so we are going to try setting up a private registry. We also tried Docker Hub's "Automated Build" service but it doesn't do any layer caching between builds so we found it way too slow.

One thing I'd recommend is at least installing the Docker service itself on the AMI (spinning up EC2 instances, especially on Opsworks, is slow enough as is)


I actually trust Docker Hub more compared to a private registry setup... had so many hard to debug EOF errors, registry container crashes, etc.

But the Docker Hub is also a single point of failure, so now I'm considering just having our CI service dump (docker save) the container build to a tarfile and send it to S3, then having the destination host download it and run docker load.

Docker's handling of layers is excruciatingly slow, anyway.


We use private registry instances behind an ELB, with the containers stored in S3. When production counts on the registry being available, you've got to ensure reliability of the service and durability of the underlying container data.


Great, thank you!


A year into this I'd love to hear a cost benefit analysis including infrastructure & ops cost, uptime, disaster recovery, security management (key rotation, heartbleed-like threat mitigation, etc), and overall developer happiness.


Having run Cloud66 for the past year, I can say that actually managing infrastructure and dealing with security management such things would have been prohibitively expensive. Cloud66 has taken care of things like heartbleed and package updates (and alerts for rails updates). It's been a good experience in that I learned that I really do want a managed hosting environment - since I can see what they've done, but don't actually have to do it myself.

I would also love to hear a cost benefit analysis a year into managing your own servers.


This is a great article and breaks down the components needed to run Docker in production on your own servers. We have been doing that in Cloud 66 - http://www.cloud66.com/ (on 8 cloud providers including AWS) for a while and it's been a great experience: http://blog.cloud66.com/docker-in-production-from-a-to-z/


I'd like to chime in here in support of Cloud66. I've been running normal (non-dockerized) Cloud66 for about a year on a wide variety of projects, including the main Bike Index site (https://bikeindex.org), and it's fantastic.

It's all the ease of Heroku and all the flexibility of actual servers, and way cheaper than Heroku. The only reason I've touched Docker on Cloud66 is JRuby - since this article is about a python app, it would be also require Docker. But for Ruby MRI applications, the logical first step as soon as Heroku stops being cost effective is immediate (and painless) migration to Cloud66.


Did you know your site was blocked in Indonesia?

> This site is prohibited. The site is included in the list that should not be accessed through this network because contains porn, gambling, phising/malware, SARA or Proxy.

> In accordance with Indonesian Government Regulation refers to Law Number 36 of Year 1999 on Telecommunications Article 21, "Telecommunication provider is prohibited from conducting business telecommunications operation contrary to the public interest, morality, security or public order."


what is SARA?


An acronym that is code word for ethnic, religious, tribal, and racial issues that are not to be discussed publicly and are censored by the ministry of information.

https://books.google.com/books?id=2oZQRuT78JIC&pg=PA80&lpg=P...


> With Heroku, you lose fine-grained control and visibility over your servers. Installing custom software in your server stack is far from straightforward, and it’s not possible to SSH into a server to debug a CPU or memory issue.

I found this annoying at first along with not being able to save files to the server. However, Heroku is forcing you to automate server setup so servers can be restarted, destroyed and created without losing data or functionality. If you allow SSHing in to install software and storing files on the server, you're probably going to get some nasty surprises when you need to recreate the server or add more servers for example.

You mention it was a couple of months of work to transition to your EC2+Docker setup. How much time do you find is required to maintain this compared to your Heroku setup? Heroku does seem expensive but I find it can have lower setup and maintenance costs than a custom solution.

Being able to run and test your Docker setup locally is a nice benefit though. I haven't found a satisfying solution to this for Heroku yet.


>I found this annoying at first along with not being able to save files to the server. However, Heroku is forcing you to automate server setup so servers can be restarted, destroyed and created without losing data or functionality. If you allow SSHing in to install software and storing files on the server, you're probably going to get some nasty surprises when you need to recreate the server or add more servers for example.

The point the author was making was that automating the installation of custom software on Heroku sucks and that you can't SSH in to debug issues that only occur in the Heroku environment and can't be reproduced locally. Nowhere did the author say they were SSHing in with the intention of installing custom software.

The author is using Docker to create images that contain the OS, the custom software, and the app itself. These images are then loaded onto EC2 instances by Chef and the application is run. There's no manual installation.

Even if the author didn't use Docker, there are a lot of ways to automate server setup with AWS. Anyone seriously using AWS for production software is automating server setup. This isn't some kind of Heroku specific concept.


As skwirl pointed out, yes SSHing is more for debugging production server issues than for setting up the servers.

The transition itself took one month. So far it's been several months and it's been very stable with pretty minimal maintenance. We did end up writing a number of Python scripts to help make the deploy process easy for developers (i.e. replicating most of the common Heroku features we used to use)


While you cannot ssh directly, you can get a bash shell on a heroku dyno quite easily by running:

heroku run bash

This can be very handy for debugging.

That said, heroku will spin up a new dyno instance to run the bash shell. You cannot (as far as I know) get a shell on a specific dyno of your choice.


This post is mostly focused on migrating application instances. I wonder what's the best step-by-step way to move database from Heroku/EY/etc provider to Amazon .

And is it possible to first migrate app instances that would for some time use Heroku DB and then migrate the DB itself? Instead of doing everything in one big step?


Yes, it's possible to point a heroku app at a EC2 DB, and it's also possible to point an app running on EC2 at a Heroku DB.


If you are using Docker on AWS then EC2 Container Service (http://aws.amazon.com/ecs/) would be an option. It's still in beta but looks promising. It takes care all of the cluster management, launching, stopping etc.


I'm in a situation like this as well, using Heroku but not happy with the performance on the Dynos and looking to move to EC2 (we already use our own databases etc hosted on EC2).

The biggest issue is finding a solution which can give similar ease of use and functionality as Heroku. Doing a opsworks/chef setup like this isn't appealing - over time we want to have many microservices which can be scaled independently, and you don't want to be doing a lot of setup for each one. It's also complex to setup the kinds of scaling and rollbacks with git deploys like you can get on Heroku.

Right now Deis or Flynn looks best, giving most of the advantages of Heroku, but they are also pretty immature. Deis only just got the ability to send logs to a syslog instance, and can't be upgraded without either building a new cluster and migrating across (lots of work), or doing a in-place upgrade, which could mean up to 10 mins of downtime.

Flynn has no ability to send logs to an syslog server from what I can see, and seems even more immature than Deis, but looks like it might have a better technical foundation.

I've also been evaluating Amazon's Elastic Container Service, but that is unfortunately also very immature - No load balancing integration yet and also no information on logging, rollbacks, etc.

Elastic beanstalk is also interesting in a way, but we already have one app deployed on it using Docker and it's not great - logging is a kludge using a logger running in each container, and you either use your own docker repository (in which case you can't do rollbacks), or give them a zipfile with a Dockerfile and your app, which allows rollbacks but means the docker image gets built by every server in the cluster. It also doesn't seem to have any way of easily running and scaling multiple processes per app like you can with the Procfile on Heroku.

Anybody else in a similar situation? It seems that there are a bunch of interesting projects that will be very competitive with Heroku soon, but nothing that's really matured yet.

I wish Heroku would just introduce a new range of PX (running on their own EC2 instance) Dynos without the crazy pricing - right now the only PX dyno they have is $500 per month when you can get a substantially faster instance on EC2 for ~170 per month.


"It also doesn't seem to have any way of easily running and scaling multiple processes per app like you can with the Procfile on Heroku."

Have you seen that Elastic Beanstalk now lets you deploy multiple Docker containers in a single environment?

https://aws.amazon.com/about-aws/whats-new/2015/03/aws-elast...


I have, and I should really look into it more. Although my experience using Docker on Elastic Beanstalk hasn't been so great.

We actually upgraded to PX dynos recently and the performance is great - just expensive!


My first thought when I read the news about Heroku's beta pricing changes was that if a huge number of non-paying customers leave then Heroku might provide more resources for less money to paying customers.

As another commenter said, I would also like to see an honest writeup by the author of cost and engineering time trade offs - written a year from now.


Since you folks serve real users in the real world (as opposed to being a much higher QPS product like a consumer oriented game or analytics or data service), I can't imagine your needs are extremely high in terms of QPS. Why not optimize other front-end experience, like serving one CSS/js file instead of many, serving assets from CDNs, etc? Cheers on moving to AWS and using Docker, though!


Interesting way to go about it. But how scalable would this solution be? I would love to hear your opinion on your implementation vs using the traditional configuration management tools. Which would perform better? My point being that in the future when you have scaled out to a lot more servers you shouldn't have to end up trying to move everything to chef later on.


I'm curious about what kind of scalability challenges you had in mind?


I loved this article. It was very clear: the reasoning the high level step by steps were very helpful. +1 suggestions to follow up in 3/6/12 months. Also, I'd love to see a more detailed explanation, chef scripts, etc.


Why did you not use Deis? Seems tailor made for a migration like this!


Docker on a single host is good enough for my develop/use, but I don't know whether swarm like tools is ready for real multiple-servers on production.


Out of curiosity, did you consider Dokku, which approximately reimplements the Heroku API on top of Docker?


Good question - I did take a brief look but it wasn't really solving the main problem we had at the time, which was how to manage a cluster of servers. I feel like emulating the Heroku features is pretty easy to do yourself anyway (and kind of a fun exercise)


Dokku is a single host solution, so no redundancy and no horizontal scaling.


> it’s not possible to SSH into a server to debug a CPU or memory issue.

Not true. You can do all sorts of SSH-like things with 'heroku run bash'. Of course, your changes will be wiped when you end the session, but you can inspect things.


Yes, but you won't see production traffic there since it's a isolated instance. Great for doing one-off tasks with your app, but doesn't help debug performance issues.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: