Hacker News new | past | comments | ask | show | jobs | submit login
I Want to Run Stateful Containers, Too (techcrunch.com)
152 points by kevindeasis on Nov 22, 2015 | hide | past | web | favorite | 99 comments

I think the author is implicitly describing the paradigm shift from software and hardware abstractions to service abstractions. We used to make computers by soldering electronics together, then by assembling cases and components, then buying premade servers, then renting cloud capacity, and now we're starting to rent everything as services.

The cloud is about going from capex to opex and the "capex" now is the initial work needed to define your own architectures and stacks for every project (before you get to work on the actual project, i.e. the differentiating part). Amazon is eliminating most of this by offering building block services that fit together with little hassle.

So the challenge for open source is how to move on to this era of service abstraction. It's no longer enough to just provide an NPM package or a configure script.

I think Docker is in a good position to bring us there, but it's currently stuck at the stateless container level. Something needs to evolve so that launching a scalable and auto-maintainable database cluster along with a connected web application cluster is as easy with Docker as it is by renting a few Amazon service.

>The cloud is about going from capex to opex and the "capex" now is the initial work needed to define your own architectures and stacks for every project (before you get to work on the actual project, i.e. the differentiating part). Amazon is eliminating most of this by offering building block services that fit together with little hassle.

Maybe they are eliminating it for themselves, but if they are eliminating most of that for you, then that contradicts your prior thesis. You aren't defining your own architecture then, you are just conceding to becoming a part of Amazon's architecture. Amazon will lock you into their ecosystem as much as possible. As Amazon's ecosystem fills up with services, there will be little to differentiate your service from any other Amazon hosted/run service.

A much better idea would be to have architectures that don't depend on any specific vendor or company, which are deployable everywhere.

>there will be little to differentiate your service from any other Amazon hosted/run service.

Besides, you know, your product.

Infrastructure can kill your product if it's awful, but it certainly can't "differentiate" it in a good way.

How does OpenStack stack up in this regard? A step in the right direction?


If you want to run your own infrastructure from the ground up, maybe. Ironic (bare metal deploys) + Magnum (container deploys with Kubernetes) is exciting in that area, but as someone who wrote and ran an OpenStack cloud, I wouldn't recommend it. For most people, something like ECS/GKE/bare Kubernetes on a cloud provider is a decent step in that direction.

Terrible usability, and mainly aiming at the lower levels of the stack, not the level of actually deploying applications. It is far easier to get a container stack running than to bother with Open Stack.

Personally I think it might make sense to run your stateful apps in openstack VMs and the stateless portion in containers.

Openstack is an equally terrible place to put stateful apps.

What exactly does containerizing everything give you anyways? I really don't get it.

Before - use chef/puppet to manage dependencies, distribute config files. run processes. maybe use something like upstart to restart on failure.

After - use Dockerfiles to manage dependencies (same thing, bunch of install commands). now you have a container for the web app, one for another service, etc. so everything is isolated.. great. what do you gain over running 2 separate processes? thats pretty damn isolated too.. except for the same disk, they each have own virtual memory, state, config files etc..

I'm not a (modern) ops expert at all, but i know my way around the command line. What do you gain from Docker or say launching a mongo instance on the cloud instead of just renting a server and launching the process? I really want to know. Atleast on a small scale, say if you're managing say 10-20 servers, i don't see the point.

I see an instance as sort of a compiled binary of your app and the environment it runs in. You can install your app, all its dependencies, get all the config files correct, run tests on the created instance, and then you have a Docker instance and can start up any number of them in production.

If any of those steps fails in the middle of some script, you haven't put any server in some half way there state.

If a rollback is needed, you can switch to the previous Docker instance, and changed requirements won't trip you up.

We used to do it with zipped chroot environments and some startup/shutdown scripts, Docker is more or less that.

Of course you need to store data outside them, as otherwise you lose once you switch to a newer version of your instance, but that's easy enough.

All that said, still not really a fan of it.

Disclaimer: I work at Google on Kubernetes

I'd love to hear more - anything in particular that doesn't feel like a fit?

Nothing specific, they do what they do quite well.

I have a generic feeling that the rabbit hole is getting too deep though.

In our case we run large servers with virtualisation software (I think they run on Windows, I never touch this layer). Then we have virtual Linux servers running on them. They have package managers that are also a way to solve this sort of problem. We run Docker instances on those servers. They are Linux again, so have their own package manager. Then we run Python (with virtualenv or Buildout) or Node (with npm), that again have their own package managers that try to provide isolated environments. And of course they run bytecode in the Python or Node...

And that ridiculous stack is used to run some relatively mundane web app that can't even mutate its data directly, and is used to send some set of Javascript and HTML and JSON to the user's browser. Which is where the app actually runs...

I wish we had some Web framework as nice to use as Django but compiling to static binaries that are immediately sort of an equivalent to a 12 factors app, and a kernel that was made for running them directly. Or so.

But this is the internet, right? It literally evolves, just like nature -- we can only build on top of existing layers, not remove a few and start over.

Disclaimer: I work at Google on Kubernetes

The number one feature we see people get out of containers (as we did at Google) is flexibility. Bin packing and (some) isolation is important and valuable, but the vast majority is just being able to throw 25k containers at a system (Borg for internal work, now Kubernetes externally) and just say "Go run this somewhere." The system figures it out for you - and you never have to deal with server config. Then if you need to move to a new data center - presto, happens in seconds.

Even on a small scale there are significant benefits, both in production and in development and test. As the article states, containers are great for moving away from a monolith architecture. Many applications are moving to a distributed service oriented architecture. Each piece can be deployed separate. That leads to an easier more maintainable development process. However, the larger number of independently deployable apps adds operation complexity. Trying to manage that complexity, even on a small scale is no fun. Queue containers. Containers by themselves are interesting; they let you run many more applications on a single machine. Unlike a VM, there is no operating system chewing up ram and cpu. On machines only able to run 4 VM's reasonably well, I have seen nearly a hundred containers. That is only a small part. Container platforms such as Openshift make the process much nicer and more powerful. Openshift uses Kubernetes under the covers. You arrange your applications inside pods. The pods are wrapped in a Kubernetes service. The pods automatically assign internal ip addresses; that allows you to put, let's say, 20 wildfly servers on the same machine without worrying about ip and port clashes. The service acts as a load balancer and a router directs traffic accordingly. I don't have to worry about any of that. It is all automated. I create a build config and a deployment config for my services (there could be hundreds) then Openshift (or other container PAAS) maximizes the potential of my hardware. In development, it is much easier to completely replicate production. Everything is encapsulated in the container image. Not just the application; all of the software and configuration the application requires is part of the image. If I need to recreate the container image, every step is in the Dockerfile. I do not need documentation to remind me how everything is set up. In openshift, I grab a template, then click a button; within seconds, the containers of my app are running. That would take much longer with VMs. Containers allow me to develop in a production like environment because I am able to deploy an unscaled version easily. What took multiple physical machines to replicate now takes a few seconds to deploy on a laptop.

The point is better hardware utilization. Part of the "isolation" that containers provide is resource allocation. You assign limits to the CPU cycles, RAM, disk IO, network IO etc that each container can use.

Without any kind of virtualization, i.e. processes in an OS running on bare metal, processes can starve each other of resources - hog the CPU, fill up the disk etc. Virtual machines like kvm and Xen limit the resource usage of guest operating systems, but that comes with a lot of overhead. Each VM has to run its own kernel and a bunch of user-space programs in addition to the the service you're trying to provide. The disk images have to include a full operating system as well - config files, libraries, executables and so on.

With containers you get the best of both worlds: the resource limits of virtual machines with the low overhead of processes sharing an OS. That means you can run multiple services per machine without them interfering with one another, and you can pack more of them onto each machine than if you were using virtual machines.

So if you're Google (which did the initial kernel work to support containers in Linux) you can get a lot more computation out of your 100K-server data centre if you package all your software as containers and write sophisticated software to distribute them to server and shuffle them around as load fluctuates and batch jobs get run.

For those of us that don't run our own data centres the benefit is much less clear. Running containers on top of virtual machines seems like a pretty terrible idea. With all that overhead, performance suffers and you end up running more containers to compensate, which just makes everything more complicated and more expensive.

I think the current container craze boils down to two things: first, no matter what ugly, undocumented, unspecified process you use to build a container image, once you have the image you can deploy it in a repeatable, consistent way. That's a big improvement over Chef/Puppet right there. Second, fashion. Doing ops the Google way is cool, and it'll make my resume look good even if I'm not currently in a position to reap the benefits that Google does. Being a Puppet expert is so passé.

Disclaimer: I work at Google on Kubernetes

Portability, as you mentioned, is huge - but beyond that there is also binpacking benefits. You can basically cut your serving costs by 50% or more by using lots of single process containers instead of poorly utilized VMs.

Yeah. This is what I meant by "better hardware utilization".

I agree with the author that writing your own snowflake PaaS is a mistake. I work at Pivotal on the fringes of CloudFoundry; OpenShift Origin is a competing system. Either way, you should be using a full PaaS instead of rolling your own.

But this is the bit that surprises me:

    They get you hooked for free and the next level 
    is $1,496 per month… wtf! MongoLabs is little 
$1500 per month, versus weeks of engineering time spent tinkering with and upgrading and bug-fixing and trouble-shooting and security-patching a hand-made solution is a fantastic bargain.

Supposing the author was running on Pivotal Web Services. It would've taken less time to add a MongoLab service (about 2 minutes, 3 if you include a re-stage) than to perform the CloudFormation calculation (say, half an hour, resulting in no running software).

PaaSes like Heroku, Cloud Foundry and OpenShift are feature-complete for the cases that application engineers and operators care about. If you roll your own you're directing effort to something that doesn't provide user value.

If I walked in on someone at a regular dev company rolling their own operating system for a web service, I'd be surprised. Writing their own programming language? I'd be skeptical. Oh you built a new HTTP server? Why? Outside of a research environment, why are you doing that?

And so it is with PaaSes. The marker was passed years ago. We don't need to go on these spiritual quests any more.

Disclaimer: I work for Pivotal, which donates the majority of engineering effort on Cloud Foundry. I'm actually in Pivotal Labs, the agile consulting division, which is where I morphed into a lay preacher for just-using-a-PaaS-dammit.

> $1500 per month, versus weeks of engineering time spent tinkering with and upgrading and bug-fixing and trouble-shooting and security-patching a hand-made solution is a fantastic bargain.

That's only true if you look at it from the point of view of an enterprise, a startup with VC funding, or generally a "rich" company. If you're a cash-strapped startup/small business with only a few thousand dollars profit per month, $1500/mo is a huge deal.

This is especially true when you consider what that same $1500/mo will buy you using something like DynamoDB or Aurora. Both those solutions will give you more storage, are managed for you and will mostly scale up with you, meaning you don't have to start anywhere near $1500/mo.

I know the article said tying yourself to Amazon feels wrong. But focusing your time and energy on managing infrastructure that could be managed by Amazon instead of focusing your time and energy on your product feels more wrong to me. There are exactly zero startups that have succeeded because they had a more reliable and performant MongoDB installation.

A big +1 for DynamoDB + S3 for a cash strapped startup.

If you are writing an app from scratch, and the cost of data services is a big concern, then don't pick a data store with complex and expensive replication properties.

DynamoDB is admittedly harder to understand and use than Postgres or Mongo, but when you figure it out its a HTTP data API with no setup or maintenance costs.

$8/mo of DynamoDB can easily cover your users and other CRUD.

> If you're a cash-strapped startup/small business with only a few thousand dollars profit per month, $1500/mo is a huge deal

If it takes 30+ hours of engineering time per month (less than one work week), then you've hit your break-even point. This is assuming the engineer is $50/hour, including overheads.

Wouldn't need to worry about auto scaling Mongo yourself if you at that stage of the company though? You'd need to have quite a bit of traffic to need it. And if you do, hopefully you have generated $ needed to support it.

Engineering time isn't free, even for cash-strapped startups.

For a startup, your hyperfocus has to be on creating a product that is attractive to users. Everything else is entirely secondary.

Yes, I think devs sometimes forget that this is the bleeding-est of bleeding edges. Hell, I can't even get a normal Digital Ocean server instance to programmatically drop and recreate doing minimal things like creating a user and setting SSH keys. I suppose if I wanted to learn something heavyweight like Chef or Puppet I could, but for now, I just maintain a list of commands to run, and understand that Digital Ocean servers can't quite be treated like livestock just yet, though I can fake it pretty well. Eventually Digital Ocean will get around to making the platform changes, which are all already in feature request.

Engineering time is very limited and the support cases are exponentially large. The author wanted MongoDB supported in the hard case. Well, that means you also need to support 4 or 5 other, similar databases the same way.

If you want a bleeding-edge PaaS with corporate support, you should expect to pay corporate prices. If the project / organization doesn't allow for that, then either the requirements or the mindset is off-kilter.

I think the what the author is really mad about is the vendor lock-in. That used to bother me too, until I realized that learning new platforms is done on my employer's dime and not mine.

A few more years, and some other challenge will be on the bleeding edge and this one will be mostly solved, meaning that open-source community solutions will exist that you can build once and forget about it. Knowing most devs, they'll spend all their time complaining about whatever the new edge is instead of marveling at how much easier things are now compared to the 'bad old days' of 2015.

> Hell, I can't even get a normal Digital Ocean server instance to programmatically drop and recreate doing minimal things like creating a user and setting SSH keys. I suppose if I wanted to learn something heavyweight like Chef or Puppet I could, but for now, I just maintain a list of commands to run, and understand that Digital Ocean servers can't quite be treated like livestock just yet, though I can fake it pretty well. Eventually Digital Ocean will get around to making the platform changes, which are all already in feature request.

It sounds like you might benefit from Cloud-config for droplets: https://www.digitalocean.com/community/tutorials/an-introduc...

Or have you tried that already? It's also available through their API, BTW.

> And so it is with PaaSes. The marker was passed years ago. We don't need to go on these spiritual quests any more.

The CloudFoundry installation instructions alone makes me run away screaming. Same with OpenStack.

There's no "spiritual quest" involved, but a lack of willingness to add unnecessary complexity. Of course it looks different when you've already eaten the cost of getting it up and running and/or has readily available in-house expertise.

> The CloudFoundry installation instructions alone makes me run away screaming.

BOSH is getting some much-needed love right now for this reason. Incidentally, for long-running stateful services, BOSH is the right tool for the job. We use it for Cloud Foundry, MySQL and a bunch of others I don't recall right now.

Anyhow. If you just want to see quickly if CF fits your needs, the place to go is Lattice[1], explicitly designed to be installable on a laptop.

In the meantime, you can let Pivotal host you on Pivotal Web Services[2].

That plucky startup from Armonk, IBM, are the second-largest donor of engineering effort to Cloud Foundry. Their public installation is BlueMix[3].

Other hosted installations of Cloud Foundry are CenturyLink AppFog[4], anynines[5] and Swisscom Application Cloud[6].

When you decide you want something in-house, you can buy Pivotal Cloud Foundry with all the support bells and whistles. Or you can just use the standard cf-release repo and deploy it to you own OpenStack or vSphere environment. Or on AWS. Or on Azure. Or some mixture of the above.

[1] http://lattice.cf

[2] https://run.pivotal.io

[3] https://console.ng.bluemix.net/

[4] https://www.ctl.io/appfog/

[5] http://www.anynines.com/

[6] https://www.swisscom.ch/en/business/enterprise/offer/cloud-d...

Edit: I'd be interested in hearing why people object to this comment.

> Anyhow. If you just want to see quickly if CF fits your needs, the place to go is Lattice[1], explicitly designed to be installable on a laptop.

The issue is not whether or not CF can meet my needs, but that it is crazily complex. That it is complex enough that there's a need for a separat setup "explicily designed to be installable on a laptop" is to me an admission that CF is flawed from the outset. I don't care if it's' installable on a laptop, incidentally, I do all my development and testing and experimentations on setups that matches (in architecture anyway) the environments I'll deploy to. Exactly because I refuse to follow a setup built to be easy to get up on a dev machine only to find out that the full production deployment ends up being totally different (a lot of software fails here).

> In the meantime, you can let Pivotal host you on Pivotal Web Services[2].

No, I can't. It's not remotely cost effective for the types of things I do. E.g. for one client I currently part-time manage a private cloud environment with abou 150 containers. Firstly, none of Pivotals options offer enough CPU or memory for us to be able to run it on 150 of the 2GB container options (it'd be more like 500+ at that size), but even if 150 was enough, the estimated cost for that would be substantially more than what it costs to lease racks + power + bandwidth + leasing costs on all the hardware + charging the cost of hardware upgrades/maintainenance and the cost of server admin. With more realistic container counts, it'd bankrupt the company in question.

> and deploy it to you own OpenStack or vSphere environment. Or on AWS. Or on Azure. Or some mixture of the above.

The complexity and/or cost of these options makes CF massive unattractive for me. I'm just in the process of cutting a clients hosting costs by about 80% by ditching AWS for a setup on managed servers, for example, and the full deployment of all the infrastructure we need is substantially simpler than just deploying a "bare" OpenStack deployment, and certainly nothing like putting CF on top. They can afford an awful lot of my time to help implement additional functionality for what they save by that move.

"That it is complex enough that there's a need for a separat setup "explicily designed to be installable on a laptop" is to me an admission that CF is flawed from the outset."

I don't get this. The setup for a Chef or Puppet managed 1000 node setup is quite different than it is for your laptop. Is it somehow fundamentally flawed?

"Exactly because I refuse to follow a setup built to be easy to get up on a dev machine only to find out that the full production deployment ends up being totally different (a lot of software fails here)."

I agree. That's actually not the case in CF / Lattice land, the install is actually bit-identical, Lattice just removes stuff you may not need but a big organization might need (multi tenant security, access/identity management, infrastructure orchestration, etc.) Lattice is pretty dead simple to get up with Terraform or Vagrant. The actual pieces aren't particularly complicated relative to other container schedulers like K8S or Mesos+Marathon.

"I'm just in the process of cutting a clients hosting costs by about 80% by ditching AWS for a setup on managed servers, for example, and the full deployment of all the infrastructure we need is"

Right, I don't think a platform cloud like CF or even Kubernetes is appropriate to your client base's needs. My issue was with the concept that something like CF is "fundamentally flawed" because it doesn't meet your niche use case. It fills a different niche that presumes an infrastructure cloud to begin with. Most companies are moving OFF of managed services onto clouds, so it's a reasonable assumption. But there will always be a market to cost cut with a dedicated setup, where you play, and that's great. The market's big enough for everyone to get what they want.

> I'm just in the process of cutting a clients hosting costs by about 80% by ditching AWS for a setup on managed servers...

And what are the additional long-term development/sysadmin costs of having to manage those servers and infrastructure?

And how did the client factor in the risk of being tied to you and your knowledge of this particular managed server setup?

And how did the client factor in the risk of being tied to a particular vendor's managed servers?

It may very well have still been the right decision for the client to move from cloud to managed servers.

However, my point is that if you solely base the cloud vs. managed decision on hosting costs, you are ignoring other real long-term costs and risks that are not as readily apparent, but can still have a significant impact in terms of upgrades, maintainability, downtime, availability, changes in personnel, etc.

Every time I've moved people off AWS, there's been an immediate and substantial and ongoing drop in development and sysadmin costs as we have far more flexibility in setting up an environment that actually fits that customer and can make guarantees about the system that makes for a less complex system overall.

As for risk of beng "tied to me", that risk is far greater with AWS - the number of "moving wheels" is vastly greater. If I was thinking primarily of staying busy, I can easily charge far more for my AWS work than for the more "mundane" setups, so from tht point of view it'd be in my interest to stick everything on AWS. The market rate and availability for "regular" ops guys that can handle these setups is far better - at least here in London.

In terms of being tied to a particular vendors managed services: They're not. Getting out of lockin is part of the appeal of getting off AWS once they first make that decision (and to be clear, if they insist on deploying on AWS, I do that too - ultimately it needs to be their decision). The setups I do for clients typically only require a method for bootstrapping a server to a pre-agreed basic Linux setup (usually CoreOS these days) and bringing up ssh with a key. Everything beyond that is generally easy to make provider independent. We do this exactly so that they can mix and match, sometimes at the same time (e.g. I have one client that is currently running systems at AWS, Google and a managed provider at the same time, deployed off the same little setup).

The do get dependencies on certain performance characteristics etc., but typically they can easily be met at dozens of providers.

> However, my point is that if you solely base the cloud vs. managed decision on hosting costs, you are ignoring other real long-term costs and risks that are not as readily apparent, but can still have a significant impact in terms of upgrades, maintainability, downtime, availability, changes in personnel, etc.

I agree you have to consider these too, but generally they tilt the numbers further in favour of managed servers or co-located servers.

If we were comparing with "old style" manually deploying applications straight to bare metal, I'd agree. But the more realistic comparison is to deploy to containers or VMs running on a thin OS layer on bare metal or a hypervisor. The point is not to get rid of the cloud abstractions, but to get the flexibility of being able to pierce the clouds so to speak and make decisions about the lower layers to keep cost and complexity under control.

The main reason I see for AWS growth is that in my experience most of the tech guys pushing for it have no visibility to costs and budgets. Even lots of middle management don't. And upper management rarely understands the technology tradeoffs.

I've been in the bizarre situations in the past of having people ask me what I could possibly need salary statistics for my development team for, because it didn't cross anyones minds to actually take out cost data to determine how much money we were actually allocating to ops vs. request features vs. bug fixing, or to determine metrics for e.g. when it makes sense to simply buy more server capacity vs. spending developer time to optimize.

The biggest problem with this is that people look at "cloud" as "I get to delete this capex line" without understanding the new opex that comes with it, and without understanding how little of the ops time a well run team actually spends on the small bits of low level hardware stuff that is unique to managing your own hardware vs. renting vms.

Most people simply don't understand the cost of the technology choices they are making.

Being able to close the gap for people there is a very valuable service (and here's career advice to anyone making the leap into technology management: Be that guy that can actually price out the technology solutions your team proposes and that can digest and make the technology costs understandable for the business guys - it makes you surprisingly rare)

There are certainly people for whom AWS is the right choice. E.g. if you can make extensive use of spot pricing and bring up huge number of instances for short periods, then it can be hard to beat. But most people just don't have thos usage patterns.

We're just not going to agree on this. I hear you saying "I can roll my own for less", and in my head I'm seeing all the snowflake PaaSes I've seen in the past few years while working for Labs.

It looks good now.

Then you move on and someone else is stranded with a completely custom environment that only does the things you thought of at first.

Cloud Foundry isn't complex for a developer. That's the point. Deployment involves typing `cf push` and waiting a minute or two.

But the essential complexity of managing very large herds of heterogenous applications and services doesn't vanish simply because you aren't using them right now. It's always latent. We and other companies working on Cloud Foundry have already fixed problems you have never thought of.

Who said anything about rolling my own? I do nothing of the sort. Most setups don't need one. Most needs a tiny selection of functionality that are easily served by selecting from an array of well tested, well understood, components that needs little enough glue that a single person can get the grasp of it a day or two. We know this, because we've gone through the exercise of bringing in people from the outside to walk through more than one system to validate the setups for various customers.

I understand your concern about maintainability, and that's exactly where I come at it from. I had developers on my teams in the past that would swear at me (one tried to have me fired) because I made technology choices they found boring and un-sexy because I actually paid attention to our long term costs, such as hiring replacements and ease of even finding replacements. I care deeply about that. And what I'm consistently seeing is that this definitively does not speak in favour of pulling in large, complicated systems like OpenStack or CloudFoundry unless you're working on very large, complicated systems that actually needs all the functionality they offer and where you can justify a team to learn how it hangs together.

Typically my requirement from the outset is that a mid level devops guy should be able to take over with minimal training for the simple reason that I have much more lucrative work to do than ongoing maintenance, so I'm 100% dependent on setting up systems that can be quickly and effectively managed by someone much cheaper than myself and that is part of the appeal.

As for not being complex "for a developer". I disagree. Automating a build pipeline is trivial compared to getting the application architecture right, and that is the area where dev teams often fall down, and I see all to often that they fall down because they don't realise what it is they've been signed up for and assume everything will be taken care off by magical ops fairies because they have way too little visibility into what is actually happening when they deploy something.

> But the essential complexity of managing very large herds of heterogenous applications and services

Most companies never end up managing "large herds of heterogenous applications and services".

And this may be the fundamental disconnect.

If you are dealing with "large herds", then by all means pick a comprehensive platform, as presumably you have the resources to manage it properly too. But the type of businesses who do are relatively speaking few and far between. I'm sure you see lots of them because of what you do, and I'm not saying there aren't good use cases for Cloud Foundry and similar for certain types of businesses.

What I am saying is that most companies don't ever even get to the size where they can afford that complexity, much less to the size here they need it.

It reads as an extended advertisement for you company's services, and your comment doesn't really respond to the statement of the person you replied to. He said CloudFoundry's installation instructions were bad, then you replied as if he said he was unsure if CF "fit his needs" and was looking for more ways to try it out, which is a pretty salesy response. I think mostly it's a question of tone and length. If you had a couple sentences acknowledging that the installation instructions sucked, and saying he should go with BOSH, you probably wouldn't have gotten downvoted.

Thanks. I was too emotionally invested.

I actually work for Pivotal Labs, not Pivotal Cloud Foundry (though as of this week I'm on secondment to a PCF-related team).

As a consulting engineer I get to see a lot of projects in a lot of companies. My eagerness for PaaSes comes from seeing a variety of approaches. Just using a PaaS makes large, expensive, disruptive discussions simply disappear.

Those who go through my history will note that I take pains to mention other PaaSes, usually Heroku and OpenShift. When I talk about hosted Cloud Foundry I usually namecheck ours and IBM's.

I am working on Convox, an open source "PaaS" that installs in your AWS account in minutes.

There is no additional layer of complexity. It configures AWS coherently with a few cloud formation templates.

I'd love to hear your opinion on this type of installer.


It's nice if you are ok with being tied in to AWS, but I generally recommend my clients to stay away from AWS because of the massive cost premium. The cases where they've ignored that, I've generally been able to bill thousands to move them off AWS later when they've realised how much they're spending.

I certainly would never choose to build on top of another layer that's AWS dependent - if I'm first going to layer something on top of AWS, I'd use it as an opportunity reduce AWS dependency for the near inevitable moment when I'm asked how to move somewhere cheaper.

On what planet is it worth paying $20,000 per month for a vm??

The author is correct. We should be running stateful containers, but until the the software is ready the default advice quite rightly should be not to unless you understand the implications.

For example we run ceph and use it in kubernetes. We obviously only run replica sets (the kubernetes feature, for those uninitiated) of 1 for these services. There's also some rough edges with locking for dead nodes, but it's definitely good enough for running the likes of Jira or Gitlab have single points of failure pretty much whatever way you scale them (unless you build a HA nfs server first).

Before that, we were running test clusters with just fleet, using custom bash and etcd for service discovery. Works fine, but you're probably better off with frameworks.

Now where things can go horribly wrong is you decide to use a clustered database, run it in kubernetes, thinking it's fine since you can't mount external storage for replica's but you have 3 nodes over 3 zones and you have backups. Kubernetes will quite happily schedule all of your replica's on the same host, meaning a single reboot and you're dead. There are hacks around this such as defining a hostPort, but of course if someone updates the RC you could happily be restoring to backup. You could run two multiple replica controllers to get around this and that'll allow you to mount storage again but that takes a way a lot of the elegance of the framework.

Point is, you can do it, but you'll need a bunch of uglyness and you'll need to be careful. If that doesn't sound great, way until the software properly supports it. :)

There are still a whole lots of pieces of the puzzle missing.

I think it's all about creating ready to use containers which are running on different hosts which can replicate their data between themselves and handle failovers (they need to connect to etcd or zookeeper or consul for that).

For PostgreSQL it seem to have already solved (?) 2 times by different people:

https://github.com/zalando/patroni https://flynn.io/

But we don't only need it for PostgreSQL.

You need ready to use data-replicating Docker images for storing/indexing log files.

And for statistics.

Probably ready to use containers for replicated/clustered Redis.

We need something for a replicated S3-like service for storing static files.

Which can be used for different things like: - let's say you are running a website with Wordpress it can be configured to put it's files in a 'CDN'. So now whenever you you deploy Wordpress you point reverse proxy with caching (like Varnish) at your S3-like service.

A S3-like service could also be used to hold your own Docker images. You could put a Docker registry in front of it to push and pull to.

We need something, maybe also S3-type service, for storing the files of your local git repositories (which holds the sources of your Docker-images).

As far as I know in the ecosystem there is still a scheduler missing which can deploy these containers on the right hosts.

And there is also no standard API for starting new containers or whole machines when the other containers (or some monitoring tool) in a cluster noticed one is missing.

These things take time. Lots of time. :-(

Flynn co-founder here. We plan to expand to support many more popular open source databases next year (and expose a framework for others to do the same). The goal is that a supported database should "just work" on the platform and be sanely configured out of the box with high availability, automatic failover/recovery, encrypted streaming backups, app credential provisioning, etc.

Our Postgres appliance is unique in that it is explicitly designed to not lose data in the face of failure. Simply wrapping up a database in a container with leader election and replication is not enough, as there are many pitfalls that can cause data to be lost during failures. Of course we are limited by the guarantees the database can provide, so some datastore appliances we build may have clearly defined caveats.

Storing binary blobs is another tricky matter, and we're exploring what we can do with little to no configuration. We currently store app images and git repos in Postgres, which works but will only scale so far.

We are also improving our scheduler so that it is capable of providing the constraints necessary to place stateful services properly.

I think your ideas are sound, but last time I tried the manual install I couldn't get it running. So that was disappointing. That was many months ago, a lot might be changed now. I'll need to find time to have an other look.

Anyway keep up the good work.

Can anyone explain why Docker image handling is so terrible? For one, building an image creates lots of unnecessary overlays (resulting in the absurd need for "garbage collection"), rather than just compacting everything into one file? But also the whole registry thing. Why can't I just point Docker at an S3 bucket? Sure, you can run a private registry, but that involves (last I checked) running a daemon, Redis, and using "docker login" to get credentials set up... instead of, you know, a file path (like an NFS volume), an HTTP URL, or an S3 bucket. The registry seems like a nice thing for publishing public images, but it has nothing to do with private images.

1. The idea behind the overlays is to "cache" and reuse as much as you can between images. If you have a long list of packages you install before you copy your app source to the container, you don't have to install the packages again and again every time you do a change to the source and rebuild.

2. You can point Docker to a file - use 'docker save' and 'docker load'.

Take a look at the new DaemonSet abstraction Kubernetes. Great for running things like HDFS and other distributed data workloads/frameworks: https://github.com/kubernetes/kubernetes/blob/release-1.1/do...

Looks like Kubernetes is improving. Hadn't seen this yet.

Sometimes it gets hard to follow everything in the Docker ecosystem.

Thanks, I wasn't aware of that.

Disclaimer: I work at Google on Kubernetes

Yep - and the ability to run just one per node is coming "soon" (ideally by next milestone). It's something we wanted to get in, but just didn't have the time by 1.1.

My #1 recommendation is to do whatever you do today. If you mount an NFS mount into your VM to store your MySQL data, that's what you should do again. The speed at restarting a pod is very low - but you do need to do the (small) hack today if you want to be absolutely sure you don't have multiple pods with on the same node.

To be clear, the ugly-ness is one line in your config file:

nodePort: <some arbitrary number >1024 & <65536>

Does kubernetes not have a mechanism for defining host/rack diversity constraints?

The scheduler is just best effort at the moment there's some ways around it.

You could indeed create seperate replication-controllers against labels, such as your aws-zone, and run them seperatly. This flies in the face of having replication-controllers and sucks in my opinion.

The other option is a single replication-controller and have your containers use a resouce that can't be reused, thus ensuring you're scheduled elsewhere. This is hacky.

I suspect it'll be another year before kubernetes is all singing and dancing, but I do like using it.

I know that OpenShift 3, built on kubernetes, supports both affinity and anti-affinity policies (https://access.redhat.com/documentation/en/openshift-enterpr...). I don't know if these are part of kubernetes or part of the extensions custom to OpenShift.

All of those scheduling policies are also available in native Kubernetes. The Kubernetes version of that documentation is here: https://github.com/kubernetes/kubernetes/blob/master/docs/de...

I'm not sure of the current status, but I believe the mechanism would be to label nodes with rack information. You'd then be able to specify anti-affinity constraints based on those labels that would be taken into account by the scheduler.

Edit: browsing issues briefly, I don't think this functionality is available just yet. Discussion of node selectors: https://github.com/kubernetes/kubernetes/issues/341#issuecom... -

Why not run good old classic bare-metal servers (or in VMs, if you like to decouple hardware from software) with the services being configured either by hand (using good documentation, something that is a lost art these days) or with a system like puppet?

The article, unfortunately, stinks of overengineering and overcomplexity for setting up a SIMPLE FUCKING SERVER ENVIRONMENT.

Or just stick to plain old Apache/Lighttpd with PHP and a standard MySQL/pgSQL database?

Don't reinvent the wheel just because it's "cool".

edit: also, I don't get why anyone would spend literally weeks reading docs, learning DSLs etc. just to get a deployment done. My personal maximum time for getting a "hello world" is two hours, if it's radically new a day. If the docs of the project are insufficient (or the quality), DO NOT RELEASE IT. Don't let your users do your work for you.

So I fought with this over the last two years, and went from running Postgres in a Docker container, to Amazon RDS after getting frustrated with maintaining Docker volumes, and then now, back to using Docker volumes via Kubernetes' volume attachment.

I think Kubernetes has done a great job at tackling this, at least as a first pass. Right now, I can attach an EBS volume as my Postgres data store, and not worry about where the container is running, since Kubernetes handles mounting the volume. Presumably, I can run an NFS server and have it use that instead of an EBS volume.

Now, I can run backups and slaves however I like. It's not as easy as RDS, but I have more control now, and it's marginally cheaper.

Anyway, I see where the author is coming from, and we're not totally there yet, but the problem is being solved.

> Back to my point (I think I have one).

Customers rarely understand the feature set and deliverables they need for a given use case when there is a major shift in the way the system delivers that use case. I've been working with a very large company's operation team on containerized PoC of a specific developer's team use case which builds and tests their SaaS software stack. The ops team can't wrap their heads around how the software needs to be changed to enable their move to a containerized solution. They just thought they could "containerize" it for the developers and move on. Guess we should have been talking to the devs instead.

One point here about all this is that operations teams and developer teams must work closely together on objectives but their end goals are in direct conflict when it comes to providing infrastructure for software. Ops doesn't want the infrastructure to change much because it becomes difficult to scale and prevents reliable, repeatable root cause analysis when things break. Nobody wants to wake up in the night and troubleshoot stuff one doesn't understand when there are bears at the door.

Developers want the infrastructure to be flexible and support doing crazy shit with their software so they can satisfy customer's demands with use cases and requirements, and do it faster than the competitor does. The desire for sales and growth drives the need for it to remain extraordinarily reliable and scalable.

Immutability of the infrastructure provides a means by which both devs and ops folks can come together and achieve common goals, while keeping their objectives meet with moving fast (devs) and keeping things reliable (ops).

Wanting stateful containers as a feature is simply a misunderstanding of what is needed from the infrastructure based on the developer's standpoint of needing "reliability" from both ops and devs. Reliability of the underlying infrastructure is brought about by making the container's deployments immutable. Reliability of the software is brought about by keeping state for a given configuration once it has been proven to work through many iterations of a given use-case.

How much of this could be avoided if the application didn't use mongo? Needing to run a three-nice cluster out of the gate seems like a big part of the problem. Sure, you want backups and redundancy for any database, but there are situations where a MySQL or pg slave that can be switched on makes more sense financially, especially if load doesn't require a three-node cluster.

That was my first thought, too. If the author didn't insist on using mongo, he could have opted for an inexpensive RDS setup.

=== Shameful Plug ===

If you're interested in running MySQL inside Kubernetes, check out Vitess:


We still have work to do, but we're not stopping at the easy part. We show you how to do replication, sharding, and even live re-sharding of MySQL inside Kubernetes. We're working on integrating with Outbrain's Orchestrator for automated failover, and our VTGate query routing service means those failovers will be transparent to your app.

We are admittedly running into the same limits in Kubernetes around using replication controllers for datastores. But Kubernetes is improving very quickly, and there's a reason for that. They have a cheatsheet of one way all of this can come together successfully in the form of Borg:


At YouTube, we run our main MySQL databases (the ones with tables for users, videos, views, etc) inside containers with local storage. Of course, Borg has a much more mature scheduler, which gives stronger safety guarantees for replicas. My point is that we've proven this approach can work at scale for datastores in a container cluster, and through Vitess we're trying to bring the same capabilities to Kubernetes.

I believe part of what the author is looking for is currently being pushed by the Deis guys with Helm[1]. Think of it as a package manager for creating Kubernetes configurations. I think we'll see some stable and reusable templates for both stateful and stateless services there.

[1] https://helm.sh/

Disclaimer: I work at Google on Kubernetes

We _LOVE_ what Helm is doing, and are actively helping. Please dive in! :)

Not sure where those estimated costs for running MongoDB on AWS came from. It jumps from a single t2.micro instance (which you can get for free) straight to 3x m3.2xlarge instances at $1500/month. That's a pretty big jump. There are at least 6 instance types between the two. Like 3x m3.medium instances with 500GB gp2 EBS volumes would cost $300/month. That's the on-demand pricing, you could probably save some more with reservations given the stateful nature of MongoDB.

The chart comes from this AWS-provided document, actually: https://s3.amazonaws.com/quickstart-reference/mongodb/latest...

Not sure why the author complaining about VM & cloud ?

You can get real machines through APIs not from Amazon, but you can from other providers like: Rackspace and IBM/Softlayer (and others). He even links to Bryan Cantrill, so pretty sure Joyent can deliver containers (even Docker ?) on baremetal if you want them.

Right now VM's are the better choice for Statefull use cases like running a a conventional databases that expects a real filesystem. In a few years I think container and container schedulers will get good at doing persistent volumes.

In the meantime I think a less known but great solution for reliabily creating VM' is OpenSoure BOSH http://bosh.io/ is an excellent tool that will allow you to deploy VM's on most major IaaS including AWS, Azure, OpenStack, vSphere and others.

BOSH is a hard to learn and it has a different philosophy than typical configuration management tools such as chef/puppet/ansible ... etc but it is totally worth it once you have it you have an amazing power tool at your disposal.

There are a lot of bosh releases for popular tools on github.com for example here is one for mongoDB https://github.com/Altoros/mongo-bosh

One of huge challenge in solving stateful containeres is being as agnostic to the orchestration tool Docker Swarm, Kubernetes, Mesosphere, etc, all of which have different opinions of how clusters of containers should be orchestrated while also accounting for variety in what hosting a cluster of stateful container(s) means to the user.

I work at ClusterHQ, our team believes the tools we are building like Flocker are going to get the community there.

it's pluggable to both orchestration tools and has a model for creating backend plugins.

Storage backend provider plugins that work with Flocker. http://doc-dev.clusterhq.com/config/configuring-nodes-storag...

If you're using AWS you can use a pre-task (i.e fleet unit file) to run something like https://github.com/leg100/docker-ebs-attach to attach an volume before running your container. You can also do this with Flocker(https://docs.clusterhq.com/en/1.7.2/config/aws-configuration...) if you want something fancier.

Still rather have AWS manage the data. Unless you're a really biggie sized company RDS/Elasticache are really good ideas. Managing data and databases are headaches I'll gladly outsource.

I'm curious Would you be ok with being locked in with X provider's storage specific solution? (say a ECS only way of doing things) Is the headache in the setup, config, or risk of being at the helm of orchestrating your own data?

My experience is that you are going to be "locked in" in some way no matter what. Current infrastructure systems are a mess of vendor specific solutions and configurations. Migrating from one open source system to another is going to be just as hard as migrating from a proprietary thing like vanilla ECS to say kubernetes on bare metal.

There are other concerns like vendor pricing and stuff, but I have not had bad luck with that.

Disclaimer: I work at Google on Kubernetes.

I should mention Kubernetes is 100% open source, runs on AWS, Google Cloud, Azure, Digital Ocean, Vagrant, bare metal, VMWare, Rackspace and lots more I'm probably forgetting. Then you can pick the cloud you like, and lock-in be gone.

The author seems to be mixing some valid observations regarding the difficulty of stateful containers with some paranoia about the growth of cloud deployments and ownership of data. Or maybe it's not paranoia. I don't know. But it's different.

First off, containers don't absolutely have to be stateless. The first and foremost benefit of containers is dependency isolation and configuration management. Once you use them for any length of time this becomes clear. You make them, put them on a machine with a compatible kernel and network access to the right stuff and they just run.

It's a pretty short leap from containers that just run to the idea of container orchestration systems like kubernetes. We just deployed a new staging environment built on Google Container Engine, an implementation of kubernetes, and it's pretty damn amazing what you can do at the services layer, and yes, even at the gateway and persistence layers. But you have to treat the needs of these layers differently.

Statelessness is important at the services layer because ultimately you want to scale up and down seamlessly and automatically, and kubernetes allows you to do just that.

In the persistence layer it's the opposite: state is all that's important and scaling is a more complicated affair. That doesn't mean containers aren't useful in that layer. They still provide the above-mentioned benefits. The aforementioned staging environment uses elasticsearch running in a dedicated kubernetes cluster, where each pod is bolted to a persistent disk at cluster creation. It also uses a mongo replicaset that is just deployed on instances in the old manner, but we have a prototype containerized install and will be moving toward that. Lastly it uses mysql via Google's cloudsql managed offering.

So you have a lot of differences in the persistence layer, and a lot of choices for how to manage those differences. Things are a lot simpler and cleaner at the services layer, but that doesn't mean the benefits of containers in one layer are somehow less of a win than in the other. After three years of using and deploying them my feeling is it's pretty much all win.

Disclaimer: I work at Google on Kubernetes.

This is a really good point - I've said it before and I'll say it again - containers neither add to (nor subtract) from whatever you're doing today. If you have a single VM and no shared storage, you're exactly as vulnerable as if you were doing things in a container. And, in the majority of cases, the exact same techniques you'd use in a VM work in a container or Kubernetes too.

While I agree with the author of this article from the perspective of the engineer who likes to tinker with things and roll my own stuff I must disagree from the business perspective.

From the business perspective a PaaS solution costs one dollar amount and a custom, hand rolled, stateful solution built and maintained by engineers costs another dollar amount.

With offerings from AWS the former will almost always beat the later. It's not until you reach Facebook or Google level scale that the monthly savings from the latter can outweigh the benefits of just using an AWS solution like DynamoDB or ECS, etc.

I agree about this right now.

But I think enough open source code will be made to fill this space.

It's just a matter of time.

The complexity of this setup screams 1 major thing to me: security issues

Seeing as we are on this topic, I would like to pose 1 simple and theoretical question for all those who only need 1 decently-sized 4GB server to get their small projects running:


What type of setup can be used to get a simple Rails/Sinatra/Flask/Django webapp + Postgres-DB on a single 4GB server that has to be maintained by a single individual where time and complexity are highly-valued commodities?

The least-complex setup will be preferred, as the hundreds of 1-man side-projects will not be able to maintain their 43 container-clusters using x-software on top of y-software that is managed by z-software.


A good answer here will probably help hundreds of individuals here avoid the situation of "I should probably containerize my app because everyone else does it" scenarios.

Disclaimer: I work at Google on Kubernetes.

I put that disclaimer at the top of all my posts, but this one truly is HIGHLY biased.

The absolute easiest way to do what you describe is to use GKE (Google's hosted Kubernetes). For $0.15/hr, we'll manage everything for you, and you can build out teeny tiny clusters that do everything you need. It's even free for clusters <5 nodes. Start it up, use a sample app from this directory (https://github.com/kubernetes/kubernetes/tree/master/example...) and you're done.

Is that really the easiest? I couldn't see a sample app there that matched the description.

One other issue is that's also over 100 dollars per month. I can rent 32GB SSD backed xeon E3 servers for half that. Or a stack of 10 4G VPS.

That sounds like a job for Ansible:


Have you looked at gocircuit.org and the accompanying language for connecting templates Escher.io? They are still not production ready, but aiming to solve your problem in a general way. The circuit simply says you should be able to write the logics that build out your software as programs against a simple live Cluster API, provided by the circuit. Escher helps mix and math such functional logics. But the bottom line is this. Every framework is a language. Adding frameworks adds complexity. This is why circuit reuses the go language for its concurrency and abstracts your cluster into a programmable dynamic data structure

Much of the distributed system management tooling seems to be in its infancy.

For exapmle, if I have one host, I can manage dependencies by adding them to my debian/control files, and apt-get/dpkg will figure them out for me. If the services are installed on separate hosts, I'm on my own. I haven't found a proper solution for managing service dependencies distributed on several hosts. (Compare https://news.ycombinator.com/item?id=10487126).

So when even the most basic managment tasks are solved for distributed systems, why does it surprise anybody that more advanced state management is still in the "you're on your own" stage?

I'm sorry, but I can't help my cynicism here.

https://crate.io Co-Founder here. We're participating in this game by working hard to build a fully distributed, shared-nothing SQL database.

In our vision, a app in a container is able to access the persistence layer like in SQLite - just import it. another cluster of containers - preferably having one instance node-local - takes care of the database needs.

The database is distributed and makes sure enough replicas exist on different nodes. it's easy to scale up and down, local ressources are being utilized whenever possible (no NAS/SAN like storage).

OpenNode founder here. This article nicely brings out the reasons why we started to develop NodeFabric prototype design - mixing docker app containers with highly-available stateful backends. Homogeneous stateful prebuilt micro-clusters versus large-scale docker orchestration with configuration management. Agressively co-located and highly-available by design. Simplicity. http://nodefabric.readthedocs.org/en/latest/

At the end of the day you want to be robust to failures right? Be able to quickly restore from a database in a hardware failure?

If so, why not just boot that way every time? It'll keep your backup system well exercised and it means fewer code paths since you don't have a separate hot boot.

We run a large stateful service with hundreds of images based on whaleware and a thin layer of orchestration on top, and it works. We just dont expect everything to happen magically through an orchestration tool.

Looks like MongoDB is the problem here.

MongoDB is the hub of this problem.

Why not just run lxc and lxd?

That is exactly what we do at work with stateful services. For now.

We run a sharded+replicated mongo cluster and a redundant postgresql array in docker containers. We had to write our own orchestrator on top of Etcd to get the functionality we wanted. Like the article author I feel there's some cultural divide happening. How is it possible that the 100.000 line Java project of Kubernetes lacks even basic resource management that our orchestration tool that's written in a few hundred lines of bash and Ruby does have?

* Kubernetes is not written in Java

* There is no one size fit all, just like never ending Ruby vs JavaScript VS Java.

* Kubernetes just hit v1.0 not long ago.

* Seem like you already wrote the functionality you want, now what is the problem?

* Alright, sorry Kubernetes is in Go I must've gotten it confused with some other project.

* Well, alright, but they market it as a general purpose project. I'm just saying that to us it's strange that none of these frameworks out there deal with persistent storage, as the author of the article also observed.

* Ok.

* Well the problem is that it'd be much nicer if we could just use Kubernetes. Obviously having a homegrown orchestrator is not very nice. There's bounds to be lots of edgecases that our ops will run into that'll continuously cause us to perform maintenance on it, and maintaining an orchestration framework is not our core business.

Anyway, this is not necessarily a critique of Kubernetes, it's great software. It's just an affirmation of the point the article is making, that it's curious that there's no interest in stateful containers from the maintainers of these frameworks.

Disclaimer: I work at Google on Kubernetes.

We do care enormously about stateful solutions; I'll say what I've said elsewhere - how do you handle stateful services in your VMs?

We don't run anything in VMs at the moment. I guess if we would we'd do it like we're doing with the containers, mount the drives into them. Do you guys run the search engine in VMs?

Kubernetes is not a Java project.

Sorry got confused, Mesos is C++ and Java project I think. Probably got it swapped with that one.

I'm curious what was the responsibilities of your own orchestrater? what are examples of situations it had to handle?

Well basically the same things most frameworks do. Encode relations between containers, and allow use to spin containers up on machines, and automatically establishing the links between them. In addition to that, to encode persistent storage resources and link them to tasks. For example machine-1 has 1 HDD of 2TB and 3 SSD's of 250GB. On SSD1 the task postgres-1 has a data resource allocated called abc123. Now when we restart the machine, we can restart postgres-1 task and point it at its data resource and everything will work again.

Note that the orchestrators do sort of give solutions to this issue. They might for example recommend doing the storage inside the docker container, and have the data disappear whenever the container disappears. This might be a little less reliable or transparent, but since you have a redundant cluster you can always restore the data from some other container. Our data is a little bit too big for restoration processes to happen during normal operations. And we're a bit too dependent on the speed to have no control over on which disk a resource is stored.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact