Docker will likely be more prevalent in a few years with startups who have built their infrastructure form the ground up.
That's great, if that's what you need. But most people aren't building a service like that. HN, I believe, runs on one machine, with a second for failover purposes. And HN still has many, many more users than typical company-internal services, community services, or at the extreme end personal services.
When you aren't operating at absurd scale, "Google-style" infrastructure doesn't do you any favors. But the industry sure wants to convince us that scalability is the most important property of infrastructure, because then they can sell us complicated tech we don't need and support contracts to help us use it.
(Disclosure: I'm the lead developer of https://sandstorm.io, which is explicitly designed for small-scale.)
And lets not forget: replace any and all efforts at code optimization with "just throw another rack of blades at it".
The problem with Docker is not that it doesn't solve (or attempt to solve) widespread problems. At its best, Docker gives you dev/production parity, and dependency isolation which is useful even for solo developers working part-time. The problem is that it's not a well-defined problem that can be solved by thinking really hard and coming up with an elegant model—like, for example, version control—it's messy and the effort to make it work isn't worth it most of the time right now.
That's no reason to write off Docker though. Pushing files to manually configured servers or VPSes is messy and leads to all kinds of long-term pain. You can add Chef / Puppet, but it turns into its own hairy mess. There's no easy solution, but from where I stand, the abstraction that Docker/LXC provide is one that has the most unfulfilled promise in front of it.
I get that when I use the same OS and built-in package manager?
I would virtualize the environment using something like VirtualBox for my dev and EC2/DigitalOcean/etc on prod.
> and dependency isolation
If you're going to scale something, you're going to split everything out on different virtualized servers anyway, so you'll get your isolation that way.
Basically, current mainstream practice is to virtualize on the OS level, where as Docker is pushing to have things virtualized on the process level.
I personally don't see the advantage ... just more complexity in your stack. I never have to mess with the current virtualization structure, I don't even see it. It looks just like a "server", even though it's not. Isn't that better?
But I agree, just use VirtualBox. I know Idea already supports deploying to VMs and you they just look like another machine, so no learning curve. All the benefits with none of the hassle.
I don't use Docker, but those are problems I can think of off the top of my head.
(And, later, if you want to play with Docker, Packer lets you do that too. But you should use the Racker DSL in any case, because life is too short to deal with Packer's weird JSON by hand.)
Terraform, on the other hand, I think is a huge, huge mess, and I don't think they're going to fix it. I wrote a Ruby DSL for it the last time I tried to use it in anger, only to encounter that Terraform didn't honor its own promises around the config language it insisted on instead of YAML or a full-featured DSL of its own. Current client uses it, and every point release adds new and exciting bugs and regressions in stuff that should be caught by the most trivial of QA. For AWS, I strongly recommend my friend Sean's Cfer as a better solution; CloudFormation's kind of gross, but Cfer helps.
 - https://github.com/seanedwards/cfer
> There's the issue of deploying changes fast without leaving files in an inconsistent state (you don't want half of some file to run). How about installing the required dependencies?
rpm / dpkg also install dependencies, are quite fast and well tested. They have the advantage of working in a standard environment which most sysadmins know but the disadvantage that you need to configure your apps to follow something like LSB (e.g. install to standard extension locations rather than overwriting system files, etc.).
The one issue everything has is handling replacement of a running service and that's not something which Docker itself solves – either way you need some higher level orchestration system, request routers, etc. Some of those systems assume Docker but that's not really the value for this issue.
Common misconception. You only need to do this if you're going to try to push the packages upstream. If they're for your own consumption, you can do what you like. Slap a bunch of files in /opt, and be done with it - let apt manage versions for you and be happy.
As with many things, this is one area where you've just got to know what to ignore. It's simpler than it looks.
/opt is defined in FHS for local system administrator use, so installing your company's packages there is actually the recommended way to avoid conflict with any LSB-compliant distribution as long as you use /opt/<appname> instead of installing directly into the top-level /opt:
How would Docker help with this? Genuinely curious.
I store them in bash scripts outside the repo that populate the relevant data into environment variables and execute the code. The code then references the environment variables.
> How about installing the required dependencies?
There are two kinds. On the OS level and on the platform level.
On the OS level, you can have a simple bash script. If you need something more complex, there are things like Chef/Puppet/etc.
On the platform level, you have NPM/Composer/PIP/etc which you can trigger with a simple cron script or with a git hook.
> There's the issue of deploying changes fast without leaving files in an inconsistent state
So the argument here is that you're replacing one file in one go vs possibly thousands? That in the latter scenario the user might hit code while it's in the process of being updated?
Ok. With docker, you would shut it down to update. You would have to.
Same goes for the traditional deployment? Shut it down, update, start it back up?
You can, of course, automate all of this with web hooks on Github/Bitbucket, for both docker and the traditional deployment.
The traditional deployment should also be faster, since it's an incremental compressed update being done through git.
Edit: forgot to mention, the file system mount means that they don't need to be in env var, which are fairly easy to dump if you have access to the box or are shipping containers around in plain text.
How AWS does updates is it first downloads the new code into a separate folder and then switches the link to point to the new folder instead.
But AWS has an unsatisfactory feeling because it downloads the entire code instead of doing a git update. These are all issues that could be fixed, and someone has to do them. I have no idea if Docker helps with any of them, but the opportunity is still there.
I still reckon that the main reason VMware ESX is as successful as it is comes down to the lack of isolation and sheer deployment hell that windows has been for years. The same can be said for python or ruby on a Linux machine for example. Docker removes some of that pain like ESX does.
The same can be said for ruby, python and any number of other language environments where you have multiple services that were written at different times with differing base targets. I've seen plenty of instances where updating a host server to a new runtime breaks some service that also runs on a given server.
With docker, you can run them all... given, you can do the same with virtualization, but that has a lot more overhead. It's about maximum utilization with minimal overhead... For many systems, you only need 2-3 servers for redundancy, but a lot can run on a single server (or very small cluster/set)
I have to agree on ansible, systemd and go... I haven't done much with go, but the single executible is a really nice artifact that's very portable... and ansible is just nice. I haven't had the chance to work with systemd, but it's at least interesting.
This is a solved problem in Python and Ruby. In Python, use virtual environments. In Ruby, use RVM. You won't have the issue of one tenant breaking another.
A runtime environment for a given service/application can vary a lot, and can break under the most unusual of circumstances. An upgrade of a server for one application can break another. Then you're stuck trying to rollback, and then spend days fixing the other service. With docker (or virtualization) you can segregate them from each other.
Also, RVM in production? Sledgehammer to crack a nut :-)
I see docker as a valid attempt to fix limitations of existing and broken package system (eg: apt) at a price that I am not yet willing to pay.
The opposite seems likely ... Docker will fade and become deprecated as building infrastructure from the ground up locally to feed into the cloud becomes cheaper and cheaper still. AWS is not always so cost-effective when you truly dig in and crunch the numbers.
My guess as to why Docker won't succeed widely in production is because it's a software-based solution trying to glue together slippery pieces that just don't want to be glued together. The core issue of security will never be solved by a Docker-like solution; that problem is best solved by integrated hardware.
This very issue is being addressed in ClearLinux: http://sched.co/3YD5
With regards to docker/lxc/container security, you're right. Some of the biggest players haven't solved the lxc/docker/container security issues yet; its a really hard problem to solve. Breaking out of container will always be easier than breaking out of deeper levels of virtualization (Xen/KVM).
I agree it's not easy to get right, but it doesn't seem necessary that containers will always be leaky. Solaris/Illumos Zones are an OS-level virtualization approach that's pretty airtight, for example.
When you have a local server, that supports KVM and Zones, you choose KVM as the cleaner abstraction. While surrounded by neat tech, Zones are actually a bit of a pain and not all that portable between systems IME. OTOH I can `zfs send/recv` over SSH, drop a short bit of JSON in, and have my KVM instance reliably moved to another SmartOS box 100% of the time, no worries.
So unless you're really worried about that last 5% or whatever of overhead, what's the point of Docker? It's not actually very portable at all it seems (on my Mac I'd have to run it inside VirtualBox). I don't have much experience with it, but my guess is that similar to Zones, you're at the mercy of the host system as far as common dependencies like OpenSSL or gcc go.
It seems like a solution to a problem I'm having trouble even imagining. A slightly lower overhead, less secure, less portable lightweight "VM" with slightly less overhead. I guess if you're a PaaS and you could increase margins by 5% overnight by switching to Docker that might make sense?
As someone who's set up Solaris 10, OpenBSD, FreeBSD, SmartOS, Debian, Redhat, Ubuntu, KVM, Xen, etc etc etc, I just have a real hard time figuring out Docker's value proposition. It seems like the Solaris world went from Zones to KVM, and some people are attempting to do just the opposite. Which I just can't think of a good excuse for.
I currently use it for MySQL DB restoration and remote bug-checking by having a handful of xtrabackup instances that I can quickly attach a docker to, hand an IP to a developer, and he can then debug the problem with production data _at that exact point in time._
When they're done, I simply throw that docker away.
It's a tool that (in my mind) doesn't solve any existing problems better than a lot of tools out there. It instead should be thought of like a better hammer for the same nail. Think of it like... would you rather have a giant set of wrenches, or a single ratchet with a set of sockets? They both accomplish the same thing, but both are better for certain jobs.
If you have a consistent level of traffic (i.e. you don't have inordinately wild upswings/downswings like e.g. Reddit), AWS isn't even remotely cost-effective. I was going to do the math to compare our current physical server infrastructure with AWS, and even if you factor in that physical servers need to be in pairs (for redundancy) and over-provisioned (for traffic spikes), I didn't even get as far as back-of-the-envelope math before it was obvious that AWS was completely infeasible.
Similarly, cloud offerings give you remote reach easily - one company I work at has it's production servers almost literally on the direct opposite point of the globe. You can do datacentres with remote hands, sure, but it's another layer of complexity. Hardware also has a mild barrier to entry in the form of cost - for small shops, doling out the five or six figures you need for initial hardware is a pretty sizable chunk.
(2) Build environments -- it's helpful to build distribution Linux binaries in older Linux versions like CentOS 6 so that they'll work on a wider range of production systems.
(3) Installing and running "big ball of mud" applications that want to drag in forty libraries, three different databases, memcached, and require a custom Apache configuration (and only Apache, thank you very much).
#3 is really the killer app.
This has led me to conclude that Docker is a stopgap anesthetic solution to a deeper source of pain: the Rube Goldberg Machine development anti-pattern.
More specifically, Docker is a far better solution than the abomination known as the "omnibus package," namely the gigantic RPM or DEB file that barfs thousands of libraries and other crap all over your system (that may conflict with what you have).
Well written software that minimizes dependencies and sprawl and abides by good development and deployment practices doesn't need Docker the way big lumps of finely woven angel hair spaghetti do.
Docker might still be nice for perfect reproducibility, ability to manage deployments like git repos, and other neat features, but it's less of a requirement. It becomes maybe a nice-to-have, not a must-have.
But... if my software is not a sprawling mess that demands that I mangle and pollute the entire system to install it, why not just coordinate development and deployment with 'git'? Release: git tag. Deploy: git pull X, git checkout tag, restart.
Finally, Docker has a bit of systemd disease. It tries to do too much in one package/binary. This made the rounds around HN a while back:
It demonstrates that at least some of Docker's core functionality does not require a monster application but can be achieved by using modern filesystems and Linux features more directly.
So honestly I am a bit "meh" about Docker right now. But hey it's the hype. Reading devops stuff these days makes me wonder if "Docker docker docker docker docker docker docker" is a grammatically correct sentence like "Buffalo buffalo buffalo buffalo buffalo buffalo."
Docker actually doesn't help reproducibility at all, because the underlying reproducibility problems present in the distro and build systems are used are still present. See GNU Guix, Nix, and Debian's Reproducible Builds project for efforts to make build truly reproducible.
I had a good laugh when I read "the Rube Goldberg Machine development anti-pattern". This describes the situation of "modern" web development perfectly. I'll add that such software typically requires 3 or more different package managers in order to get all of the necessary software. And yes, Omnibus is an abomination and Docker is much better.
I think Docker is papering over issues with another abstraction layer. It's like static linking an entire operating system for each application. Rather than solving the problem with traditional package management, Docker masks the problem by allowing you to make a disk image per application. That's great and all, but now you have an application that can only reasonably be run from within a Linux container managed by Docker. Solving this problem at the systems level, which tools like GNU Guix do, allows even complex, big ball of mud software to run in any environment, whether that is unvirtualized "bare metal", a virtual machine, or a container.
You say it like it's a problem, but that's the most concise description of Docker I've yet read. It rhymes with the way all the fed up oldies using Go like its static linking.
I simply will not run apps like that unless I have no choice. If I see that, plonk it goes into the trash.
... and yes, the whole package management situation is comical. Every language has its own package management system, and the OS, and sometimes people use both at the same time. It's ridiculous.
```The following packages are needed to run bocker.
util-linux >= 2.25.2
coreutils >= 7.5
Because most distributions do not ship a new enough version of util-linux you will probably need grab the sources from here and compile it yourself.
Additionally your system will need to be configured with the following.
A btrfs filesystem mounted under /var/bocker
A network bridge called bridge0 and an IP of 10.0.0.1/24
IP forwarding enabled in /proc/sys/net/ipv4/ip_forward
A firewall routing traffic from bridge0 to a physical interface.
A base-image which contains the filesystem to seed your container with.```
Is this the "well-written software" pattern that you're talking about? Because to me, this looks like a "big ball of mud" - i.e. dependence on an eclectic combination of libraries, co-programs, and environment configuration - and indeed, if for some perverse reason I felt like I wanted to deploy this in production, it's exactly the kind of thing I'd wind up writing a Dockerfile for. (Which, I notice is functionality "Bocker" doesn't attempt to replicate.)
I hate being passive-aggressive so I'll be directly aggresive here: this mentality is a way to say, "I don't want to revisit the operational aspects of my system because I don't like to do that work. Find someone else."
Like any aspect of your system, your ops and deploy components can rot. Pretending otherwise is outright ignoring a consistent lesson offered by those who came before and have failed over and over.
Docker offers to take over as a project many aspects of the system subject to bit-rot and make an explicit and consistent container abstraction for software to compose. While it has many features we do need (I agree wholeheartedly that it'd be great to parallelize layer creation, less so about secret exposure since the environment & volume tooling already can do that), it has also replaced whole categories of software and devops tooling with simple and extensible metaphors.
And then there's the part where Weave is slow, so you might as well stick to VMs or hardware...
The idea that docker introduces that much uncertainty is outright fear mongering. There is a huge amount of recalcitrance in the community to do anything meaningful in the space due to a proposed risk aversion. My personal opinion is that we're all pretending we didn't write incredibly delicate and brittle provisioning and monitoring code with very dated tools.
Many people I know, and more than a few I respect, ultimately point to all their provisioning shell scripts as the ultimate reluctance to change things. "It will be really hard to migrate and test these! Generating them is a pain!" Of course, the elephant in the room is we all knew this going into it and we all know we SHOULDN'T have been doing things like generate shell script execution and using git to provision on production boxes and w/e other hacky shit we've done.
Of course, what we have is not any one thing but all too often an amalgam of spare hours and quick fixes and patches laid over some existing provisioning system like salt, ansible (or just a whole shit ton of puppet work).
Counter-intuitively, suddenly everyone has become a devops luddite when it comes to a genuinely novel approach even though container abstractions have already proven themselves at scale. People hem and haw and suggest that somehow it's not ready for production. Meanwhile major players in the space are already using them, even for core services, with excellent results.
Lightweight containerization has been used to solve this for awhile now. Docker as a product and initiative is relatively new, but to suggest it was the first example of a container engine used in production ignores the actual history of lightweight containers.
Show me a docker-aware Rapid7
There are a lot of tools for security and compliance completely thrown out with the bathwater when you move to containers. You're not going to get enterprises to bite until you can satisfy the auditors.
Last time I checked, Python and Django are agnostic to their operational concerns and deployment.
Packer is used to build the AMIs with the containers built-in, and Docker is used both in Prod (single container to each AMI) and Dev (Docker Compose to bring up entire dev env locally). Both used a shared docker registry.
He tries to demo me what they have currently and the damn thing timed out during login. I laughed.
The cost of these headaches is easily avoidable. Get off the ground and running first, pay the kind-of-premium Heroku bill, and when you're ready to really scale, make the switch.
There are few exceptions to managing an infrastructure, such as RackSpace, a cluster of AWS nodes, your own metal, etc. versus something like Heroku.
- 1 webserver/proxy, let's say nginx
- 1 simple Rest API server, let's say in flask
- 1 database, let's say PostgreSQL
and I want to connect all 3 things and I want to preserve logs for the whole time and preserve the state of the database (of course). Also not to forget make all bulletproof for the Internet.
And here all sorts of problems arise: What underlying OS, how to connect this containers, how to preserve state of my database and logs (it's not trivial as the article proofs again).
So overall Docker makes life not easier on this simple use-case, it makes life (of the sysadmin) more complicated.
- What underlying OS? CF provides a minimal Ubuntu Linux "stemcell" and then has a standard "rootfs" for Linux containers
- a Python buildpack to assemble the container on top of this OS for your Flask server
- a built-in proxy/LB so you don't need one, if you want a static web server there's a static buildpack for Nginx
- an on demand MariaDB Galera cluster for your database if you want HA; PostgreSQL is there too but non-HA I think
- A standard environment variable based service marketplace & discovery system for connecting the containers to each other or to the database
- high availability (with load balancer awareness) for your containers at the container, VM or rack level
- reliable log aggregation of your containers (which you can divert to a syslog server).
As I said the only trouble is when you want to make this "bulletproof" is that there are a dozen "support VMs" are all there to make your app bulletproof and secure, e.g. an OAuth2 server, the load balancer, an etcd cluster, Consul cluster, and the log aggregator, etc. So it's overkill for one app, but good if you have several apps.
For single tenants and experimental apps, there's http://lattice.cf which runs on 3 or 4 VMs and is a subset of the above, but not what I'd call "production ready".
1. Data services, not true. There's MariaDB, Cassandra, Neo4J, Mongo, Postgres, among others. Yes, they're in VMs, but recoverable/reschedule-able persistent volumes in container clusters are at best experimental features anywhere you look.
2. NIH, compared to what? CF reuses etcd, consul, monit, haproxy, nginx, etc. will use runC and appC as those get hammered out.
3. Lots of people love BOSH.
4. If you don't like all the decisions Full CF makes, this is why Lattice exists, it delegates config/install to Vagrant or Terraform (which have their own problems) so anyone can take the core runtime bits with Docker images and use them in new and interesting ways.
5. What container or cloud platform project isn't based on code contributed by one or two vendors? Realistically? None. The CF foundation at least is an honest attempt to give all the IP to a neutral entity (including the trademark soon), has several successful variants (mainline OSS, Pivotal, Bluemix, Helion, Stackato), and has customers and users joining the foundation, not just vendors.
Dokku - https://github.com/progrium/dokku
Can't really beat `git push deploy/uat`
I just run PostgreSQL on the host and connect to it from the containers. Sure I could containerise PostgreSQL itself but I don't really see the point.
I then run my own Dokku plugin (dokku-graduate: https://github.com/glassechidna/dokku-graduate) for graduating my apps from UAT to production.
I'm running Docker (specifically Dokku) because it drastically simplifies deploying new builds, and graduating those builds between environments.
I know a large part of this article was that Docker complicates rather than simplifies the situation. I guess if you're trying to be a Docker purist (for no reason) then sure. The same is generally true if you try be a purist of any kind.
The reason why I am using Docker is the forced honesty on the environment side, if your app runs on your laptop it does not mean it will run on the production boxes. If the Docker container runs on your laptop it gives you higher confidence that it will run on the production infra. No missing JARs, environment variables, misconfigured classpaths, etc.
Something that's more declarative is definitely superior. Why? Because it will be shorter and easier to debug. I am not a fan of `git push` as a deployment strategy (because git is a version control tool, not a deployment tool), but it does force you to create and use a system that's by definition declarative. This is why I use dokku for my new projects.
git push deploy/uat
Plus, by using Dokku I get the benefits of containerised apps.
Application and database servers are different animals. Not sure why a 'hybrid' approach would be surprising or unappealing.
Databases are also tricky to run in containers, because even those with the best replication strategies can afford losing nodes but at a high cost (like re-balancing nodes, etc), and containers still don't have the stability to provide an acceptable uptime that's worth the risk.
On a side note, since you mentioned nginx and RESTful APIs, I would check out Kong (https://github.com/Mashape/kong) which is built on top of nginx, and provides plugins to alleviate some of these problems (http://getkong.org/plugins/).
edit: I guess you can cram all of the various Kubernetes master/etcd servers on a single node but whoops there goes reliability.
Agreed that she doesn't need to use Docker. But if she is writing a paper on those results, she might want a way to reproduce her findings years down the road (even after she switched Distros), or to collaborate with others who want to reproduce/build on her research (and may not be running her distro).
It's easy to think "oh, this script just requires python 2.7", but most of the time you actually have many more dependencies than that (libxml, graphviz, latex, eggs, etc.) A Dockerfile requires some work to setup, but it tracks your requirements in an automated way.
So I'm not going to say "all researchers should use Docker". But I will say "Docker could be useful to some researchers". Just like Source Control, it's a tool that solves real problems. Source Control has gotten easy enough to use that it's recommended everywhere. Docker (or some other container standard) will get there eventually.
Docker is really good for dev environments. I've had a relatively painless time dockerizing snapshots of old internal web apps so I can hack on them without installing things into my main desktop environment. It lets me have lots of server things side by side.
This is not to say that Kubernetes is bad but … it's a commitment which isn't appropriate for everyone. If you aren't exercising its abilities heavily, that's probably going to be a distraction from more pressing work unless you're scaling up heavily right now.
Once you understand how docker works, using the YAML file can become useful to lighten your load.
Multi-Host is moderately more difficult. A full orchestration and resource scheduling stack that scales with load even more so.
But you have to ask what your needs are if you're being realistic.
Do me a favor and if you got a startup, stay clear of all this. Everyone wants to reinvent their own flavor of heroku and make your deployment and build pipeline god-awful complex. Their tool of choice? Docker.
Before you know it you'll be swimming in containers upon containers. Containers will save us, they'll cry! Meanwhile you have 0 rows of data before you've paid them their first month's salary and have spent time on solving problems of scale you'll never have.
Focus on your product, outsource the rest. And leave customized docker setups to mid-stage startups and big corps who already have these problems, or at least the money and people to toil on them. Not everything needs to be a container! And most companies are not and will never be Google!!
I quit the job.
The scenario played out just as you said: I ended up single-handedly and poorly re-engineering something that already existed (they did have a working Ansible setup) for no visible gain. "Swimming in containers upon containers" is exactly what happened; they kinda worked, but the farther we got, the more kludges piled on top of each other. In four months work we didn't even hit production - the most we got was a CI/QA service that was actually nothing more than a loose bunch of Python scripts. Between managing dev/test/prod differences, tracing missing logs, removing unused volumes, networking all that stuff together and trying to provide at least a decent level of security, I realized that I'm wasting everyone's time and money. Developers hated it because it filled their workflows with traps and obstacles. Admins hated it because of the lack of tooling. Business hated it because it caused unexplainable delays. The only thing we really accomplished was some compliance with the The Twelve-Factor App - something that could've been done in a week. Hardly a victory.
My advice? Forget about Docker unless your primary business is building hosting systems. It will take years before Docker gets mature enough for production, and not without a ton of tooling on top of it and some major architectural changes. Until then, go back to the old UNIX ways of doing things... it worked perfectly since the Epoch and it will continue to work long after the 32-bit time_t rolls over. You'll be fine.
The services in question are built with a hodge-podge of shell scripts and build tools, so getting them all to compile locally is a challenge, let alone deploying them. My hope was that containerizing the builds would isolate any configuration problems, and that containerizing the deployed services would cut down on outages by permitting trivial rollbacks (say, by snapshotting all the service containers before each deploy and merely restoring them should a deployment fail). Of course, all of the above could be fixed by traditional means (e.g. rewriting the build system with a single, standard tool; streamlining the deployment process, etc.), but it seemed like Docker could solve 80% of the problems while easing the implementation of the proper solutions down the line.
Considering the above, do you still think Docker's a poor fit for business that aren't building hosting systems? Oh, and any nuggets of wisdom you could throw to a newcomer to the industry? :)
You don't have a standard repeatable way to set up an environment now. You need to do that first before jumping on docker I think. Once you have that, you can start replacing parts of the setup with docker and see if it fits your needs.
The advantage of ansible is that it is idempotent and the changes it makes to the system are the same ones you make manually or via bash scripts. So it is quite easy to debug
If you come to a new place and "there's an error in here somewhere", the difference between layers of images held together with shell scripts and a Ansible/Puppet/Chef script is like night and day.
Something like the following
docker run -v `pwd`:/tmp/buildresult your-weird-hodpodge build-command
docker run -v `pwd`:/...
As for rollbacks, the exceptionally bad Docker tagging system just adds headaches to rolling back efficiently. If your production OS has a package management system, consider building packages for that - after all, it will have been battle-tested and known to work on that OS. There will be a learning curve for any packaging system, but using a native one means less faffing around later - remember also that docker is changing a lot with each release.
Also, as mentioned in the article, logging with docker is difficult and hasn't been solved properly yet, and if you like production logs for troubleshooting, Docker requires some attention before you can get those logs. My devs just run the app and watch STDOUT... which isn't easy to log in docker. Then, of course, they complain that they don't have production logs to debug, and subsequently complain when I ask them to modify their logging so I can slurp it :)
Anyway, Docker is not a packaging system for use in-house; if you're only using it to package stuff... you will be ripping it out later on down the line (this is what happened to me). On the other hand, if you open-source your stuff and want to provide 'canned images' for random members of the public to use, then there is a point to using docker, since you don't control what those host machines will be running.
In short, Docker is a complex ecosystem with it's own learning curve, and doesn't really save you from learning curves for other things. If you can't articulate the exact problems that using Docker will solve for you in production, I would advise against it.
Edit: If you need a standardised provisioning system, start out with Ansible. It's pretty straight-forward. Admittedly I've only used it and Puppet... and puppet is better aimed at large/complex infrastructure environments.
> Focus on your product, outsource the rest.
What do you mean by outsource the rest?
Do you mean, "hey we're using AWS <Everything>-as-Service because we don't want to manage a DB cluster or deal with a load balancer?
Or do you mean, rely on existing available tools and stop reinventing the wheel every week?
iamleppert means: Identify your company's core competency and do that in house, but outsource or avoid that which is not your core.
For example, we're making a game. Gameplay, art, and tech is all done in-house and not with remote contractors because it needs to be -- it's the part of the product we love and the part our players will end up loving. Email, forums, chat, HR, applicant tracking systems, and git hosting are outside of our core and best handled by others.
Installing an exchange server is arguably letting email be "handled by others" because you are not responsible for how it works, just the setup and monitoring, which could be handled by in-house staff or by a contractor.
My point is that "focus on the core competency" doesn't have to mean "make our company reliant on a dozen other SaaS businesses who may go offline or change their business model/functionality on a whim"
Regarding the outsourcing, that's what we're shooting for at Giant Swarm. We've written a stack that runs containers and manages the metal underneath for you. We run the solution as a shared public cluster at giantswarm.io, but can also do private hosted deployments or managed on-prem deployments. It's a complete solution for running containers that feels like a PaaS, but without all the opinionated crap associated with a PaaS.
We're basically offering to be your little devops team that could - with containers.
Services like Cloud66 are interesting (they manage deployments onto your own EC2 or other cloud infrastructure), but the developer experience doesn't quite match Heroku yet.
Heroku really needs some more competition...
Also, one of the killer features of Heroku (which few services seem to replicate) is log drains - I can easily add a http or syslog endpoint and have Heroku send the logs over. The other killer feature which isn't often replicated is One-off dynos, where we can spin up a new instance and get a console attached to it in one command - useful for running database migrations or using Ruby as a CLI to access data.
If we were on .NET that would probably be attractive, but it's still not really competing with Heroku.
If I have less than 50 (maybe even 100) EC2 instances for my applications there is no way in hell I am going to run 3 service discovery instances, a few container scheduler instances and so on and so forth.
For whatever it's worth, we completely agree with the sentiment (and I like your "blue collar apps" term) -- and we deliberately have designed Triton for ease of use by virtualizing the notion of a Docker host. I think that the direction you are pointing to (namely, ease of management for very small deployments) is one that the industry needs to pay close attention to; the history of technology is littered with the corpses of overcomplicated systems that failed because they could not scale down to simpler use cases!
Fortune 500 technology customers. Fortune 500 companies who have hundreds of millions and decades of work invested in their infrastructure generally aren't going to jump on whatever the latest infrastructure trend is.
You could go "old school" and have some (virtual) servers do more than one thing :)
Officially Docker is only supported on RHEL 7 and up, and most systems I've seen are still on RHEL6.
I think its just a matter of time before Docker goes into Production, where I'm working we're seriously looking at "Dockerizing" lots of things, but OS support keeps popping up.
I really wish RH had found the time to fix RHEL 6 and support docker.
RHEL 7/CentOS 7 is a big step for many. RHEL 6 isn't even near EOL and many people (including myself) wanted to get more mileage out of CentOS 6.
But really, the most painful aspect of using Docker in production, at least in environments where you need multiple physical servers (or VMs) is overall orchestration of the containers, and networking between them.
Things are much better today than they were a year (or 6 months!) ago... but these are two parts of Docker configuration that take the longest to get right.
For orchestration: there are currently at least a dozen different ways to manage containers on multiple servers, and a few seem to be gaining more steam, but it feels much like the JS frameworks era, where there's a new orchestration tool every week: flynn, deis, coreos, mesos, serf, fleet, kubernetes, atomic, machine/swarm/compose, openstack, etc. How does one keep up with all these? Not to mention all the other tooling in software like Ansible, Chef, etc.
For networking: if you're running all your containers on one VM (as most developers do), it's not a big deal. But if you need containers on multiple servers, you not only have to deal with the servers' configuration, provisioning, and networking, but also the containers inside, and getting them to play nicely through the servers' networks. It's akin to running multiple VMs on one physical machine, but without using tools like VMWare or VirtualBox to manage the networking aspects.
Networking is challenging, but at least we have a lot of experience with VMs, which are conceptually similar. Orchestration may take more time to nail down and standardize.
> Where to put logs
Well, I just throw them aside and use `docker logs [container]`
> How to manage state
One container should perform one service. I haven't run into a problem here.
> How to schedule containers
ECS :) But honestly, I subscribe to the approach that containers = services and thus should just always be running.
> How to inspect app
`docker exec -it [ container id ] bash` ("ssh" into container)
`docker -f logs` (follow logs)
> How to measure performance
Probably same way you measure system performance
> How to manage security
Everything of mine is in a VPN; some services can talk to certain services over certain ports... Personally, I don't really understand all this talk about security. Protect your systems and that should protect your containers. Why is it that isolated processes are causing people to throw up their arms like security is an unimaginable in such a world? There are ways..
> Consistency across docker containers
This can be a pain if you need this, yea. They see to be adding better & better support to allow containers to talk to one another (and ONLY to one another).
> Ain't nobody got time for that.
Hmm, personally I don't have time to go thru what Puppet, Chef, and even Ansible require to get your systems coordinated. I see this as far more work than creating a system specification within a file and finding a way to run it on some system.
All comes down to requirements though and where your technical stack currently is at. To any newcomers who are also plowing into the uncertain fields of a dockerized stack, fear not! You are in good company and if I can make it work, you can too.
1) 'docker logs' relies on using the json logdriver which means the log file is stored in /var/lib/docker/..... and grows forever. No rollover. No trimming. FOREVER.
2) What if your container dies? What if your host dies? Do you have any state at all or have you abstracted that out? Are your systems distributed
3) Always running does not answer finding where to run them
4) That only works if the container is running. What if it died? Also, docker logs is a fool's game
5) bingo, that's right at least
2 - If a system dies and it has a state, then what do you do? If a dockerized process dies, and it has a state, then what do you do? This isn't some new problem to Docker. If my database service dies, you know what happens? It starts back up and connects to the persistent volume. Personally speaking, yes all of my services / systems are distributed.
3 - Most people don't need to start their services exactly at this point and then stop at another certain point (which is why I pretty much brushed over it). If they do, there's plenty of tools to do this that can also utilize docker.
4 - What if a system died? Does this mean you SSH'ing in isn't a viable option? (yes...)
5 - Yes, you love negativity so clearly this is your favorite
6 - ...? What? Do you have something more to say?
It's cute that you like to poke holes and personally attack people, but really my comment was just how I go about things on a day-to-day basis. This is coming from someone who has 6 major Docker services abstracted out running all the time across 3 environments.. all capable of being updated via a `git push`. I think I have decent, practical advice to offer for other docker-minded practitioners and just decent advice to newcomers.
Your grievances circle around logs not being centralized, easily-accessible (1, 2, 4).. You also don't outline any solutions yourself.
Yes, the unrotated container logs are kept in a root-accessible-only location in a directory named after a long key that changes on every image restart - not conducive to manual log inspection, and definitely not conducive to centralised logging. That's not a 'system problem', it's Docker just being rude. Yes, a relatively experienced engineer can work around that... but why should they need to 'work around' it in the first place?
Ironic really, that if you put a user in the 'docker' group, that they can do anything they want with the docker process, destroying as much data as they like or spinning up containers like nobody's business... but they can't see the container logfiles.
Even without that issue, I'd prefer my logs to be centralised. So as well as my app should I be running a logging daemon, process monitoring, etc for each docker instance?
People are not kidding through, when they say that everything gets very complicated. All the things that we did by convention and manual configuration in regular VMs that are babysat manually have to be codified and automated.
Docker is going to be a great ecosystem in 3 years, when the entire ecosystem matures. Today, it's the wild west, and you should only venture forth if having a big team of programmers and system administrators dedicated just to work on automation doesn't seem like a big deal.
Stop with the blaming statements.
This avoids saying "You're an idiot", which is nearly never constructive or helpful, and instead makes education and cooperation its goal. Most people respond better to that.
Who the fuck cares? Why are you debating if you should call someone an idiot or not? My personal philosophy is to be nice to others, always because I don't know what they're going through. What's there to be gained by not only calling someone an idiot but defending it on the meta scale? Yes we should always cloak our rebuttals with negativity -- for what other way could there be?! I must call this person an idiot, don't you see?! 50% of people are below average intelligence -- surely I must let them be aware of the fact that I believe they're mundane and forgettable!
I don't care if other people think poorly of me, I'm going to believe in myself... you critics are so annoying. I can't even write a comment trying to help people without jackasses flying in poking holes in what I said AS IF it were gospel! It's a comment! I wrote it in 2 seconds and, sure, maybe I should have put some more time into it but I was just trying to help out anyone who got scared by that list. It's such a different mindset. I didn't set out to be RIGHT, which is what's most holy & sacred around these parts. The best thing that could have happened is some people would have been like, yea but how is X going to solve Y when Z happens? And I woulda been like, good question mate, blah blah and we woulda all been better off.
Instead, a kid comes flying in drunk on keyboard ego and is like "You should stop talking"; think about MY intention vs HIS intention. Think about the INTENTION behind calling someone an idiot and what it does to that person. So stupid.. honestly. There's a bigger picture at play than being right or wrong... You don't do certain things not because it's empirically correct to do it, but because it's the moral thing to do or the mature thing to do or the compassionate thing to do.
If you want to argue "this is the right way", be prepared to bring data and defend your statements.
People here might be making critical decisions based on knowledge shared here, and they deserve the most accurate information possible.
Not sure what you're implying either, but you can tell in my comment, I never came out of the gates saying "This is the best way!"
That's why you should talk to the points and not make personal statements like "you shouldn't give advice".
You have to remember that you containers and coming and going all the time, which is one of the biggest challenges. It basically means you have to have everything centralised and that means a lot of additional infrastructure/complexity.
At the end of the day, you have to view it as building a reliable system that performs a function. Docker is one tool you can use to do that. Virtual machines are another tool. They don't solve all the problems you describe, nor are they intended to. If you're a tiny startup, you can just go the AWS route, but that leaves you beholden to AWS and their pricing. That's fine early on, but eventually you'll want to go full-stack for one reason or another.
Just one non trivial example: I can secure Ubuntu against sshd attacks pretty good and easy with `sudo apt-get install fail2ban`. Now try to secure CoreOS against sshd attacks. There are guys out there who tried to run fail2ban in a container (without luck) and so far I've only found one hacky script which tries to do the same oO https://github.com/ianblenke/coreos-vagrant-kitchen-sink/blo...
That's not to say you're wrong; containers probably aren't that useful to most small shops. But that summary doesn't make any sense for this article.
Also, see https://titanous.com/posts/docker-insecurity
In the two hours I've spent with OSv, I've gotten much lighter weight VMs that boot my large scala app extremely quickly (a few seconds, max), with less configuration and more predictable performance.
For instance there's still work being done to add native PAM and by extension Kerberos support, and the daemon runs as root, thus requiring extra caution about who may run docker commands.
If you're (for example) in an enterprise where developers may never have root access under any circumstances, you end up with a chicken and egg scenario: if developers don't have the ability to test container creation (because doing so might grant them root access in a container), who does?
In summary from a person in that scenario:
1. Not known of and too short of time horizon - People still run Windows XP in the real world. Changes where the rubber meets the road (IT and DevOps) take years of hard evidence, infrastructure cost, justifications, etc. to catch on. It does not behove these groups to be an early adopter.
2. Not flexible enough yet - I have a ton of use for this if I could run it more like a VM but faster and easier to deploy. I devop with a product that uses its own kernel... I tried to talk Dev in to compiling a kernel with Docker for a use case I have - you can guess where that went.
Docker is great, but I can only use it with my devs in its current state and for myself in specific cases.
I keep hearing about people putting Docker in dev and test environments and not production. This use case makes no sense to me as you would throw away the entire point of containers and have a wildly inconsistent path to production.
Relying on Puppet (as with prod) means development VM setup/change time is measured in hours. My company's Puppet catalog takes 15 minutes to compile, 6 hours to run. Entire days of developer productivity are lost trying to get development VMs working. Docker would make that instantaneous. It's also very hard to manage and synchronize data (i.e. test fixtures) across all those services. With Docker you could have a consistent set of data in the actual images and revert to it at will.
Even a simple `npm install` in a docker container fails on Windows because of the lack of support symlinks (adding --no-bin-links means npm's run scripts can't be used to their full and useful extent).
Simple tools (rpm + yum + docker) allowed us to replace these people with a simple shell script. Literally.
I agree with the article Docker is missing some things. Two that I would like to see:
- Auto cleanup
- Clean and easy proxying
That means we're like a year away from it being boring and just working, right?
But then, if you don't feel like you need it, that's probably because you don't need it.
(If people are downvoting your question, it's probably because you're giving off a bit of a "I don't understand Docker so it must be crap" vibe, which is not helpful.)
Sorry if my initial question came across with a weird vibe. I'm generally curious. I have colleagues working at places and they actually are being asked to drop everything and implement Docker. I asked why and what's driving this and got the predictable response of "management/dev/someone wants something new".
Catching it before pushing your changes is far more preferable.
There are a lot of different methods and processes to fix this. Docker is a new one that simplifies a number of the pieces of the puzzle by constraining the environment is useful ways.
However, if you already have a process worked out and aren't experiencing pain then you probably don't need to switch for the sake of switching.
That's true until something break in production: then you want to replicate the same situation in the dev environment, as close as possible.
Docker isn't magical, but the process that it lends itself to can be very useful. Those companies aren't using docker to be successful. They are successful because of the processes (and intensity) that docker fits into.
I'm sorry that your colleagues are being asked to drop everything and look at anything (much less Docker). That's not a nice way to work -- and I'm sure it influences their notions of Docker.
Also, shout out to the fanboys for downvoting my question, which was just a question asking for thoughts and answers and didn't make any statement whatsoever.
It has literally nothing to do with the ability to upgrade independent pieces on the sysadmin's schedule and everything to do with abstracting sysadmins clean out of the process. The entire profession has established itself as a roadblock to progress, so like good engineers do, we're busy coding the problem away.
Docker's answer to storage so far has been "don't use Docker". That's their answer. Use volumes to map some other storage, but then you have to have some way of mapping storage to containers outside of Docker. Now you're really stuck.
Containers are awesome, but unless your product doesn't do work, you'll need to store data at some point. And that's when the magic stops.
It also does not link containers, instead opting to attach the database to the first IP address of the network Docker sets up, thereby avoiding the need for complicated service discovery. It also includes instructions on how to deploy Redis on the same box and use that with WordPress. Also includes instructions on how to do SSL for each site. It's being used in production.
Containers are only going to grow in uptake; companies like Weave and ClusterHQ have a very bright future if they can solve real pain points like the ones in this article.
I mean if your app needs the entire fucking OS to provide isolation from other apps, then you are clearly doing it wrong.
Docker could be much more successful in the Windows world, the ability to package very precise versions of databases, libraries, weird obsolete application into one image that can be deployed easily would be extremely helpful in many companies. It would be the wrong solution, but an easy work-around for broken upgrade paths.
Having containers able to package weird obsolete (unpatched) applications, specific (out-of-date) versions of libraries, and poorly-written homespun code is a recipe for exploits. The out-of-date version of the library (e.g. Java 7) likely has exploits out in the wild that have been patched in more recent versions. The weird obsolete application (e.g. DTS) likely not only has exploits patched in the active codepath, but has multiple bugs and integration issues. The homespun code likely reimplements something done better in another application or library, and introduces more bugs and vulnerabilities to the network.
Sorry for going off on this, but being able to repackage unsupportable applications would be a nightmare in places I've worked before.
Unfortunately full App-V is only for Windows enterprise customers.
Prove it. I'm not saying it's impossible, but it's certainly not trivial.
Also, take a look at what Joyent are doing with Triton.
It is also worth mentioning that since Joyent has implemented their own docker client, not all features are there yet. Last time I tried docker-compose didn't really work right yet. There is a full list of divergences on their github page. It has a lot of potential though.
Not our own docker client, our own Docker engine, https://github.com/joyent/sdc-docker , which was necessary for the whole DC to be the host. For a taste of the details see https://www.joyent.com/developers/videos/bryan-cantrill-virt... .
Your larger point is correct, we're still working hard every day to add increase the support particularly for the newer docker apis and extensions. Now, docker-compose 1.2 is working in the production datacenters with docker-compose 1.3 in the east-3b (beta) dc.
VMs have the advantage of shielding the kernel with a hypervisor, but they also have the disadvantage of lots of complicated driver code that can allow exploits such as VENOM.
see : bocker
Build, test, and ship the same artifact. Whether it's a Vagrant on your Mac, AWS, or metal in your colo datacenter.
>and why cant you just run cgroups without the overhead of docker ?
If you're running cgroups, you've created your own half-baked implementation of Docker in giving yourself a reasonable API to work with. This might make sense if you're Google but otherwise probably not.
Every service I deploy gets it's own VM (which is automatically provisioned/locked-down by a bash script), and they automatically update when a new revision is pushed to our production git branch.
It seems that docker is more useful when you have physical hardware? and/or lots of under-utilized infrastructure?
I'm frustrated though because I keep pinging them about adding branch information to their (dockerhub) webhooks so I can actually deploy environments via branches.. It's crazy vital in my opinion and seems like it should be an easy fix, but 2 months later and still doesn't seem to be scheduled in.
Nevertheless, I'm sure Docker has its technical shortcomings but really, I wouldn't say it's not succeeding.. it's just young. Adoption takes time.
That said, what we do is we have our CI system build our docker images, push them to dockerhub (private registry) if the tests all go great, and then we deploy using https://github.com/remind101/deploy. We also tag all our images with the git SHA that they were created from, so we have immutable identifiers for each image, which has been useful.
We just recently put direct github deployment support in Empire, so that's been really nice (before we had to use another service that pulled deployments and put them into Empire).
Anyway, not quite the workflow you're talking about, but it's really worked well for us, so maybe it'd help you as well :)
No one seems to know anything about it.
Also, when we upgraded from 1.6.3 to 1.7, devicemapper started having issues.
On top of the bugs, the limited networking support is very, well, limiting.
I would be very hesitant about using it in production at the moment. That said, I can also see the potentials and it seems to be heading in the right direction. It's just not ready at this moment.
Hmm, I don't think so. My reason is that, in addition to the maturation and feature growth of containers, there will also be feature growth in Puppet et al.
IMO any tool that does procedural run-time configuration like chef/ansible/puppet will generally be inferior to an image based infra management solution. (unless you're using said tools to build images - which is another ball of wax that will likely end up looking like a reimplemented docker)
The problem with procedural run-time config is that unless you blow away the VM, build from scratch, and run a test suite you don't really have good assurances your infrastructure is in a good state. With images, you have a bit for bit copy of what was built and tested in CI or QA. This is, for us, worth the price of admission.
Reproducibility implies being able to regenerate the full container including software version control and visibility of the full dependency chain all the way down to BLAS and glibc! You can't do that by using apt, rpm, Perl CPAN, rubygems, Python pip and the like. None of these package managers have been designed for true isolation of packages and full reproducibility. That is why today people go with Docker. The shortcomings of these package managers drive people to Docker.
The technology for regenerating exact Docker containers exists in the form of GNU Guix and/or Nix packages. The fun fact is that when using GNU Guix, Docker itself is no longer required.
Watch GNU Guix.
The article mentions that "most vendors still run containers in virtual machines", presumably since if someone hacks an app in a container they might be able to break out of the container and access other apps running on that host. But clustering systems like Kubernetes, CoreOS, AWS Container Service, etc. seem to be all the rage these days and they seem fundamentally at odds with this. The cluster might schedule multiple containers on the same host at which point somebody who hacks one can hack all of them.
How do you reconcile this? Do people running these clusters in production typically run tiers of separate clusters based on how sensitive the data they have access to is?
It becomes as simple as asking what name the cluster should have.
It also makes sense from managing resource concerns to some extent, such as a cluster with cheap instances for low priority applications but need HA support or a cluster with beefy instances in a subnet that has fewer hops should be used for edge tier applications.
But that's as far as I will take it, Docker is mainly used (from what I've seen) as a nice way to package something without having to write an actual package (RPM/deb) that will work across multiple platforms (for the most part). If you take the time to learn how to properly package your application, docker is unnecessary in almost every case.
Throw in a database, a cache server, couple of versioned libraries your jar file needs, and more developers, and suddenly a reproducible image with all this packaged will make a lot of sense.
I have a database and a cache server. They don't run on the same server as the application jar... they run on separate machines tuned to their purpose. Why would I want them packaged together? So my team doesn't have to run "apt-get install postgresql" on their dev machines? Or to maintain an exactly consistent dev environment?
ill highlight the website is off by one... reading the website i have no idea how it works and what technical debt im adding to my teams stack by using it. "Build/Run/Ship", I'm doing that already. I have no idea if its using VMs or something else for containers. no idea if my hardware works on it. and no idea if the distros used for images are 1 year old or -nightly, so whos security issues am i inheriting?
Also moving around a 700MB+ image when you can deploy some Debian package (or even setup a virtualenv, I do mostly Python), sounds a waste of resources. Add to that that moving volumes around is still an issue and... well, Docker has a lot of potential, but I doesn't fit very well in any of the projects that I'm involved in.