Hacker News new | past | comments | ask | show | jobs | submit login
Docker: The good parts (blog.shrikrishnaholla.in)
123 points by shrikrishna on Jan 13, 2014 | hide | past | favorite | 67 comments

It is pretty cool, but I'm starting to see a lot of comments like this about the Dockerfile:

> I don’t need to worry about the version of node, nor of the dependencies nor anything else. If it’s worked for them, it’ll work for me. As simple as that!

This isn't true as far as I can tell, the Dockerfile will have a series of lines like this

    RUN apt-get install x
    RUN apt-get install y
    RUN apt-get install z
    RUN cat "config line" >> /etc/config.conf
which does not guarantee success any more than a makefile on a clean install would. Those versions can change, bugs can be introduced or features changed and the config file location or type can change. If you build it today and I tomorrow then we could have different images running.

I'm aware that this file is to generate a "run anywhere" image, but I worry people might be treating it as a huge step on from installation scripts when it's very similar. The image part afterwards, however, is a huge step onwards.

Spot on. Once an image has been built its safe to say that it can be moved around and ran even after a lot of time has passed. The build process is another story; even though you have the steps(the logic) for creating the image(the result), you still need to control the package sources(the input data) to get an image that works correctly. I think the build process maintains its "run anywhere" property, but it doesn't maintain its "run anytime" property.

I've been doing a lot of research lately into more deterministic dependency management and reproducible builds by leveraging hashes via git.

You might be interested in MDM[1], which is a general-purpose dependency manager for binary blobs.

Specifically for container images, you also might be interested in hroot[2] -- it separates the concept of the image and transport out from the containerization system.

I agree wholeheartedly that it's the image permanence that's the interesting part about containers right now. In the last 24 hours I actually had an experience where a docker setup full of apt-get's failed to reproduce an image (new deps were added upstream that broke the system). Fortunately with hroot, I had the exact filesystems I had previously produced in a permanent, transportable system, and all covered by a hash so my production system could fetch exactly the correct version. I could have done this all manually with tars, but that's a pain for nontrivial use cases, and I could have done it with a docker registry, but I'm too much of a security nut to use the public one, and I already have git infrastructure set up, so it's actually easier to use that than try to spin up a private docker registry and secure it, etc.

[1] http://github.com/polydawn/mdm [2] http://github.com/polydawn/hroot

Why would you use apt-get with an upstream repository? Deployment 101 is to set up your own local repo mirror so that you control exactly what binary objects get deployed.

Dockerfiles are just "run anywhere", not "run any time". Maybe I should have made it clear. The creator still needs to maintain his Dockerfile. Maybe http://blog.docker.io/2013/11/introducing-trusted-builds/ will make things easier.

I think the level of detail you have worked fine for the writing style, my comment is somewhat nitpicky. Your statement is generally true, if they use the dockerfile they posted then it's extremely likely to work for you too.

It was mostly a clarification for people reading that the dockerfile doesn't guarantee repeatable builds.

Thanks for the post :)

Hi, Docker author here.

This is a perfect example of how we're trying to design Docker: by looking for the right balance between evolution and revolution.

Evolution means it has to fit into your current way of working and thinking. Revolution means it has to make your life 10x better in some way. It's a very fine line to walk.

I think a lot of bleeding edge tools sacrifice evolution because it involves too many compromises - there's a kind of "if they don't get it, their application is not worthy of my tool" mentality, and as a result the majority of developers are left on the side of the road. I see several tools named in this thread which suffer from this problem, and as a result will never get a chance to solve the problem at a large scale.

In this example of build repeatability, "evolution" means we can't magically make every application build in a truly repeatable way overnight. However, we can frame the problem in such a way that lack of repeatability becomes more visible, and there's an easy and gradual path to making your own build repeatable.

Sure, you can litter your Dockerfile with "run apt-get install" lines, and that does partially improves build repeatability: first with a guaranteed starting point, second with build caching, which by default will avoid re-running the same command twice. Your build probably wasn't repeatable to begin with, and in the meantime you benefit from all the other cool aspects of Docker (repeatable runtime, etc), so it's already a net positive.

Later you can start removing side effects: for example by building your dependencies from source, straight from upstream. In that case your dependencies are built in a controlled environment, from a controlled source revision, and you can keep doing this all the way down. The end result is a full dependency graph at the commit granularity, comparable to nix for example - except it's not a requirement to start using docker :)

Hi, nice to speak to you.

I agree, this is the right way to go about it. Someone with a nicely repeatable build can go ahead and get that with docker too, someone without still that gets a nicely distributable image. Docker seems to have taken off quickly as there's a benefit very soon after you start using it, and very little to get in the way of you having something running.

There's an issue in that people see the claims of one part and think they apply to the whole (I don't think the poster thinks that, but people reading it might get that impression), but this is a problem of education, not a technical one.

The dependency issues are solved if you reuse a given docker image; as you said.

Yes, the procedure of generating the image (Dockerfile) is basically a glorified installation script.

But I don't understand why you use the properties of the image creation tool to refute runtime properties of docker images.

EDIT: like if when discussing the properties of a perfect headache free binary package management system, you mention that the code is still not guaranteed to be the same because when the packages are built, two builds of the same package could be slightly different. The purpose of the binary packaging scheme is to use the built artifact. Repeatable builds are a secondary goal.

> But I don't understand why you use the properties of the image creation tool to refute runtime properties of docker images.

I'm not, my concern is that the language used suggested that having a Dockerfile meant that if it builds for one person it would build for all when this isn't the case.

That's not a problem with docker, as it's not something docker is trying to (or claiming to) solve. I'm worried that some people might think it is, so thought I'd post here to clarify things.

This is solvable by using a more deterministic system like Nix as the base build box.

Or using a package manager which allows to specify exact versions of the artifacts, like npm or maven.

or apt-get...

For that to work you'd have to have your own repository with all the packages you install. Old specific versions will not be preserved in the upstream repos if a new package is released.

Right, but that's not really a problem, is it? My understanding is that committing dependencies is strongly discouraged -- if you're writing libraries or modules intended for distribution and others' use. On the other hand if you're deploying a standalone app, locking down any specific dependencies' versions and committing them is actually considered a best practice, as it's the only way to be sure it won't unexpectedly change. Right? I'm genuinely interested in others' thoughts and experiences here.

The problem is that you're now forcing all of your users to use the library you bundled. First, this bloats the system, since you now don't share dependencies between applications. Second, it makes the system more complicated because there are multiple library copies floating around the filesystem. Third, it means that when you really do need to change a library/dependency, you can't just update the libfoo package, you have to update every application.

In my experience (Linux on the desktop), bundling dependencies is an indicator of poor software quality, but I realize that the situation may be different on other systems.

In this case, I think that the solution is to use the distro's package manager to pin minor versions and rely on the distribution's updates for security fixes. Hosting your own repo is a bad idea, since it means you won't get any software updates. Software updates are really important -- they have security patches and bug fixes. If you're worried about the distro changing something from under your feet, you should pick something more stable. Debian or CentOS are good choices.

Ok, but here it seems like you're referring to "[my] users" as if they were consumers of the library -- that is, as if they were developers. That's still my first case in which I agree committing / bundling dependencies is a bad idea. I'm talking about the 2nd case, of locking down dependencies used to build an application whose "users" are not even aware of the presence of dependent libraries nor the structure nor configuration of the underlying software. In this second case, isn't [freezing/shrinkwrapping/committing/bundling] the dependencies generally considered the right thing to do?

Of course, updating the dependencies (whether for patches or more consequential updates) in subsequent releases is fine... but within a given release cycle, one otherwise runs the risk of unexpected inconsistency, between e.g. a developer build and a CI build done a short time later.

I think this discussion illustrates the problem and might be more articulate than I'm being: https://github.com/bower/bower/pull/538

Thanks @hdevalence -- and anyone else who cares to comment. :)

That is just basic common sense. It is easy to mirror upstream repos that you need. Don't allow your systems to have dependencies that are outside of your control.

If run in the same machine, I think it will install the same version every time... won't docker cache the change and just play it back?

not if there is a non-cacheable command like "apt-get update". And if the subsequent command is something like "apt-get nodejs", then potentially two builds can have two different versions

Sandboxed applications that take care of all of the dependencies is a no-brainer for me. Amazingly I see quite a lot of focus on separating out all of an applications dependencies into separate containers and then linking them. I feel like in most circumstances that is not taking advantage of Docker. Unless you have quite a huge amount of time invested in learning Puppet/Chef or whatever, have nothing to do except play around with Puppet/Chef configs all day (its your only job), and are looking for a reason to keep using them with Docker, which is what I think is why some people are using links when things would run just fine and would be simpler if everything was living inside one container.

By "linking them" what do you mean exactly? Building container hierarchies using "FROM"? Exposing services through ports? Exposing resources through volumes?

If your application is simple, then sure, you can get away with almost any deployment and provisioning approach and it'll work "well enough". But these linking capabilities and products like Chef exist for more complex scenarios, and it would do you well to investigate the rationale behind them before being so dismissive.

I currently have a requirement to run 100s of applications provided by mutually untrusted 3rd parties, and co-ordinate startup/shutdown (for backup) and RPC access to these applications.

I need to be able to start an arbitrary combination of these applications on a node, depending on load (I cannot foresee the bandwidth/CPU requirements of each application without running it, and it will change unpredictably over time, sometimes to the point where a 1Gb/s link will be saturated by a single application for a few hours, and then change again to a trickle).

Sometimes I need to start multiple copies of this infrastructure for independent services that I may need to bring up/down independently.

In my scenario, using Docker alone to deploy the whole caboodle is not a maintainable solution. Using Docker to package the untrusted applications and selectively expose just the volumes for backup and a single port to just the host control process (keeping applications from talking to one another), and Chef to deploy/undeploy applications to nodes in arbitrary and constantly varying configurations that automatically rewire themselves is very maintainable.

The way I use these tools:

- Chef/Puppet/etc. = infrastructure deployment and configuration management

- Docker = application deployment and confinement

This separation is useful because the operations people can do their job, and the developers can do theirs, without stepping on each others toes and with minimal co-ordination. If you do everything in Docker, the ops team has a nightmare managing change in complex applications; if you do everything in Chef, your developers suddenly have to become Chefs, which is overkill and will waste time co-coordinating with the ops people.

My example above is childs play compared to what some organisations need to deploy and manage.

By linking them I meant using Docker's links feature. http://docs.docker.io/en/latest/use/working_with_links_names...

I have seen people discussing using that link feature where it seemed like they were just setting up a database or something for a single application and then linking that database container. Which seemed like it would be easier to set up the database in the container if possible.

I wasn't saying you can't use Puppet or Chef, just commenting on that particular case with using links for things like database dependencies for a single application.

The use case you describe obviously is not something you would try to manage with Docker alone.

From what you are saying it sounds like you have a good solution.

One thing that I remembered when you mentioned "requirement to run 100s of applications" was this new devops tool called Salt (saltstack.com). I actually don't know much about it but it sounded a little bit like what you are talking about. What do you think of Salt compared to Chef?

One advantage that I see is for example possibility to upgrade applications server and leave db running. You can then switch to new app-server using nginx making this process transaction like (complete successfully or fail completely).

I have never used Puppet/Chef,so can't speak for them, but part of the reason why sandboxing in Docker isn't overkill is that it uses a layered file system that shares as much as can be shared; so, although the containers are isolated in user space, they still share the same base. This is one of the reasons for its high performance

Not sure I understand what you are saying. I know that Docker uses AUFS. I'm not saying sandboxing in Docker is overkill. I'm saying that using separate containers for application dependencies rather than running them all in the same container is often making things more complicated than necessary.

Obviously some people have good reasons to use links, like they need to run lots of databases on different servers or something. But for most installations that don't need to scale to serve millions of people, putting all of the application dependencies in one container makes a lot more sense.

Oh! It seems I misunderstood your comment. My apologies. Yes, it makes sense to use one container for all the dependencies pertaining to an application. However, if you are running multiple applications on the same server, you can sandbox them by running them in different containers.

I've settled on running 5 or 6 containers rather than one big one with 5 or 6 processes because:

I can host some containers on other physical hosts - I don't need to keep thinking (what if I fill up the biggest droplet on Digital Ocean) - If one of the services dies or needed kicking - 5/6 of the stack is unaffected - Orchestration is really only a matter of network endpoints getting written to environment variables - not too taxing

Running everything is one container also has it's advantages - like being able to push the whole stack as one image.

Complete off topic nitpick, but reading this font made my head hurt. I didn't manage to read till the bottom, even though I'm interested. Could I suggest a font that has the same x-height for each character?

Irony is that I chose this theme purely for the typography.

I guess it looks better on your machine. This is how it looks like on Linux for me: http://i.imgur.com/JRHzYp7.png

Extremely annoying to read.

That looks really bad. I use Linux as well, but hadn't checked the rendering in Firefox. Have to change it. Pity, it looked good in Chrome

This looks like the font file you're using lacks some rendering hints (all points that touch the median line should be annotated as such, it looks like they aren't).

Another version of the same font or the same font from another source may work. Webfont sites often try to tweak the hinting tables, they're trying to make the fonts look better but it breaks stuff all the time.

It's weird, it looks good on my machine, using FF25 on Ubuntu.

It's nowhere near that bad, but it doesn't exactly look great on Chrome OS X.

And again in Chrome on Win 8.

This how it looks for me. http://i.imgur.com/9wNxAZM.png

Chrome Version 32.0.1700.72 m on Windows 8.1

Similar on android

Looks OK to me on android - http://i.imgur.com/JqV9bK0.jpg

Changed the theme to CleanPress. Hope it resolves the issue

Here are three instances where Docker made my life way easier:

Docker as an educational tool can be pretty powerful. One of the most annoying parts of CS courses is the initial install/configure/dependency wrangling you have to do to install required applications in whatever courses you happen to be taking that semester. Since courses may have different and conflicting requirements, just preparing your machine to use for coursework can be a nightmare.

Docker solved this problem for me as a student, and I can imagine it being solved easily for others if professors would just latch on to it and provide DockerFiles for their students. Sure, OS X and Windows users may have the initial hassle of setting up VirtualBox or what not, but I think the trade off is worth it. And when the course is over, there's no longer a lot of development software sitting around your hard drive that you may never use again. Take any source you developed, the DockerFile you used, throw it all in a repo and then you can easily replicate the build environment if you need it later on.

As a developer, I use Docker to replicate "large scale" deployments on my own machine. Typically this is just a database container, a nodeJS server container, and a container for my web application code. However, as an exercise I've spun up a container with NGINX to act as load balancers for multiple running instances of my webapp container. It was simple, repeatable, and can be easily replicated on production servers.

Finally, onboarding of new developers becomes MUCH simpler with Docker. I developed bash scripts to quickly spin up containers for development and production workflows. So onboarding new developers to my codebase is fairly easy. I distribute the source code of the project, a repo that contains DockerFiles and bash scripts, and a small readme. Developers are typically up and running in less than an hour, regardless of their operating system of choice.

I completely associate with what you said about using Docker as an educational tool. I have my projects littered around, and half of them might not even run anymore. Might be good to build a tar and archive them somewhere.

Sure. And if Docker was leveraged to create an easily reproduceable build environment, you'd have a fairly good chance of all that source running in that container again at a later date.

I have been using docker as an application sandbox for a while now. It seems like a breeze! Instead of concentrating on setting up and resolving dependencies, I can concentrate on the development. That is the best take away for me from docker!

I've been using Docker as a sandbox for an online contest judge. I'm working to add in SELinux into the Docker container as well. So far, its been working great for me.

I'm not sold on using Docker for development though. I haven't attempted to setup multiple Vagrant machines so maybe that's why I'm not seeing the value, but setting up single dev machines through Vagrant is just so simple and straight forward.

This maybe slightly OT, but could someone explain what etcd does?. CoreOS hosts containers and an etcd instance run on host (master) and on each container. Is that it?

Let's say I'm on on a VPS and I'm running multiple instances of CoreOS each hosting multiple containers. Can etcd be used in this case?

Yes, etcd can be used for a few different use-cases on CoreOS clusters. You can do service discovery by "announcing" each container to the cluster. With an etcd watch, any process on the cluster could be listening for changes to your service discovery keyspace, and take any action you specify (Reconfigure proxies, etc.)

Etcd also provides distributed locking for the cluster through a module. If you need to prevent an action from happening more than once, you can take a lock on a specific key to prevent others from processing that item.

Related to locking is the leader election module which offers an easy way to choose a new leader for a distributed service. Module docs are here: https://github.com/coreos/etcd/blob/master/Documentation/mod...

etcd[1] is a distributed key-value store with some interesting properties for making things like locking mechanisms. It's like zookeeper[2] except it uses the raft consensus algorithm[3][pdf] and is written in Go.

[1]: https://github.com/coreos/etcd

[2]: http://zookeeper.apache.org/

[3]: https://ramcloud.stanford.edu/wiki/download/attachments/1137...

We use docker for on-demand temporary PostgreSQL databases (for testing). Being able to get a clean db in 1 second is pretty neat. In addition to the docker stuff, there's another process that destroys containers over X minutes old.

That is awesome.

I like the idea of a sandboxed application, but I worry about the security implications - what happens if there's a security fix, but the creator of the Docker version is AWOL?

Docker has a facility to build containers automatically from source. As long as you have access to the source and it has a Dockerfile, you can rebuild the container itself after making all the changes you want (including security fixes).

Your Google Analytics is showing: http://imgur.com/D8vWM3k


Thanks! Fixing it

cute headline :) ( at least I think so as a node JavaScript developer ) . all fun aside, I love docker ( and previously vagrant ) as an on demand mobile back end for native iOS and android Dev. it works isolate, disconnected , and can be deployed when I need to stage for reviews . its great !

This is likely very naive. But can you elaborate on how you use it for this?

Am just starting to explore Docker.

I am a node js developer as well. That's where I got the idea! :D

Except that Douglas Crockford was talking about Javascript which (at the time that book was published, and perhaps even now) was viewed as a steaming pile of crap. So he wrote a book that highlighted "the good".

Docker is a pretty modern piece of tech, that has very little wrong with it, and seems to work exactly as designed. So you are kind of shoe horning the reference...

While I think the bad parts of Javascript are far more bad than the bad parts of Docker, I think it's fair to call out the good parts of Docker in such a way that the reader walks away and starts thinking "well, okay, but I'm guessing there are some bad parts too - now what are they?".

I'm a pretty smart person; I've poured over the Docker documentation, run through the interactive tutorial twice now, and I still don't have a great sense of 1) what it really is for, 2) how to really use it, or 3) how to handle slightly-non trivial use cases.

For #3, reading through a few examples online of how to get MySQL or Redis up and running in a container... honestly makes my head hurt. And those represent just one or two parts of the system I'm thinking could some day run on Docker - I don't have the time or patience right now to figure out how to get Nginx, Node.js, Redis, MySQL, RabbitMQ, and a few other things here or there - stitched together into a dockerfile.

If I had to make a guess at what the "bad parts" of Docker are, it's that it's complex and not all that understandable - yet. Maybe there aren't that many things that are technically wrong with it, but at this point, it's pretty painful to wrap one's mind around (IMO), and at least I personally think thats a "bad" part.

Docker feels like one of those things that is going to be indispensable and incredibly useful in a year or two. I'm certainly keeping my eye on it, but I'm staying away from the diving board for now.

I'll be more than happy to work with you 1:1 to get clarity. Your medium of choice.

That goes for anyone else reading this. I love answering questions and helping people. It's like pure bliss, so don't worry about being a bother.

That would be incredible - would you be willing to shoot me an email? rringham@letsgohomeapp.com

I'd be happy to take whatever I learn and apply to my app and contribute it back to the community, maybe as a quick tutorial up on GitHub, or something along those lines.

Sure thing. And for anyone else interested, email nick @ docker . com or say hi to keeb on Freenode

Docker has kinks in its armour as well. There are some pretty irritating issues like #643 and #1171. Anyway, the naming wasn't done with that intention. I agree that the context just isn't the same.

Those are not really Docker issues they are upstream Linux issues. 42 layers of aufs is probably a sane limit, the NFS one looks at a glance like an obscure NFS bug...

Docker and LXC looks interesting. What is your experience of the performance overheads of using containers?

In that sense, we found that all container-based systems have a near-native performance of CPU, memory, disk and network. The main differences between them lies in the resource management implementation, resulting in poor isolation and security. [1]

[1] http://marceloneves.org/papers/pdp2013-containers.pdf

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact