I don't see the parallels to Git at all. If you just looked at the command line options without understanding what's going on at all it might look similar.
> a tool like Puppet or Chef is needed when you have long-running VMs that could become inconsistent over time.
Uh, no, Puppet and Chef are designed for configuration management. To manage your configs. They are not designed to replace customization and they don't address package management or service maintenance. (They have options to munge these things, but you still need a human to make them useful and coordinate changes with your environment) Neither does Docker. Docker also doesn't do configuration management. All different things.
> You'll be 100% sure what runs locally will run on production.
Incorrect; you're using unions to fill in the missing dependency gaps, so there's no guarantee what was on your testing container's host is on your production container's host. Your devs also might be running with container A and B, but in production you're using containers A and C. Not to mention the kernels may be different, providing different functionality. All this assuming A, B and C don't need instances of different configuration. There are no guarantees.
You know what else is crazy fast and easy to manage? Packages. There's this new idea where you can have an operating system, run the command 'apt-get install foobar', and BAM all of a sudden foobar is installed. If you need a dependency, it just installs it. And it only downloads what it needs. Also does rollback, transactions, auditing, is multi-platform, extensible, does pre-and-post configuration, etc. Sound a lot like docker? That's because it's a simpler and more effective version of docker [without running everything as a virtualized service].
Deploy using your package manager. Except for slotted services which (AFAIK) no open-source package manager supports, it will do everything you need. And what it doesn't do, you can hack in.
Package managers are okay mechanisms for installing software, but really limited for configuration management. For example:
- I need to configure 50 virtual hosts, each containing a different set of web content - should I make 50 separate packages? or make one meta package?
- I need to install a java web application, and configure a database.properties file with a JDBC URL that might vary based on environment
- I have a set of cron jobs that I need to be configured based on the application sets that I have installed, but a different set that I need to be consistent across all systems. Now I need to build logic that figures out whether any of the cron jobs configured match jobs that already exist, or run the risk of them running more than once.
- I need to install tomcat 15 times with slightly varied configurations.
Now, you can say that some of these things are not relevant in development environments, or that you can do some of them with packages, and so on. But there's real advantages to use a config management tool to build your dev environment, and then when you're ready to move to production, use teh same config management model to build that environment.
It's not that you can't make packages do most or many of the things you want. It's about using the right tool for the right job.
1. If you believe in the paradigm of package manager as deployment tool, making 50 different packages would be really simple and effective. One meta package would get clunky to use, but require less package maintenance (and instead require more configuration/maintenance on the target)
2. Make one package for the web application. You can put the database.properties as a post-install configure section if you already know what it should be, or have it run a script on the target that loads the correct variables. Or make separate packages just for the database.properties. Again, it comes down to where you want to do maintenance.
3. This one isn't too difficult. Use /etc/cron.d/ and name your cron jobs uniquely based on what they do. Then make packages however you want. Even if multiple packages deliver the same cron job (more than once), you're just overwriting a file that already exists, so no duplicates. If that causes a conflict you can deliver to a temp location and have a %post section test if the target cron file already exists, and delete the temp file.
4. This is where slotted services (admittedly a term I made up) comes in handy. No package managers really deal with this properly, which is where a completely virtualized service becomes a much easier way to handle it. But you can still install a chroot directory and run the service from it using just package manager, no deploy tool required. Optionally, you can also build a set of packages and dependencies and install them to version-specific directories, and set up directories of symlinks to the versions you want, and target your application at the symlink-directory-tree that matches the versions you want to run. It can be a hassle if you don't have a tool to do it for you. I think there are some existing open source tools designed to do this, but I haven't used them.
(That was for running 15 different versions of tomcat, by the way. If you just want to run tomcat with different configurations, just make your configs and run tomcat for each instance! The scripts that ship with tomcat already support instance-specific configurations; iirc, HOME_DIR was the base path, and INSTANCE_DIR was the instance-specific configuration path, or something similar)
Everything you mentioned is relevant to development. And i'm not trying to discount real configuration management. If anything, it's critical to use a real configuration management tool to manage large, complex sets of configurations across large orgs. Using a package manager is just an easy deploy tool for the configuration, but how you manage it is left as an exercise for the engineer.
(That being said: why don't deploy tools incorporate change management, persistent locking, user authentication, pre/post install hooks, and audit trails? We need more open source solutions that fulfill enterprise requirements)
1. This is debatable, but probably too long a discussion to have over HN.
2. Once you're "running a script on the target that loads the correct variables", you're doing configuration management. There needs to be a mechanism for retrieving the metadata from somewhere, usually centralized. You'd be better off delivering this through a CM tool to build and maintain the file.
3. This doesn't work. Not only do you have the conflicts to deal with, and a post-configure script a hacky way to deal with it, but now if you need to change a single cron job, you have two unsatisfying options - make a new meta-package for that one cron job and overwrite the versions delivered by the five other packages, which will break any config validation you're doing (as one would hope you're doing), or I can update all of the packages that provide that cron job and update them all (which works, but now you have to come up with a way to notify your package that just in this case you shouldn't restart services adn the like just because the package was deployed. On top of that, there's an even worse issue, which is - how do you know when the last cron job is removed? If I have five packages that all create that file, either removing the first one removes it, which breaks the other four, or I have to come up with post-remove script logic that tries to programmatically determine if this is the last reference to that file and remove it only then. If I did the latter, my meta-cron job package update would break this model as well, and I'd have to remember to remove that specifically as part of uninstalling the their packages.
4. I guess this works, but now you're dealing with chroot'ed environments, which means deploying not just the specific stack you want, but all of the necessary libraries, and as you say, your original package manager idea doesn't really deal with this.
And tomcat gets a lot more complicated too, when you're trying to manage shared XML resource files. In fact, the whole package manager notion really requires the "X.d" style approaches to loading files.
But all of these challenges are why package managers are the worst solution for deploying configuration files. Package managers are great for deploying static software, shared libraries, and the like. I'll even concede that dropping code on a machine is fine with a package manager. But they're not designed to deal with dynamic objects like config files and system configurations.
In fairness, there's a whole other third class of objects that currently both package managers and configuration management tools do terribly, and that's things that represent internal data structures - database schema, kernel structures, object stores, etc.
I've built a couple of configuration management tools and work for a company that has a few more, so this is something I've spent a lot of years working with. Package maangement as configuration distribution is attractive for simplicity, but falls apart beyond the simple use cases. Model-driven consistency is vastly superior.
3. If you consider base OS packages to be hacky... That's most packages deal with potentially overwriting hand-edited configs. 'deliver foo.conf.new; [ ! -e foo.conf ] && mv foo.conf.new foo.conf'
But you have a good point! Dupe files are hard to manage. Some package managers refuse to deal with it, others have complicated heuristics. The best solution would be to just deliver the files and let a configuration management tool sort out what needs to be done based on rulesets. This can still be accomplished with packages as a deploy tool, and a configuration management tool to do post-install work, instead of the %post section.
4. You already deal with chroot environments using lxc/docker/etc. They're just slightly more fancy. But even with docker's union fs you still have to install your deps if they don't match the host OS's. Unless, of course, you package all the deps custom so they can be installed along with the OS ones. Nothing is going to handle that for you, there is no magic pill. Both solutions suck.
Most configuration management eventually becomes a clusterfuck as it grows and gets more dynamic and complex. In this sense, delivering a static config to a host in a package is simpler and more dependable. I can't tell you how much more annoying it is to have to test and re-deploy CM changes to a thousand hosts, all with different configurations, only to find out 15 of them all have individual unique failures and having to go to each one to debug it. On the other hand, you could break out those configs into specific changes and manage them in VCS. Or even pre-generate all the configs from a central host, verify they are as expected, and package and deliver them. I have done both and both have their own special (read: retarded) problems.
For reference, the sites that I worked at that delivered configuration via package management spanned several thousand hosts on different platforms, owned by different business units and with vastly different application requirements. But you have to adjust how you manage it all to deal with the specific issues. edit Much of it involves doing more manual configuration on the backend so you can 'just ship' a working config on the frontend. Sounds backwards but (along with a configuration management tool!) it works out.
> There's this new idea where you can have an operating system, run the command 'apt-get install foobar', and BAM all of a sudden foobar is installed.
Nah dude, that's never gonna fly. See, as mentioned in the top comment, no one uses and knows the OS they are developing for nowadays (though only windows users are looked down upon for some strange reason).
It is always more fun to not spend precious minutes reading about how to create an apt package, but reinvent the wheel for the hundredth time and implement another under-featured and buggy package manager in our current favorite language, and then add workaround after workaround to actually make it somewhat close feature wise to apt or rpm (i.e. usable).
> Docker includes git-like capabilities for tracking successive versions of a container, inspecting the diff between versions, committing new versions, rolling back etc. The history also includes how a container was assembled and by whom, so you get full traceability from the production server all the way back to the upstream developer. Docker also implements incremental uploads and downloads, similar to git pull, so new versions of a container can be transferred by only sending diffs.
I get it now. They support features which are comparable to features found in Git. Similar to how Apt and RPM are just like Git, because they also have the same features.
What I don't see is specifically git-like functionality, which would be incredibly useful for anyone who packages deployments of OSes, applications, configs, etc. For example, with Git you have a workspace that allows you to work with a tree of files, make changes, make specific commit logs for specific changes, merge, compare, search, etc. From what I see of docker, it's all just "how do I move my already-built containers around" functionality, and of course a shell-script (or specfile, or debian rules file) replacement called a Dockerfile.
There are lots of deployment solutions out there. What there isn't is a handy way to manage and track the assembling and customization of your various things-to-be-deployed, independent of the platform. A deployment tool that did that would become very popular.
> There are lots of deployment solutions out there. What there isn't is a handy way to manage and track the assembling and customization of your various things-to-be-deployed, independent of the platform.
Have you actually tried Docker? It does exactly what you describe.
Docker containers are versioned similarly to git repositories. You can commit them to record changes made by a running process; audit those changes with a diff. Unroll the history of any container to reconstitute how it was assembled, step by step. You don't get commit messages because typically changes are snapshotted automatically by a build tool - instead you get the exact unix command which caused the change, as well as date etc. This means you can point to any container, ask "what's in there?", and get a meaningful answer. In theory that would be true if 100% of all code deployed used rpms or debs. In practice that never happens because developers never package everything that way.
You can branch off of any intermediary image. This branching mechanism is used by the build tool as a caching mechanism: if you re-build an image which runs "apt-get install", it will default to re-using the result of the previous run. Uploading and downloading of containers takes advantage of versioning, so that you only transfer missing versions (similarly to git push and pull), and only store each verion on disk once with copy-on-write.
A Dockerfile is a convenience for developers to specify exactly how to assemble a container from their source, independently of the platform. Each step of the Dockerfile is committed, and benefits from the aforementioned benefits.
Customization is a special case of assembly: just use a pre-existing container as a base, and assemble more stuff on top.
All of this can be tracked, managed and automated as described above.
> A deployment tool that did that would become very popular.
Your documentation seems limited to basic functions, with nothing explaining really what this does, or why I would need it. I've just spent like 20 minutes looking all over it and I have no idea what to do with it.
Thanks for your feedback, this kind in particular is very helpful and valuable.
We have a video explanation and walk-through of shipbuilder in the works which should help communicate more clearly about what ShipBuilder is and what it can do for you.
I have a few additional questions if you wouldn't mind helping me improve this aspect of Shipbuilder:
1. Have you used Heroku before?
2. Are you confused about the purpose of ShipBuilder?
(i.e. "what does Shipbuilder do?")
3. Are you confused about how to setup the ShipBuilder
4. Are you confused about how the client works?
Finally, please feel free to contact me personally; I'd love the opportunity to answer questions or help you (or anyone) get started with ShipBuilder.
contact info: #shipbuilder on irc.freenode.net or jay [at) jaytaylor do.t com
Look at how easy heroku makes your first few steps. I'd advise doing something similar but with, for example, instructions for setting up a quick shipbuilder server on amazon ec2 or the like. Bonus: it will show you where the pain points are for shipbuilder at the moment, because you'll have to write too many small gotchas&workarounds into the tutorial!
I think the main question that would help me would be "Why do I need Shipbuilder?" with different examples of when I would need it, how it compares to other tools, what it was designed to accomplish, all the system and network dependencies, and a breakdown of each individual component and how it works with other components. Mainly I want to be able to have an image in my head of how it fits into my network/system, what it might replace, or how I can take advantage of it.
How does ShipBuilder compare to other open-source PaaSes like http://www.cloudfoundry.com ? CloudFoundry's pretty far from production-ready at this point, but it does have the huge advantage of supporting existing Heroku buildpacks.
I haven't quite heard this perspective before. Thank you, I find it interesting!
I believe the use cases are orthogonal at best. If you want to distill it, just as packages are great for dependency management and application install (which, if you read many Dockerfiles, you'll see the common approach is to have the package manager do most of that work,) Docker is great at combining the technologies and providing 2 types of experiences.
1) Developer intent. It is up to the developer to specify that the application receives traffic on specific ports. That it is going to store persistent data in a specific location. That is should run as a specific user.
2) Fulfillment (sysops). This is a prod environment? Let's put that storage on a NBD instead of local storage. Need static port allocation? Map it at run time. Host based routing? Run time.
I've found that the duality of the roles here can be quite powerful. And I believe it can only get better.
There was a grandiose tool at a previous place I worked that was built in-house as a universal deployment tool. It was based on SVN and Rpm with a MySQL backend and an HTTP API, and friendly console and web tools to manage it all.
You could build something as a developer and install it on a machine, and it could run multiple versions of the same application at the same time, including with different ABIs. There were build servers and all the build scripts were automated and vcs-managed. You could package config changes or applications. You could go back and rebuild old crap nobody had looked at in 3 years, and have it actually work. Ops and devs could both use it independently, with ops having the ability to overwrite dev changes. It was slightly clunky, but the functionality was beautiful.
Decentralized, distributed, automated, auditable, and able to support maintenance of pretty much any kind without interrupting existing services. It was fucking sweet, and i've never seen another tool that could match it.
It was built on a single "stack", but all of the code and technology involved was multi-platform. It would probably take a month or two to make it fully portable. But the RPM database it used was independent of the OS, and it provided all its own dependencies across multiple architectures and versions of the OS. (You could also run it on Solaris...)
With the correct versioning, you can sort the guarantees out - there is some discussion on the docker forum at the moment on signing / hashing or otherwise verifying the images.
For slotted services, I suggest looking at nix and nixos, a package manager (and a distribution) which pinches some ideas from containers.
As for the main point of your comment:
Yes, native package management is lighter-weight than containers (which is lighter than vms, which is lighter than seperate physical machines). Perhaps unsurprisingly, that weight brings additional features. The main one that containers (upwards) adds is segregation. apt (lovely as it is) can only ensure packages don't conflict on the files that they install - you are on your own for ensuring there are no runtime conflicts. Yes, with proper user creation + management you can restrict their ability to tread on each other's toes (hope there are no setuid programs in there), but that is all more effort than the 'their filesystems are seperate' that the heavier options give you.
There is also the question of tidying up / migrating. Let's say I install number of packages for some thing I'm deploying on a box. After a while I realise the load is too high and decide to migrate one/some of the apps to another machine. apt, etc can tell me what files a package has installed. It can't tell me what files a package has created while running. I'll have to go around and figure out the data (config, user config, log, etc) file locations and probably miss a couple and end up just duplicating the original machine. Or I copy the container file and the half a dozen images that make it up.
It's true that docker (and to a lesser extent vagrant et al) are perhaps suffering from over-use as the are 'the new hotness', but that's because we have a new tool and haven't yet fully figured out how to use it - it's somewhat inevitable behaviour. And yes, for some applications package management is fine and containers is unneeded overhead. But for others it isn't.
I will add one more difference between Docker vs. traditional package managers: Docker is a tool developers enjoy using. I have yet to meet a developer who enjoys building his application as an rpm or deb. The shorter the development/deployment cycle, the worse it gets.