Docker solves a problem that most people don't have. It's not a PaaS, rather, it's (some of) the building blocks to create your own PaaS. Most folks don't need that. Most folks want to put files on a server and start a process. For those folks, Docker in the raw ends up being a whole lot of confusing & unnecessary scaffolding.
Moreover, Docker / Kubernetes aims to solve the problem of building services that can easily scale to hundreds of machines and hundreds of millions of users, "Google-style".
That's great, if that's what you need. But most people aren't building a service like that. HN, I believe, runs on one machine, with a second for failover purposes. And HN still has many, many more users than typical company-internal services, community services, or at the extreme end personal services.
When you aren't operating at absurd scale, "Google-style" infrastructure doesn't do you any favors. But the industry sure wants to convince us that scalability is the most important property of infrastructure, because then they can sell us complicated tech we don't need and support contracts to help us use it.
(Disclosure: I'm the lead developer of https://sandstorm.io, which is explicitly designed for small-scale.)
"But the industry sure wants to convince us that scalability is the most important property of infrastructure, because then they can sell us complicated tech we don't need and support contracts to help us use it."
And lets not forget: replace any and all efforts at code optimization with "just throw another rack of blades at it".
There was also a time when most people thought they didn't need version control. Back in the 80s and 90s it was a justifiable viewpoint because existing version control systems sucked.
The problem with Docker is not that it doesn't solve (or attempt to solve) widespread problems. At its best, Docker gives you dev/production parity, and dependency isolation which is useful even for solo developers working part-time. The problem is that it's not a well-defined problem that can be solved by thinking really hard and coming up with an elegant model—like, for example, version control—it's messy and the effort to make it work isn't worth it most of the time right now.
That's no reason to write off Docker though. Pushing files to manually configured servers or VPSes is messy and leads to all kinds of long-term pain. You can add Chef / Puppet, but it turns into its own hairy mess. There's no easy solution, but from where I stand, the abstraction that Docker/LXC provide is one that has the most unfulfilled promise in front of it.
> At its best, Docker gives you dev/production parity
I get that when I use the same OS and built-in package manager?
I would virtualize the environment using something like VirtualBox for my dev and EC2/DigitalOcean/etc on prod.
> and dependency isolation
If you're going to scale something, you're going to split everything out on different virtualized servers anyway, so you'll get your isolation that way.
Basically, current mainstream practice is to virtualize on the OS level, where as Docker is pushing to have things virtualized on the process level.
I personally don't see the advantage ... just more complexity in your stack. I never have to mess with the current virtualization structure, I don't even see it. It looks just like a "server", even though it's not. Isn't that better?
Yeah, but then there's still the issue of secrets, you need to have testing PayPal credentials, testing mailing service credentials, etc. There's the issue of deploying changes fast without leaving files in an inconsistent state (you don't want half of some file to run). How about installing the required dependencies?
I don't use Docker, but those are problems I can think of off the top of my head.
Docker doesn't credibly solve the credentials problem and the other problems you outline (which do exist) are as practically solved with something like Packer. And I mean, I'm not a Packer fan--oh look, VirtualBox failed to remove a port mapping for the VM that just shut down, throw away the whole build--but it's built on much, much more battle-tested technology with a much wider base of understanding.
(And, later, if you want to play with Docker, Packer lets you do that too. But you should use the Racker DSL in any case, because life is too short to deal with Packer's weird JSON by hand.)
Thanks for pointing me to Racker (https://github.com/aspring/racker). I'm currently building Packer and Terraform images with chunked together Python scripts that work, but I wouldn't call them a great solution. I'm actually using Packer specifically so I can start with regular EC2, and then move to a more Docker-based infrastructure.
Packer severely frustrates me, with the maddening regularity in which it fails just for funsies. Or the consistent but completely inane ways that it fails, like refusing to proceed based on not finding a builder for an 'only' or 'except' clause (making it nearly impossible to re-use provisioners and post-provisioners across multiple projects). Racker does help--my shared Racker scripts are in a Ruby gem--though I think that it pretty much removes Packer as anything more than a dummy solution into which you dump directives on a per-builder basis. As a tool that you carefully feed the bare minimum of information to do its job in any specific situation, though, it works okay.
Terraform, on the other hand, I think is a huge, huge mess, and I don't think they're going to fix it. I wrote a Ruby DSL for it the last time I tried to use it in anger, only to encounter that Terraform didn't honor its own promises around the config language it insisted on instead of YAML or a full-featured DSL of its own. Current client uses it, and every point release adds new and exciting bugs and regressions in stuff that should be caught by the most trivial of QA. For AWS, I strongly recommend my friend Sean's Cfer as a better solution; CloudFormation's kind of gross, but Cfer helps.
Credentials have to be managed separately from Docker anyway.
> There's the issue of deploying changes fast without leaving files in an inconsistent state (you don't want half of some file to run). How about installing the required dependencies?
rpm / dpkg also install dependencies, are quite fast and well tested. They have the advantage of working in a standard environment which most sysadmins know but the disadvantage that you need to configure your apps to follow something like LSB (e.g. install to standard extension locations rather than overwriting system files, etc.).
The one issue everything has is handling replacement of a running service and that's not something which Docker itself solves – either way you need some higher level orchestration system, request routers, etc. Some of those systems assume Docker but that's not really the value for this issue.
> the disadvantage that you need to configure your apps to follow something like LSB (e.g. install to standard extension locations rather than overwriting system files, etc.).
Common misconception. You only need to do this if you're going to try to push the packages upstream. If they're for your own consumption, you can do what you like. Slap a bunch of files in /opt, and be done with it - let apt manage versions for you and be happy.
As with many things, this is one area where you've just got to know what to ignore. It's simpler than it looks.
I think we're actually talking about the same thing – I said “like LSB” simply to denote following some sort of consistent pattern, which will vary depending on how widely things are shared.
/opt is defined in FHS for local system administrator use, so installing your company's packages there is actually the recommended way to avoid conflict with any LSB-compliant distribution as long as you use /opt/<appname> instead of installing directly into the top-level /opt:
Kubernetes secrets are a really great solution to this problem.  They are stored at the cluster level and injected into the pod (group of containers deployed together) via a file system mount. This means that each pod only has access to its secrets which is enforced by the the file system namespace. If an entire machine is compromised, only the secrets of pods currently scheduled onto that machine are able to be stolen. That's a high level, but it's worth taking a look at the design doc.
Edit: forgot to mention, the file system mount means that they don't need to be in env var, which are fairly easy to dump if you have access to the box or are shipping containers around in plain text.
I don't know if Docker helps with this, I don't use Docker. But some kind of solution has to exist.
How AWS does updates is it first downloads the new code into a separate folder and then switches the link to point to the new folder instead.
But AWS has an unsatisfactory feeling because it downloads the entire code instead of doing a git update. These are all issues that could be fixed, and someone has to do them. I have no idea if Docker helps with any of them, but the opportunity is still there.
Ansible, systemd and go is stealing my heart at the moment. Basically pick the tech that doesn't cause the problems to start with.
I still reckon that the main reason VMware ESX is as successful as it is comes down to the lack of isolation and sheer deployment hell that windows has been for years. The same can be said for python or ruby on a Linux machine for example. Docker removes some of that pain like ESX does.
That's a bit of my point... If you're building relatively small independent services with Docker you can deploy service A with node 0.10 as it's tested environment and service B with iojs 2.4 on the same server, without them conflicting... when you need to update/enhance/upgrade service A you then can update the runtime.
The same can be said for ruby, python and any number of other language environments where you have multiple services that were written at different times with differing base targets. I've seen plenty of instances where updating a host server to a new runtime breaks some service that also runs on a given server.
With docker, you can run them all... given, you can do the same with virtualization, but that has a lot more overhead. It's about maximum utilization with minimal overhead... For many systems, you only need 2-3 servers for redundancy, but a lot can run on a single server (or very small cluster/set)
I have to agree on ansible, systemd and go... I haven't done much with go, but the single executible is a really nice artifact that's very portable... and ansible is just nice. I haven't had the chance to work with systemd, but it's at least interesting.
> The same can be said for ruby, python and any number of other language environments where you have multiple services that were written at different times with differing base targets. I've seen plenty of instances where updating a host server to a new runtime breaks some service that also runs on a given server.
This is a solved problem in Python and Ruby. In Python, use virtual environments. In Ruby, use RVM. You won't have the issue of one tenant breaking another.
And with node, you can use nvm... however there are libraries and references at a broader scope than just Python, Ruby or Node... Say you need an updated version of the system's lib-foo.
A runtime environment for a given service/application can vary a lot, and can break under the most unusual of circumstances. An upgrade of a server for one application can break another. Then you're stuck trying to rollback, and then spend days fixing the other service. With docker (or virtualization) you can segregate them from each other.
In local system yes, but in production its painful to work with. With RVM for isolation you would create gemsets for each app with specific ruby version. It OK for 2-3 applications, but anything more than that would be a pain to work with. And then if you plan to put everything behind passenger, it would just be too messy. Think of automating this? Would be a nightmare to maintain. Over here containerization does help.
Node has this too : nave, npm and n. But using these tools means that you are not longer using the package manager of the system and this can be a problem sometimes. Eg you need to open your firewall to something else that it is not the standard pkg manager.
I see docker as a valid attempt to fix limitations of existing and broken package system (eg: apt) at a price that I am not yet willing to pay.
Two points; firstly, SRV records offer other advantages over A records beyond simply port diversity.
Most notably: the ability to exist at the apex of a domain, resolving to both IPv4 and IPv6 addresses simultaneously, and weighted-round-robin and fallback support.
The overload of the address record to discover service endpoints is ultimately a murky throwback to when servers were named like pets, not disposable commodities.
Secondly, in this age of containerized deployment it is already common to have many HTTP-substrate services bound to a single address, disambiguated by port. Leveraging the SRV record in DNS means not having to invent yet another endpoint discovery mechanism just to know what port number to connect to.
The reference to "typing an extra 5 characters" invokes a world of manual, static configuration that many of us are happy to have replaced with services that find each other through discovery protocols.
I can only agree, and further voice my strong disapproval at the continuing, damaging and absurd lack of DNS and IPv6 considerations, most notably the omission of any discussion of endpoint resolution.
Literally so: this protocol document does not specify how you determine which server to connect to. HTTP2 is, in definition, only very loosely coupled to IP despite making significant optimisations for TCP. Thus in implementation we simply get the same old mistakes and undefined behaviours. Issues with floating apex records, hacks based on IPv4/6 race conditions, unnecessary address wastage and so forth will continue; all derived from the colossal architectural wart of overloading the DNS host (A/AAAA) record as a service endpoint discovery mechanism.
Once again, I say unto the peanut gallery: shoulda used SRV. The benefits are many and the downsides greatly overstated. I bemoan the missed opportunity.
I can't agree enough about the missed opportunity to use SRV records. This would have been such a monumental step forward.
Edit: It makes me a bit giddy (which makes sense if you factor in my being a sysadmin) to think about what SRV records would've done for load-balancing, running servers on non-standard ports, IP address exhaustion, and server migrations. Anybody who doesn't appreciate proper service-location hasn't ever done serious sysadmin work and, IMO, has no business designing protocols.
An apex record is one at the root of a DNS zone. Sometimes called "naked domains".
For example, in "https://github.com/" they are the records particularly for "github.com", rather than for subdomains that might exist such as "www.github.com" or "gist.github.com".
Apex records have a particular restriction: they cannot be aliases, because the apex includes DNS metadata that is not allowed to be aliased. Read on for how this becomes a problem.
I've used the term "floating" as a visual metaphor, because what I'm about to describe lacks a universal standard name, because it is an ugly hack:
HTTP resolves endpoints using host records, so an URL of "https://github.com" means looking up A and AAAA records for "github.com". Yes, the protocol is arrogant enough to assume that your host address for the whole domain is that of the web server. (This is why we ended up prepending "www" to domain names, as a service selector). In response to the query you get an IP address.
Unfortunately, IP addresses sometimes change without warning. The most common example today is the loadbalancer offered by Amazon Web Services. The solution to this is to use an alias record in your human-friendly domain, pointing at an hidden technical domain that the infrastructure provider keeps up-to-date (e.g. "my-elb-name-1-1160186271.ap-southeast-1.elb.amazonaws.com")
This is fine for "www.example.com" but not the naked "example.com", because aliases are prohibited at the apex.
As a result, DNS providers such as Route 53 have ended up with a hack: a spoofed record at the apex, one that tracks an external resource and synthesizes a fake A/AAAA response. Now you have a naked domain that tracks, or rather hopes to track, the correct endpoint. But it changes with the wind. Hence my description of it as "floating".
There is no consistent name for this kludge. AWS calls it an alias, and for reliability concerns restrict it to their own infrastructure only; DME call it an "ANAME" record . The model can even be readily implemented as a shell script run out of cron on your nameserver. It is fragile, it is often unreliable, it is not at all standardised, and it doesn't scale beyond one service.
One better solution would be to require use of SRV records, which allow one to declare instead, for example, an "https" service for "example.com". Alongside, let's say, the xmpp service, sip service, or any other service you care to announce. SRV records can exist at the apex. They can also bundle the A and AAAA (IPv6) addresses for the resulting endpoints in the answer, and select alternative port numbers without bothering the user about it.
Not quite a universal panacea: there is a minor hazard of zone cuts that could increase the number of client lookups, but that's an edge case, not one you can easily blunder into and also easy to fix.
 HTTP/1.0 and earlier are forgiven, because they hail from a time when you just had a web server in a rack and called it "www". But HTTP/2 is supposed to respond to modern architectures.
I think "apex record" means the root domain name in a zone, and is unrelated to A records (except for the fact that you would usually make an A record for your apex so web browsers can reach your site even without a "www." subdomain/prefix)
Actually, they all seem to me pretty limited compared to the answers you get from Wolfram Alpha.
Also c.f. the responses to one of their test questions, "how much is a quarter cup of butter?". Google makes fun of the inquiry. Wolfram Alpha gives you a thorough nutritional profile, and links to variations based on international cup sizing and different types of butter.
Wolfram Alpha is amazing at discerning the intention of the question.
In the article, the question "How old is the Lincoln Tunnel" struck me as incorrectly formatted for the parser (I know, that's the point), so I asked Siri, "When was the Lincoln Tunnel built." The Wikipedia article on the Lincoln Tunnel was returned. Wolfram Alpha was listed under other sources, so I chose that. The response? "1937"
I've noticed lately that many times things I know Alpha will slam-dunk don't get routed to it by Siri. I'm not aware of the details of the deal we have with them but from my observations of Siri it looks like Apple might be looking for certain keywords (such as how, what, why, etc) before it tries routing anything to Alpha. I hope they can relax that in future.
Luckily if you say "Wolfram XXX" instead of just "XXX" Siri will route your question straight to Alpha no-questions-asked.
There are approximate conversions for recipes, since many American recipes use volume measures and expect you to have measuring cups, while European recipes expect you to have a kitchen scale. But yes, there isn't any single conversion, since density varies: there's one cups/grams ratio for granulated sugar, one for powdered sugar, one for sifted flour, one for water, etc.