Afterwards I asked Larry, "so, do you think you'll ever finish your PhD, either here or at Stanford?". He said, "If this Google thing doesn't work out I might, but I have a feeling it will work out ok."
It amuses me that Professor Brewer is now working for Larry. :)
I worked as a contractor at Google in 2013 and loved their infrastructure. It was amazing to fire off a Borg job that used hundreds to thousands of servers, and the web based tools for tracking the job, fantastic logging to drill into problems, etc.
And, Borg was two generations ago!
Even though I am very happy doing what I am now, sometimes I literally wake up in the morning thinking about Google's infrastructure. I now use lesser but public services like AppEngine, Heroku, nitrous.io (like Google's web based IDE Cider, a bit) but it is not the same.
BTW, not to be negative, but while Google is a great home for someone like Eric Brewer, it is a shame that many hundreds of future students at UC Berkeley will not have him as a professeur.
Also, they had code labs that are self paced modules for learning specific tech. I didn't spend much time at work doing code labs, but I could access them at home with my corporate laptop and I went through about a dozen of them at home. One other nice thing is even mentioning that you couldn't figure out something from the documentation or code labs would cause someone to jump in to help you.
Also, I don't think that people leaving Google have problems getting other jobs :-) The retention rate at Google is surprising low, given the pleasant atmosphere there. People leave to go elsewhere, start their own companies, etc. I was 63 when I worked there, and although it was probably not the most awesome place I worked, it was really great.
Apply for a job if you are interested, or go the easier route and get a contractor position.
So, why aren't such techs mentioned in these discussions? I liked his explanation of the partitioning problem. Yet, he and NoSQL advocates seem unaware that numerous companies surmounted much of the problem with good design. We might turn CAP theorem into barely an issue if we can get the industry to put the amount of innovation into non-traditional, strong-consistency architectures as they did into weak-consistency architectures. There is hope: Google went from a famous, NoSQL player to inventing an amazing, strong-consistency RDBMS (F1). Let's hope more follow.
Also incredibly expensive. Take this case, Google took off because they were able to scale-out with off-the-shelf hardware, compared to the millions banks were pouring in for scale-up configurations which handled much less load. Scale-up can quickly hit hard limits, before it becomes exponentially expensive to continue on the path. This is true even today.
NoSQL movement, if you wanna call it that, took off because most apps (including Google, including Financial Services, even Health Care) don't need some types of consistency offered by relational databases. Many large apps are significantly de-normalized and have many foreign-key less tables, often filled up by scheduled jobs. That's fine for most apps; NoSQL architectures recognize that and users consider that in design.
For the majority of use-cases out there, NoSQL databases offer enough consistency. For the remaining use-cases, there are tools available in NoSQL databases to make them work, though it requires a bit of work.
For applications with a nontrivial data model, ensuring that each logical operation only does a single document update (or that multiple nontransactional updates cause no conflict in maintaining consistency in the presence of other logical operations) is actually really challenging - and it adds a substantial design overhead to every new feature added. I think you're being extremely optimistic in your assertion that NoSQL systems are that widely safely applicable. My experience has been that NoSQL-based systems stay mostly-consistent because they happen to experience very little concurrent activity, not through understanding/design on the users' part.
This is not to make light of the situations where NoSQL systems shine, but the idea that higher levels of consistency are rarely useful does not match my experience at all.
> NoSQL movement, if you wanna call it that, took off because most apps (including Google, including Financial Services, even Health Care) don't need some types of consistency offered by relational databases.
I'd say that Mongo (for example) took off because:
- They really nailed the setup experience for new users (which RDBMSs historically sucked at).
- The data model is much easier for simple apps.
- They had some fairly creative techniques for making their system look good - unacknowledged writes, and claims of support for replication which didn't really fully work.
- Most programmers don't really understand the ins-and-outs of maintaining consistency.
NoSQL movement's origins are highly debatable. Here's what I saw in its beginning: an explosion of articles on the subject after a few success stories about big companies doing massive scale on cheap servers. Zealots argued that mature, strong-consistency solutions had problems. So, instead of fixing those, we should just ditch them and strong consistency for The New, Great Thing. The only time I even saw a real analysis of cost-benefits was a few articles which focused on a narrow class of applications where data consistency didn't really matter. So, my conclusion was that the movement was two things: 95% a social phenomenon that happens with each IT fad; 5% a result of weaknesses of relational model and tools. That last part makes sense and is why I've always opposed RDBMS's.
On your last point, do you have a citation that shows weak consistency databases offer enough integrity for "majority of use-cases?" I thought the majority of use cases for databases were protecting data that's important to a business: orders, billing, HR, inventory, customer information, and so on. I usually just tell them to use PostgresSQL and that covers it with high reliability. If you really can back your claim, though, I'll be glad to start transitioning my R&D toward designs that trade against integrity of data.
There's a boat load of assumed knowledge in this quote, how likely is it that someone not familiar with rdbms would know what a foreign key is, for example? Not very likely I think. I posit you give developers too much credit.
The problem is at a large distributed scale forcing consistency is basically fighting with the laws of physics. Is something in Australia always consistent with something in US? Well "same time" is a funny thing in physics because it doesn't actually exist. So we have to work really hard to keep consistency going.
To illustrage, here is how Spanner (F1) works:
Spanner's 'TrueTime' API depends upon GPS receivers and atomic clocks that have been installed in Google's datacentres to let applications get accurate time readings locally without having to sync globally.
I call that a cludge in the general sense, not amazing future technology. Yes if you can afford to install GPS antennas on your datacenters you can handle it and it is ok. But it is a crutch. "Well but there is NTP one might say say". Yeah there is, and connectivity to that fails as well.
The one interesting research area though is CRDTs. These are datatypes that know how to auto-converge to a known value even if they experience temporary inconsistency. So you basically experience temporary inconsistency but it fixes itself.
So, back to this current example. Synchronizing time was a huge problem for consistency. They had expensive, highly-custom datacenters throughout the world. Yet, there wasn't even a solution worth buying that wouldn't cost a fortune or need new infrastructure. An engineer noticed that a time-source existed which all datacenters could sync with using affordable, COTS equipment. One among others. So, they used that to solve the critical problem, solved other problems with other technologies, and integrated the resulting components into a solution to their real problem (F1).
What I've just described is not a hack: it's Engineering 101. Identify the problem(s), look for known solutions to it, adapt them to your needs, and deliver the solution. Their use of GPS to solve a problem that otherwise would cost millions of dollars to solve is exemplary engineering. Given datacenter costs, using this at each one would barely be a blip on the budget sheet and will get easier to deploy as adoption increases (network effect).
It's a hack, in all the positive senses of the word.
Can probably find some used ones on ebay for $1.5k or so.
FWIW, CAP is not about NoSQL or the 'NoSQL movement', it's about distributed systems and distributed shared memory, which applies to a whole range of computing problems.
Admittedly so, but at a very, very high price. Similarly, Sun and SGI had amazing technology in the server and workstation space (after Solaris 2.3, anyway), but over time Linux became "good enough" and we became willing to sacrifice Sun's niceties to save millions per data center.
The mere existence of technology isn't enough; it has to be affordable - and rational managers will have to make cost/benefit decisions that suit their goals.
The simple route, as I indicated, would be to copy, contract, or even buy a MPP vendor. In academia, MIT Alewife showed one custom chip was all that was necessary to build a NUMA-style, 512 node machine with COTS hardware. Existing, shared-nothing clusters already scaled higher than Google using custom chips for interconnect. One can buy cards or license the I.P for FPGAs, Structured-ASIC's, etc. Much software for compute and management was open source thanks to groups such as Sandia. And so on. Plenty to build on that's largely been ignored except in academia.
Instead, they've largely invested in developing hardware and software to better support the inherently-inefficient, legacy software. So, they're doing the hardware stuff but just not imitating the best-of-breed solutions of the past or present. The only exception is OpenFlow: a great alternative to standard Internet tech that major players funded in academia and are putting in datacenters. Another success story is Microsoft testing a mid-90's approach of partitioning workloads between CPU's and FPGA's. So, they're... slowly... learning.
we generally tend to jump into these as "omg awesome new tech" with a very narrow view. But it also helps boosting more though-out techs (even thus it feels less efficient to go through that route first, its perhaps the only route that works with human: try, fail, try again, etc.)
The other is a social thing that leads to the "network" effect. People flock to something for whatever reason. This builds a community (or network) that entices others to join. That also tend to forget about other things and reinvent the wheel. Example: much of current work in Web applications aims to solve problems already solved in client-server apps with better efficiency, security, reliability, and portability. Even Facebook went back to that model for mobile IIRC. Good luck convincing most Web technologists to switch to client-server, though.
Whoever solves both these problems will create ripple effects that grow innovation at a heightened, maybe exponentially better, pace. The reason will be a combination of avoiding wasted effort plus visibility into best efforts. I got ideas on Problem 1 but the best minds need to get on Problem 2: it's a gold mine if it's solved.
"Computing spread out much, much faster than educating unsophisticated people can happen. In the last 25 years or so, we actually got something like a pop culture, similar to what happened when television came on the scene and some of its inventors thought it would be a way of getting Shakespeare to the masses. But they forgot that you have to be more sophisticated and have more perspective to understand Shakespeare. What television was able to do was to capture people as they were. So I think the lack of a real computer science today, and the lack of real software engineering today, is partly due to this pop culture." 
There have been countless reliability and security problems that occur due to buffer, pointer, data becomes code, and interface errors. These are about 99% of worst problems. They happen because underlying Intel/IBM/RISC architecture treats all data the same... mostly. Plus, the systems languages (C/C++) are fundamentally broken far as preventing errors. The Burroughs team saw this [in 1961] and solved the problems at their source: CPU protection of pointers; CPU could tell code & data apart for security purposes; hardware-managed stack; CPU bounds-checked arrays; high-level language (Algol) for system code; interface types checked at compile & function-call time; hardware & software isolation of apps from OS. Good luck crashing or hacking that!
So, I've read thousands of hardware, firmware, and software solutions to these problems. Yet, very few will straight up fix the problem at its source. That's despite the existence of a proven solution since 1961 that costs a mere two bits of tagging. I'll give up a single-digit percentage of memory with single-digit performance hit to stop 99% of attacks. I'll do it today. Yet, industry's latest solutions are detecting this little tactic, hardware extensions for that, and no solution to the actual problem.
The failure of modern industry to do what Burrough's did, fix the underlying problem, is the source of most of our IT headaches. Aside from social reasons, backward compatibility with legacy is a big contributor. It's why heuristic-driven, software transformation systems such as Semantic Design's toolkit or Racket need a huge boost in R&D. Such tech is seeming like our only hope to getting legacy software onto better underlying platforms as nobody will pay for a human to understand and rewrite each codebase line for line, bug for bug.
the problem with folks hating on new technology "x" is often failing to see that the new thing might only offer one significant advantage over the prior tech, or they see it, but discount it too heavily to be motivated to try it.
Hard for me to say.
But it's the less common choice.
I am guessing convenience is more important than the better solution that would ultimately be more just as convenient and more efficient if it gets enough eyeballs?
At this rate we'll end up shipping containers as 'apps' to the clients machines with a suitable emulator at some point.
All this luxury comes at (considerable) cost and not everybody seems to be doing the math before deployment which more often than not leads to terrible efficiency.
But we're technology fans, so 'oohh! shiny!'.
While VMs and containers in VMs don't utilize the hardware as efficiently as a dedicated server they make your people much more efficient and happy. Unless you are running at a fairly large scale making your people more efficient gives you a much better ROI than making your servers more efficient.
Every dedicated server that you set up has an opportunity to not be duplicated perfectly. Every system that knows about your dedicated server has an opportunity to hard code what it shouldn't. Which makes these a potential point of failure. Add enough of those, and you're statistically guaranteed that the careful architecture that you have for failover is a pipe dream.
If everything is deployed with containers and discovery and the correct provisioning, then dealing with the fact that containers move around forces you to solve all of your other problems. And containers provide an abstraction layer that makes the rest of it straightforward.
Let me illustrate with an example.
When I worked at Google in 2010, I remember reading an article from eBay about how they finally manage to transition everything off of a running data center, without interrupting live traffic, and how much planning it took them. And they were congratulating themselves on what a heroic feat they had managed. Most companies today would still consider that a pretty amazing feat, and would find that challenging.
At the same point time I was learning how things were set up at Google so that you could drop any data center at random with barely any interruption of live traffic, and with no manual intervention required. And Google occasionally does this without warning to important data centers just to be sure that it works.
I like the test-the-panic-button attitude that google brings to these things, I've yet to get someone to accept my challenge to power down their supposedly automated fail-over solution, it's supposed to work but they usually can't be sure or management would surely not allow such a rash thing as a live test and it's bad form for me to then walk up to the switchboard and trip the breaker. Verry tempting...
This is, literally, the reason.
You can replace rails with any similarly bad technology. I got this explanation a few weeks ago at my job:
the java build process (I'm not kidding) has such a complex dependency graph that it must spin up full containers to do each build.
If dynamic languages supported better packaging/isolation this entire thing would be research projects.
(I am reading through the comments trying to work out if I should learn Docker. I already know how to use a virtualenv and I already know how to use a VM.)
One of the preferred ways to deploy rails is (was? I did this ~2 years ago) was to check it in with git and the name/version of the ruby environment to use is stored in the file ".rbenv-version", with dependencies managed by bundler (which you point to a local gem server). Install is then 1) use rbenv/ruby-install to install basic ruby, 2) install app with "git clone", and 3) run bundler. Many tools exist to do this in one step over ssh/etc automagically.
Even better than rbenv, you can just use chruby to point to any rubies you want; just check one into your project itself (or whatever) and configure the siteruby/etc load paths to point to project directories. Really, chruby just fixes up your dev environment to point to a specific ruby; you set the actual project to be self contained with known paths, just like you would do de facto in a container.
While dependency issues were a problem back in the ruby 1.8 / rails 2.x days, this
At least with the Borg containers (I'm not familiar with how Omega and Kubernetes do things), there wasn't any additional layer of abstraction - the fact that there were multiple jobs running on the same kernel wasn't hidden from those jobs (although they didn't have to be aware of it). The containers were purely used for in-kernel resource accounting and control.
Assuming you're operating at scale I don't see why that would be the case. And if you're not, what's the point?
> The more different things you can pack on a machine while still ensuring that the high-priority/low-latency jobs get prompt access to the resources that they've reserved, the higher overall utilization you can achieve (and hence bring costs down),
Yes, that's the theory. But in practice you're assuming better static control over the situation than the operating system running multiple jobs will have over the dynamic situation. So you'll need to over-provision and then you're back to square one with your utilization or alternatively you'll under-provision and then you will run into performance issues. TANSTAAFL.
(For instance, what's to stop each container to ship another implementation of the same library as a dependency, say SSL).
A lot of user-facing services at Google have to be over-provisioned in order to handle the cyclical usage patterns (the daily query peak is far higher than the average for most services) and to be able to survive the loss of a datacenter or two. This results in a lot of under-utilized servers for a big fraction of the time. So by packing lots of medium and low priority jobs on those same servers (and over-committing the resources on the server), you can soak up the slack resources; in the event that the resources are needed by the user-facing service the kernel containers ensure that the all the less latency-sensitive jobs on the machine don't compete for resources with the user-facing services.
It's true that the performance isolation when there are tens of jobs running on the same machine isn't going to completely match the performance isolation of running a service on a dedicated server, even with kernel resource isolation via containers, but you have to make cost trade-offs somewhere. The number of Borg services that could justify requesting dedicated machines was very small.
And to address your other concern about the OS not having so much insight into what's going on - Borg containers consisted generally of a single process, running on the machine's normal kernel. The containerization was just for in-kernel resource accounting/isolation. (Using Linux control groups, rather than anything fancier like LXC or Xen)
Thank you for the insight into the number of processes inside a typical Borg container, so that was basically a kind of 'heavy process' rather than a complete application with all dependencies (including other processes the main one depended on) packaged in, this is something I wasn't expecting at all.
My impression was that typically you'd have a process for the service, a borgmon process for monitoring, and maybe another process to ship logs off in the background.
Developers would only think about the service process (which itself typically was a fairly thin shim in front of other services), but a borg container would have more than that going on in it.
The logsaver would also be a separate job, although typically running co-located 1:1 with instances of the actual service job. The service and the logsaver would have access to the same chunk of disk (where the logs were generated) but otherwise they were separate as far as the kernel was concerned. (As far as Borg was concerned they were very much related, but that was at a much higher level than the kernel).
In another view, it's another approach to what many look to Chef, Ansible, and Puppet to do. Combined with something like Mesos or Kubernetes, you can quickly deploy to a heterogenous cluster, without a lot of install scripts running.
Some of the other uses cases, such as running multiple containers simultaneously on the same hardware, make less to me.
Here's my answer for "why": DRY. Once you've deployed hundreds of servers using the same exact Ubuntu 12.04 LTS kernel base, why not just completely abstract the OS away and focus the attention on scaling the OS services that matter? Why is that when I decide that I need to scale out, I need to copy every library of the OS and every line of code for the kernel and redeploy it every time I add a node?
> At this rate we'll end up shipping containers as 'apps' to the clients machines with a suitable emulator at some point.
That's exactly the point. Care to elaborate on the downside of such a promise?
Because it adds a layer that makes no sense unless you have very specific use cases. Though I see the point regarding people efficiency, that one makes good sense (see other comment in this sub-thread)
> Why is that when I decide that I need to scale out, I need to copy every library of the OS and every line of code for the kernel and redeploy it every time I add a node?
If you're doing it that way then you are simply doing it wrong. See: chef, configuration management and various deployment services (of which you could argue containers are one off-shoot, but they focus (imo) on the wrong level for all but the largest companies). Containers are like sandboxes with significant overhead for applications that focus on ease of deployment (but that's strange to me because I see that as a one-time cost for most of my own use cases, though I can see how that equation would change if you deploy lots of things configured by lots of different people to a single set of servers, especially if there are conflicting requirements between those deployments).
> Care to elaborate on the downside of such a promise?
That's my personal view of hell, if you don't see any downside there please ignore my vision and continue as if nothing was said.
From open, text based standards to shipping arbitrary binaries in a couple of decades. And I thought GKS was about as bad as it got ;)
A chef script is basically the automation of "I need to copy every library of the OS and every line of code for the kernel and redeploy it every time I add a node?" I'm sorry if you didn't pick up on my implied remark. Two problems are then introduced when automating those actions: (1) it doesn't negate the fact that I need to store and deploy a 700M sized OS layer every time I want to add a node (which takes minutes, not seconds with non-containerized configs) and (2) maintaining config scripts can (not always) be painful (version control, rollbacks, etc)
> unless you have very specific use cases.
> Containers are like sandboxes with significant overhead for applications
Again, do you have experience using containers? You seem awfully dismissive ("you're doing it wrong!") in a way that suggests that you might not entirely understand how they actually work...
In a nutshell: running a 'standard' combo of apache and a DB server as well as some auxiliary bits and pieces inside 'containers' a year ago gave significant overhead compared to running those without the containers. I'll re-do this and I'll probably do a write-up because the subject is interesting. This comment and follow up (https://news.ycombinator.com/item?id=9567623) are by people using this tech in production right now and their experience echos mine (but they're very far down the line compared to where I stopped).
Besides that particular use case (where performance and isolation are the key components to be looked at) some interesting points have been made in this thread which has shifted my stance on container use depending on what the situation is. So I don't think it is valid to classify me as 'awfully dismissive'.
FWIW I have not used containers in production (yet) but I'll be more than happy to if I can figure out where and how they can bring me an advantage, which is pretty much how I approach all tools.
That gives me both environment separation (A needs Ruby 1.9, B needs Ruby 2.0), resource accounting on a per-app basis, and a repeatable foundation in case I need to re-deploy the server or spin up new instances.
Disk - the technologies used for disk isolation (save chroots) are very poor performance, and in some cases can cause resource contention between what would otherwise appear to be unrelated containers. As an exmaple, using AUFS with Node creates a situation where any containers running on the same file system can only run one at a time, regardless of the number of cores. It's silly. Device mapper, on the other hand, is just plain slow (and buggy, when used on Ubuntu 14.4).
Network: The extra virtual interfaces, natting, and isolation all come with a performance penalty. For small payloads, this manifests as a few milliseconds of extra latency. For transferring large files, it can result in up to half of your throughput lost. Worse, if you have two docker containers side by side but due to your discovery mechanisms one container uses the host device to talk to the other container, you create what is known as assymetric TCP, which can cut your performance by a fifth or more. Try it out sometime, it's entertainingly frustrating to figure out.
Security: My favorite. What's the point of creating a container for your application if you're going to include the entire OS (and typically not even bother to update it with security patches). A real simple DOS on docker boxes would be to get the process to fill the "virtual" disk with cruft. You'll impact all running processes, the underlying OS (/var/lib/ is typically on the same device as /), and create such a singularly large file that it's usually easier to drop the entire thing and re-pull images instead of trying to trim it down.
Sorry if I sound down on the tech, but I've been fighting to make this work for production, and all of these little niggles are driving me batty.
Docker is fun and great when it's running on your workstation and coddled by your fingers at the terminal, but there's a lot of gotchas and missing parts when it comes to putting things into production, to be taken care of in a hands-off manner. There still isn't an easy way to centralise logs from a container app's STDOUT. Yes, there are other containers you can install to ship logs (which work for the author's use-case, not necessarily yours) or you can hack together something horrible. If you want to look at container logs, you have to have root rights. You can be in the docker group and have full control over the daemon, but the container log location is root only, and is made afresh with every container. (and don't forget to rotate those logs!)
My latest fun with docker is that one of my docker servers, built from the same source image and running on the same configuration plan in ansible as my other docker servers, fails to start docker on boot. Some sort of race condition, I assume. Basically it fails to apply its iptables rules and dies. People talk about making problems go away with docker, but it's a trope in my team that any day I'm working with docker, I'll be spamming chat with problems I'm finding in it from an ops point of view. And I'm just a midrange sysadmin :) But the point is that adding Docker adds an extra layer of debugging. The app stack still needs to be debugged, and now there's an extra abstraction layer that needs debugging.
Plus, in my particular case, there's the irony of using single-function VMs to run a docker container, which is running the same OS version as the VM :) (my devs bought into docker before I arrived...)
I can see some (mostly potential at this point) security advantages but that's about it (and maybe those advantages will be enough to justify the performance overhead but containers are mostly treated as a silver bullet by the adherents and I'd like to see a bit more balance).
No. A container is just a tarball of user-space code run with some isolation. The kernel is still the kernel. Run multiple containers on a machine, and the OS manages all of their processes at once.
And I just verified, you can kill a running process from outside a docker container. So the OS does see it and probably can do all its scheduling magic.
How does this perform in practice when they start talking to the outside world at or near capacity? How does it perform when they start talking to each other using some defined interface? (But presumably, no longer regular IPC).
So if one or more active containers could share resources then they won't, which leads to inefficiencies because you'll be running a much larger number of processes than you would otherwise (because of duplication) requiring a larger memory footprint and probably less efficient cache and/or IO utilization.
The deployment of the apps will be easier (which is a definite plus) but machine utilization will be lower and the amount of software running on a single machine will be far larger than otherwise, especially if multiple versions of dependencies are present on the same system.
A container is very much not a single process, it can contain many processes and some of those processes will likely duplicate components in other containers but without the resource optimizations that a kernel can normally perform.
Although true, that probably isn't really very significant compared to the vast wasted resources of idle dedicated machines. Which is hard to avoid without the vast wasted resources of a highly paid somebod(y|ies)
I also don't quite understand how one can reserve CPU cycles, memory and deliver IO guarantees without the same over-provisioning that you'd have to do using regular virtualization. After all, as soon as you make a guarantee nobody else can use that which is left over, so in that respect I see little difference between virtualizing the entire OS+app versus re-using the kernel (ok, that does save you the overhead of the kernel itself but that's not a huge difference unless you run a very large number of VMs on a single machine).
In the event that there ends up being no best-effort resources available on a machine for a significant period of time (because all the user-facing jobs are busy and using their guaranteed resources) Borg will shift the starving batch jobs to other machines that aren't so busy.
Where regular virtualization runs multiple kernels (which in turn will run whatever applications you assign to them) containers appear (to me, feel free to correct me) as a way to 'share a single kernel' across multiple applications dividing each into domains that are as isolated as possible with respect to CPU, memory, namespaces and IO (including network) provisioning and allowing multiple version of the same software to present at the time without interference.
The CPU, memory and IO provisioning can be thought of as a kind of 'virtualization light' and the namespaces partitioning should (in theory) help to make things a bit harder to mess up during deployment.
Leakage from one container to another will probably put a dent in any security advantages but should (again, theoretically) be a bit more robust than multiple processes on a single kernel with shared namespaces.
So I see them as a 'gain' for deployment but a definite detriment for performance because it appears to me we have all (or at least most) of the downsides of virtualization but of course you can expect both virtualization and containers to be used simultaneously in a single installation with predictable (messy) results.
I'm really curious if there is an objective way to measure the overhead of a setup of a bunch of applications on a single machine installed 'as usual' and the same setup using containers on that same machine. That would be a very interesting benchmark, especially when machine utilization in the container-less setup nears the saturation point for either CPU, memory or IO.
And that's assuming that it'd work exactly the way you're thinking.
I feel like the win over running VMs (which incur something like a 12% overhead compared to both Docker and running right on the machine for a single application), plus flexibility, plus ease of deployment is worthwhile. I mean, the current situation is running VM images anyway, right? This is a step in the right direction over that, even you must admit.
But you've made me curious enough that I'll do some benchmarks to see how virtualization compares to present day containers for practical use cases faced by mid-size and small companies, my fooling around with this about a year ago led to nothing but frustration, it's always a risk to argue from data older than a few months in a field moving this fast and more measurements are the preferred way to settle stuff like this anyway.
The linux kernel does not "lose track" of processes/libs inside containers, they are simply namespaced, like a more extensive chroot environment.
Of course it does make it easier to package and deploy applications (and to ensure their correct application) but to pretend that there is no cost associated with this is simply not true.
There is also ksmd that is useful with VMs, where memory is at a premium, though I'm not certain it is compatible with lxc yet.
But in practice (at Google-scale, anyway), that's dwarfed by the efficiency gains you can get by squeezing lots of things on to the same machine and increasing the overall utilization of the machine. Prior to adding kernel containers to Borg to allow proper resource isolation between the different jobs on a machine, the per-machine utilization was really embarrassingly low.
Another point to consider is that not all jobs are shaped the same as the machines - some jobs need more memory (so if you put them on a number of dedicated machines adding up to the total amount of memory needed, there will be lots of wasted CPU), and other jobs use a lot more CPU and less memory (so if you put them on a number of dedicated machines adding up to the total amount of CPU needed, there will be lots of wasted memory).
By breaking each job up into a greater number of smaller instances and bin-packing on to each machine, you could take advantage of the different resource shapes of different jobs to get better overall utilization.
No, you use containers despite the fact that your hardware utilization goes down (mainly because no shared pages between applications), because your huge sprawling environment is too hard to change with flag days.
Being able to strictly apportion resources between the different jobs on a machine (and decide who gets starved in the event that the scheduler has overcommitted the machine) means you can squeeze more out of a given server (by safely getting its utilization closer to 100%)
There are other definitions of the word 'container' that are closer to 'virtual machine' and include things like a disk image which is much harder to share, but that's not what's being discussed in the context of Borg. (Not sure about Kubernetes, that's after my time)
I like you main point however: I would like to know, given vistualization has X% overhead, what is X?
Perhaps because that's not what happens, the host runs one copy of linux, which namespaces the containers.
Also, possibly a clairification: Unless I'm misunderstanding what you mean by "running several linuxes on a linux machine", I believe you may be mistaken about the way containers work. Only one Linux is really running. And that is the Linux that the kernel comes from. The other stuff doesn't run unless you tell it to (so, no init, no daemons you don't specify, etcetera). Yeah the image size can be a little fat if you don't trim them down, but you can have a container nearly as small as your code is, if you statically compile. On the order of just a few bytes of overhead.
you no longer have an OS in the traditional sense.
you just dont get the debugging stuff (which is okay as long as you can choose)
it does reduce the attack surface/amount of things.
The future of computing is not this horrible kludge.
The problem containers are trying to solve is not isolation of environments (though, that's a tremendously good outcome).
The problem is how do you develop microservices that are self-contained, easily composable and inherently portable. It doesn't matter if you're running on Java 1.1 on AIX 3.0 or Node on Ubuntu 26, if I have a ball of computing providing a service in Dubai, and want to move it to Ireland to take advantage of computing space that just opened up, containers make that trivial (and about a million other scenarios).
But I sort-of agree in that we're just starting to make the transition from whole-os VMs to app containers, with a rare few going further. But cgroup/jails type isolation is lightweight enough that we can easily apply it at a much finer-grained level.
Google have 40 programmers dedicated to that project. It's still very beta btw all programmed in Go.
There's also mesos and I think you can use both in tandem since they're targeting a different thing.
Anyway if anybody is doing or thinking about containers check Kubernete and Mesos out. Also of course docker and rocket. Kubernete officially support docker and will be supporting rocket.
There are also article about how rump kernel are better than containers. Just fyi.
I know that containers are faster because they don't virtualize the hardware, however it comes at the cost of security.
I tried to address it here:
But in summary, the basic thing that containerized software offers is inherent portability and simple composition. VMs aren't going anywhere; running a container on top of them just makes everything more powerful.
But FREEBSD had jails since back in the day. What's the benefit of containers over bsd jails?
I mentioned earlier - https://news.ycombinator.com/item?id=9570639 - but the biggest thing is not isolation, it's portability. You do not care what you're running on, just that you have a certain amount of disk, CPU and network available to you, and the rest you handle yourself. That gives your organization a crazy amount of flexibility in moving that ball of stuff around.
Does Kubernetes provide those guarantees to the containers? And, if so, what does it do once those guarantees can no longer be met, say, due to neighbor growth?
Additionally, a hardlink solution (as you point out) really doesn't sound too outrageous to me. A trivial tool to write to manage immutable things like binaries. Or is there some trouble I'm failing to see (possible)?
> THIS FILE SYSTEM TYPE IS NOT YET FULLY SUPPORTED (READ: IT DOESN'T WORK) AND USING IT MAY, IN FACT, DESTROY DATA ON YOUR SYSTEM.
I think this is the case due to granularity of workloads and what apears to be a continuum in the workload container from metal > vm > containers > lambdas (as first class workloads).
True, lambda is DEFINITELY an advance, but it's more like the salmon at a buffet that includes steak and chicken. Some applications will need just the ability to run code (lambda), some will need defined environments (containers) and some will need total isolation (VMs/bare metal). You'll see a mix of all of these in every mature environment - some things do not fit. For example, it's super unlikely that you'd be able to run a trading app on Lamdba (needs 10GB/sec of direct network access & memory); similarly it'd be totally unnecessary to run a thumbnail processor on bare metal (though you could, of course).