Video : https://www.youtube.com/watch?v=1qUQsZwbu6s
Edit: not one of the coauthors, since I left the team in April 2019.
* Borg is primarily for managing hardware resources. K8s, in comparison, is for managing applications. I often call them a complementing twins. They both are container based cluster manager, but their focuses are opposite. I also use "machine oriented" for Borg, and "application oriented" for k8s.
* Borg emphasize on scheduling capabilities, performance, and scalability. And various integration capabilities for wiring together the base line software foundation, and interfacing with hardwares. For example, SDN, security, name service, needs to hook up with Borg, to interact with all applications.
* K8s emphasize on modeling application semantics and how to map them into containers. K8s provides abstractions for on boarding apps and a lot of toolings.
* K8s is considered a layer on top of Borg, if it can ever be incorporated into Google production.
* As of now, there is nothing in sight that can be considered next generation Borg. Just like there isn't any next generation Linux. Software like Borg, and Linux, can only be supplanted, not succeeded. Their vast amount capabilities defy succeeding. In other words, if one want to build a better Borg (or Linux) they are doing it wrong. K8s did not try to better mesos or open stacks, or Borg perse, it's just a different thing.
Edit: There are numerous difference between Borg and kubernetes. Send me an email listed in my profile, if you are really into the gory details. We can setup some discussion.
I think it's sad that 99% of the resources are used by hogs, though. I always thought it'd be neat to build on tiny containers; this suggests that you really can be fine with a few giant tenants and minimal colocation.
sigh some day I'll get to design my dream Borg successor, even if it's a TempleOS-like art project after I've descended into dementia.
Borg is vast. Unimaginably vast. Any engineer can start a job in the free tier for a personal web server in minutes, all the way up to saturating 100,000 cores to process some dataset. You can browse cluster job lists forever and never reach the bottom. 1% of jobs is still a huge number of different jobs.
The "hogs" are going to be jobs like web search serving, indexing and related, logs processing etc.
The "mice" are going to be the long tail of jobs. Remember that this sort of paper presents Borg as a kind of exemplar of what large cluster systems look like, but as you already observe, that's not really true. Borg is really unique to Google and Google is really unique. In particular due to the personalities of its founders and its financial position Google has a massive long tail of products and web sites that don't see much usage, or see decent usage by non-Google standards but which gets lost in the noise of search, ads, YouTube etc.
So the hogs vs mice phenomenon is telling us more about the nature of Google and how it does projects than something fundamental about job scheduler systems.
We might have something to show.
It seems that K8s takes networking under its purview, Borg does not and leaves it to the client (another comment here).
K8s also implements StatefulSets for storage.
I recently got back on a project that was on maintenance mode for a year - But because it had a nice K8s setup, it had more or less managed itself through OS upgrades and rebuilds with very little user intervention.
However, the developer experience is hugely different, both in the things that Kubernetes does have (Configuration Canaries) and the things it doesn't have (Global Load Balancing). There's patterns and hacks around most of these, but they're often created by developers who didn't understand the tradeoffs.
Most importantly: Borg is very well sanded. Borg was a production-ready system ten years ago, and while it's had to grow and change, there's many things that "work together" just nicely in the Google Ecosystem, that just don't yet outside. Partially because they support many more alternative implementations that don't quite have a standard API, but mostly because there's just rough spots about how you deal with marginal/edge cases that aren't found and documentation written on how to avoid them.
K8s was meant to fix borg's broken application management semantics and toolings.
As a scheduler it's much more powerful in capabilities, scalability and performance. It was designed to react to different workload demands in huge clusters very quickly. Imagine a data center goes offline and you need to spin up the search engine while killing a map reduce with 20k workers. A borg cluster is war zone, priorities and quotas win.
CSIs exist because there are a billion vendors of "cloud storage" solutions, and apps need a way to decouple themselves from the details. If you just need a block device, your Kubernetes app can run against anything that implements that API; your app you developed on DigitalOcean will Just Work on GKE. That's the idea behind those.
For workload management, Borg feels very close to StatefulSets; your replica count could increase and decrease as you demanded it, but each task was individually named (0.your-job.cell). I ran all my jobs with semantics similar to Deployments (tasks did not care which id they had; they all showed up as load balancer targets) and I think Borg had different abstractions for things in Kubernetes like ReplicaSets (i.e., a particular [config,code] tuple that is running in production that logically makes up a deployment).
You didn't ask so I won't go into the details as it gets very complicated, but networking is vastly different. (Kubernetes tries to provide IPAM and connection-balancing that is transparent to workloads; Google preferred client-side "smart" load-balancing libraries. That meant that load-balancing complexity was outside the scope of Borg, but for Kubernetes, it's very much in scope. Often with confusing results, but I digress.)
My experience having used both extensively is that there are about the same to the end user. Before I worked at Google, I always let "someone else" handle deployment and maintenance of production. At Google, I felt like I could write and release a piece of software to Borg in the same day (and did once!) After Google, I struggled with a bunch of orchestrators until I landed on Kubernetes, which gives me similar confidence and ease of use. Kubernetes has more API surface, probably because it has more users. At Google you could declare by fiat how jobs were to be run. In the real world, people aren't going to use your thing if you don't support their favorite feature. I have tried the opinionated container orchestrators in the real world (Heroku, Convox, ECS) and they didn't make me happy, while Kubernetes did.
To that extent, there's a hole left dealing with things that do need to consider state versus run networked API services. Kubernetes seems to have no good story here, whether it's the evolutionary progress happening around StatefulSets/PVC/PV, or a per appliance operator for you, and you, and you, which punch a hole as big as you like in the scheduling abstraction. Streaming for example is a notable pain, but pretty much every OSS project created in this century that has state needs a compensating tool, typically an operator, to function on Kubernetes. I'm not even sure at this point whether state is a design consideration that can be retrofitted—that's not a criticism of Kubernetes, but it is a complication and investment factor for stateful workloads. So what may happen should Kubernetes be one of those things that does in fact end up being a long term technology, is the entire software industry offloads state management to vendors (ie to a handful of cloud services), or something in open source reacts and is created to fill the infrastructure gap for state management.
In my personal opinion it is more valuable to adopt the Google model where no local file is considered critical, than it is to try to cram statefulness into an otherwise cloud-native stack. I feel that if you still care about specific files on specific disks then you really haven't fully adopted the meaning of cloud-native.
Jobs can write to local disk. Where do you think their executables came from? They can also use local disk as a scratch space for various things. The main use was writing out logs. Logs get written then picked up by a co-located job and saved out to disk clusters like Colossus, where they are then in turn picked up for processing.
However it's true that local disk was considered to be used only for transient data, most of the time. You could configure Borg to not work like that, so you could certainly run MySQL clusters on it and things, but that was a very unusual setup and not at all recommended. After all, then you'd have to manage backups and machine failure yourself.
This practice worked amazingly well when writing software entirely in house. So it was great for things like web search where there was no open source stack waiting to be adopted. It was disastrous for backwards compatibility with pre-written software, and I think this sort of thing contributed to Google's notoriously strong NIH syndrome. Using open source software at Google is hard; there are processes around it that must be satisfied, but more importantly, that software will expect POSIX file APIs to actually work and on Borg they don't. They appear to work, right up until your job gets evicted and then the data goes poof. To solve this you need to work with files using Google's own proprietary file VFS APIs that are backed by RPC clustered storage. GFS then Colossus presented a FS API that was only vaguely related to POSIX, so you couldn't even just hack together a quick bridge.
Net/net it was worth importing small in-memory libraries from the open source world, very rarely, but anything more substantial like a server - forget it. This could lead to absurd outcomes. I worked on a project there many years ago that would have really benefited from using Apache to do file serving, but Apache didn't integrate with Borg so I ended up using "static GWS" as it was called at the time. But static GWS had just enough features to serve websites written exactly how Googlers wrote it and nothing more, so it was a total fail at serving a third party developed website we'd agreed to host. Much pain.
One of the problems with Docker and container architectures in my mind is that they ultimately evolved out of Borg. The cgroups work in the kernel and other kernel features were developed by Paul Menage and others on the Borglet team for Google's internal use. Then the Docker guys picked them up and created this notion of containers that contain a whole OS - not at all how Google does it - but they kept the notion of transient local storage. Well that's a disaster, because normal software expects local disk to stick around. In my post-Google career I have witnessed more than one disaster caused by Docker of the form "whoops, we misconfigured our Dockerfiles and just erased our private key". There's no good justification for that. For most people systemd with some simple use of static linking would work just as well. That's what Borglets used to do - set up cgroups, have a simple base OS and then run statically linked binaries.
Just FYI, a lot changes in 5 years.
But there are big architectural differences too. Borg can split its master across multiple machines for example. The scheduler is a very tightly optimised piece of code. Borg clusters are managed by a central, dedicated specially trained team so there's basically no limit to the operational complexity that a Borg cluster can take on as long as they're all consistent - which they are.
Early on (changing somewhat now), large scale and throughput were not big goals for K8S. Given most uses are with VMs, and pretty much all uses are tiny compared to Borg, this was a totally reasonable prioritization.
K8S fixes many many serious issues in Borg that are mostly unfixable at this point (I worked on Borg for many years), it's better in many ways.
For scale specifically, Borg has a big advantage in that the state database in internal (the master maintains it directly on a local disk). This is naturally much faster then using etcd (at least etcd today).
It’s been around a long time, before Go existed :).
- The Borg are a fictional race of aliens in Star Trek: Next Generation. They were first shown on TV in 1989. This appears to be the basis for the software projects' naming.
- Borg, Google's cluster management system, was introduced at some point prior to 2006 (per , published in 2016, which states Borg was introduced "over a decade ago").
- Borg the backup system was forked from Attic in 2015 
Borg @ Google predates Borg the backup utility by several years.