Kata Agent: https://github.com/kata-containers/agent
Kata Shim: https://github.com/kata-containers/shim
Kata Proxy: https://github.com/kata-containers/proxy
KSM Throttler: https://github.com/kata-containers/ksm-throttler
And some forks to provide for their necessities, I suppose, as:
Linux Kernel: https://github.com/kata-containers/linux
docker run --pids-limit=64
e.g. the entire testing repo is empty
I would suggest we are looking at a conference driven development release
The idea of treating containers as secure and isolated as VMs is enticing for non-ephemeral services. Are these strictly tuned to exploit intel Hardware features or would they consider supporting the equivalent features in say AMD?
On the other hand, isn't this the realm of mainline distributions like RHEL, Debian and the like? To support such isolation facilities. I always thought clear Linux was a Intel playground for proof-of-concept which will eventually be up streamed to major Linux distributions.Is it not true?
I guess my question is why a separate project like this, instead of RedHat Enterprise Containers or Debian containers?
At least Debian doesn't develop isolation solutions on its own; it tends to package software that's already out there. And if it's popular enough, it might be integrated fairly tightly into the distribution.
From someone who works at big name companies: this should not impress anyone. Big companies love slapping their name on things that give them "innovation" credibility. It's like Pepsi sponsoring the X Games.
> why a separate project like this
This is an OpenStack project, so it's not vendor-specific. It's also supposed to be a new "standard container", which I highly doubt will happen because they're just slapping together two other projects.
Are you saying that security and isolation is not enticing for ephemeral services? I know that an ephemeral container is reset after a restart but I think that it's a bit naive to think that that is a good enough replacement for true isolation.
Didn't mean to imply the reverse logic of my statement. I believe Linux Containers (and hence Docker) depend only on Kernel namespaces to provide isolation. In my admittedly naive eyes, they were not good enough/mature to replace my KVM VMs yet. Too much to trade off for little convenience/performance.
However, if Linux containers matured up and offered the same isolation facilities that something like KVM does, then I can think about switching to them in future, and enjoy the performance boost.
>I know that an ephemeral container is reset after a restart but I think that it's a bit naive to think that that is a good enough replacement for true isolation.
If I'm looking to run an application for which I care about solid isolation of resources, I'd spend my time running it as VM. But, if I'm running a one-time script that chews some data and I don't care much about it bothering other workloads in the system or other workloads bothering it, then I'd fall back on the isolation facilities offered by namespaces by using Containers. Nothing wrong with that.
Security view on these is another argument. If I can't afford the application escalating it's view and looking into other workloads in that system, I just wouldn't run them in Containers today.
However, what you get is indeed a virtual machine. It is simply impossible for "real" containers to provide the same isolation as virtual machine, simply because the attack surface is that of the shared kernel; a hypervisor presents a much more constrained interface to a VM than the full kernel, even if you add QEMU to the mix.
So it's true that, as in the famous Theo de Raadt rant, virtualization overall adds to the attack surface compared to containers. But it would also be stupid to ignore that it also introduces very important bottlenecks: to get to the hypervisor, you have to break KVM which is only ~60000 lines of code; getting remote code execution in QEMU might be easier, but then you also have to break the host kernel from a process that has access to almost nothing on the system.
There is also vhost, which is implementing PV drivers in the host kernel. This however is also a small amount of code, and it is generally used only for networking and AF_VSOCK sockets.
RedHat / Suse / Ubuntu / $Vendor take the upstream project, tidy it a bit, package it, get it integrated in their ecosystem, and add an easy installer.
Having it in a vendor neutral foundation means that all the vendors can colaborate, and not have one group with a massive advantage or complete control over a roadmap.
I didn't mean to undermine the work that goes into turning a project like this, OpenStack, Kubernetes, Cloud Foundry etc into a real product that users can download and install on random hardware in random configurations, and get a working system - it is a ton of work, and is massively important for getting actual users to install what are very complex distributed systems.
Docker became popular because it was pretty easy to use, and to publish and reuse existing containers. Whatever competes with it only stands a chance if it can either reuse the existing container ecosystem, or offer something roughly as good.
.io is the TLD for the British Indian Ocean Territory, technically speaking.
They also said the same about Xen, that a special purpose microkernel was a better choice than Linux as a hypervisor...
Xen was successful because it was innovative, and because it worked around the fact that x86 was not virtualizable at the time. But after ten years of healthy competition, the only reason to prefer Xen to KVM would be things like QubesOS.
Generally, treating any OS-level technology as a silver bullet is a huge mistake. Any serious developer would make multiple levels of security that _should_ be sound.
While not applicable to FreeBSD alone, this polemic thread:
is a pretty accurate description of container level security and not much has changed. Stuff built on a foundation is always subject to the foundation's qualities.
Jails are secure. As are SmartOS zones. Whoever you heard that there are “many instances of breaking out of a jail” from is full of sh47. And you would be wise to never listen to them ever again. No really, EVER.
And no, breaking the ps4 was not a jail exploit. The attacker already had elevated privileges. So you would be sunk no matter what.
But when we say "elevated privileges" are we talking root inside of a jail? Because if that breaks jails, then a large class of Docker exploits also wouldn't classify as 'exploits' under that criteria. One of the biggest problems with Linux namespaces is the band-aid put over root, via capabilities.
As far as I know, though, the PS4 exploit was more Sony's fault. IIRC, they broke out of the jail by exploiting custom syscalls not in stock FreeBSD. Bugs in syscalls in FreeBSD aren't unheard of though, even if less commonly found than Linux.
My entire point is that good security implies not treating any solution as a panacea, lest you find yourself in a digital Titanic scenario. Multiple layers of solid security beats one layer of solid security.
More likely, there are fewer people looking for vulnerabilities in BSD than in Linux.
>less commonly found
rather than less common. Impossible to know with 100% certainty what's literally less common.
If I had to guess, I'd guess FreeBSD had less bugs in general, just because the surface is generally smaller, and the system is more homogeneous.
hn discussion: https://news.ycombinator.com/item?id=10093862
Here are the first two that pop up if you google his name.
He gave a talk at DTrace conf 2016 about all the security vulnerabilities he personally found in DTrace in SmartOS. Here are the slides: http://slides.com/benmurphy/deck
but I can't find a mention of Hyper-V anywhere (which doesn't mean there was no inspiration). Maybe you confused Hyper runv and Hyper-V here (the naming certainly doesn't help)?
So put more finely: containers are not secure, anywhere. Virtual machines are. So you should run your containers inside virtual machines if security is important to you. Environments that can't run containers natively are forced into the more secure configuration.
If you're interested in the ongoing work to make containers more secure, Jessie Frazelle has very clear posts on the subject . The Bubblewrap project also has a great summary of various approaches being used to "jail" container processes properly. 
These were both in Illumos found by the person you are replying to.
Can you (or someone else) ELI5 what makes containers insecure? Not a low level Linux or security expert.
Linux containers run a new environment on top of the host's kernel. It's the same kernel in one container as another and the same as in the host. If you manage to break out of the namespace or otherwise exploit the kernel, you're already in some other container's business. Worse, there's a good chance you've exploited the kernel in a way that you can get all the other containers and the host all at once with one exploit.
That's not usually the case, though.
With containers, you get the kernel exploit, and you're in, for the most part.
In what sense is this "OCI compatible"? Do they implement the runtime, image format spec, or both? My understanding of containerization and OCI runtimes is that they're fundamentally different from hardware-level virtualization.